PyData London 2023

Driving down the Memray lane - Profiling your data science work
06-04, 15:00–15:40 (Europe/London), Minories

When handling a large amount of data, memory profiling the data science workflow becomes more important. It gives you insight into which process consumes lots of memory. In this talk, we will introduce Mamray, a Python memory profiling tool and its new Jupyter plugin.


In this talk, we will be exploring what memory profiling is, and how it can help with data science work. We will start the talk with a basic explanation of how Python arrange memories for various objects. This lays the foundation explanation of why we need a special tool to memory profile Python programs.

Then we will be going through a data science use case where we memory profiles some part of the process with the Memray Jupyter plug-in. This would be a use case that a data science practitioner or learner would be familiar with and they can see how memory profiling could be useful.

We will then explain how to interpret the frame diagram in Memray, a commonly used diagram in memory profiling to understand how much memory a process and its sub-process uses. This is something that for a new user, it could be hard to understand and not know what to look into. From this example, audiences can see what they can learn about from the frame diagram.

Goal

This talk is for data scientists, learners or anyone who is interested in memory profiling their Python program. Although the talk will be using a data science use case as an example, the explanation and the tool can be expanded to be used in any Python program. However, for data science practitioners and learners who have been using Python to process data, this may be a step forward for them to improve their data workflow and prevent memory leaks from their programs.

Outline

  • Introduction (5 mins)
  • Why we need a special tool for memory profiling (5 mins)
  • How to use Memray in Jupyter notebook (5 mins)
  • Demonstration for using Memray in data science work (10 mins)
  • How to interpret a frame diagram (5 mins)
  • Conclusion (5 mins)
  • Q & A (5mins)

Prior Knowledge Expected

No previous knowledge expected

Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, Cheuk is now working with the open-source community. Cheuk also contributes to multiple Open Source libraries like Hypothesis, Django and Pandas.

Besides her work, Cheuk enjoys talking about Python on personal streaming platforms and podcasts. Cheuk has also been a speaker at Universities and various conferences. Besides speaking at conferences, Cheuk also organises events for developers. Conferences that Cheuk has organized include EuroPython (which she is a board member), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.

This speaker also appears in: