PyData London 2023

From Passive to Active: Exploring the Benefits of Active Learning in Data Science
06-02, 11:00–12:30 (Europe/London), Salisbury

Active Learning is a powerful technique in the field of data science that enables efficient use of labelling resources. In this 90-minute-long hands-on tutorial, we will provide a step-by-step guide on how to apply basic Active Learning techniques for a document classification problem.

The tutorial will begin with an introduction to Active Learning, followed by a brief discussion of its cost and time savings benefits. Next, we will implement clustering to select the first batch of training data. Then, we will train a document classification model and analyse fundamental Active Learning concepts such as diversity, isolation, and model uncertainty. We will compare different metrics to select the best points for annotation.

Finally, we will evaluate the model's performance and compare the results of Active Learning with random annotation. Throughout the tutorial, attendees will have the opportunity to work on their implementation and receive assistance.

By the end of this tutorial, attendees will better understand the principles of Active Learning and how to apply them to their own supervised learning problems, enabling them to make more efficient use of their labelling resources.


Participants will be able to run examples and exercises on their computers using Jupyter notebooks, as this tutorial will be presented to them. To facilitate this, a GitHub repository will be established two weeks before the conference, providing instructions on setting up the Python environment for running the tutorial locally.

The tutorial session aims to familiarise the audience with the primary abstraction levels of Active Learning, along with its typical challenges, by solving a simple problem. This will give them the foundations to implement more complex solutions for their projects.


Prior Knowledge Expected

No previous knowledge expected

Mate Timar is a physicist turned data scientist, with expertise in both fields. He obtained his degree in physics and went on to specialise in strongly correlated quantum systems during his research career.

Driven by his passion for exploring the intersection of physics and data science, Mate eventually transitioned into the world of data science. He is now an expert in Bayesian Statistics, Interpretability, Experimentation, and Active Learning.