PyData London 2023

Mate Timar

Mate Timar is a physicist turned data scientist, with expertise in both fields. He obtained his degree in physics and went on to specialise in strongly correlated quantum systems during his research career.

Driven by his passion for exploring the intersection of physics and data science, Mate eventually transitioned into the world of data science. He is now an expert in Bayesian Statistics, Interpretability, Experimentation, and Active Learning.

The speaker's profile picture


From Passive to Active: Exploring the Benefits of Active Learning in Data Science
Mate Timar

Active Learning is a powerful technique in the field of data science that enables efficient use of labelling resources. In this 90-minute-long hands-on tutorial, we will provide a step-by-step guide on how to apply basic Active Learning techniques for a document classification problem.

The tutorial will begin with an introduction to Active Learning, followed by a brief discussion of its cost and time savings benefits. Next, we will implement clustering to select the first batch of training data. Then, we will train a document classification model and analyse fundamental Active Learning concepts such as diversity, isolation, and model uncertainty. We will compare different metrics to select the best points for annotation.

Finally, we will evaluate the model's performance and compare the results of Active Learning with random annotation. Throughout the tutorial, attendees will have the opportunity to work on their implementation and receive assistance.

By the end of this tutorial, attendees will better understand the principles of Active Learning and how to apply them to their own supervised learning problems, enabling them to make more efficient use of their labelling resources.