PyData London 2023

Konstantin Lopukhin

I lead Machine Learning research and development at Zyte, where we work on making the web data accessible to more people through products, services and open source projects. I also participated and won Kaggle competitions, achieving a competitions grandmaster title, and contributing to the community with talks, sharing code and knowledge.

The speaker's profile picture


Web Data Extraction with Deep Learning
Konstantin Lopukhin

Extracting data from web pages is a problem which is not so well covered and researched compared to image or text classification, object detection or named entity recognition. But this problem is extremely exciting to look into and rewarding to work on, because web pages can be represented in so many ways: as a screenshot of a page, as it's text, as an HTML tree, as a sequence of elements with discrete and continuous properties, and in other ways. This leads to many diverse approaches, which often combine different input types and ways of data representation inside one model. In this talk we will explore several intriguing approaches for web data extraction, and see how one can come up with novel approaches and grow your model according to the task at hand.
This talk is intended for anyone with interest in neural networks. I hope it gives you inspiration and intuition for building deep learning models tailored to the structure of your data. This is applicable not only to web information extraction, but also to document extraction and other domains with structured text or image inputs.