PyData London 2023

Zhaozhi Qian

I am a postdoc at the van der Schaar Lab in the University of Cambridge. In the past, I have led and contributed to the development of a host of novel algorithms for synthetic data (a list of publications can be found here). I am also leading the development of Synthcity, an open-source software library that aims to democratise the cutting-edge research in synthetic data.

Prior to joining the academia, I worked as a data scientist in one of the largest mobile games companies in the world, designing and implementing AI-powered systems that automatically optimize performance marketing campaigns. I also proudly worked for NHS as a volunteer during the pandemic, contributing to UK's first ICU capacity planning and forecasting system.

Synthetic data: what is it and why do we need it?
One of the biggest barriers to machine learning and data analytics is the difficulty to access high quality data. Synthetic data has been widely recognized as a promising remedy to this problem. It allows sharing, augmenting and de-biasing data for building performant and socially responsible ML systems. In this talk, I will overview the significant progress in the theory and methodology of synthetic data over the past five years. I will also introduce the open-source library, Synthcity, which implements an array of cutting-edge synthetic data generators to address data scarcity, privacy, and bias. The participants will walk away with a deeper understanding of the theory and practice of synthetic data, an understanding of when which methods apply (or do not apply) to their specific use case, and be ready to apply them in hackathons, competitions, and their day-to-day work.