PyData London 2023

Mastering Great Expectations: Ensuring Data Quality in Your Data Pipelines.
06-03, 15:45–16:25 (Europe/London), Warwick

Join me as we dive into the world of Great Expectations, an open-source tool that helps data driven teams ensure their pipelines deliver high-quality data. You will be introduced to the key concepts of Great Expectations, including data validation, documentation and lineage. Plus, I'll show you how to set up Great Expectations in a cloud environment, using Google Cloud Platform as an example. By the end of this talk, you'll have a solid understanding of how Great Expectations can improve the reliability and correctness of your datasets and transformations.


In this talk, I will cover the key concepts of Great Expectations, including how to connect to a data source, bundle different expectations into a suite, and test data against different expectations. You'll also learn how to formulate expectations in pure Python and translate them into configuration YAML files that can be stored in different storage environments. You'll also get to see the data documentation tool, so-called data-docs, which makes it easy to identify potential issues in case not all validations were successful.

Finally, I will give you an overview of the customization possibilities for expectations and how this setup can be used in a scheduled ETL workflow, as well as strategic hints for distinguishing between necessary expectations and optional conditions. Don't miss this opportunity to learn how Great Expectations can help you ensure the reliability and efficiency of your data validation.


Prior Knowledge Expected

No previous knowledge expected

Carsten works as a data science consultant for Datadrivers, a consulting company based in Hamburg.
After working in risk management and graduating in mathematics, he entered the field five years ago. He focuses on the development of end2end AI solutions for customers in various industries, preferably in the cloud.