PyData London 2023

Delta Lake 101: How many water metaphors does it take to describe data?
06-02, 15:30–17:00 (Europe/London), Warwick

Delta Lake is an open-source storage framework that enables the creation of a Lakehouse architecture using a variety of compute engines such as Spark, PrestoDB, Flink, Trino, and Hive from Python. Its high data reliability and optimized query performance make it an ideal solution for supporting big data use cases, including batch and streaming data ingestion, fast interactive queries, and machine learning.


In this tutorial, you will learn about the current requirements in modern data engineering and the challenges faced by data engineers in ensuring data reliability and performance. We will delve into how Delta Lake can help overcome these obstacles, through presentations, hands-on code examples and notebooks.

By the end of the tutorial, you will have a comprehensive understanding of how Delta Lake can be applied to your data architecture and the benefits it can bring. Additionally, you will gain insight into how the wider open-source community is utilizing Delta Lake as an open standard to develop the next generation of data engineering and data science tools in Python.


Prior Knowledge Expected

No previous knowledge expected

Holly Smith is a multi award winning Data & AI expert who has over a decade of experience working with Data & AI teams in a variety of capacities from individual contributors all the way up to leadership. She has spent the last four years at Databricks working with many multi national companies as they embark on their journey to the cutting edge of data. She has also worked with non profits through Datakind UK to advise on data strategy and bring data skills to social change organisations.

Eoin is a Senior Resident Solutions Architect at Databricks. He has worked on data platforms in a variety of industries, including Retail, Financial and Manufacturing .