PyData London 2023

Serverless Python Analytics at Petabyte scale using ArcticDB
06-03, 15:00–15:40 (Europe/London), Warwick

At Man Group we ingest data of all shapes and sizes, from market prices to weather, ESG reporting to news media. Storing that data in a format that is useful both to researchers and to strategies trading more than a billion dollars is a unique technical challenge - on that has resulted in two iterations of our high-performance data storage product ArcticDB. The aim is nothing less than to bring the performance and analytical capabilities of a server infrastructure right into the Python client. With the client now available on Conda and PyPI. and the source published on GitHub, I want to take you on a tour of our database; the design rationale, the ups and downs of developing a new database while supporting a billion-dollar trading estate, and the lessons we've learned along the way.

ArcticDB is a client-only database that talks directly to storage, whether that's local disk, S3 or potentially any other key-value store. This makes it uniquely horizontally scalable, up to the limits of your network and storage bandwidth. It also means that there are no single points of failure - no server to go down in the middle of the night causing a flurry of pings that get people out of bed. It's also an immutable database - you can efficiently store every version of a table that has ever existed. Made a bad update? No problem, you can just roll back to the last known good version. In an ongoing research project, you can take a snapshot of a whole database at a particular point in time, then carry on appending, updating, deleting safe in the knowledge that you can always revert back to things as they were before your changes.

With the philosophy of Pandas-in, Pandas-out, there are no custom data structures to learn, you can read, write, filter, aggregate etc all on familiar Python objects. I will take you through some challenging examples we have encountered and touch on our philosophy of software design, our approach performance analytics and how to build a great product with a small focussed team.

Prior Knowledge Expected

No previous knowledge expected

William Dealtry has been working in both Python and C++ for many years, and has been a member of the C++ standardization committee for more than a decade. Currently he is the Architect of a new open-source Dataframe database, ArcticDB, which is backed by long-time Python enthusiasts Man Group and Bloomberg.