PyData London 2023

Deploying Real-Time Machine Learning Models Using Serverless AWS
06-04, 16:30–17:10 (Europe/London), Salisbury

In this presentation, I will show how to use AWS Lambda and API Gateway to deploy real-time machine learning models developed in Python. I will use these tools to create a serverless web endpoint and serve model predictions with high availability/scalability. These tools provide a relatively simple and cost-effective solution for data scientists and machine learning engineers looking to deploy models without the hassle of managing servers and without needing to rely on third parties. I will cover potential pitfalls to be aware of, such as Lambda's cold start delays and memory limitations. Through code examples and practical tips, attendees will gain a solid understanding of how to use serverless AWS to deploy and serve their own machine learning models at scale.


This presentation aims to provide an overview of how to deploy machine learning models using serverless AWS. The target audiences are data scientists and machine learning engineers who want to deploy real-time models at scale without the hassle of managing servers.

One of the major challenges that machine learning practitioners often face is the process of deploying their models for use in production environments. This can be especially difficult for those without a background in software/backend engineering and without engineers to help. I will start by discussing model deployment in data science: Why is the last mile the most important? Why should data scientists do it themselves sometimes? What are batch and real-time models? What are examples of unavoidable real-time models?

To address the challenges, I will use AWS Lambda and API Gateway to quickly and easily serve machine learning predictions at scale. Lambda is a serverless computing platform that can automatically scale to meet the demand of incoming requests. I will show how Lambda can serve predictions from pickled models in Python (whether from scikit-learn, XGBoost or even Tensorflow). API Gateway wraps Lambda functions and provides web endpoints that can be accessed by external users or systems to actually make predictions using your model.

However, it’s important to be aware of some of the potential pitfalls and challenges when using these tools. For example, Lambda may experience delays when first starting and it also has memory limitations that can impact performance. During the presentation, I will discuss strategies for managing these potential issues and ensuring reliable performance of your deployed models. I will also compare the proposed serverless deployment architecture against more traditional deployment techniques such as using a Flask web server.

The presentation outline is the following:
* [5 min] Introduction: Batch vs real-time models, examples of real-time models, why should a data scientist do the last mile?
* [10 min] Setting up a Lambda function and configuring it to accept incoming requests and process them using your Python machine learning model pickle.
* [10 min] Using API Gateway to create an endpoint that can be accessed by internal or external users or systems to make predictions using your model.
* [10 min] Tips, best practices, and strategies around potential issues.
* [5 min] Q&A

By the end of this session, you should have a clear understanding of how to use Lambda and API Gateway to deploy your machine learning models with the help of open-source code examples that will be made available through Github.


Prior Knowledge Expected

Previous knowledge expected

Pedro Tabacof is based in Dublin and is currently a staff machine learning scientist at Intercom. Previously, he has worked at Wildlife Studios (mobile gaming), Nubank (fintech), iFood (food delivery app). He has used and deployed machine learning models for anti-fraud, credit risk, lifetime value and marketing attribution, using XGBoost or LightGBM in almost all cases. Academically, he has a master's degree in deep learning and 300+ citations.