PyData London 2023

Building a data science solution for an NGO when you don’t know what infrastructure it will run on: a case study predicting tutor supply and demand mismatch
06-03, 10:15–10:55 (Europe/London), Minories

The Brilliant Club supports less advantaged students to access and succeed in the UK’s most competitive universities. They do this by mobilising the PhD community to support students in schools via their courses and tutoring programme. A challenge they face is being able to anticipate the tutor supply they need to meet the increasing demands of their programmes as they expand nationally. A team of six DataKind UK volunteers worked with The Brilliant Club to develop a way to forecast and visualise the mismatch between tutor supply and demand across the UK. This is a talk about how we collaboratively explored their data and built a valuable, new tool for them and, crucially, how we did so in a flexible, scalable way that provides them with immediate value but also will fit into their future use of digital and cloud-based tools. This talk is for people intrigued by deploying new, data-driven solutions in organisations that are only just maturing into the data space. No previous knowledge is required.

The Brilliant Club’s (TBC) 2021-26 strategy focuses on two goals: student access and student success. TBC wants more pupils from less advantaged backgrounds to access university; their aim is to work with 100,000 pupils on their flagship Scholars Programme over the next five years.

In order to achieve these aims, TBC needs to ensure their systems and processes are scalable, and crucially, that they have the right number of PhD researcher volunteers in the right areas to meet school demand. One particular challenge is estimating the number of tutors they need to recruit. This was previously done with Excel spreadsheets, but the problem was that it didn’t yield accurate predictions at the regional level, and they sometimes had tutor shortages. Moreover, as TBC now supports 25,000 students, it was difficult to visualise and monitor where there was a tutor supply/demand mismatch.

Six DataKind UK volunteers took on the challenge to deliver a better data-driven solution to this problem. This collaboration is an example of a DataKind DataCorps data project that unites a charity with a team of experienced data scientists to build and implement a bespoke data science solution or tool for the organisation. Over an 8-month period, the team worked to explore the data, and make a maintainable, functional set of models that could help in the prediction of The Brilliant Club’s tutor supply and demand using the latest available data and visualising it in an easy-to-use manner.

A large part of the challenge is the TBC currently has no cloud infrastructure so the solution delivered needed to be easy to use and maintain but also able to be, potentially, scaled and deployed to the cloud in the future. To this end, we built our solution around various open-source tools and technologies including: Python; Scikit-learn; Jupyter; Docker; PostgreSQL; dbt; Prefect. The visualisation aspect of the project was delivered via Tableau as it was a tool The Brilliant Club already had licensed and was familiar with; however, we could easily have replaced it with a variety of other dashboarding tools.

In this talk, we will discuss how we designed the solution to deliver it in an accessible way and demonstrate how we implemented a variety of technologies to produce a reproducible suite of tools and outputs to support The Brilliant Club’s mission to help more students access education in the UK.

Prior Knowledge Expected

No previous knowledge expected

I spent 10 years as an astrophysics researcher analysing high-energy data from space telescopes in the search for new objects in the universe and a better understanding of what we already knew to be out there. In 2015 I transitioned to data science joining a smart-cities startup called HAL24K. Over the next 8 years, I built data science solutions that enabled city governments and suppliers to derive actionable intelligence from their data to make cities more efficient, better informed and to make better use of resources. During that time I built and lead a team of 10 data scientists and helped the company spin out four new companies. In 2022, I joined ComplyAdvantage as a Senior Data Scientist working to combat financial crime and fraud.

I have supported DataKind UK since 2015 in their mission to bring pro-bono data science support to charities and NGOs in the third sector. And I have been an active member of the PyData community over the same time period.