PyData London 2023

Code Smells in Data Science: What can we do about them?
06-03, 11:00–11:40 (Europe/London), Salisbury

We all want to write cleaner code but usually don't know where to start. It also doesn't help that most guides available are written for software engineers and not data scientists.

Code smells are a taxonomy and a well-defined set of instructions on how to identify typical antipatterns in your code and change them in a few steps.

In this talk, I will select a short list of typical code smells that frequently appear in data-intensive workflows and walk you through how to resolve them.

Data Science code is prone to degrade because we usually write experimental code and iterate on it ad-hoc.

With a little effort and a couple of techniques, we can help ourselves to clean it up a bit occasionally, which will make future iterations much faster.

There are too many code smells to go through in a presentation, but I will pick some that are typical in modelling and analysis workflows.

Code Smell categories and a few typical examples:

  • Bloaters (Long Method, Long Parameter List, Data Clumps)
  • OOP Abusers (Switch Statements)
  • Change Preventers (Shotgun Surgery)
  • Dispensables (Duplicate Code, Speculative Generality)
  • Couplers (Feature Envy, Middle Man)

In this talk, I will show the audience how to identify each one, the steps to resolve them and demonstrate the solution with quick examples.

Prior Knowledge Expected

No previous knowledge expected

I run Hypergolic, a consultancy in London specialising in Machine Learning Product Management.

Formerly I was Head of Data Science at Arkera, a fintech startup in London, where I built market intelligence products with Natural Language Processing for Tier 1 investment banks and hedge funds.

Before that, I worked in mobile gaming for King Digital (makers of Candy Crush), specialising in player behaviour and monetisation.

I started my career as a quant researcher writing trading strategies at multiple investment managers.

This speaker also appears in: