The tutorial aims to introduce the audience to the power of Hyperparameter Optimization. It will help them learn; how using simple python libraries one can make a huge difference in their ML model behavior.
We start with understanding the importance of hyperparameters, and the different distributions they are selected from. We then review some basic methods of optimizing hyperparameters, moving on to distributed methods and then to bayesian optimization methods. We'll use these algorithms hands-on, and play around with search spaces. We'll try out packages like Hyperopt, Dask, Optuna, to tune hyperparameters.
This tutorial will help beginner-level ML practitioners and working professionals use these methods in their applied ML tasks. They will be able to enhance the model quality and tune hyperparameters for bulky experiments more effectively.
Prior Knowledge Expected - Basic Python, a very basic understanding of Machine learning.\
Good to have - worked with libraries like scikit-learn(just knowing
model.fit() should be enough)
We will build an operational ML system to predict air quality in London. Instead of a single monolithic ML pipeline, we will build a more manageable system as 3 FTI pipelines: a Feature pipeline, a Training pipeline, and an Inference pipeline, connected together by a feature store. The feature pipeline scrapes new data and provides historical data (air quality observations and weather forecasts), The training pipeline produces a model using the air quality observations and features. The inference pipeline takes weather forecasts and predicts air quality for London, visualized in a UI. The system will be hosted on free serverless services - Modal, Hugging Face Spaces, and Hopsworks. It will be a continually improving ML system that keeps collecting more data, making better predictions, and provides a hindcast with insights into its historical performance.
sktime is a widely used scikit-learn compatible library for learning with time series. sktime is easily extensible by anyone, and interoperable with the pydata/numfocus stack.
This tutorial explains how to use sktime for three learning tasks with independent instances of time series: time series classification, regression, clustering. It also explains their close connection to time series distances, kernels, and time series alignment, and how to flexibly combine such estimators to classifiers, regressors, clusterers with custom distances/kernels or feature extraction steps.
This is a continuation of the sktime introductory tutorial at pydata global 2021.
Polars is a next generation data-frame library which aims to be fast, efficient, composable and lazy! This introductory tutorial will take you through the basics of getting started with polars in Python. We will demonstrate the out the box multi-core efficiencies, by composing advanced filters and joins, before comparing with the traditional pandas workflows. As a finale we will look at some lazy processing when applying polars to large scale data-sets.
This 90-minute tutorial provides an introduction to using TensorFlow for building random forest models. The tutorial will begin with an overview of the random forest algorithm and its advantages in the context of machine learning. Next, participants will learn how to implement a random forest model using TensorFlow's high-level API, Keras. The tutorial will cover important concepts such as model architecture, hyperparameter tuning, and training and evaluation techniques. Additionally, participants will learn how to use TensorFlow's TensorBoard to visualize and monitor their models during training. The tutorial will conclude with a discussion of best practices and tips for improving the performance of random forest models. By the end of the tutorial, participants will have gained a solid understanding of how to use TensorFlow to build powerful and accurate random forest models.
Active Learning is a powerful technique in the field of data science that enables efficient use of labelling resources. In this 90-minute-long hands-on tutorial, we will provide a step-by-step guide on how to apply basic Active Learning techniques for a document classification problem.
The tutorial will begin with an introduction to Active Learning, followed by a brief discussion of its cost and time savings benefits. Next, we will implement clustering to select the first batch of training data. Then, we will train a document classification model and analyse fundamental Active Learning concepts such as diversity, isolation, and model uncertainty. We will compare different metrics to select the best points for annotation.
Finally, we will evaluate the model's performance and compare the results of Active Learning with random annotation. Throughout the tutorial, attendees will have the opportunity to work on their implementation and receive assistance.
By the end of this tutorial, attendees will better understand the principles of Active Learning and how to apply them to their own supervised learning problems, enabling them to make more efficient use of their labelling resources.
If part of your job is to constantly poke your fellow data scientist to isolate projects environments, updating requirements, cleaning code, writing consistent docstrings, etc., then you should definitely join us for this very hands-on tutorial with reproducibility, compliance, and consistency in mind
Join us for a 90-minute tutorial on how to build an end-to-end open-source modern data platform for biomedical data using Python-based tools. In this tutorial, we will explore the technologies related to data warehousing, data integration, data transformation, data orchestration, and data visualization. We will use open-source tools such as DBT, Apache Airflow, Openmetadata, and Querybook to build the platform. All materials will be available on GitHub for attendees to access.
Over the last decade, the commercial use of recommendation engines/systems by business has grown substantially, enabling the flexible and accurate recommendation of items/services to users. Examples of popular recommenders include (to name a few) movies, videos and books recommendation engines offered by Netflix, Youtube and Amazon respectively.
In general, most recommender systems are typically “black-box” algorithms trained to provide inference of relevant items to users using techniques such as collaborative or content-based filtering models or hybrid models. The algorithms used in these systems are broadly opaque, thus making the predicted recommendations lack full interpretability/explainability. Making recommenders explainable is very essential, as they try to provide transparency and address the question of why were particular items recommended by the engine to users/system designers.
Over the last few years there has been a growing area of research and development in explainable recommendation systems. Explainable recommendations systems are generally classified as Post-hoc (i.e. explainability is done post-recommendation) or Intrinsic (explainability is integrated into the recommender model) approaches. This workshop will provide a hands-on implementation of some of these approaches.
Delta Lake is an open-source storage framework that enables the creation of a Lakehouse architecture using a variety of compute engines such as Spark, PrestoDB, Flink, Trino, and Hive from Python. Its high data reliability and optimized query performance make it an ideal solution for supporting big data use cases, including batch and streaming data ingestion, fast interactive queries, and machine learning.
In this tutorial, we will learn the basis of MLflow. After introducing the library and the problem it solved we will implement an end-to-end machine learning lifecycle using MLflow.
Object detection is arguably the most common Computer Vision task. It is applied to images and videos across various domains. However, action recognition is a tad different from object detection because it can be difficult to tell certain actions from a single image. It is hard to tell if a door is being opened or closed or tell what martial art technique is being executed from an image.
In this tutorial, the MMAction2 framework will be used to train an action recognition model to detect what Judo throws are being performed in videos. While it will be fun seeing Machine Learning techniques applied to Martial Arts, the knowledge and techniques applied can easily be generalized to other action recognition tasks where simple object detection does not suffice.
Keynote with Ines Montani
The Brilliant Club supports less advantaged students to access and succeed in the UK’s most competitive universities. They do this by mobilising the PhD community to support students in schools via their courses and tutoring programme. A challenge they face is being able to anticipate the tutor supply they need to meet the increasing demands of their programmes as they expand nationally. A team of six DataKind UK volunteers worked with The Brilliant Club to develop a way to forecast and visualise the mismatch between tutor supply and demand across the UK. This is a talk about how we collaboratively explored their data and built a valuable, new tool for them and, crucially, how we did so in a flexible, scalable way that provides them with immediate value but also will fit into their future use of digital and cloud-based tools. This talk is for people intrigued by deploying new, data-driven solutions in organisations that are only just maturing into the data space. No previous knowledge is required.
Large scale call centres are the frontline of customer experience across many industries. Optimizing their operations is crucial for achieving better customer service. We model agent customer pairing as a “talent” allocation problem. In this talk, we discuss how we used uplift modelling to provide real time agent-customer pairings that drive a positive lift in overall interaction score (which can come from any arbitrary scoring function). We discuss the challenges of developing and deploying such models to make real-time interventions in call centres. Similar approaches can be used to drive uplift of any important business KPI with respect to an allocation decision.
Have you ever struggled with choosing the right tools for your Machine Learning projects? As a Lead Data Scientist in a consulting firm, I faced this challenge repeatedly and finally converged to a small set of technologies which allow to build reliable and scalable projects with a great DX (Developer Experience). In this talk, I will share the key components of my ML stack, including DVC, Streamlit, FastAPI, Terraform and other powerful tools to streamline the development and experimentation processes. Through a live demo, I will finally show you the Project Generator I’ve built to encourage adoption of these technologies and to help Data Scientists focus on the ML itself rather than the "plumbing" around it. Attendees should have a basic understanding of Python and Machine Learning concepts.
We all want to write cleaner code but usually don't know where to start. It also doesn't help that most guides available are written for software engineers and not data scientists.
Code smells are a taxonomy and a well-defined set of instructions on how to identify typical antipatterns in your code and change them in a few steps.
In this talk, I will select a short list of typical code smells that frequently appear in data-intensive workflows and walk you through how to resolve them.
Executives at PyData is a facilitated discussion session for executives and leaders to discuss challenges around designing and delivering successful data projects, organizational communication, product management and design, hiring, and team growth.
We'll announce the agenda at the start of the session, you can ask questions or raise issues to get feedback from other leaders in the room, NumFOCUS board members and Ian and James.
Organized by Ian Ozsvald (London) and James Powell (New York)
What’s the optimal way to upgrade a broadband network to fibre? In this session we’ll talk about how we used actor-based simulations and discrete optimisation to build a planning tool that has not only optimised one of the biggest fibre upgrade operations in the UK, but also unlocked powerful scenario testing capabilities. We’ll go through how to architect scalable, agent-based simulations using only open-source libraries and Python, and take you on our journey (including pointing out pitfalls) towards optimising UK wide fibre broadband rollout.
In this talk, I will take you on a rollercoaster tour of how data science is delivering for the public good at the Office for National Statistics’ Data Science Campus and beyond. Drawing on examples from the dozens of data scientists working at the Campus, you’ll find out how Python is improving the public sector already in a myriad of ways, from creating or improving national statistics, to forecasting the economy, to dealing with Covid-19, to evaluating efforts to tackle the gender pay gap. We’ll even see how a tweet by a food campaigner led to a huge effort to web-scrape budget brand offerings in UK supermarkets—analysis that made it onto every major UK news programme! And we’ll look ahead to the challenges, and potential, of Python for the public sector in the future.
Were you aware that the cloud infrastructure powering modern computing has a larger greenhouse gas footprint than commercial aviation? This talk is aimed at developers and data scientists who are concerned about the impact of their work on the environment and want to explore practical solutions to address this challenge. We will explain how greenhouse gas emissions are categorized and estimated for computing. We will also introduce approaches to developing more sustainable software and provide practical examples using Python data analytics.
I will present the challenges we encountered while migrating an ML model from batch to real-time predictions and how we handled them. In particular, I will focus on the design decisions and open-source tools we built to test the code, data and models as part of the CI/CD pipeline and enable us to ship fast with confidence.
An exchange of views on fastAPI in practice.
FastAPI has become an integral part of the PyData ecosytem. FastAPI is great, it helps many engineers create REST APIs based on the OpenAPI standard and run them asynchronously. It has a thriving community and educational documentation.
FastAPI does a great job of getting people started with APIs quickly.
This talk will point out some obstacles and dark spots that I wish we had known about before. In this talk we want to highlight solutions based on experience building a data hub in asset management.
This talk focuses on the benefits of using an event-driven approach for machine learning products. We will cover the basics of event-driven architecture for software development and provide examples of how it can be applied for machine learning use cases. The talk will be accompanied by live examples and code for you to follow along, using open source tools such as Apache Kafka, FastAPI and River. By the end of the talk, you'll have a good understanding of the advantages of event-driven architectures, such as improved scalability and responsiveness. If you are a machine learning practitioner interested in exploring this topic, this talk is a great starting point in which we will cover the concepts, tools and common pitfalls of the event driven framework for machine learning products.
Pydantic is a data validation library for Python that has seen massive adoption over the last few years - it's used by major datascience and ML libraries like Spacy, Huggingface and jinja-ai - overall Pydantic is downloaded over 50m times a month!
In this talk Samuel Colvin, the creator of Pydantic will cover two subjects which have seen massive interest in recent years:
- How Pydantic can be used to prepare data for processing thereby saving time and avoiding errors
- The emergence of Rust as the go-to language for high performance python libraries - how this might go in the future, and the benefits and drawbacks of the trend
At Man Group we ingest data of all shapes and sizes, from market prices to weather, ESG reporting to news media. Storing that data in a format that is useful both to researchers and to strategies trading more than a billion dollars is a unique technical challenge - on that has resulted in two iterations of our high-performance data storage product ArcticDB. The aim is nothing less than to bring the performance and analytical capabilities of a server infrastructure right into the Python client. With the client now available on Conda and PyPI. and the source published on GitHub, I want to take you on a tour of our database; the design rationale, the ups and downs of developing a new database while supporting a billion-dollar trading estate, and the lessons we've learned along the way.
This discussion session is for educators to talk about how we teach data science in industry and academia. It'll be a guided discussion, we'll vote on the top topics to discuss at the start and then we'll work our way through problems, solutions and new ideas. Maybe we'll get to talk about ChatGPT, or using Jupyter, or when to "teach in an IDE", or how to balance lecture vs problem solving vs homework - all topics can be up for voting at the start.
Music streaming services like Spotify and youtube are famous for their recommendation systems and each service takes a unique approach to recommending and personalize content. While most users are happy with the recommendations provided, there are a section of users who are curious how and why a certain track is recommended. Complex recommendation systems take various factors like track metadata, user metadata, and play counts along with the track content itself.
Inspired by Andrej Karpathy to build an own GPT, we have to use Language Models to build our own music recommendation system.
Join me as we dive into the world of Great Expectations, an open-source tool that helps data driven teams ensure their pipelines deliver high-quality data. You will be introduced to the key concepts of Great Expectations, including data validation, documentation and lineage. Plus, I'll show you how to set up Great Expectations in a cloud environment, using Google Cloud Platform as an example. By the end of this talk, you'll have a solid understanding of how Great Expectations can improve the reliability and correctness of your datasets and transformations.
Pandas 2 brings new Arrow data types, faster calculations and better scalability. Dask scales Pandas across cores. Polars is a new competitor to Pandas designed around Arrow with native multicore support. Which should you choose for modern research workflows? We'll solve a "just about fits in ram" data task using the 3 solutions, talking about the pros and cons so you can make the best choice for your research workflow. You'll leave with a clear idea of whether Pandas 2, Dask or Polars is the tool for your team to invest in.
AutoEncoders (AEs) are among the most popular techniques in modern machine learning. Thanks to their strong representation learning capability, they can be used not only to generate data, but also for many other tasks, e.g. clustering, dimensionality reduction and transfer learning.
Despite their popularity, their application is usually advertised mostly for applications with static tabular (e.g. for recommender systems) and image data (e.g. for computer vision tasks). With this talk we will try to shed some light on a less well-known area of application, namely the use of AEs with time series data. After a brief introduction on AEs we will highlight challenges to their application in the time-series domain with a particular focus on clustering, features extraction and transfer learning.
The talk is for everyone with an interest in deep learning, time series and their intersection. Despite some working knowledge of applied machine learning (deep learning in particular) and time series analysis would be beneficial, the talk will be delivered in a format accessible to all data science practitioners.
This discussion session is for anyone using Python for higher performance work. You probably use Pandas, NumPy, Polars, Dask, Vaex, Modin, cuDF or any of the related tools, you've got questions, you want to know what other people are using, what's pragmatic and where new opportunity might exist.
This will be a guided discussion, we'll vote on topics at the start of the session and then host Ian will work through the list.
Join us for an insightful session on active learning and its applications in machine learning. You will learn how leading teams are embedding active learning into their ML pipelines and how to build your first active learning loop. This session is for ML engineers and data scientists (aspiring or practitioners) who want to stay updated on the latest techniques and learn how to implement active learning with open-source tools.
This talk will introduce the PyData community to dbt and demonstrate how to leverage Python to unlock its full potential. Attendees will learn best practices for working with dbt, how to integrate it with other tools in their data stack, and how to use Python packages like fal to perform complex data analysis. With real-world examples and use cases, this talk will equip attendees with the tools to build a modern, scalable, and maintainable data infrastructure.
Keynote with Lisa Carpenter and Antonio Campello
There is no publicly available data on the skills that are commonly required in UK online job adverts, despite this information being useful for a range of use cases. To address this, we have built an open source skills extraction python library using spaCy and huggingface. Our approach is twofold: we train a named entity recognition model to extract skill entities from job adverts then map them onto any standardised skills taxonomy. By applying this algorithm to a dataset of scraped online job adverts, we are then able to find skill similarities amongst occupations, and regional differences in skill requirements.
Have you ever wanted a standard and efficient process to approach new datasets? Do you want a systematic way of highlighting complex nonlinear or low frequency patters in your data?
In this talk, I will share the open-source stack that I use to get efficiently extract interesting insights from any dataset. I will teach you how to use data visualisation, gradient boosted decision trees and XAI tools quickly find hidden patterns, to de-risk you projects early or debug your models.
We're in a new era of dataframe development. Libraries like Arrow, Polars, DuckDB, Vaex, Modin, and others stretch the bounds of performance on what we think can be done with tabular data in Python. These systems have great benchmarking results and generate significant buzz on social media.
Pandas, the community favorite, is also innovating, although with less buzz. Structural improvements like Arrow data types, copy on write, and more bring the world's most popular dataframe library (55% of Python users) into significantly better performance and memory use. Additionally Dask, a parallel computing library developed closely with Pandas, has also added new features in the last year, like memory-stable shuffling, task queueing, and with recent experiments in query optimization which we'll discuss as well.
In this talk we'll highlight some of these new features and show the impact they make on speed and cost on real-world workloads, as well as a vision for future development.
Data is everywhere. It is through analysis and visualization that we are able to turn data into information that can be used to drive better decision making. Out-of-the-box tools will allow you to create a chart, but if you want people to take action, your numbers need to tell a compelling story. Learn how elements of storytelling can be applied to data visualization.
Today most conventional ML systems look to exploit correlations in data in order to draw inferences. However as we learned back in school Statistics class, correlation is not causation. So when you need to know the ‘why’ behind a particular prediction, or why A outperforms B in an experiment, then relying on correlations is insufficient. Furthermore some ML models are build purely for explainability and insight purposes rather than predictions, in order to understand how the world works so we could potentially make some kind of policy change, e.g. What if we had chosen a different strategy or tactic – would the outcome have been different, and if so, by how much? To answer these kinds of questions, you need to delve into the world of causality.
This talk is a gentle (and occasionally entertaining) introduction to the interdisciplinary field of causality and how it is starting to impact machine learning. You will learn what kinds of questions causal inference can answer, and how it can address some of the limitations of current explainable ML methods, under certain conditions. I draw upon use-cases drawn from financial services and marketing, and I will show a short practical example of how combining human domain knowledge (intuitively via Graphical Causal Models) along with your data can sometimes unlock insights not recoverable by purely data driven approaches.
The European Parliament proposed a Cyber Resilience Act - basically wants all software to have an “EC” stamp on it. There is a non-commercial craft out but it is still not enough to make sure open-source projects with limited resources are exempted from the Act. How will it affect the OSS ecosystem?
Extracting data from web pages is a problem which is not so well covered and researched compared to image or text classification, object detection or named entity recognition. But this problem is extremely exciting to look into and rewarding to work on, because web pages can be represented in so many ways: as a screenshot of a page, as it's text, as an HTML tree, as a sequence of elements with discrete and continuous properties, and in other ways. This leads to many diverse approaches, which often combine different input types and ways of data representation inside one model. In this talk we will explore several intriguing approaches for web data extraction, and see how one can come up with novel approaches and grow your model according to the task at hand.
This talk is intended for anyone with interest in neural networks. I hope it gives you inspiration and intuition for building deep learning models tailored to the structure of your data. This is applicable not only to web information extraction, but also to document extraction and other domains with structured text or image inputs.
ChatGPT has reignited worldwide interest in text data, capturing the imaginations of thousands of developers, but how do we actually build large scale production pipelines for working with and processing this highly unstructured data?
SQL is a great language for simple data modalities that fits in a database table, but when it comes to complex "unstructured" data, it is Python that really shines. In this talk, we show how easy it is to go from data storage to querying and processing large amounts of unstructured data using modern Python open-sourced tooling such as Ray, Daft and HuggingFace models.
This session will facilitate a discussion exploring the differences in data work between different industries, including eCommerce, Insurance, Cyber Security, and Finance.
We will discuss the challenges and opportunities of data work in each industry, as well as the skills and knowledge that data professionals may need, in order to be successful. We will cover Data Engineering and Data Science specifically, but this is an open forum for anyone to discuss data challenges in different industries.
This talk presents Taipy, a new low-code Python package that allows you to create complete Data Science applications, including graphical visualization and managing algorithms, pipelines, and scenarios.
We will follow master detective Robot Holmes on his way to solve one of his hardest cases so far - a series of mysterious murders in the city of MLington. The traces lead him to the Vision-Language part of town, which has been a quiet and tranquil place with few incidents until lately. For a few months the neighbourhood has been growing extensively and careless benchmark leaders are dropping dead at an alarming rate.
Robot Holmes sets out to find the cause for this new development and will gather intel on some of the most notorious of the new citizens of the Vision-Language neighbourhood and find out what makes them tick.
We welcome all PyData Organizers to join us for an open discussion during lunch.
PyScript brings the full PyData stack in the browser, opening up to unprecedented use cases for interactive data-intensive applications. In this scenario, the web browser becomes a ubiquitous computing platform, operating within a (nearly) zero-installation & server-less environment.
The ChatGPT and the GPT models by OpenAI have brought about a revolution in the way we think about the world and not only how we write texts, but how we can process information about the world. Let's discusses the capabilities, and limitations of large language models including ChatGPT, about possible applications, tooling, data security, wider societal implications, and ethics. Some applications have gone as far as automating data analysis so this also poses a question about the future of data science.
When handling a large amount of data, memory profiling the data science workflow becomes more important. It gives you insight into which process consumes lots of memory. In this talk, we will introduce Mamray, a Python memory profiling tool and its new Jupyter plugin.
This will be a gentle introduction to the world of clinical gait analysis and how your gait (a.k.a the way you walk) is a digital biomarker for predicting physical and cognitive health. We will talk about digital biomarker engineering from unconventional sources of data (footstep sounds for example). To demonstrate a real life application, I will briefly mention how the R&D team at MiiCare uses acoustic machine learning for fall risk assessment for older adults living alone at home and care homes across the UK.
You should join this talk if you are interested in digital health, digital biomarker engineering and applications of AI in social care.
This talk is about grouped weighted summary statistics in pandas… and some other things.
Reinforcement learning (RL) has become the go-to framework when working with decision processes. Originally demonstrating superhuman performance in videogames, applications of reinforcement learning providing state-of-the-art results now extend to a myriad of areas: from drug discovery to autonomous driving and computer vision, just to name a few.
In this talk, we will concentrate on the application of RL to pricing environments. In particular, we will consider how Ben, our friendly neighbourhood gelato merchant, might approach the dynamic problem of pricing his products throughout the year with RL. We will introduce the problem as a Markov decision process and review the most common archetypes of RL algorithms to solve it while highlighting various pitfalls and challenges, always with a focus on its implementation to pricing.
By the end of the talk, we will be able to help Ben set up a pricing model for his delicious gelato!
This discussion session focuses on exploring the application of software engineering practices in the field of data science. Join us to delve into essential aspects such as python packages, IDEs, testing, refactoring, and architecture that play a crucial role in building robust and scalable data science solutions. We will discuss how adopting software engineering principles can enhance the reliability, maintainability, and efficiency of data science projects. Whether you're a DS manager or practitioner, this session offers a platform to exchange insights, share experiences, and discover innovative approaches to integrating software engineering practices into the data science workflow.
One of the biggest barriers to machine learning and data analytics is the difficulty to access high quality data. Synthetic data has been widely recognized as a promising remedy to this problem. It allows sharing, augmenting and de-biasing data for building performant and socially responsible ML systems. In this talk, I will overview the significant progress in the theory and methodology of synthetic data over the past five years. I will also introduce the open-source library, Synthcity, which implements an array of cutting-edge synthetic data generators to address data scarcity, privacy, and bias. The participants will walk away with a deeper understanding of the theory and practice of synthetic data, an understanding of when which methods apply (or do not apply) to their specific use case, and be ready to apply them in hackathons, competitions, and their day-to-day work.
This talk educates the audience on how to create end-to-end data products using the Python data ecosystem, from data integration to reporting, dashboards, apps, and surfacing insights.
After analyzing the features found in popular proprietary analytics products across various verticals, this talk will demonstrate how data teams can use open-source libraries to create and deploy applications which are accessible to non-technical end users but hold distinct advantages over proprietary alternatives.
In this presentation, I will show how to use AWS Lambda and API Gateway to deploy real-time machine learning models developed in Python. I will use these tools to create a serverless web endpoint and serve model predictions with high availability/scalability. These tools provide a relatively simple and cost-effective solution for data scientists and machine learning engineers looking to deploy models without the hassle of managing servers and without needing to rely on third parties. I will cover potential pitfalls to be aware of, such as Lambda's cold start delays and memory limitations. Through code examples and practical tips, attendees will gain a solid understanding of how to use serverless AWS to deploy and serve their own machine learning models at scale.
This talk is about how advances in Large Language Models (LLMs) are helping make inroads into the 11 types of comedy. For many years most, but not all, types of humour were beyond the reach of automated systems. This talk is for those interested in comedy, how it is created, the state of the art in LLMs, and comedy datasets. This talk will include specific code examples as well as trying to be humorous in its own way. At the end the audience will have learnt how LLMs are changing the human/computer comedy landscape.