CAIML #23

CAIML #23 happened on January 25, 2023, at taod with food sponsored by how.fm. We had two talks on machine learning applications and machine learning operations, and time for networking with food and drinks:

18h30 Open Doors 19h00 Welcome & Intro 19h15 Christopher König (Data Scientist at taod Consulting GmbH): What the Fuzz? Record Linkage using SPLINK

“We need more data” & “Garbage in - garbage out” are no empty phrases. Merging data from heterogeneous sources, while ensuring data quality are paramount for successful AI / ML applications. Drafting from a real-world example of duplicated customer data, this talk shares hands-on experience on Probabilistic Record Linkage. After introducing the statistical theory of the Fellegi-Sunter (1969) model, we walk you through the most important modeling decisions data scientists face in this context. On the way, you will learn how to implement Record Linkage Models using SPLINK on a PySpark back-end.

- 5 Minute Break -

19h45 Dr. Richard Sieg (Data Scientist at how.fm): Getting Your Model to Production with AWS Sagemaker

When talking about Machine Learning, we mostly focus on the design & training of models for specific use cases. However, the subsequent step of serving a model in our products is also very crucial and intricate. In this talk, I will show you how to host a model as a scalable & maintainable API using AWS and its ML platform Sagemaker. You can use this API as a template that can be easily adapted to your use case. We will take a closer look at the different layers of the API, its deployment, and how to configure the API for different scenarios & stages. This talk will be quite hands-on and you will see a lot of code - which you do not need to copy since I will provide you with a GitHub repository in the end.

20h15 Networking with food and drinks sponsored by how.fm and taod

Updated: