DATA Pill #098 - Deploy LLM in your Private Kubernetes Cluster, The Real Cost of Self-Hosting MLflow

ARTICLES

Data Quality Error Detection powered by LLMs | 17 min | LLM | Simon Grah | Towards Data Science Blog

Read the first review of the introductory article on the Data Dirtiness Score, which explains the key assumptions and demonstrates how to calculate this score. It's the second in a series about cleaning data using Large Language Models (LLMs), with a focus on identifying errors in tabular data sets.

Unlocking Kafka's Potential: Tackling Tail Latency with eBPF | 7 min | Data Engineering | Maciej Mościcki, Piotr Rżysko | Allegro Tech Blog

This blog post describes Allegro’s team journey — how they used Kafka protocol sniffing and eBPF to identify and remove the performance bottleneck.

Evaluating Large Language Model (LLM) systems: Metrics, challenges, and best practices | 11 min | LLM | Jane Huang, Kirk Li, Daniel Yehdego | Data Science at Microsoft

This article thoroughly examines LLM system evaluation, distinguishing between model and system evaluation and scrutinizing online and offline strategies. It focuses on AI assessing AI and Responsible AI metrics. The article highlights the relevance of diverse evaluation tools and frameworks across application scenarios, urging readers to stay informed about evolving metrics and frameworks for a comprehensive understanding.

How we expose data in BigQuery | 8 min | Data Engineering | Roxanne Ricci | Black Market Blog

This transition highlights a user-centric approach, focusing on building a domain-oriented, self-service data platform through experimentation. BackMarket aims to improve user experience and operational efficiency by prioritizing seamless data organization and access policies.

The Real Cost of Self-Hosting MLflow | 5 min | ML | Aurimas Griciunas | neptune.ai blog

TL;DR

MLflow is a popular experiment-tracking and end-to-end ML platform
Since MLflow is open source, it’s free to download, and hosting an instance does not incur license fees
Hosting MLflow requires multiple infrastructure components and comes with maintenance responsibilities, the cost of which can be difficult to estimate

On AWS, which offers various options for hosting MLflow, a medium-sized instance comes in at about $200 per month, plus storage and data transfer costsL;

SKILL LAKE

Data Learning Week | Online | 8-11th April

Would you like to test one of our courses before investing money in it? Then come to our Data Learning Week, a series of 4 free hands-on workshops. Each session is a free first-trial lesson for the full training. We will also have a special bonus from the Academy for all workshop participants.

Choose your topic, check agenda and sign up:
Distributed Machine Learning
Deep Learning - Demystifying AI
Data Visualisation Magic
Production Ready Machine Learning

TUTORIALS

Deploy a custom Docker image on Azure ML using a blue-green deployment with Python | 13 min | ML | Timo Uelen | Xebia Blog

This tutorial dives into such a custom solution:

Deploy our ML model using a custom Docker image.
Use a blue-green deployment strategy to ensure there is no downtime when deploying our model.
Run smoke tests to see if our deployment is working as expected, before we replace our previous model.
Use the Azure ML Python SDK to configure and manage deployment to Azure ML.

DATA TUBE

In this tutorial, Marcin Zabłocki shows how to deploy LLM in your private Kubernetes cluster in 5 simple steps on the Mistral example.

Streams Forever: Kafka Summit London 2024 Keynote | 1 h 48 min | LLM | Jay Kreps | Confluent

Jay Kreps, Co-creator of Apache Kafka and CEO of Confluent, will present his vision of unifying the operational and analytical worlds with data streams and showcase exciting new product capabilities. During this keynote, the winner and finalists of the $1M Data Streaming Startup Challenge will showcase how their use of data streaming is disrupting their categories.

PODCAST

ML for Finance and Storytelling through Data | 1 h 7 min | ML | Daniel Bashir, Ben Wellington

On challenges for ML in quantitative trading and investing, and telling stories through data.

CONFS EVENTS AND MEETUPS

Big Data Technology Warsaw Summit | Warsaw and Online | 10th and 11th April

Join the independent conference with an agenda with presentations arranged into nine categories – find your most desired topics! There are, for example:

Data Engineering
Streaming and real-time analytics
ML & Data Science
Gen AI

And more! Learn from speakers from companies like Dropbox, IKEA, Cloudera, Allegro, Ververica, and Freenow.

Shhh… Use the DataPill200 code to get the 200 PLN discount!

________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on G itHub

➡ Dig previous editions of DataPill

2024-03-28 16:01