DATA Pill feed

DATA Pill #084 - MLOps BABY! MLOps -> MLFlops -> LLMOps?

MLOps Baby! MLOps is a huge topic this year, and we believe it will be in ‘24. It will probably evolve more to LLMOps, but still.

So save this pill for 24! Maybe you will need it.

ARTICLES

Building an End-to-End MLOps Pipeline with Open-Source Tools | 11 min | MLOps & Open Source | Grig Duta | Qwak Blog

This article is a focused guide on the transition from experimental machine learning to production-ready MLOps pipelines. It identifies the limitations of traditional ML setups and introduces you to essential open-source tools that can help you build a more robust, scalable and maintainable ML system. How is it different from the traditional setup?

MLOps and MLflops | 9 min | MLOps architecture | Andrew Blance | Better Programming Blog

This is an introduction to standard and modern methods of storing data, creating resources and deploying AI. Let’s make sure your next model deployment isn't an MLflop.

From concept to production in 2 months: sales forecasting Machine Learning model for dema.ai | 9 min | ML Case Study | Michał Madej | GetInData Blog

How to bring a sales forecasting prototype to production in less than 2 months.
The solution architecture
Nixtla + Kedro

Declarative Feature Engineering at PayPal | 4 min | Feature Engineering | Marina Lyan | The PayPal Technology Blog

How the declarative feature engineering approach helps our engineers to address scale, TTM and TCO requirements.

5 Levels of MLOps Maturity | 10 min | MLOps | Maciej Balawejder | Toward Data Science Blog

This blog post aims to synthesize and take the best from both MLOps frameworks: Google and Microsoft. Maciej analyzes five maturity levels and shows the progression from manual processes to advanced automated infrastructures. He also argues that some of the points presented by Microsoft and Google should not be followed blindly but rather be adjusted to your needs.

Building a large scale unsupervised model anomaly detection system — Part 1 | 8 min | ML Platform | Anindya Saha, Han Wang, Rajeev Prabhakar | Lyft Engineering Blog

Lyft’s ML Platform is a machine learning infrastructure built on top of Kubernetes that powers diverse applications such as dispatch, pricing, ETAs, fraud detection and support. This post focuses on how Lift utilizes the compute layer of LyftLearn to profile model features and predictions and perform anomaly detection at scale.

DATA LIBRARY

Build Feature Stores Faster. An Introduction to Vertex AI, Snowflake and dbt Cloud | 7 min | MLOps & Feature Store | Jakub Jurczak | GetInData Blog

Introduction to MLOps,
A step-by-step guide to designing and building a Feature Store,
Example of MLOps architecture and workflow,
How to integrate GCP with Snowflake using terraform,
Vertex.ai platform - how it works in practice.

CLICK THROUGH ARCHITECTURE SCHEME

We didn't know what category to put here, but since there is a lot of content that refers to solution architecture, we thought this would be a good resource - a diagram of the MLOps Platform architecture that you can click through to see the technological details.

Click Through MLOps Platform Architecture Map

2 Minute Video with Platform Walkthrough

TUTORIALS

Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch | 15 min | LLMOps | Sebastian Raschka | Lightning.AI Blog

10 techniques to reduce the memory consumption of PyTorch models. When applying these techniques to a vision transformer, they reduced the memory consumption 20x on a single GPU.

Writing modular MLOPs-ready Python code for easy explainability and interpretation | 7 min | MLOps | Samar Deen, Ceren Altincekic | Data Science at Microsoft Blog

Covers what is required to productionize Python scripts into fully fledged outputs ready for use in actual business cases. An overview of the Python main function and its importance in getting code to be production ready.

How to use LLMs for data enrichment in BigQuery? | 16 min | LLMOps | Piotr Pilis | GetInData Blog

This post details the integration of LLMs with Google's BigQuery for data enrichment. By leveraging Cloud Functions and BigQuery Remote Functions, you can easily interface BigQuery with LLM APIs. How can dbt help with data transformations? How should you address limitations and security concerns of LLMs?

LLMs with BigQuery offer an easy to deploy and cost-effective solution for enhancing data analysis capabilities.

NEWS: COURSE

Building Maintainable Data Pipelines | 12 min intro | MLOps | Kedro

Finally! A free MLOps course from Kedro! On the agenda:

How to get started with Kedro
How to run Kedro pipelines
Kedro project deployed on Apache Airflow
Kedro nodes
and more

As you can see in the intro video, our DATA Pill community contributes quite a bit to Kedro development. Have you noticed GetInData's mention or Marcin Zabłocki as a committer (active DATA Pill contributor who gets FRIENDS jokes)? Marcin we are proud!

BTW there is the possibility to schedule a free consultation on MLOps with Marcin.

Just saying ;)

DATA TUBE

The story of the MLOps platform that makes you productive, everywhere! | 26 min | MLOps Platform | Marcin Zabłocki | Big Data Warsaw

This is a recording of a presentation from the conference: Big Data Technology Warsaw ‘23.

The selection of managed and cloud-native machine learning services that you can run your data science pipelines and deploy your trained models on is versatile. But there is no single way of interacting with platforms like Amazon Sagemaker, Google Vertex AI, Microsoft AzureML and Kubeflow. In this presentation you will learn how battle-tested technologies such as Kedro, MLflow and Terraform will make your data scientists’ life easier and more productive - regardless of what cloud provider you use.

GitHub for Kedro plugins: https://github.com/orgs/getindata/repositories?q=kedro&type=all&language=&sort=

Building ML pipelines with Kedro and Vertex AI on GCP | 1 h 5 min | MLOps Workshop | Michał Bryś | GetInData

Two practical exercises to help you build the ML pipeline yourself in an hour (links to GitHub on YouTube)
Why do we need a pipeline for Machine Learning models?
Kedro, an open-source Python framework for creating reproducible, maintainable and modular data science code

PODCAST

MLOps in the Cloud at Swedbank - Enterprise Analytics Platform | 55 min | MLOps + Cloud | Adam Kawa & Varun Bhatnagar | Radio Data Podcast

An overview of the solution - What is an Enterprise Analytics Platform (EAP)?
Evolution of MLOps at Swedbank
Iterative development for ML models - How can one improve the iterative development process for ML models?
Key take-away points and the lessons learned from ML cloud transformation.

Best Practices for Building LLM-Backed Applications | 53 min | LLMOps | Ben Lorica & Waleed Kadous | The Data Exchange Podcast

Open Source LLMs: when and how to use them
Code Llama vs. GitHub Copilot
Deploying open source LLMs
Reimagining "AutoML" in the age of LLMs
AMD and other hardware options for LLM inference

2023-12-20 16:50