DATA Pill feed

DATA Pill #096 - Full Steam Ahead: Riding the MLOps Wave


Enabling near real-time data analytics on the data lake | 8 min | Data Science | Shi Kai Ng, Shuguang Xiang | Grab Tech Blog
Data lakes facilitate efficient data processing and analytics by serving as a bridge between analytics and production. However, the challenge of managing frequent data updates has led to embracing the Hudi format for its rapid write capabilities and ACID compliance.
10 Best Open-Source Monitoring Tools for DevOps in 2024 | Eduardo Messuti | 9 min | DevOps | StatusPal Blog
It covers the following open-source monitoring & observability tools that modern DevOps teams should be aware of in 2024:

  • Checkmk
  • HyperDX

and more.
Data lakehouse with Snowflake Iceberg tables - introduction | 9 min | Data Engineering | Michał Rudko | GetInData | Part of Xebia Blog
Snowflake introduces Data Lakehouses, blending data warehouses and lakes' benefits with the Iceberg format to address limitations and enhance cost-efficiency, flexibility, and security. This series will explore Snowflake Iceberg Tables and their advantages and provide a blueprint for adoption.
Transforming data science with Vertex AI: Telepass journey into MLOps | 7 min | MLOps | Gioia Sarti | Google Cloud Community Blog
This article explores Telepass' adoption of MLOps, detailing initial hurdles, architectural outcomes, and benefits. It highlights key lessons and future directions, offering insights into Telepass' transition to a modern ML platform with Google Cloud and Go Reply.
Demystifying MLOps: From Notebook to ML Application | 12 min | MLOps | Yke Rusticus | Xebia Blog
This post demystifies MLOps and takes you through the process of going from a notebook to your very own industry-grade ML application. The first part will be about the what and why of MLOps and the second part about technical aspects of MLOps.


Apache Kafka: Architecture, Real-Time CDC, and Python Integration | 20 min | Data Streaming | Ahmed Sayed | Personal Blog
Apache Kafka's architecture centers on fundamental components such as producers, consumers, brokers, topics, partitions, and ZooKeeper for coordination. These elements are crucial for effective data streaming, enabling Kafka's high performance in real-time messaging and data processing scenarios. This tutorial delves into the architecture of Kafka, its key components, and how to interact with Kafka using Python.
MLflow on AWS with Pulumi: A Step-by-Step Guide | 14 min | MLOps | Bojan Jakimovski | MLOps Community Blog
This tutorial covers deploying an MLflow tracking server on AWS with Pulumi, using Python to automate AWS resource setup, minimizing manual errors, and ensuring a scalable, reproducible ML environment.


Coding LLaMA-2 from scratch in PyTorch - Part 1, Part 2 | 2 h | LLM | Prince Canuma
In this video series, you will learn how to train and fine-tune the Llama 2 model from scratch. The goal is to code LLaMA 2 from scratch in PyTorch to create models with sizes 100M, 250M, and 500M params.

In the first video, you'll learn about transformer architecture in detail and implement a basic model with 100M params using PyTorch.

In the second video, you'll learn in detail about different attention mechanisms (MHA, MQA, and GQA) and how to implement them in the 100M model we built last time.


The Power of Vector Databases and Semantic Search | 37 min | LLM | Elan Dekel, Richie Cotton | DataCamp Podcast
Explore LLMs, vector databases and the best use-cases for them, semantic search, the tech stack for AI applications, emerging roles within the AI space, the future of vector databases and AI, and much more.


Gain a better understanding of how to leverage LLM technology responsibly and achieve operational excellence in your organization.

What to expect:

  • A look at how LLMs and GenAI are reshaping organizational landscapes with real-world case studies showcasing customers' success stories.
  • Practical steps to prepare your infrastructure, workforce, and processes for LLM adoption.
  • Expert advice on data privacy, security, and staying ahead of regulatory shifts.
  • Overview of ethical LLM guidelines, focusing on mitigating biases and ensuring transparency.
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on