DATA Pill #147 - Are you ready for MLOps? 🫵 DuckDB goes distributed?

ARTICLES

DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data | 5 min | Data Engineering | Mehdi Ouazza | Personal Blog

DeepSeek’s smallpond extends DuckDB to distributed computing using Ray and a custom storage system, balancing scalability with added complexity.

How much does it cost to run DeepSeek-R1 locally? | 3 min | AI | Mehul Gupta | Data Science in your pocket

Running DeepSeek-R1 (671B params) locally? It’ll set you back ~$106K in hardware alone—GPUs, RAM, storage, and cooling make it an enterprise-scale investment.

Estimating Incremental Lift in Customer Value (Delta CV) using Synthetic Control | 7 min | Data Engineering | Mahshid Moha | The PayPal Technology Blog

PayPal uses ML-driven causal inference to measure how new product adoption impacts revenue and engagement, refining decision-making with user-matching techniques.

TUTORIALS

Step-by-Step Guide to Boosting Enterprise RAG Accuracy | 8 min | RAG | Madhukar Kumar | Software, AI and Marketing

Improve retrieval from PDFs using semantic chunking, entity extraction, and knowledge graphs—enhancing RAG/KAG performance.

FacetController: How we made infrastructure changes at Lyft simple | 7 min | DevOps | Miguel Molina, Arvind Subramanian | Lyft Engineering Blog

Lyft’s Kubernetes-based FacetController automates deployments, scales infra efficiently, and eliminates mass redeployments.

The caching strategy of our Teads SSP | 13 min | Data Engineering | Tristan Sallé | Teads Engineering Blog

Teads scales its SSP to handle massive traffic using Redis caching, Kubernetes, and automated rollbacks for reliability.

Are you ready for MLOps? 🫵 | 6 min | MLOps | Jeroen Overschie | Xebia Blog

MLOps without a DevOps foundation is a recipe for failure—this blog breaks down key adoption steps and best practices.

A practical n8n workflow example from A to Z — Part 1: Use Case, Learning Journey and Setup | AI | 19 min | Personal Blog

A deep dive into automating knowledge collection, AI-powered summarization, and LinkedIn post generation using n8n.

DATA TUBE

Agentic AI: A Progression of Language Model Usage | AI | 57 min | Insop Song | Stanford Online

A webinar on agentic LMs—covering planning, tool usage, and iterative workflows to enhance AI performance.

CONFS, EVENTS AND MEETUPS

How Drata Built a Secure, Real-Time Agentic AI System in 60 Days| Webinar | 12th March

Discover how Drata uses Change Data Capture (CDC) and Apache Flink to build a scalable RAG system, ensuring compliance and real-time data ingestion with Decodable and Vellum.

PINNACLE PICKS

Your last week top picks:

SQL is all you need!| 5 min | Data Analytics | Paul Marcombes | Google Cloud - Community Blog

SQL is at the heart of modern data operations, eliminating the need for external tools and custom scripts. Learn how Nickel’s approach enables self-service analytics through a governed SQL function catalog using BigFunctions, an open-source framework.

AI-Ready Organization How AI is Changing the Hiring Process | 3 min | AI | Giovanni Lanzani | Xebia Blog

AI is transforming recruitment by automating screening and improving efficiency. However, human judgment remains irreplaceable. This article explores how organizations can optimize AI for fair and ethical hiring decisions.

Cloud Native Warsaw - March 2025 Edition | Warsaw | 12th March

Learn how to deploy AI inference workloads on Amazon EKS using Terraform, Triton Inference Server, and Prometheus Adapter for autoscaling, monitoring, and optimization.

________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on G itHub

2025-03-03 13:27