DATA Pill feed

DATA Pill #147 - Are you ready for MLOps? 🫵 DuckDB goes distributed?

ARTICLES

DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data | 5 min | Data Engineering | Mehdi Ouazza | Personal Blog
DeepSeek’s smallpond extends DuckDB to distributed computing using Ray and a custom storage system, balancing scalability with added complexity.
Running DeepSeek-R1 (671B params) locally? It’ll set you back ~$106K in hardware alone—GPUs, RAM, storage, and cooling make it an enterprise-scale investment.
Estimating Incremental Lift in Customer Value (Delta CV) using Synthetic Control | 7 min | Data Engineering | Mahshid Moha | The PayPal Technology Blog
PayPal uses ML-driven causal inference to measure how new product adoption impacts revenue and engagement, refining decision-making with user-matching techniques.

TUTORIALS

Step-by-Step Guide to Boosting Enterprise RAG Accuracy | 8 min | RAG | Madhukar Kumar | Software, AI and Marketing
Improve retrieval from PDFs using semantic chunking, entity extraction, and knowledge graphs—enhancing RAG/KAG performance.
FacetController: How we made infrastructure changes at Lyft simple | 7 min | DevOps | Miguel Molina, Arvind Subramanian | Lyft Engineering Blog
Lyft’s Kubernetes-based FacetController automates deployments, scales infra efficiently, and eliminates mass redeployments.
The caching strategy of our Teads SSP | 13 min | Data Engineering | Tristan Sallé | Teads Engineering Blog
Teads scales its SSP to handle massive traffic using Redis caching, Kubernetes, and automated rollbacks for reliability.
Are you ready for MLOps? 🫵 | 6 min | MLOps | Jeroen Overschie | Xebia Blog
MLOps without a DevOps foundation is a recipe for failure—this blog breaks down key adoption steps and best practices.
A deep dive into automating knowledge collection, AI-powered summarization, and LinkedIn post generation using n8n.

DATA TUBE

Agentic AI: A Progression of Language Model Usage | AI | 57 min | Insop Song | Stanford Online
A webinar on agentic LMs—covering planning, tool usage, and iterative workflows to enhance AI performance.

CONFS, EVENTS AND MEETUPS

Discover how Drata uses Change Data Capture (CDC) and Apache Flink to build a scalable RAG system, ensuring compliance and real-time data ingestion with Decodable and Vellum.

PINNACLE PICKS

Your last week top picks:
SQL is all you need!| 5 min | Data Analytics | Paul Marcombes | Google Cloud - Community Blog
SQL is at the heart of modern data operations, eliminating the need for external tools and custom scripts. Learn how Nickel’s approach enables self-service analytics through a governed SQL function catalog using BigFunctions, an open-source framework.
AI-Ready Organization How AI is Changing the Hiring Process | 3 min | AI | Giovanni Lanzani | Xebia Blog
AI is transforming recruitment by automating screening and improving efficiency. However, human judgment remains irreplaceable. This article explores how organizations can optimize AI for fair and ethical hiring decisions.
Learn how to deploy AI inference workloads on Amazon EKS using Terraform, Triton Inference Server, and Prometheus Adapter for autoscaling, monitoring, and optimization.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda