ARTICLES
Scaling Pinterest ML Infrastructure with Ray: From Training to End-to-End ML Pipelines | ML | 8 min | Andrew Yu, Jiahuan Liu, Qingxian Lai, Kritarth Anand | Pinterest Engineering Blog
Pinterest unified its ML stack using Ray to enable scalable training, hyperparameter tuning, and modular end-to-end pipelines.

Measuring Commercial Impact at Scale at Canva | Data Analytics | 6 min | Jun Ye | Canva Engineering Blog
Canva connects experimentation with business outcomes by measuring impact at scale across its product ecosystem.

The Transactional Outbox Pattern: Transforming Real-Time Data Distribution at SeatGeek | 7 min | Data Engineering | ChairNerd Blog
SeatGeek shares how it ensures reliable and fault-tolerant event publishing across microservices using the transactional outbox pattern.

High concurrency mode for Fabric notebooks in pipelines| 4 min | Data Engineering | Adrian Chodkowski | SeeQuality Blog
Microsoft Fabric notebooks now support high-concurrency mode for faster and more efficient pipeline execution.
Fine-Tuning LLMs is a Huge Waste of Time| 8 min | ML | Personal Blog
This opinionated take argues that RAG and prompt engineering are often more effective than fine-tuning large language models.
TUTORIAL
Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix | 15 min | Data Management | Alex Hutter, Alexandre Bertails, Claire Wang, Haoyuan He, Kishore Banala, Peter Royal, Shervin Afshar | Netflix Engineering Blog
Netflix introduces its Unified Data Architecture to power batch, streaming, and ML pipelines across a scalable and modular platform.

NEWS
Introducing BigQuery ObjectRef: Supercharge your multimodal data and AI processing | 7 min | Data Analytics | Jamy Su, Gaurav Soni | Google Cloud Blog
Google BigQuery adds support for OBJECT data types, enabling native querying of unstructured formats like PDFs, images, and audio.

TOOLS
A new open-source Python notebook for building reactive dashboards with reproducible, modular code and minimal boilerplate.
Introducing Firebolt Core - Self-Hosted Firebolt, For Free, Forever | 3 min | Data Warehouse | Mosha Pasumansky, Benjamin Wagner | FireBolt Blog
Firebolt releases its high-speed query engine as a free open-source option for local or hybrid data environments.
A solution accelerator that connects Genie with Slack through n8n to trigger workflows and automate operations from chat.
DATA TUBE
A Framework for GenAI App and Agent Development | 52 min | GenAI | Jerry Liu, Richie Cotton | Data Camp
In this podcast, the LlamaIndex CEO breaks down how to build GenAI systems that handle complex document workflows and scale in the enterprise.
PINNACLE PICKS
Your last week top picks:
How did Meta modernize their lakehouse? | 10 min | Lakehouse | Vu Trinh | Data Engineer Things Blog
How Meta’s initial approach caused them troubles and their effort to fix them at the organizational scale.
Preventing Revenue Loss With Real-Time A/B Test Monitoring | Streaming | 15 min | Lukasz Krawiec | Expedia Group Technology - Engineering Blog
How Expedia uses real-time A/B test monitoring with Apache Flink to detect anomalies early, preventing revenue loss and improving experiment reliability.
Dimensional Data Modeling with Databricks| 15 min | Lakehouse Architecture | Mariusz Krajewski | Personal Blog
A practical guide to dimensional data modeling in Databricks using Delta Lake, Unity Catalog, and Delta Live Tables to build scalable, BI-ready star schemas and fact/dimension tables.
____________________
Have any interesting content to share in the DATA Pill newsletter?