DATA Pill feed

DATA Pill #162 - Netflix’s UDA, Claude’s Control Protocol, and the Kafka Fix That Saved 10M

ARTICLES

Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix | Data Architecture | 15 min | Alex Hutter, Alexandre Bertails, Claire Wang, Haoyuan He, Kishore Banala, Peter Royal, Shervin Afshar | Netflix Technology Blog
Netflix connects GraphQL, Avro, Iceberg, and more through a unified modeling layer backed by a knowledge graph. UDA makes schemas reusable, discoverable, and consistent across systems.
How Kafka Saved Our Payment System And Helped Us Scale to 10 Million Users | 5 min | System Design | Himanshu Singour | Personal Blog
A fragile payment flow became a scalable, event-driven architecture using Kafka. One topic, many consumers, instant results.
Claude’s 24,000-token system prompt reads like a runtime spec, not guidance. Full of logic trees, fallback rules, and control layers that shape every response.
Get your data ducks in a row with DuckLake | 14 min | Platform Engineering | Xe Iaso, Katie Schilling | Tigris Data Blog
DuckLake pairs SQL metadata with object storage, turning DuckDB into a portable lakehouse. It’s cloud-agnostic, serverless-ready, and built for nomadic compute.

TUTORIALS

How Nexthink built real-time alerts with Amazon Managed Service for Apache Flink | 10 min | Streaming Architecture | Nikos Tragaras, Raphaël Afanyan, Lorenzo Nicora, Simone Pomata, and Subham Rakshit | AWS Blog
From database polling to event-time alerting, Nexthink explains how they rebuilt monitoring with Apache Flink on AWS.

NEWS

Polaris now supports more parallel transactions with lower latency, thanks to a refactored JDBC-backed persistence layer.

TOOLS

Native data lineage in Debezium with OpenLineage | 3 min | Data Engineering | Fiore Mario Vitale | Debezium Blog
Track every CDC event from source tables to Kafka topics with built-in lineage metadata. Great for auditability and pipeline observability.
Tributary DuckDB Extension | 5 min | Data Engineering | Query.Farm Blog
Tributary lets you query Kafka topics from DuckDB using pure SQL. Ingest, analyze, or write back, ideal for streaming-first workflows.

DATA TUBE

Efficient NLP: Fine-Tuning Small Language Models Using Azure | 40 min | LLM | Ben Keen | MLOps London
Learn how to train and deploy small LLMs using Azure. A practical session packed with tooling, workflows, and use cases.
Alice 2: Building and Scaling an AI Agent During HyperGrowth | 11x | LangChain Interrupt | AI | 20 min | Sherwood Callaway, Keith Fearon | Langchain
Behind the scenes of building and scaling an AI agent inside a fast-moving company. Lessons on infra, quality, and shipping fast.

PINNACLE PICKS

Your last week top picks:
The Lakehouse Is Dead. Long Live the Lakehouse | Data Engineering | 6 min | Thomas F McGeehan V | Personal Blog
A sharp, funny critique of today’s bloated lakehouse stacks that questions whether complexity has become the product. If you’ve ever wrangled Iceberg and wondered “why?”, read this.
AI Agent Architecture via A2A/MCP | 15 min | AI | Jeffrey Richter | Personal Blog
A hands-on breakdown of how to design autonomous AI agents that plan, act, and learn. If you’re building beyond chat interfaces, this is a practical architectural guide.
Introducing Agent Bricks: Auto-Optimized Agents Using Your Data | 6 min | AI | Xiangrui Meng, Kasey Uhlenhuth, Hanlin Tang, Patrick Wendell, Matei Zaharia | Databricks Blog
Agent Bricks is a new framework for building, monitoring, and deploying RAG agents at scale. Integrated with Unity Catalog and MLflow for production-readiness.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
2025-06-17 11:35