DATA Pill feed

DATA Pill #163 - Is dbt Core dead? Lakehouse at Meta. Flink and Streaming

ARTICLES

Operating Flink Is Hard: What does this really mean? And how to go about it? | Data Processing | 10 min | Sharon Xie | Decodable Blog
Why Flink stream jobs require microservice‑style discipline - capacity planning, checkpoint health, backpressure, ownership by multiple teams - and offers best practices for metrics, staging, and monitoring to manage complexity.
Preventing Revenue Loss With Real-Time A/B Test Monitoring | Streaming | 15 min | Lukasz Krawiec | Expedia Group Technology - Engineering Blog
How Expedia uses real-time A/B test monitoring with Apache Flink to detect anomalies early, preventing revenue loss and improving experiment reliability.
Is dbt Fusion the death of dbt Core? | 5 min | Platform Engineering | Toby Mao, Andrew Madson | Tabico Cloud Blog
Is dbt Core dead? What ‘source available’ really means, why dbt Labs’ shifting to Fusion, and how it marks the end of dbt’s open source innovation.
1/ Chain-of-thought prompting works because it slows down the model’s reasoning. Slower = smarter.
How did Meta modernize their lakehouse? | 10 min | Lakehouse | Vu Trinh | Data Engineer Things Blog
How Meta’s initial approach caused them troubles and their effort to fix them at the organizational scale.

TUTORIALS

Dimensional Data Modeling with Databricks | 15 min | Lakehouse Architecture | Mariusz Kujawski| Personal Blog
A practical guide to dimensional data modeling in Databricks using Delta Lake, Unity Catalog, and Delta Live Tables to build scalable, BI-ready star schemas and fact/dimension tables.
How to bridge that gap by combining ADF diagnostic settings with Databricks system tables. How to create a centralized overview to analyze the amount of data ingested and the end-to-end runtime for a specific use case.

NEWS

Collibra acquires data access startup Raito | 3 min | Data Governance | Rebecca Szkutak | TechCrunch Blog

TOOLS

Apache Flink MQTT Source Connector | 3 min | Streaming | George Leonard | Personal Blog
Enabling real-time ingestion of IoT data streams into Flink pipelines using Eclipse Paho.
Launching: The Boring Semantic Layer | 7 min | Data Engineering | Julien Hurault | Ju Data Engineering Newsletter
Introduces a lightweight Python semantic layer built on Ibis, designed for simplicity and version control.

PINNACLE PICKS

Your last week top picks:
Polaris now supports more parallel transactions with lower latency, thanks to a refactored JDBC-backed persistence layer.
How Nexthink built real-time alerts with Amazon Managed Service for Apache Flink | 10 min | Streaming Architecture | Nikos Tragaras, Raphaël Afanyan, Lorenzo Nicora, Simone Pomata, and Subham Rakshit | AWS Blog
From database polling to event-time alerting, Nexthink explains how they rebuilt monitoring with Apache Flink on AWS.
How Kafka Saved Our Payment System And Helped Us Scale to 10 Million Users | 5 min | System Design | Himanshu Singour | Personal Blog
A fragile payment flow became a scalable, event-driven architecture using Kafka. One topic, many consumers, instant results.
____________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
2025-06-26 22:39