DATA Pill feed

DATA Pill #168 - SQL is Back in ClickHouse, Kedro Hits 1.0, and LLMs Learn to Reason

ARTICLES

How we built fast UPDATEs for the ClickHouse column store: Part 1, Part 2 | 9 min | Databases | Tom Schreiber | ClickHouse Blog
ClickHouse now supports SQL-style UPDATE and DELETE plus new table engines tailored for high-performance workloads.
Anthropic uncovers how reasoning behavior in models comes from internal structure, not surface-level prompts.
Why Startups Are Betting Everything on Apache DataFusion | Databases | 5 min | Andrew Lamb | The New Stack Blog
Write tests before models and catch logic errors early.
Is Your Data AI Ready? Are You? | 4 min | Data Governance | Jennifer Belissent | Snowflake Blog
Snowflake explains how clean labeling, governance, and discovery turn raw data into model-ready inputs.

TUTORIAL

Stream Kafka Topic to the Iceberg Tables with Zero-ETL | 12 min | Data Streaming | Vu Trinh | Data Engineer Things
Learn how to stream Kafka data into Iceberg tables using Flink for real-time, zero-ETL pipelines.

NEWS

Announcing Kedro 1.0 | 6 min | ML | QuantumBlack, AI by McKinsey
Kedro reaches 1.0 with improved modularity, long-term support, and new hooks for ML pipelines.

TOOLS

From Stream to Lake: Hands-On with Fluss Tiering into Paimon on Minio | 5 min | Data Streaming | Yang Guo | Apache Fluss Blog
Apache Fluss brings transactional consistency to streaming pipelines with built-in tiering to MinIO.
Open-source FastAPI app that connects to Databricks Lakebase with built-in token refresh and DB optimizations.
MCP Server uses AI to analyze Spark jobs, surface bottlenecks, and improve performance across pipelines.

PODCAST

Warehouse Native Incremental Data Processing With Dynamic Tables And Delayed View Semantics | 55 min | Data Processing | Tobias Macey, Dan Sotolongo | Data Engineering Podcast
Explore how delayed view semantics and dynamic tables are reshaping incremental data workflows.

EVENTS, CONFS, AND MEETUPS

ML in PL | 15-18th October | Warsaw
ML in PL 2025 gathers top machine learning minds from academia and industry for keynotes, talks, tutorials, and panels on AI, open models, security, and more.

📢 New deadline for Call for Contributions applications: 08.08.2025

PINNACLE PICKS

Your last week top picks:
Apache Flink Agents | Agentic AI
Build fault-tolerant, long-running AI agents directly on Flink using native state and streaming.
Test Driven Development (TDD) with dbt: Test First, SQL Later | 5 min | Data Engineering | Dumky de Wilde | Xebia Blog
Write tests before models and catch logic errors early.
The slow death of scaling and what comes next | 1 h 2 min | ML | Sara Hooker | Cohere
Sara Hooker explores the limits of scale in machine learning and what’s coming next for open research and efficient models.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda