DATA Pill feed

DATA Pill #169 - Persona vectors, HA Postgres on K8s, streaming lakehouses

ARTICLES

Anthropic introduces latent vectors that shape tone, expertise, and goals in LLMs without touching the prompt. A practical path toward more modular and controllable AI.
Achieving High Availability with distributed database on Kubernetes at Airbnb | Platform Engineering | 6 min | Artem Danilov | Airbnb Tech Blog
Airbnb explains how they built a multi-tenant database platform with native Kubernetes tools, custom operators, and automated failover strategies.
5 Ways Dremio Makes Apache Iceberg Lakehouses Easy | 8 min | Data Lakehouse | Alex Merced | Dremio Blog
Dremio cuts the complexity of working with Iceberg using catalog federation, instant metadata refresh, and no-copy table creation.
Five Python Tips You Won’t Find in Most Curriculums | 5 min | Lucy Sheppard | Data Engineering | Xebia Blog
Quick wins for cleaner Python. Learn about else clauses in loops, function factories, and assignment expressions that keep your code lean and expressive.

TUTORIAL

Build a Streaming Lakehouse with Flink, Kafka, Iceberg, and Polaris | 8 min | Data Engineering | Gilles Philippart | Personal Blog
A hands-on guide to setting up a streaming data lakehouse with schema evolution and end-to-end reliability using open-source tools.

NEWS

New AI-native connectors, unified batch and stream processing, improved autoscaling, and hardened production stability make this Flink's most capable release yet.
Spark pipelines can now read Snowflake data without data movement. This new integration simplifies hybrid workflows and keeps full access control in place.

TOOLS

Introducing LangExtract: A Gemini powered information extraction library | 4 min | NLP | Akshay Goel, Atilla Kiraly | Google for Developers Blog
A lightweight Python library for information extraction with built-in schema validation and few-shot support. Built for fast, type-safe NLP pipelines.
Databricks Labs LSQL | LLM | Databricks Labs
lsql turns LLMs into SQL-native copilots with prompt optimization and semantic query generation. Great for analytics teams building natural language interfaces.

EVENTS, CONFS, AND MEETUPS

Data Expo 2025 | 10-11th September | Utrecht
The largest data event in the Netherlands returns with 100+ vendors, 150+ sessions, and a packed agenda for engineers, scientists, and data leaders. Free to attend.

PINNACLE PICKS

Your last week top picks:
Announcing Kedro 1.0 | 6 min | ML | QuantumBlack, AI by McKinsey
Kedro reaches 1.0 with improved modularity, long-term support, and new hooks for ML pipelines.
Stream Kafka Topic to the Iceberg Tables with Zero-ETL | 12 min | Data Streaming | Vu Trinh | Data Engineer Things
Learn how to stream Kafka data into Iceberg tables using Flink for real-time, zero-ETL pipelines.
Why Startups Are Betting Everything on Apache DataFusion | Databases | 5 min | Andrew Lamb | The New Stack Blog
DataFusion is winning over startups with its fast Rust-based query engine and plug-and-play architecture.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda