ARTICLES
Making the Right Choice: Flink or Kafka Streams? | 9 min | Stream Processing | Juliusz Nadbereżny | GetInData | Part of Xebia Blog
A hands-on comparison of Flink and Kafka Streams, breaking down time handling, state, deployment, and scalability. Flink comes out on top for flexibility and long-term reliability.
Faster, Better, Cheaper: How to Measure the Business Impact of LLMs | 3 min | LLM | Eva Bosma | Xebia Blog
LLMs boost productivity and reduce costs—but only when used strategically. Here's how to measure ROI across real business use cases.

Querying Structured and Unstructured Data Using LLMs — No PhD Required | 10 min | LLM | Peter Lawrence | Personal Blog
Combining LLMs with a unified knowledge model allows natural language queries across structured and unstructured data—making complex questions simple.
TUTORIALS
Building a modern Data Warehouse from scratch | 15 min | Data Warehouse | Rihab Feki | Personal Blog
From architecture to analytics—follow this practical guide to designing a scalable, SQL Server-based data warehouse using the Medallion Architecture.
Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode | 5 min | Data Engineering | Aarthi Srinivasan, Parul Saxena | AWS Blog
Learn how to manage hybrid access to Iceberg tables, using IAM for writes and Lake Formation for fine-grained reads—without disrupting existing setups.

Real Time Fraud Detection Using Apache Flink — Part 1 | 11 min | Stream Processing | Shriram Ravichandran | Yugen.ai Technology Blog
Detect suspicious transactions on the fly with Apache Flink, using stateful processing and event-time logic. Complete with Kafka integration and production tips.
NEWS
Apache Airflow® 3 is Generally Available! | 3 min | Data Engineering | Kaxil Naik, Vikram Koka | Apache Airflow Blog
Airflow’s biggest release yet—featuring DAG Versioning, Event-Driven Scheduling, a new UI, and support for multi-cloud task execution.
DATA LIBRARY
How I use LLMs | LLM | 2 h 11 min | Andrej Karpathy | Personal Channel
A practical, example-rich walkthrough of how LLMs are applied in daily workflows—perfect for both enthusiasts and builders.
CONFS, EVENTS AND MEETUPS
PDFs – When a Thousand Words Are Worth More Than a Picture (or Table) | Online | May 1st
In this hands-on session, learn how multimodal models break down complex tables and figures into question-ready chunks—boosting the quality of inputs for vector databases and RAG systems.
Practical GenAI: Building LLM-powered Applications | Online | May 1st
GenAI is full of potential—but building LLM apps can feel daunting. This hands-on workshop makes it practical, guiding you through Python-based LLM interaction, creating your first RAG, and building your first agent.
PINNACLE PICKS
Your last week top picks:
Data quality on Databricks - Spark Expectations | 5 min | Data Quality | Bartosz Konieczny | Waiting for Code Blog
Understand how to enforce data quality in Apache Spark using Spark Expectations. This tutorial covers defining and applying various validation rules.
GenAI + dbt = dbt-sqlx: The Easiest Way to Switch SQL Dialects | 4 min | Gen AI | Nikhil Suthar | Data Engineer Things
Discover dbt-sqlx, a GenAI-powered CLI tool that translates dbt models across SQL dialects, simplifying warehouse migrations and reducing manual rewrites.
10 tips for migrating from SAS Viya to Snowflake + dbt | 3 min | Analytics Engineering | Lasse Benninga | Xebia Blog
Get practical advice on transitioning from SAS Viya to Snowflake and dbt. This guide covers handling true deletes, SAS-specific logic, and implementing robust testing practices.
________________________
Have any interesting content to share in the DATA Pill newsletter?