DATA Pill feed

DATA Pill #154 - Flink or Kafka Streams? Apache Airflow® 3

ARTICLES

Making the Right Choice: Flink or Kafka Streams? | 9 min | Stream Processing | Juliusz Nadbereżny | GetInData | Part of Xebia Blog
A hands-on comparison of Flink and Kafka Streams, breaking down time handling, state, deployment, and scalability. Flink comes out on top for flexibility and long-term reliability.
LLMs boost productivity and reduce costs—but only when used strategically. Here's how to measure ROI across real business use cases.
Querying Structured and Unstructured Data Using LLMs — No PhD Required | 10 min | LLM | Peter Lawrence | Personal Blog
Combining LLMs with a unified knowledge model allows natural language queries across structured and unstructured data—making complex questions simple.

TUTORIALS

Building a modern Data Warehouse from scratch | 15 min | Data Warehouse | Rihab Feki | Personal Blog
From architecture to analytics—follow this practical guide to designing a scalable, SQL Server-based data warehouse using the Medallion Architecture.
Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode | 5 min | Data Engineering | Aarthi Srinivasan, Parul Saxena | AWS Blog
Learn how to manage hybrid access to Iceberg tables, using IAM for writes and Lake Formation for fine-grained reads—without disrupting existing setups.
Real Time Fraud Detection Using Apache Flink — Part 1 | 11 min | Stream Processing | Shriram Ravichandran | Yugen.ai Technology Blog
Detect suspicious transactions on the fly with Apache Flink, using stateful processing and event-time logic. Complete with Kafka integration and production tips.

NEWS

Apache Airflow® 3 is Generally Available! | 3 min | Data Engineering | Kaxil Naik, Vikram Koka | Apache Airflow Blog
Airflow’s biggest release yet—featuring DAG Versioning, Event-Driven Scheduling, a new UI, and support for multi-cloud task execution.

DATA LIBRARY

How I use LLMs | LLM | 2 h 11 min | Andrej Karpathy | Personal Channel
A practical, example-rich walkthrough of how LLMs are applied in daily workflows—perfect for both enthusiasts and builders.

CONFS, EVENTS AND MEETUPS

In this hands-on session, learn how multimodal models break down complex tables and figures into question-ready chunks—boosting the quality of inputs for vector databases and RAG systems.
GenAI is full of potential—but building LLM apps can feel daunting. This hands-on workshop makes it practical, guiding you through Python-based LLM interaction, creating your first RAG, and building your first agent.

PINNACLE PICKS

Your last week top picks:
Data quality on Databricks - Spark Expectations | 5 min | Data Quality | Bartosz Konieczny | Waiting for Code Blog
Understand how to enforce data quality in Apache Spark using Spark Expectations. This tutorial covers defining and applying various validation rules.
GenAI + dbt = dbt-sqlx: The Easiest Way to Switch SQL Dialects | 4 min | Gen AI | Nikhil Suthar | Data Engineer Things
Discover dbt-sqlx, a GenAI-powered CLI tool that translates dbt models across SQL dialects, simplifying warehouse migrations and reducing manual rewrites.
10 tips for migrating from SAS Viya to Snowflake + dbt | 3 min | Analytics Engineering | Lasse Benninga | Xebia Blog
Get practical advice on transitioning from SAS Viya to Snowflake and dbt. This guide covers handling true deletes, SAS-specific logic, and implementing robust testing practices.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda