ARTICLES
How AI and Machine Learning are Fixing Data Quality Fast | 4 min | AI | Michał Kardach, Katarzyna Kusznierczuk | GetInData | Part of Xebia Blog
Discover how AI-driven tools like Monte Carlo and Talend Data Fabric improve data quality for faster insights.
TUTORIALS
Every System is a Log: Avoiding coordination in distributed applications| 13 min | Software Engineering | Stephan Ewen, Jack Kleeman, Giselle van Dongen | Restate Blog
Reduce complexity in distributed applications by using a single log to manage failures and concurrency.

Don’t count rows in ETL, use Delta Log metrics!| 7 min | Data Engineering | Adrian Chodkowski | Seequality Blog
Leverage Delta Lake’s transaction log for automated ETL monitoring and performance optimization.
From RAG to fabric: Lessons learned from building real-world RAGs at GenAIIC – Part 1 | 8 min | RAG | Aude Genevay | AWS Blog
Explore RAG architecture and strategies for optimizing retrieval-augmented generation models.

Databricks Lakehouse Optimization: A deep dive into Delta Lake’s VACUUM | 4 min | Data Engineering | Frank Mbonu | Xebia Blog
Learn to cut Databricks storage costs with VACUUM and advanced cleanup techniques.
DATA LIBRARY
Agents | AI | Julia Wiesinger, Patrick Marlow, Vladimir Vuskovic | Google
See how AI agents enhance decision-making with external tool access for real-time actions.
TOOL
Cellm | LLM
An Excel extension that integrates Large Language Models (LLMs) like ChatGPT into formulas.
DATA TUBE
“The Coding Machine” at Meta | Software Engineering | 1 h 15 min | Gergely Orosz, Michael Novati | The Pragmatic Engineer
Inside Meta’s engineering culture, career growth, and hiring process with insights from top engineers.
CONFS, EVENTS AND MEETUPS
Looker Community Event #6| Rotterdam | 6th February
Join talks on Looker Explore Assistant, BI transformation, and AI-powered reporting
DuckDB Amsterdam Meetup #2 | Amsterdam | 20th February
Dive into DuckDB with Unity Catalog, WASM-powered spreadsheets, and Postgres Data Warehouses.
PINNACLE PICKS
Your last week top picks:
Airflow in a multi-teams / multi-tenant environment. Deployment strategies | 22 min | Data Engineering | Kacper Muda | GetInData | Part of Xebia Blog
Explore deployment solutions for Apache Airflow in multi-team environments. Highlights include resource isolation, shared access options, and a glimpse at Airflow 3's upcoming capabilities.

drawdata | Data Visualization
A Python library for interactive dataset creation directly in Jupyter notebooks. Perfect for machine learning tutorials and algorithm demos.
Paimon 1.0: Unified Lake Format for Data + AI | 4 min | AI | Martin Grund, Stefania Leone | Alibaba Cloud Blog
Introducing Apache Paimon, a groundbreaking data lakehouse solution integrating batch and streaming operations for real-time AI workflows.
________________________
Have any interesting content to share in the DATA Pill newsletter?