ARTICLES
Tackling AI Hallucinations in LLM Apps | 6 min | LLM | Denis Kazakov | Gusto Engineering
Explore how LLM confidence scores help filter poor-quality responses, improving AI reliability in customer support and automated workflows.

10 Future Apache Iceberg Developments to Look forward to in 2025 | 12 min | Data Engineering | Alex Merced | Data, Analytics & AI with Dremio
Apache Iceberg is evolving with scan planning, federated catalogs, geospatial support, and delete file optimizations, enhancing data governance and performance.
Hard-Earned Lessons from a Year of Building AI Agents | 7 min | AI | Maya Murad | Personal Blog
Author shares AI agent development insights, emphasizing transparency, governance, and open-source adoption to drive real-world impact.
Open Standards for Data Lineage: OpenLineage for Batch AND Streaming | 11 min | Data Engineering | Kai Waehner | Personal Blog
Understand how OpenLineage is shaping data governance in streaming platforms like Kafka and Flink, ensuring enterprise-wide visibility.

Why a Strong Data Management Strategy Starts with a Conceptual Information Model| 5 min | Data Management | Steven Nooijen | Xebia Blog
A conceptual information model streamlines data alignment, governance, and traceability, supporting scalable and efficient operations.
TUTORIALS
What is the best way to release code in Microsoft Fabric? | 4 min | Data Engineering | Matias Samblancat | Personal Blog
Compare Fabric Deployment Pipelines (easy, no-code) vs. DevOps REST API (customizable automation) to find the best deployment approach.
CI/CD in Microsoft Fabric with Azure DevOps using fabric-cicd accelerator | 5 min | DevOps | Matias Samblancat | Personal Blog
Learn how to set up fabric-cicd, a new open-source tool, for automating deployments in Microsoft Fabric.
NEWS
Data Science Agent in Colab: The future of data analysis with Gemini | 3 min | AI | Jane Fine, Mahi Kolla, Ilai Soloducho | Google for Developers
Google’s Data Science Agent automates Colab notebook creation, helping researchers and developers streamline data analysis with Gemini AI.
DATA LIBRARY
Smarter Data, Brighter Decisions: Data Quality Tools Comparison | Data Quality | GetInData | Part of Xebia
Compare AI-powered data quality solutions like Monte Carlo, Collibra, Talend, and AWS Glue Databrew for better data management.
CONFS, EVENTS AND MEETUPS
Data&AI Spotlight: Data-Driven Transformation | Online | 18th March
Learn how Catella, Eurowings, GE Healthcare, XTB & Resistant AI, and DSV are using AI, cloud, and machine learning to transform their businesses. Q&A + ticket giveaway included!
PINNACLE PICKS
Your last week top picks:
DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data | 5 min | Data Engineering | Mehdi Ouazza | Personal Blog
DeepSeek’s smallpond extends DuckDB to distributed computing using Ray and a custom storage system, balancing scalability with added complexity.
How much does it cost to run DeepSeek-R1 locally? | 3 min | AI | Mehul Gupta | Data Science in your pocket
Running DeepSeek-R1 (671B params) locally? It’ll set you back ~$106K in hardware alone—GPUs, RAM, storage, and cooling make it an enterprise-scale investment.
Estimating Incremental Lift in Customer Value (Delta CV) using Synthetic Control| 7 min | Data Engineering | Mahshid Moha | The PayPal Technology Blog
PayPal uses ML-driven causal inference to measure how new product adoption impacts revenue and engagement, refining decision-making with user-matching techniques.
________________________
Have any interesting content to share in the DATA Pill newsletter?