DATA Pill feed

DATA Pill #149 - Date Lakehouse - is it a holy grail we have been looking for?

ARTICLES

Using generative AI to scale duoradio 10x faster | 5 min | Gen AI | Luis Mas Castillo, Sophie Mackey, Cindy Berger | Duolingo Blog
Duolingo revolutionized content creation for DuoRadio by leveraging generative AI, boosting daily sessions from 500K to 5M and expanding episodes from 300 to over 15,000 in just two quarters.
The Future of AI Agents is Event-Driven | 12 min | AI | Sean Falconer | Personal Blog
Discover how Event-Driven Architecture empowers AI agents to communicate asynchronously and scale without rigid dependencies, enabling adaptive and resilient AI systems.

TUTORIALS

Every System is a Log: Part 1, Part 2 | 33 min | Data Architecture | Stephan Ewen, Jack Kleeman, Giselle van Dongen | Restate Blog
Master building real-time data pipelines by combining Apache Flink’s Java API with SQL for efficient data ingestion and processing.
Bridging Flink SQL and Custom Java Pipelines with the Decodable SDK | 12 min | Data Sreaming | Hans-Peter Grahsl | Decodable Blog
Master building real-time data pipelines by combining Apache Flink’s Java API with SQL for efficient data ingestion and processing.
Databricks Custom Data Source — Practical Examples | 5 min | Data Engineering | Mariusz Kujawski | Personal Blog
Learn how to create custom data sources in Spark 4.0 to optimize data ingestion from APIs and generate synthetic data using the Python Data Source API.
How To Delete a Topic in Apache Kafka®: A Step-By-Step Guide| 5 min | Data Engineering | Confluent Blog
A practical guide to safely deleting Kafka topics, covering self-managed, cloud-hosted, and Confluent Cloud setups, along with automation tips.
Introducing Serverless Batch Inference | 5 min | LLM | Ankit Mathur, Ahmed Bilal, Youngbin Kim | Databricks Blog
Explore how Databricks enhances batch AI inference with serverless architecture, boosting processing speed while ensuring data governance and performance.

TOOL

Streamline event-driven architecture documentation with EventCatalog, a markdown-powered platform for maintaining clarity across domains and tracking event changes.

PODCAST

Astronomer's Role in the Airflow Ecosystem: A Deep Dive| Data Engineering | 51 min | Tobias Macey, Pete DeJoy | Data Engineering Podcast
Compare AI-powered data quality solutions like Monte Carlo, Collibra, Talend, and AWS Glue Databrew for better data management.

CONFS, EVENTS AND MEETUPS

Data Lakehouse is becoming the new buzzword, but not every organization truly needs it. This webinar will explore its potential, key challenges, and how to build an architecture that truly delivers value.

PINNACLE PICKS

Your last week top picks:
Tackling AI Hallucinations in LLM Apps | 6 min | LLM | Denis Kazakov | Gusto Engineering
Explore how LLM confidence scores help filter poor-quality responses, improving AI reliability in customer support and automated workflows.
Smarter Data, Brighter Decisions: Data Quality Tools Comparison | Data Quality | GetInData | Part of Xebia
Compare AI-powered data quality solutions like Monte Carlo, Collibra, Talend, and AWS Glue Databrew for better data management.
10 Future Apache Iceberg Developments to Look forward to in 2025 | 12 min | Data Engineering | Alex Merced | Data, Analytics & AI with Dremio
Apache Iceberg is evolving with scan planning, federated catalogs, geospatial support, and delete file optimizations, enhancing data governance and performance.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda