DATA Pill feed

DATA Pill #129 - From ETL to AI, dbt: Incremental but Incomplete

ARTICLES

How We Generated Millions of Content Annotations | 10 min | ML | Dana Puleo, Meghana Seetharam, Katarzyna Drzyzga | Spotify Engineering Blog
Spotify shares how they scaled high-quality annotations for ML and GenAI across millions of tracks, streamlining from manual efforts to a scalable, efficient platform.
DuckDB Beyond the Hype | 9 min | Data Engineering | Alireza Sadeghi | Personal Blog
Curious if DuckDB is worth it? Alireza covers why this lightweight database is a solid bridge between SQL and Python for data pipelines.
Questions we’re tired of hearing: Why can’t I just query raw data? | 6 min | Data Governance | Bo Lemmers | Xebia Blog
Explore why structured data beats querying raw data and unpacking common governance pitfalls and best practices.
NVIDIA and Deloitte’s AI agents are transforming patient care at Ottawa Hospital by streamlining interactions and reducing admin tasks. See how their tech is changing healthcare.
Databricks Migration Strategy: Lessons Learned | 10 min | Data Management | George Komninos, Jaimin Shah, Soham Bhatt, Kanad Sharma | Databricks Blog
Databricks shares key insights and a structured five-phase process for smoother, more effective data warehouse migrations, from initial assessment to complete execution.

TUTORIALS

Avoiding Issues: Monitoring Query Pushdowns in Databricks Federated Queries | 11 min | Data Engineering | Adrian Chodkowski | Seequality Blog
Adrian Chodkowski covers strategies to optimize Databricks federated queries and manage pushdowns effectively. A must-read for those working with external data sources in Databricks.
dbt Semantic Layer - implementation | 7 min | Data Processing | Przemysław Baran | GetInData | Part of Xebia Blog
Przemysław Baran offers a practical guide to implementing dbt’s Semantic Layer, from setup to production, to support centralized business logic in data projects.
dbt: Incremental but Incomplete | 7 min | Data Engineering | Toby Mao | Tobiko Data Blog
Toby Mao examines dbt’s latest microbatching feature and highlights SQLMesh as a more robust alternative for time-based, incremental processing.

NEWS

Debezium 3.0.1.Final Released | 2 min | Data Streaming | Chris Cranford | Debezium Blog
Debezium 3.0.1.Final is out, adding support for Cassandra 5, PostgreSQL 17, and MySQL 9.1, plus new YAML configuration options for Debezium Server.

PODCAST

How AI Can Bring Advanced Data Outcomes to More Businesses | 47 min | AI | Eric Dodds, John Wessel, Taylor Murphy | The Data Stack Podcast
Taylor Murphy explores AI’s impact on ETL, data analytics, and cost efficiency—insights for anyone in data.

DATA TUBE

Learn how Airflow, Atlan, and OpenLineage enable metadata management and column-level lineage across platforms like AWS and Google Cloud.

CONFS EVENTS AND MEETUPS

Discover strategies from Heineken and Van Oord for building a data-first culture in this leadership-focused webinar.
Join industry experts at the Annual MLOps World & Generative AI Summits! Enjoy FREE virtual workshops and hands-on sessions to boost your skills.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on
Tilda