DATA Pill feed

DATA Pill #105 - Mastering AI and Data: From OpenLineage to GPT-o Innovations

ARTICLES

Open Standards for Data Lineage: OpenLineage for Batch AND Streaming | 11 min | Data Streaming | Kai Waehner | Personal Blog
Discover how OpenLineage sets a new standard for data lineage in batch and streaming with insights into IBM, Google, Confluent, and Collibra data governance solutions.
The text discusses using LLMs and RAG for text-to-SQL queries, highlighting the importance of a semantic layer for better data access. It also introduces WrenAI, an open-source tool that improves database interactions by organizing metadata and business semantics for more efficient data retrieval.
How to Determine Causal Effects when A/B Tests are Infeasible through Adopter Analysis | 6 min | Data Science | Avanti Chande | Walmart Tech Blog
Explore the Adopter Analysis framework for measuring the impact of changes in e-commerce subscription businesses when A/B testing isn't possible.
How We Migrated From dbt Cloud and Scaled Our Data Development | 12 min | Data Engineering | Gloss Genius Blog
See how GlossGenius improved productivity and efficiency by migrating from dbt Cloud to dbt-core and integrating Apache Airflow and GitHub Actions.
dbt-flink-adapter - job lifecycle management. Transforming data streaming | 10 min | Data Engineering | Maciej Maciejko | GetInData | Part of Xebia Blog
Read how the dbt-flink-adapter transforms data streaming by integrating dbt's SQL models with Flink SQL, improving analytics, streamlining data job management, and boosting workflow efficiency.
Agents & Agentic Workflows | AI | Dheeren Velu | Personal Blog
Read how AI "agents" and "agentic workflows" transform the field by enabling autonomous systems to handle tasks, make decisions, and collaborate efficiently. These innovations are poised to drive significant advancements in the future of AI.

NEWS

GPT-4o | 5 min | AI | OpenAI Blog
OpenAI has introduced GPT-4o, a new model capable of real-time reasoning across text, audio, and vision. GPT-4o improves natural human-computer interaction by processing multiple input types and generating diverse outputs, offering faster performance, better multilingual capabilities, and lower costs than previous models.

TUTORIAL

One Big Table vs. Dimensional Modeling on Databricks SQL | 11 min | Data Modeling | Sepideh Jahangiri, Philip Laserstein | DBSQL SME Engineering Blog
Explore the benefits and challenges of Dimensional Modeling versus One Big Table, and learn best practices for implementing these techniques on Databricks.

PODCAST

Data Engineering and its Streams, Rivers, and Lakes | 48 min | Data Engineering | Ned Bellavance, Kyler Middleton | Day Two Cloud Podcast
Keith Gregory simplifies data engineering for DevOps professionals and hydrologists, explaining how data engineers build pipelines to transform and transport data for analysis. He covers key concepts like streaming vs. polling pipelines, data lakes vs. data warehouses, and terms like ELT, OLTP, and columnar storage.

DATA TUBE

How And Why Data Engineers Need To Care About Data Quality Now - And How To Implement It | 16 min | Data Quality | Benjamin Rogojan | Seattle Data Guy
Watch Benjamin Rogojan's video on the importance of data quality and how to implement checks like anomaly detection and data freshness to ensure reliable real-time data.

CONFS EVENTS AND MEETUPS

Join the event at Tech-Talk Petersplatz in Zürich to explore GenAI's potential and challenges, organizational structuring, and operating models. Attendees will learn about the RAG approach, integrating Google Cloud services with Open Source tools, and AI-driven insights for organizations.
🦄 From Colleague to Supervisor: Building Your AI-Driven Management System

That is the title that won our competition!
This is the most click-bite, cringe title you will NOT hear during Infoshare.
Congrats to the inventor who gets a free Developer Pass to Infoshare.
For the rest members of our community, we have a discount code: ISC24-GetInData10
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on
Tilda