DATA Pill #128 - dbt™ at BlaBlaCar, What CDC is (and isn’t)

ARTICLES

The Semantic Layer Movement: The Rise & Current State | 11 min | Data Product Platform | Animesh Kumar | Modern Data 101 Blog

Find out how the semantic layer changes modern data systems by adding context to raw data, making it more accessible across tools. This post examines how this layer solves challenges like "semantic mistrust" and powers scalable, purpose-driven data applications.

What The Heck is Apache Polaris? | 5 min | Data Engineering | Shawn Gordon | Personal Blog

Curious about the new kid on the block, Apache Polaris? This article shows how Snowflake’s latest open-source project is shaking up the table format landscape, making it easier to manage metadata and work across multiple engines.

Scaling Success: dbt™ at BlaBlaCar | 11 min | Data Science | Tushar Bhasin | BlaBlaCar Blog

BlaBlaCar leveled up their SQL transformations by adopting dbt™, which helped them manage over 4,000 tables with ease. Learn how dbt™ improved collaboration, automation, and the overall developer experience for their team.

TUTORIALS

Why Do I Need CDC? | 7 min | Data Management | Robin Moffatt | Decodable Blog

Change Data Capture (CDC) extracts incremental data changes for analytics and system synchronization without affecting operational databases. This tutorial breaks down the benefits of log-based CDC, highlighting its efficiency and low impact on performance.

The Rise of the Declarative Data Stack | 10 min | Data Engineering | Simon Späti | Rill Data Blog

Explore the shift toward declarative data stacks, which make data processes simpler and more flexible. This approach lets engineers focus on what must be done while the system handles the details.

Dynamic Data Pipelines with Airflow Datasets and Pub/Sub | 4 min | Data Engineering | Nawfel Bacha | Astrafy Blog

Airflow Datasets bring automation to trigger workflows based on specific events. This tutorial walks you through using Google Cloud Pub/Sub with Airflow Datasets to create dynamic, event-driven pipelines.

NEWS

Azure Databricks Mirrored Catalog | 9 min | Data Engineering | Jose Mendes | Telefonica Tech Blog

Microsoft announced the public preview of Azure Databricks Mirrored Catalog, enabling direct access to Databricks Unity Catalog tables from Fabric. Users can now create a read-only, replicated copy in OneLake via the UI and explore data with SQL Endpoint or Power BI. This blog covers setting up the mirrored database and its pros and cons.

Preview Release of Apache Flink 2.0 | 9 min | Data Streaming | Xintong Song | Flink Blog

The Apache Flink community is preparing for Flink 2.0, the first major release in 8 years, bringing new features and compatibility-breaking changes. A preview release is now available to help users and partners adapt early and provide feedback.

DATA TUBE

How Heineken Is Brewing Success With Generative AI | 35 min | Gen AI | Bernard Marr, Tony Costella | Personal Channel

Heineken shares how it uses generative AI to drive consumer insights and streamline operations. This session offers practical insights into how AI is reshaping large-scale businesses.

CONFS EVENTS AND MEETUPS

Big Data Technology Warsaw 2025 - CFP | 24th November

The Big Data Technology Warsaw Summit returns on April 9-10, 2025! Submit your speaking proposal and join over 500 professionals as they dive into the latest in data engineering and big data technology.

Infoshare Katowice | Conference | November 26-27

We are a community partner of Infoshare Katowice, an event for developers and architects, as well as for IT team leaders, managers, and entrepreneurs from tech companies and the GameDev industry.

3 stages:

GROWTH – Business development, growth strategies, case studies from leaders, insight into people & culture
DEV ARCHITECTURE – System architecture, programming, and software engineering
DEV CODE STAGE – Coding techniques, programming languages and developer tools

Promo code for our community: ISK24-DP10

MLOPS WORLD | Conference | November 7-8 | Austin TX

The event for the Ml/Gen AI community comprised over 20,000 ML researchers, engineers, scientists, and entrepreneurs across several disciplines.

Taken from the real-life experiences of practitioners, the Steering Committee has selected the top applications, achievements, and knowledge areas to highlight across the event.
You can use code DataPill for 20% off all tickets.

________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on G itHub

➡ Dig previous editions of DataPill

2024-10-24 12:38