DATA Pill feed

DATA Pill #133 - CDC at Pinterest, GCP & Iceberg, Databricks vs. Snowflake

ARTICLES

The Heart of the Data Mesh Beats Real-Time with Apache Kafka | 5 min | Data Engineering | Kai Waehner | Personal Blog
Discover how Apache Kafka drives scalable, decentralized data sharing, enabling real-time, flexible, and efficient data mesh architectures.
Change Data Capture at Pinterest | 7 min | Data Engineering | Liang Mou, Elizabeth (Vi) Nguyen | Pinterest Engineering Blog
A deep dive into Pinterest’s CDC solution, tackling scalability and reliability challenges for real-time data integration in distributed databases.
Apache Iceberg Won the Future — What’s Next for 2025? | 5 min | Data Engineering | Yingjun Wu | Data Engineer Things
Explore Apache Iceberg’s future, featuring advanced CDC, RBAC catalogs, and materialized views, solidifying its role in multi-engine ecosystems.
Follow the journey of a leading telecom migrating its massive Hadoop cluster to an open-source, Kubernetes-based platform, achieving scalability and cloud-agnostic flexibility.
I spent 3 hours learning how Uber manages data quality | 7 min | Data Quality | Vu Trinh | Data Engineer Things
Learn how Uber manages data quality across 2,000+ datasets with automation, incident management, and standardized metrics.

TUTORIALS

GCP & Iceberg | 12 min | Data Engineering | Julien Hurault, Borja Vazquez-Barreiros | Ju Data Engineering Newsletter
A detailed guide to integrating Apache Iceberg with Google Cloud’s BigLake for multi-cloud lakehouse architectures.
Databricks vs (Optimized) Snowflake by the Numbers | 7 min | Data Engineering | Paul Needleman | Personal Blog
Updated benchmarks reveal Snowflake’s Managed Iceberg Tables can outperform Databricks when optimized. Learn how to replicate these results.

NEWS

Apache DataFusion is now the fastest single node engine for querying Apache Parquet files | 6 min | Data Processing | Andrew Lamb | Apache DataFusion Project News & Blog
Apache DataFusion is now the fastest engine for querying Apache Parquet files, outpacing DuckDB and ClickHouse. Learn about its recent optimizations and future developments.

CONFS EVENTS AND MEETUPS

Warsaw Data Tech Talks Meetup | Warsaw | 10th December
The event aims to explore actionable strategies for data utilization and implementation in 2025, enabling businesses to extract value from data early in the year. Experts will share practical insights, discuss challenges, and propose solutions to create impactful data strategies.
AI or ROI?| Webinar Series | 10th December
This webinar series offers key insights to help you turn AI strategy into measurable growth while avoiding costly pitfalls. Learn how to maximize value, architect for success, and focus on what truly drives ROI—register today.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
2024-11-28 10:08