DATA Pill feed

DATA Pill #131 - Embeddings are underrated, The advent of the Open Data Lake

ARTICLES

The advent of the Open Data Lake | 7 min | Data Engineering | Julien Le Dem | The Symphatetic Ink Blog
Julien Le Dem maps out the shift from Hadoop to Open Data Lake, showing how cloud-native architecture eliminates data silos and enhances scalability.
Demandbase Ditches Denormalization By Switching off ClickHouse | 4 min | Data Engineering | StarRocks Engineering
Demandbase moved from ClickHouse to CelerData Cloud, cutting storage costs and simplifying data pipelines to handle real-time updates at scale.

TUTORIALS

Embeddings are underrated | 6 min | ML | Kayce Basques | Technical Writing Blog
Embeddings bring new power to technical docs, enabling content connections without complex models. Learn how these vectors organize data at a massive scale.
Streamlining Contract Management in Revenue Infrastructure | 6 min | Event Based Architecture | Austin Gundry, Travis Chun, Zian Hu | Netflix Tech Blog
Netflix introduces a new tool that centralizes and automates partner contract data, simplifying workflows for their growing subscription model.
Rethinking Data Layers: When Medallion Architecture Isn’t Enough | 9 min | Data Engineering | Annu Joshi | Data Engineer Things Blog
Annu Joshi argues for adding layers to the Medallion model for complex, cross-functional data setups that need regulatory and performance flexibility.
BI-as-Code and the New Era of GenBI| 8 min | BI | Simon Späti | Rill Data Blog
Simon Späti dives into GenBI, a new approach to BI that leverages AI to simplify dashboard creation and make analytics accessible to business users.

NEWS

Introducing Apache Kafka® 3.9 | 5 min | Data Streaming | Confluence Blog
Kafka 3.9 wraps up the 3.x series with flexible KRaft quorum management, streamlined ZooKeeper migration, and production-ready tiered storage.

TOOL

IdentityRAG combines identity resolution with retrieval-augmented generation to provide accurate, unified views of customer data, which is ideal for comprehensive LLM responses.

PODCAST

An Opinionated Look At End-to-end Code Only Analytical Workflows | 56 min | Data Analytics | Tobias Macey, Burak Karakan | Data Engineering Podcast
Burak Karakan explains the benefits of fully code-driven analytics workflows, making integrations faster and more cohesive across the data stack.

CONFS EVENTS AND MEETUPS

The Big Data Technology Warsaw Summit returns on April 9-10, 2025! Submit your speaking proposal and join over 500 professionals as they dive into the latest in data engineering and big data technology.
Infoshare Katowice | 26th-27th November | Katowice
This conference is for founders, CEOs, decision-makers, managers, and professionals from the new technologies sector.

About:
Inspiration | Trends | BizDev | Case studies | People & Culture | FinTech | EduTech
Architecture | AI/ML/Data | DevOps & Cloud | People & Culture | Java | UX & Front-end | GameDev | Cloud | Big Data | AI | CyberSecurity | eCommerce

As a community partner, we share a discount code: ISK24-DP10
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on
Tilda