DATA Pill feed

DATA Pill #095 - Real-Time RAG, pick between Kimball, One Big Table, and Relational Modeling


Apache Kafka is NOT real real-time data streaming! | 4 min | Data Streaming | Kai Waehner | Personal Blog
This blog post explores the architecture of NASDAQ that combines critical stock exchange trading with low-latency streaming analytics.
Why Gemini 1.5 (and other large context models) are bullish for RAG | 7 min | RAG | Chia Jeng Yang | Enterprise RAG Blog
This blog considers in what sense Hive’s Metastore is “open” and why we believe the leading candidates to replace it are closed, in a way that is meant to limit us to using a specific vendor’s data ecosystem.
News Recommendation: the challenging area in building recommendation systems | 8 min | Recommendation Systems | Adam Cierlik | GetInData | Part of Xebia Blog
Exploring the ever-changing world of news recommendation systems? This blog dives deep into how to blend user preferences with real-time news context for a genuinely personalized reading experience.
Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data Platform | 13 min | Machine Learning | Binbing Hou, Stephanie Vezich Tamayo, Xiao Chen, Liang Tian, Troy Ristow, Haoyuan Wang, Snehal Chennuru, Pawan Dixit | Netflix Technology Blog
This blog post introduces our Auto Remediation project, which automatically combines a rule-based classifier with a machine learning service to fix failed jobs, requiring no human intervention.
A Deep Dive into the Latest Performance Improvements of Stateful Pipelines in Apache Spark Structured Streaming | 5 min | Data Streaming | Mojgan Mazouchi, Mrityunjay Kumar, Anish Shrigondekar and Karthikeyan Ramasamy | Databricks Blog
This is the second part of a two-part series on the latest performance improvements of stateful pipelines. The first part covered Performance Improvements for Stateful Pipelines in Apache Spark Structured Streaming.

In this section, we will dig deeper into the various issues we observed while analyzing performance and outline specific enhancements we have implemented to address those issues.


Evaluate LLMs with Hugging Face Lighteval on Amazon SageMaker| 8 min | LLM | Philipp Schmid | Personal Blog
Let’s learn how to evaluate LLMs using Hugging Face lighteval. LightEval supports the evaluation suite used in Hugging Face Open LLM Leaderboard.
Easy Introduction to Real-Time RAG | 5 min | RAG | Hubert Dulay | Personal Blog
This tutorial delves into the practical application of RAG in real-time scenarios, using an innovative approach to answer questions with updated and precise information from a set of documents.


We'll be covering:
- When to use One Big Table modeling vs Kimball
- How to use Struct and Array and Array of Struct to get what you want


Optimizing both hardware and software for GenAI | 26 min | Gen AI | Ryan Donovan, Raymond Lo | The Stack Overflow Podcast
Ryan and Ben chat with Raymond Lo, AI software evangelist at Intel, about the AI PC, the software that powers AI breakthroughs, and optimizing hardware and software in unison to improve generative AI performance. Bonus: what’s the difference between a GPU optimized for graphics and a VPU or NPU optimized for AI?


Big Data Technology Warsaw Summit | Warsaw and Online | 10th and 11th April
Join the independent conference with an agenda with presentations arranged into nine categories – find your most desired topics! There are, for example:

  • Data Engineering
  • Streaming and real-time analytics
  • ML & Data Science
  • Gen AI

And more! Learn from speakers from companies like Dropbox, IKEA, Cloudera, Allegro, Ververica, and Freenow.

Shhh… Use the DataPill200 code to get the 200 PLN discount!
Journey to the Cloud | Zurich | 20th March
Gain expert insights into migrating sensitive workloads securely and optimizing costs. Dive into detailed case studies, including the migration and modernization journeys of Just Eat and Truecaller, to see these principles in action.

Don't miss out on this invaluable opportunity to learn from industry leaders and propel your business forward with confidence!
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on