DATA Pill feed

DATA Pill #099 - Conventional RAG → Graph RAG, Knowledge Graphs using Neo4j and Vertex AI


From Conventional RAG to Graph RAG | 13 min | LLM | Terence Lucas Yap | Government Digital Services, Singapore
LLMs, like ChatGPT, often rely on fixed datasets, resulting in outdated responses and challenges in updating knowledge without retraining. RAG, including Graph RAG, addresses this by integrating external knowledge bases and enriching responses with current information for improved accuracy and depth.
How I Built This Data Platform in One Week | 7 min | Data Engineering | Dorian Teffo | DataDrivenInvestor
This project involves applying recently acquired DevOps skills to construct a comprehensive data platform and update analytics daily. Dorian streamlines data processing and workflow orchestration using modern tools like Snowflake, Airbyte, and DBT, prioritizing simplicity and functionality in their approach.
Platform Engineering Essentials: 5 Key Learnings Before You Start | 5 min | DevOps | Bert Rijsdijk | Xebia Blog
Platform engineering offers immense potential for enhancing organizational efficiency and developer experience. Yet navigating its complexities requires addressing challenges such as conflicting objectives, ambiguous goals, and the urgency of adoption. Drawing from firsthand experiences implementing IDPs and CDaaS, this text will highlight five key insights for successful platform initiatives.
Real-Time Twitch Chat Sentiment Analysis with Apache Flink | 8 min | Data Streaming | Volker Janz | Towards Data Science
Learn how to empower creators by real-time sentiment analysis with Apache Flink to decipher audience emotions to steer content for viewer satisfaction.


An End-to-End Framework for Production-Ready LLM Systems by Building Your LLM Twin | Online, 11 lessons | LLM | Paul Iusztin | Decoding ML
The course is split into 11 lessons. Every Medium article will be its own lesson.

  1. An End-to-End Framework for Production-Ready LLM Systems by Building Your LLM Twin
  2. The importance of Data Pipelines in the Era of Generative AI
  3. CDC [Module 1] …WIP
  4. Streaming ingestion pipeline [Module 2] …WIP
  5. Vector DB retrieval clients [Module 2] …WIP
  6. Training data preparation [Module 3] …WIP
  7. Fine-tuning LLM [Module 3] …WIP
  8. LLM evaluation [Module 4] …WIP
  9. Quantization [Module 5] …WIP
  10. Build the digital twin inference pipeline [Module 6] …WIP
  11. Deploy the digital twin as a REST API [Module 6] …WIP


Building Knowledge Graphs from Scratch Using Neo4j and Vertex AI | 19 min | ML | Rubens Zimbres | Personal Blog
Rubens explored the free "Knowledge Graphs for RAG" course. This course meticulously details the creation of Knowledge Graphs from SEC forms, defining nodes and relationships. The individual aims to replicate their results by combining code snippets and visualizing the Knowledge Graph using Neo4j Workspace. Check out how it went.
From MongoDB to Dashboards with Dremio and Apache Iceberg | 14 min | Data Engineering | Alex Merced | Dremio blog
This post will explore how Dremio's data lakehouse platform simplifies your data delivery for business intelligence by doing a prototype version that can run on your laptop.


GID Data Copilot Demo | 5 min | Gen AI | GetInData | Part of Xebia
GID Data Copilot - An extensible AI programming assistant for SQL and dbt code:
  • Powered by Large Language Models (SOTA LLMs)
  • Robust Retrieval Augmented Generation (RAG) architecture
  • Hybrid search techniques
  • Fast Vector Database
  • Curated Prompts
  • Builtin Data commands


Are long context windows the end of RAG? | 29 min | LLM | Michael Foree, Cassidy Williams | Stack Overflow Podcast
The home team is joined by Michael Foree, Stack Overflow’s director of data science and data platform, and occasional cohost Cassidy Williams, CTO at Contenda, for a conversation about long context windows, retrieval-augmented generation, and how Databricks’ new open LLM could change the game for developers.


Deep Learning on Rails with PyTorch Lightning | Online | 9th April 11 AM ET
Key Takeaways:

  • Learn how to create and run a deep learning model.
  • Learn how to perform machine learning workflows in PyTorch Lightning.
  • See how Lightning Studio can be used for deep learning and AI development.
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on