DATA Pill feed

DATA Pill #124 - SQL Has Problems, RAG API, QueryGPT

ARTICLES

Industrial IoT Middleware for Edge and Cloud OT/IT Bridge powered by Apache Kafka and Flink | 11 min | IoT / Data Streaming | Kai Waehner | Personal Blog
Explore how Apache Kafka and Flink bridge real-time operations and business systems, enabling predictive maintenance and smart decision-making through seamless data flow in industrial IoT environments.
I spent 5 hours learning how ClickHouse built their internal data warehouse | 8 min | Data Warehouse | Vu Trinh | Personal Blog
A deep dive into how ClickHouse engineers developed and optimized their internal data warehouse to process 50 TB of data daily. Discover their strategies for performance enhancement and scaling.

TUTORIALS

QueryGPT – Natural Language to SQL Using Generative AI | 5 min | AI | Jeffrey Johnson, Callie Busch, Abhi Khune, Pradeep Chakka | Uber Engineering Blog
Discover QueryGPT, Uber’s innovative tool that transforms natural language into SQL queries using generative AI, drastically reducing the time required to generate complex queries.
Generate a preference dataset| 4 min | LLM | Distialbel Docs
A step-by-step guide on building a pipeline to generate preference datasets using the Distilabel SDK and Hugging Face Inference API, from data preparation to model evaluation.
ETL for Beginners: Data Ingestion at Scale with S3 and Snowflake | 6 min | ETL | Tamara Fingerlin | Astronomer Blog
This blog focuses on creating an automated daily ingestion pipeline from S3 to Snowflake using Airflow, offering a hands-on guide to setting up and managing data flows, even with trial versions of the required tools.

DATA LIBRARY

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL| Database Technologies | Jeff Shute, Shannon Bales, Matthew Brown, Jean-Daniel Browne, Brandon Dolphin, Romit Kudtarkar, Andrey Litvinov, Jingchi Ma, John Morcos, Michael Shen, David Wilhite, Xi Wu, Lulan Yu | Google Research
GoogleSQL introduces piped data flow syntax to address usability and extensibility challenges in SQL, making it more flexible and user-friendly without significant system changes
Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot | LLM | Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain | Cornell University arXiv
This case study examines expert-verified chatbots in healthcare, showing how an LLM-powered bot improved verification rates and reduced errors over time.

PODCAST

Process mining with LLMs | 26 min | LLM | Kyle Polich, David Obembe | Data Skeptic Podcast
David Obembe discusses how LLMs can enhance process mining tools, sharing insights from his research on conversational interfaces and future advancements using RAG.

DATA TUBE

RAG API - 30 lines of code is all you need for RAG | 23 min | ML | Sascha Heyer | ML Engineer
Learn how to implement RAG with minimal code using Google Cloud's RAG API, providing an efficient way to retrieve and integrate relevant documents for smarter query responses.
Data Streaming in the Age of AI | 1h 37 min | Data Streaming | Jay Kreps | Confluent
Learn how top leaders in services, media, and automotive industries are using Confluent's Data Streaming Platform to innovate, transform decisions, and drive business forward with real-time data.

CONFS EVENTS AND MEETUPS

Dive into the world of LLMOps to learn how to transition from demo applications to production-grade systems, tackling challenges like prompt sensitivity, cost control, and model tuning.
Infoshare DEV | Gdynia | 16th October
2 stages dedicated entirely to the latest technologies await you at this conference.
What topics does it cover?
Architecture | AI/ML | Data Science | DevOps & Cloud | People & Culture | Java | Tests | UX | Front-end | CyberSecurity | Programming
And, since we are the community partner of Infoshare DEV, we have a discount code! Use DEV24-DP10 code to get the 10% discount.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
2024-09-25 12:26