DATA Pill feed

DATA Pill #106 - OpenAI GPT-4o to query your database, Postgres on Kubernetes


My First Billion (of Rows) in DuckDB | 12 min | Data Processing | João Pedro | Towards Data Science
João explores DuckDB by revisiting the challenge of processing Brazilian electronic ballot box logs to calculate vote-time metrics, providing a benchmark for performance and user experience.
Scale Real-Time Streams to Delta Lakehouse with Apache Flink on Azure HDInsight on AKS | 7 min | Real-Time Streaming | Sairam Yeturi, Keshav Singh | Microsoft Blog
This blog explores using Delta format as a source and sink for Apache Flink stream processing. Delta, an ACID-compliant lakehouse format, supports petabyte-scale processing and acts as a single source of truth, seamlessly integrating with Microsoft Fabric.
Unveiling the Future of Streaming Data Platforms | 10 min | Data Streaming | Filip Yonov, Kaye Lincoln | Ververica Blog
Filip Yonov, Head of Streaming at Avien, is joining this year's Flink Forward Program Committee. Read a short Q&A session about his journey with streaming data platforms and his insights on upcoming industry trends.
How to use OpenAI GPT-4o to query your database? | 5 min | SQL | Howard Chi | WrenAI Blog
This post will guide you through setting up GPT-4o with WrenAI to query your PostgreSQL database, enhancing your data retrieval process with faster responses and cost efficiency.
Fine-tuning AWS ASGs with Attribute Based Instance Selection | 5 min | Data Engineering | Ajay Pratap Singh | Yelp Engineering
This post covers how attribute-based instance selection improved Yelp's autoscaling and their switch from Clusterman to Karpenter.


Amazon DocumentDB's zero-ETL integration with Amazon OpenSearch Service simplifies your data architecture and boosts search capabilities. Read about the setup process, making advanced search analytics effortless.


Building a Real-Time Data Pipeline | 11 min | Data Engineering | Andy Sawyer | Personal Blog
Andy demonstrates creating a real-time data pipeline using Kafka, Polars, and Delta Lake. It’s easier than you might think, and you can find the code on their GitHub to try it yourself.


Postgres on Kubernetes | 1 h 12 min | Data Engineering | Álvaro Hernández | Kubernetes Podcast
Álvaro Hernández is the founder and CEO of OnGres a company that provides among other things a distribution of Postgres that runs on Kubernetes, called “StackGres”.


What's next for AI agentic workflows ft. Andrew Ng of AI Fund | 14 min | AI | Andrew Ng | Sequoia Capital
Andrew Ng, founder of DeepLearning.AI and AI Fund shows the difference between non agentic workflow (LLM based) and agentic workflow in a smooth, insightful way based on example speech. Zero-shot vs iterative workflow. RAG vs Agentic RAG. See how the other one gives a better outcome.


Data Learning Week | Online | 28-31th May
Would you like to test one of our courses before investing? Then come to our Data Learning Week, a series of 4 free workshops. Each session is a free first-trial lesson for the entire training. Choose your topic, check the agenda, and sign up:

  • GenAI taster: discover the power of ChatGPT
  • dbt Learn training taster: the new standard for data transformation
  • Find valuable data use cases with Analytics Translation
  • Power BI in an hour
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on