DATA Pill feed

DATA Pill #067 - GPT-4 + Streaming Data = Real-Time Gen AI, Instagram Explore recommendations system


Sidekick’s Improved Streaming Experience | 6 min | LLM | Ates Goral | Shopify Engineering
This tutorial addresses two ongoing user experience challenges in Large Language Model (LLM) chatbots: Markdown rendering glitches and response delays. It introduces a solution incorporating Markdown rendering buffering and asynchronous content resolution, improving the user experience by enabling smoother streaming of LLM responses. Furthermore, it underscores how this approach enhances user interactions and AI chatbots.
Let’s delve into the challenges of serving personalized content recommendations to billions of users while ensuring relevance and diversity, and the explanation of the machine learning techniques and infrastructure enhancements employed to achieve this scaling, offering insights into the complexities of recommendation systems at a massive scale.
GPT-4 + Streaming Data = Real-Time Generative AI| 13 min | AI | Michael Drogalis | Confluent Tech Blog
While ChatGPT has seen experimentation, the true test is using it for real-world applications like deploying an AI support agent in scenarios such as airline customer assistance. However, handling personal and real-time queries requires accessing internal data. In this one, we delve into the synergy between event streaming and ChatGPT, outlining the architecture and considerations for building a real-time support agent.
Automate the archive and purge data process for Amazon RDS for PostgreSQL using pg_partman, Amazon S3, and AWS Glue | 6 min | ETL | Anand Komandooru, Li Liu, Neil Potter, Vivek Shrivastava | AWS Blog
Let's explore the optimization of archival and data purging procedures for Amazon RDS for PostgreSQL databases using tools such as pg_partman, Amazon S3 and AWS Glue. This tutorial provides valuable insights into automating data management tasks, enabling users to manage their PostgreSQL databases on AWS efficiently. Automation enhances performance and reduces storage costs by streamlining data archival and purging processes.
This one shows how the utilization of environment-based Terraform workspaces in conjunction with Terraform Cloud for development purposes enhances infrastructure management, by allowing teams to create distinct environments and manage their infrastructure as code efficiently, enabling seamless collaboration and version control in the cloud infrastructure development process.
Adopting dbt as the Data Transformation Tool at Instacart | 12 min | ETL | James Zheng | tech-at-instacart
Read Instacart's journey of adopting dbt as their data transformation solution. It offers an inside look into how this decision has revolutionized their data engineering and analytics workflows, emphasizing the pivotal role dbt plays in enhancing data quality and accelerating insights for Instacart's grocery delivery platform.
Multiple Stateful Operators in Structured Streaming | 12 min | Data Streaming | Angela Chu, Jungtaek Lim | Databricks Blog
Discover Databricks' concept of leveraging multiple stateful operators within Structured Streaming, a technology designed for real-time data processing. This article explores the benefits of this approach, highlighting how it amplifies the efficiency and complexity of stateful processing in streaming applications.


The Urgent Need for Responsible Use of Generative AI | 6 min | AI | Heiko Hotz | Towards Data Science
This blog post delves into four key dimensions (Scale & Speed, Personalization, Provenance, Diffusion) that set apart the current era of GenAI from previous phases, emphasizing the urgency of examining ethical and responsible AI utilization. By addressing the question of "Why now?" and focusing on these pivotal factors, this article lays the groundwork for exploring potential solutions in a future piece.
This one aims to blend these two solutions, harnessing the capabilities of both PyCaret and the BigQueryML Inference Engine. Let’s delve into the utilization of ONNX and examine whether this integrated approach provides the simplest and most effective means of training and deploying a machine learning model.
Ask like a human: Implementing semantic search on Stack Overflow | 7 min | Data Engineering | David Haney, David Gibson | The Overflow Blog
This article focuses on Stack Overflow's implementation of semantic search, which aims to create a more human-like and context-aware search experience. It offers valuable insights into how this technology elevates user satisfaction on the platform, by boosting the precision and relevance of search results, ultimately empowering developers to discover solutions to their programming inquiries with greater efficiency.


Microsoft Excel has revolutionized data organization and analysis, serving as a crucial tool for daily decision-making. Now, with the introduction of Python in Excel's Public Preview, users can seamlessly integrate Python analytics into their workbooks, simplifying data processing and visualization without any setup hassles.
Fine-tuning is ready for GPT-3.5 Turbo, and it's planned for GPT-4 later this year, allowing developers to optimize models for specific tasks at scale. Initial tests indicate that fine-tuned GPT-3.5 Turbo can excel in specific tasks, rivaling base GPT-4 capabilities. Importantly, data used with the fine-tuning API remains the exclusive property of the customer, so it cannot be utilized for training other models by OpenAI or any other organization.


Data Journey with Kacper Łodzikowski (Pearson) - Data and AI in learning and education | 44 min | AI | Host: Adam Kawa Guest: Dainius Kniuksta | Radio DaTa Podcast
What will you find in this episode?

  • Data & AI functionalities provided by Pearson in their products
  • Learning new (human) languages with Pearson and/or AI
  • How AI changes the access to education and opens new opportunities worldwide
  • The most important skills that one should focus on in the future

and way more.
Björn Hansen & Kristofer Ågren on creating 20% YOR IoT revenue growth for Telia's Division X | 31 min | IoT | Guests: Björn Hansen, Kristofer Ågren | Accelators
Telia's Division X is a standout success in the IoT industry, exceeding connectivity revenue expectations and diversifying to derive 40% of their income from new streams. With a remarkable 20% year-on-year revenue growth, they're redefining IoT revenue generation and aspire to contribute significantly to Telia's corporate revenue through innovative business models, aiming for a 10x improvement for customers.


Data Mass | Gdańsk | 5th October 2023
This Summit is aimed at people who use the cloud in their daily work to solve Data Engineering, Big Data, Data Science, Machine Learning and AI problems. The main idea of the conference is to promote knowledge and experience in designing and implementing tools for solving difficult and interesting challenges.

Psst… Use the DataPill10 code to get a 10% discount!
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on