DATA Pill feed

DATA Pill #153 - GenAI + dbt = dbt-sqlx, 10 tips for migrating from SAS Viya to Snowflake + dbt

ARTICLES

Overclocking dbt: Discord's Custom Solution in Processing Petabytes of Data| 19 min | Data Engineering | Chris Dong | Discord Engineering Blog
Discover how Discord scaled dbt to manage petabytes of data and a large developer team. Learn about their custom solutions to overcome challenges like slow compile times and inefficient backfills.
10 tips for migrating from SAS Viya to Snowflake + dbt | 3 min | Analytics Engineering | Lasse Benninga | Xebia Blog
Get practical advice on transitioning from SAS Viya to Snowflake and dbt. This guide covers handling true deletes, SAS-specific logic, and implementing robust testing practices.
Explore Docker's new tool that simplifies running and testing AI models locally. It standardizes model packaging and supports GPU acceleration for efficient local development.
Optimizing Our E2E Pipeline | 8 min | DevOps | Dan Carton | Slack Engineering Blog
Learn how Slack's DevXP team reduced frontend build times in their CI/CD pipeline by 80% using conditional builds and prebuilt asset caching.

TUTORIALS

Data quality on Databricks - Spark Expectations | 5 min | Data Quality | Bartosz Konieczny | Waiting for Code Blog
Understand how to enforce data quality in Apache Spark using Spark Expectations. This tutorial covers defining and applying various validation rules.
GenAI + dbt = dbt-sqlx: The Easiest Way to Switch SQL Dialects | 4 min | Gen AI | Nikhil Suthar | Data Engineer Things
Discover dbt-sqlx, a GenAI-powered CLI tool that translates dbt models across SQL dialects, simplifying warehouse migrations and reducing manual rewrites.

TOOL

HyperDX | Data Engineering
HyperDX centralizes logs, metrics, traces, exceptions, and session replays, helping engineers quickly diagnose production issues. It's an open-source alternative to Datadog and New Relic.

DATA LIBRARY

polars-bio – fast, scalable and out-of-core operations on large genomic interval datasets | 15 min | Data Engineering | ​​Marek Wiewiórka, Pavel Khamutou, Marek Zbysiński, Tomasz Gambin
Explore polars-bio, a high-performance Python library for analyzing large genomic datasets. Built on Apache Arrow and DataFusion, it offers significant speed and memory efficiency improvements.

PODCAST

Prompts as Functions: The BAML Revolution in AI Engineering | AI | 38 min | Ben Lorica, David Hughes | The Data Exchange Podcast
Learn about BAML, a domain-specific language that transforms prompts into structured functions, enabling more deterministic and maintainable AI applications.

CONFS, EVENTS AND MEETUPS

GoDataFest | Amsterdam | May 26-28th
Join GoDataFest 2025 in Amsterdam for three days of expert-led sessions, hands-on workshops, and networking focused on the latest in data and AI technology. Hosted by Xebia, this in-person event covers topics like modern data platforms, analytics engineering, and MLOps.

PINNACLE PICKS

Your last week top picks:
Announcing Airbyte Embedded | 3 min | AI | Teo Gonzalez | Airbyte Blog
Airbyte now lets you embed data pipelines directly into your AI app. A must-have for building context-rich assistants or copilots.
A simple Python tool that turns docs into Markdown, preserving structure for LLM consumption. Clean, readable, and tailor-made for pipelines.
The Top 7 MCP-Supported AI Frameworks | 19 min | AI | Amos Gyamfi | Personal Blog
A hands-on guide to frameworks like LangChain, Chainlit & Mastra that make integrating tools into LLM agents a breeze using the Model Context Protocol (MCP).
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda