DATA Pill #153 - GenAI + dbt = dbt-sqlx, 10 tips for migrating from SAS Viya to Snowflake + dbt

ARTICLES

Overclocking dbt: Discord's Custom Solution in Processing Petabytes of Data| 19 min | Data Engineering | Chris Dong | Discord Engineering Blog

Discover how Discord scaled dbt to manage petabytes of data and a large developer team. Learn about their custom solutions to overcome challenges like slow compile times and inefficient backfills.

10 tips for migrating from SAS Viya to Snowflake + dbt | 3 min | Analytics Engineering | Lasse Benninga | Xebia Blog

Get practical advice on transitioning from SAS Viya to Snowflake and dbt. This guide covers handling true deletes, SAS-specific logic, and implementing robust testing practices.

Introducing Docker Model Runner: A Better Way to Build and Run GenAI Models Locally | 3 min | AI | Deanna Sparks | Docker Blog

Explore Docker's new tool that simplifies running and testing AI models locally. It standardizes model packaging and supports GPU acceleration for efficient local development.

Optimizing Our E2E Pipeline | 8 min | DevOps | Dan Carton | Slack Engineering Blog

Learn how Slack's DevXP team reduced frontend build times in their CI/CD pipeline by 80% using conditional builds and prebuilt asset caching.

TUTORIALS

Data quality on Databricks - Spark Expectations | 5 min | Data Quality | Bartosz Konieczny | Waiting for Code Blog

Understand how to enforce data quality in Apache Spark using Spark Expectations. This tutorial covers defining and applying various validation rules.

GenAI + dbt = dbt-sqlx: The Easiest Way to Switch SQL Dialects | 4 min | Gen AI | Nikhil Suthar | Data Engineer Things

Discover dbt-sqlx, a GenAI-powered CLI tool that translates dbt models across SQL dialects, simplifying warehouse migrations and reducing manual rewrites.

TOOL

HyperDX | Data Engineering

HyperDX centralizes logs, metrics, traces, exceptions, and session replays, helping engineers quickly diagnose production issues. It's an open-source alternative to Datadog and New Relic.

DATA LIBRARY

polars-bio – fast, scalable and out-of-core operations on large genomic interval datasets | 15 min | Data Engineering | Marek Wiewiórka, Pavel Khamutou, Marek Zbysiński, Tomasz Gambin

Explore polars-bio, a high-performance Python library for analyzing large genomic datasets. Built on Apache Arrow and DataFusion, it offers significant speed and memory efficiency improvements.

PODCAST

Prompts as Functions: The BAML Revolution in AI Engineering | AI | 38 min | Ben Lorica, David Hughes | The Data Exchange Podcast

Learn about BAML, a domain-specific language that transforms prompts into structured functions, enabling more deterministic and maintainable AI applications.

CONFS, EVENTS AND MEETUPS

GoDataFest | Amsterdam | May 26-28th

Join GoDataFest 2025 in Amsterdam for three days of expert-led sessions, hands-on workshops, and networking focused on the latest in data and AI technology. Hosted by Xebia, this in-person event covers topics like modern data platforms, analytics engineering, and MLOps.

PINNACLE PICKS

Your last week top picks:

Announcing Airbyte Embedded | 3 min | AI | Teo Gonzalez | Airbyte Blog

Airbyte now lets you embed data pipelines directly into your AI app. A must-have for building context-rich assistants or copilots.

MarkItDown | LLM

A simple Python tool that turns docs into Markdown, preserving structure for LLM consumption. Clean, readable, and tailor-made for pipelines.

The Top 7 MCP-Supported AI Frameworks | 19 min | AI | Amos Gyamfi | Personal Blog

A hands-on guide to frameworks like LangChain, Chainlit & Mastra that make integrating tools into LLM agents a breeze using the Model Context Protocol (MCP).

________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on G itHub