ARTICLES
FireDucks: Pandas but 100x faster| 5 min | Data Science | Herman Martinus | Personal Blog
Discover FireDucks, a high-performance library compatible with Pandas, designed to speed up data manipulation without changing your existing workflow.
So I Have A Data Product… Now What? | 10 min | Data Strategy | Ryan Duffy | Modern Data 101 Blog
Explore how Demandbase transitioned from ClickHouse to CelerData Cloud, optimizing storage and streamlining real-time data updates.
Did You Update The Documentation? | 4 min | Platform Engineering | Matthijs van der Veer, Rutger Buiteman | Xebia Blog
Learn how automation and modern tools keep API documentation up-to-date, enhancing team collaboration and onboarding efficiency.
TUTORIALS
Data Pipeline Development with MinIO, Iceberg, Nessie, Polars, StarRocks, Mage, and Docker | 15 min | Data Engineering | George Zefkilis | Data Engineer Things
A guide to building a modern data pipeline using lightweight, scalable tools. This tutorial covers medallion architecture and the WAP pattern for streamlined data engineering.
DATA LIBRARY
It's About Time: What A/B Test Metrics Estimate | Data Analytics | Sebastian Ankargren, Mattias Frånberg, Mårten Schultzberg
A detailed comparison of cumulative vs. windowed metrics in A/B testing, helping you choose the right approach based on experiment specifics.
LLM Prompt Tuning Playbook | LLM | Varun Godbole, Ellie Pavlick
A practical guide to crafting effective prompts for post-trained LLMs, offering strategies and frameworks for improving interaction outcomes.
NEWS
Store and access your Iceberg data in OneLake using Snowflake and shortcuts | 5 min | Data Analytics | Microsoft Fabric Blog
Discover how Snowflake users on Azure can now write Iceberg tables directly to OneLake, reducing duplication and improving workflow efficiency.
TOOLS
chDB | Data Engineering
A Python module leveraging ClickHouse for high-performance, serverless OLAP solutions, optimized for SQL-on-Parquet tasks and more.
NEO | ML
An AI-powered ML engineering tool that automates data preparation, model selection, and deployment, boosting productivity for machine learning workflows.
PODCAST
ML Infrastructure Without The Ops: Simplifying The ML Developer Experience With Runhouse | 1 h 16 min | ML | Tobias Macey, Donny Greenberg | AI Engineering Podcast
Learn how to simplify ML workflows and build scalable AI systems with insights from the Runhouse team.
CONFS EVENTS AND MEETUPS
MOPS - Meetup #6| Poznań | 9th November
Join talks on FastAPI for model serving, scaling ML experiments on AWS, and leveraging LLMs for MLOps, followed by networking (and pizza!).
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Dig previous editions of DataPill