DATA Pill feed

DATA Pill #132 - MinIO, Iceberg, Polars, chDB, NEO, and more!

ARTICLES

FireDucks: Pandas but 100x faster| 5 min | Data Science | Herman Martinus | Personal Blog
Discover FireDucks, a high-performance library compatible with Pandas, designed to speed up data manipulation without changing your existing workflow.
So I Have A Data Product… Now What? | 10 min | Data Strategy | Ryan Duffy | Modern Data 101 Blog
Explore how Demandbase transitioned from ClickHouse to CelerData Cloud, optimizing storage and streamlining real-time data updates.
Did You Update The Documentation? | 4 min | Platform Engineering | Matthijs van der Veer, Rutger Buiteman | Xebia Blog
Learn how automation and modern tools keep API documentation up-to-date, enhancing team collaboration and onboarding efficiency.

TUTORIALS

Data Pipeline Development with MinIO, Iceberg, Nessie, Polars, StarRocks, Mage, and Docker | 15 min | Data Engineering | George Zefkilis | Data Engineer Things
A guide to building a modern data pipeline using lightweight, scalable tools. This tutorial covers medallion architecture and the WAP pattern for streamlined data engineering.

DATA LIBRARY

It's About Time: What A/B Test Metrics Estimate | Data Analytics | Sebastian Ankargren, Mattias Frånberg, Mårten Schultzberg
A detailed comparison of cumulative vs. windowed metrics in A/B testing, helping you choose the right approach based on experiment specifics.
LLM Prompt Tuning Playbook | LLM | Varun Godbole, Ellie Pavlick
A practical guide to crafting effective prompts for post-trained LLMs, offering strategies and frameworks for improving interaction outcomes.

NEWS

Discover how Snowflake users on Azure can now write Iceberg tables directly to OneLake, reducing duplication and improving workflow efficiency.

TOOLS

chDB | Data Engineering
A Python module leveraging ClickHouse for high-performance, serverless OLAP solutions, optimized for SQL-on-Parquet tasks and more.
NEO | ML
An AI-powered ML engineering tool that automates data preparation, model selection, and deployment, boosting productivity for machine learning workflows.

PODCAST

ML Infrastructure Without The Ops: Simplifying The ML Developer Experience With Runhouse | 1 h 16 min | ML | Tobias Macey, Donny Greenberg | AI Engineering Podcast
Learn how to simplify ML workflows and build scalable AI systems with insights from the Runhouse team.

CONFS EVENTS AND MEETUPS

MOPS - Meetup #6| Poznań | 9th November
Join talks on FastAPI for model serving, scaling ML experiments on AWS, and leveraging LLMs for MLOps, followed by networking (and pizza!).
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
2024-11-20 11:45