DATA Pill feed

DATA Pill #135 - LLM Fine-Tuning for Modern AI Teams, Data Pipelines with Apache Airflow

ARTICLES

Our journey to Snowflake monitoring mastery | 6 min | Data Platform | Rob Scriva | Canva Engineering Blog
See how Canva uses Snowflake to power scalable data workflows, improve cost visibility, and drive insights with tools like dbt and Looker.

DATA LIBRARY

Data Pipelines with Apache Airflow, Second Edition | Data Engineering | Julian de Ruiter, Ismael Cabral, Kris Geusebroek, Daniel van der Ende, Bas Harenslak
Learn to build scalable, efficient data pipelines with Airflow’s latest features, including the Taskflow API and LLM integration. This guide offers best practices for workflow design and production-ready deployments. Perfect for modern data engineers.

As a bonus: use the code 'mlderuiter' to get a 50% discount! This code is valid until December 25th, making it a perfect Christmas gift!

TUTORIALS

Quering OneLake Delta Lake Tables from DuckDB CLI | 4 min | Data Lakehouse | Aitor Murguzur | Personal Blog
Learn how DuckDB's CLI simplifies querying OneLake Delta tables, offering a lightweight, high-performance approach for data exploration.
Democratizing access to AI through GitHub Models | 7 min | AI | Rob Bos | Xebia Blog
Explore how GitHub-hosted AI models are making AI more accessible, driving innovation and collaboration for developers and organizations.
How to Read Unity Catalog Tables in Snowflake, in 4 Easy Steps | 6 min | Data Management | Aniruth Narayanan, Randy Pitcher, Susan Pierce | Databricks Blog
Understand how to seamlessly integrate Unity Catalog tables within Snowflake, bridging the gap between modern data platforms.
Model Validation Techniques, Explained: A Visual Guide with Code Examples | 26 min | ML | Samy Baladram | Towards Data Science
A visual guide to common model validation techniques, complete with code examples for practical implementation in your ML projects.

NEWS

vLLM, a high-performance inference engine for large language models, is now part of the PyTorch ecosystem, bringing innovations like PagedAttention for scalable AI serving.

TOOLS

Distributed Restate - A first look| 11 min | Cloud Computing | Stephan Ewen, Ahmed Farghal, Till Rohrmann | Restate Blog
Distributed Restate delivers geo-distributed, scalable, and fault-tolerant runtime with strong consistency and seamless failover powered by its log-first architecture.
Skimpy is a light weight tool that provides summary statistics about variables in pandas or Polars data frames within the console or your interactive Python window.

DATA TUBE

Learn how fine-tuned models like Mistral 7B rival commercial LLMs with proper datasets. Explore model selection, dataset preparation, fine-tuning, and evaluation with Airtrain AI tools.
Alaska Airlines enhances the travel experience with generative AI-powered searches | 32 min | AI | David Nguyen, Charu Jain, Nemo Hajiyusuf | Google Cloud
Explore how Alaska Airlines uses generative AI and Google Cloud to revolutionize destination search with personalized, intuitive experiences.

CONFS EVENTS AND MEETUPS

AI or ROI? How to Measure Value? | Webinar | 16th December
This session offers proven tools from our Data and AI consultancy, helping you assess your organization, craft improvement plans, and turn AI investments into real business outcomes.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
2024-12-12 09:14