DATA Pill feed

DATA Pill #166: Small Language Models Are the Future, Streaming ETL for ML, Multi-Agent System Patterns

ARTICLES

NVIDIA Says Small Language Models Are The Future of Agentic AI | SLM | 5 min | Cobus Greyling | Personal Blog
Small models are faster, safer, and better suited for real-time AI. NVIDIA explains why they may outpace large LLMs in practical applications.
Streaming pipelines reduce latency, improve feature freshness, and unlock continuous model updates.
What Every AI Engineer Should Know About A2A, MCP & ACP | 7 min | AI | Edwin Lisowski | Personal Blog
A practical guide to three core agentic system patterns for reasoning and structured control.

TUTORIALS

How to Profile Models in PyTorch | 5 min | MLOps | Quentin-Anthon | Personal Blog
Learn how to trace, debug, and optimize model performance using PyTorch’s native tools.
Walkthrough for running Langfuse locally to trace and debug LLM agents with full control.
Large Language Models as Classification Engines: Overkill, or Awesome? | 12 min | LLM | Katherine Munro | Towards AI Blog
A comparison of LLMs versus traditional classifiers in terms of cost, performance, and practicality.
Handling Long-Running Operations in Microsoft Fabric REST API | 3 min | Data Engineering | Microsoft Ignite Blog
How to manage async operations in Fabric using polling and status tracking.

TOOL

Embedding User-Defined Indexes in Apache Parquet Files | 7 min | Data Engineering | Qi Zhu, Jigao Luo, Andrew Lamb | Apache DataFusion Blog
DataFusion introduces custom Parquet indexing for faster queries on large datasets.

DATA LIBRARY

MedGemma Technical Report | LLM | Hugging Face
New findings suggest long-context LLMs may be overrated for many tasks, and retrieval methods often perform better.

DATA TUBE

The Agent Factory - Episode 2: Multi-Agent Systems, Concepts & Patterns | 23 min | Gen AI | Vlad Kolesnikov, Shir Meir Lador | Google Cloud Tech
Vlad Kolesnikov and Shir Meir Lador explain how to design collaborative agents using swarms, supervisors, and context engineering.

PINNACLE PICKS

Your last week top picks:
Direct Data Sharing using Delta Sharing - Introduction: Our Journey to Empower Partners at Zalando | Data Governance | 5 min | Lokeshbabu Radhakrishnan | Zalando Engineering Blog
Zalando is rolling out Delta Sharing to give partners real-time, governed access to data. No more manual exports, just scalable interoperability across teams and systems.
Introducing DataFrame API Support for Table-Valued Functions in Databricks | 5 min | Data Frames | Allison Wang, Takuya Ueshin, Jules Damji | Databricks Blog
You can now reuse complex logic with parameterized TVFs directly in the DataFrame API. Write cleaner pipelines without losing SQL-style reusability.
Why We Replaced Kafka with gRPC for Service Communication | 5 min | Data Engineering | Himanshu Singour | Personal Blog
One team ditched Kafka in favor of gRPC to reduce latency and simplify infra. A thoughtful case study that challenges default architectural choices.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda