DATA Pill feed

DATA Pill #195 – Platform Engineering, Container Scaling, Analytics Skills, Agent Frameworks, File Formats & Hybrid LLMs

ARTICLES

Digital transformation at Santander: How platform engineering is revolutionizing cloud infrastructure | Julio Bando, Jaime Nagase, Robert da Costa, Edgar Costa Filho, Guilherme Greco, Joao Melo, Jacob Mevorach, and Michael Silva | AWS | Platform Engineering
Santander’s “Catalyst” platform reduces infrastructure provisioning from weeks to hours. Built on Amazon EKS and Crossplane, the platform defines resources through Git repositories; ArgoCD syncs changes into multi‑cloud clusters and Open Policy Agent enforces compliance centrally. The bank has rolled out generative‑AI agent stacks using Bedrock, S3 and KMS with custom IAM policies; proof‑of‑concept preparation dropped from 90 days to 1 hour and monthly data‑platform tickets fell by 3,000. The result is a self‑service culture where developers can launch secure environments rapidly and platform teams maintain governance.
Netflix rethought its container strategy to harness modern CPUs. Containers provide flexible deployment and shorten time to production, but new CPUs with hyper‑threading and larger caches required revisiting resource allocation. Engineering teams identified bottlenecks in per‑container mount operations and implemented advanced monitoring to track resource usage and AI‑driven orchestration that anticipates demand spikes and reallocates workloads
Kickstart your analytics engineering career | Pádraic Slattery, Camila Birocchi | Xebia | Analytics Engineering
Xebia offers a step‑by‑step guide to build a modern data transformation pipeline using free tools. It recommends a serverless warehouse such as Google BigQuery and dbt for SQL‑based transformations; Git/GitHub handle version control and GitHub Actions automate CI/CD. The roadmap covers provisioning BigQuery, connecting dbt Cloud to a GitHub repository, organising models into staging/intermediate/marts layers, adding tests and exposures, and integrating BI tools
Introducing RockBot | Rockford Lhotka | AI Agents
RockBot is an open‑source framework for building multi‑agent systems where agents and user proxies communicate through a message bus. The author created it after finding existing frameworks insecure and monolithic; many run LLM‑generated code in the host process, making it hard to swap providers and isolating runaway code. RockBot solves this by running each agent as an isolated process that subscribes to topics on a RabbitMQ‑backed bus, calls tools/LLMs as needed and emits responses
Introducing the Apache Iceberg File Format API | Apache Iceberg | Data Formats
Iceberg has finalized a File Format API that makes file formats pluggable and engine‑agnostic. The new layer replaces fragmented logic across Spark, Flink and Java readers, eliminating large switch statements and uneven feature support. Core concepts include a FormatModel describing a format’s name, reader/writer construction and capabilities, and a FormatModelRegistry that engines query to obtain read/write builders.
The Allen Institute for AI released Olmo Hybrid 7B, a language model that mixes transformer attention with Gated DeltaNet linear recurrence. By replacing 75 % of attention layers with DeltaNet heads, the model achieves the same accuracy as Olmo 3 7B while using 49 % fewer tokens

FREE COURSES

Anthropic’s course portal offers a range of free learning paths. Courses include: Claude 101, AI Fluency, Building with the Claude API, Introduction to Agent Skills and more

DATATube

A short explainer on the hidden technical debt created when AI generates code that humans no longer fully understand. It cautions teams to pair LLM‑generated code with documentation and code reviews to avoid long‑term maintainability issues.
Dlaczego UV jest lepszy niż PIP - Praktyczny Tutorial| Marcin Zabłocki | Wojtek Mikołajczyk | ML-Workout | 25 min
A practical tutorial comparing Python’s uv package manager to pip. The presenter demonstrates faster installs, offline caching and dependency resolution, showing why uv improves the developer experience.

TOOLS

A terminal‑based tool that inspects your hardware and suggests large language models that fit your CPU, GPU and memory. The interactive TUI lists models with estimated token/s throughput, context length and quantization; CLI mode supports searching, planning required hardware and exposing a REST API
Modal’s research platform hosts an API endpoint for Z.ai’s GLM‑5 model (745 billion parameters). The endpoint allows one concurrent request and is free until 30 April 2026; a curl example shows how to call the /chat/completions API with the zai-org/GLM-5-FP8 model
Fluss is a streaming storage layer for real‑time analytics and lakehouse architectures. The Rust client enables table management and log streaming operations; it details how to start a local Fluss cluster (requiring Java 17+), create tables and append logs.
The official Anthropic marketplace (claude-plugins-official) comes preconfigured; categories include code intelligence(connecting language‑server plugins for C++, Python, Rust, etc.), external integrations (GitHub, Jira, Slack), development workflows (commit commands, PR review agents) and output styles for educational or learning‑focused responses

CONFS, EVENTS, WEBINARS & MEETUPS

Operationalizing AI Agents: From Experimentation to Production | MLOps Community | March 25, 2026

A livestream panel exploring the challenges of deploying AI agents at scale. Speakers from Databricks, GrottoAI and Sentick discuss reliability, cost, governance and security barriers; they compare LLMOps/AIOps/AgentOps with traditional MLOps and outline success criteria for generative‑AI frameworks, including evaluation and observability

Data & AI That Matter From systems to people: building data and AI with real-world impact| Webinar Online | March 26, 2026
Not all data and AI initiatives deliver value. This webinar series focuses on impact over hype — how data platforms, AI systems, and teams can be designed to genuinely support people, decisions, and outcomes.

PINNACLE PICKS

Your last edition top picks:
Scaling LLM Post‑Training at Netflix | Netflix | Baolin Li, Lingyi Liu, Binh Tang, Shaojing Li | 7 min | LLM Infrastructure
Netflix shares how it scales post‑training of large language models (fine‑tuning and reward modelling) across hundreds of GPUs. Topics include distributed optimization, scheduling on heterogeneous clusters, evaluation pipelines and lessons learned from deploying domain‑specific LLMs for personalization and content creation.

Scaling Localization with AI at Lyft | Lyft | Stefan Zier | 6 min | AI in Localization
Lyft describes its AI‑powered localization platform that translates and adapts UI strings across dozens of languages. The system combines machine translation, LLM‑based context extraction, and human review loops to deliver high‑quality localized copy at scale, reducing turnaround times for product launches.

Make Your AI Better at Data Work with dbt’s Agent Skills | dbt | Joel Labes & Jason Ganz | 14 min | Developer Tools
dbt Labs introduces agent skills—bundles of prompts and scripts that embed dbt best practices into AI assistants. Skills cover analytics engineering (building models, writing tests), semantic modeling with MetricFlow, platform operations (troubleshooting, configuring MCP servers) and migration tasks. The post explains how to install and use these skills to turn general coding agents into competent data agents
_____________________
Have any interesting content to share in the DATA Pill newsletter? Reach Out!
Made on
Tilda