Netflix shares how it scales post‑training of large language models (fine‑tuning and reward modelling) across hundreds of GPUs. Topics include distributed optimization, scheduling on heterogeneous clusters, evaluation pipelines and lessons learned from deploying domain‑specific LLMs for personalization and content creation.
Pinterest processes over 90k Spark jobs daily. Felix explains how they cut OOM failures by detecting high‑memory tasks and retrying them on larger executors (“Auto Memory Retries”). The approach makes executor sizing elastic, launching larger profiles only when needed, leading to fewer job failures and lower on‑call load.
Lyft describes its AI‑powered localization platform that translates and adapts UI strings across dozens of languages. The system combines machine translation, LLM‑based context extraction, and human review loops to deliver high‑quality localized copy at scale, reducing turnaround times for product launches.
Feast integrates OpenLineage to automatically capture feature lineage. It records every step from data sources to materialized feature tables, enabling unified visibility across systems, simplifying debugging and compliance checks
Piotr illustrates how to improve integration tests by grouping stubs by service rather than using monolithic helpers. He proposes a stub‑builder pattern that centralises external service stubs, uses fluent APIs to express expectations (stub.userRegistry().willReturnUserPermissions()), and makes test intent obvious. This pattern reduces boilerplate and makes complex integration flows easier to read
This primer outlines why thoughtful agent design beats naive prompting. It covers setting clear goals, picking the right abstraction (single vs multi‑agent), measuring agent behaviour and iterating on prompts, policies and tools. A good starting point if you’re moving from chat demos to production agents.
Ollama introduces subagents that run tasks in parallel (file search, code exploration, research) and built‑in web search for Claude Code. The post explains how to spawn subagents to audit security, find performance bottlenecks, or map database queries, and shows how web search integrates current information into coding sessions.
A comprehensive guide comparing three approaches to semantic layers: warehouse‑native (Snowflake/Databricks), transformation‑layer (dbt MetricFlow) and OLAP‑acceleration (Cube). It breaks down where to locate semantic logic, trade‑offs in performance and governance, and how different patterns evolved
dbt Labs introduces agent skills—bundles of prompts and scripts that embed dbt best practices into AI assistants. Skills cover analytics engineering (building models, writing tests), semantic modeling with MetricFlow, platform operations (troubleshooting, configuring MCP servers) and migration tasks. The post explains how to install and use these skills to turn general coding agents into competent data agents
Pandas 3.0 introduces a dedicated string dtype, copy‑on‑write semantics for predictable behavior, improved datetime resolution, and a new pd.col syntax. It also removes deprecated features and may require code updates
DuckDB announces native support for the Vortex columnar file format. Vortex offers late decompression and compute-in-storage capabilities, allowing DuckDB to filter and process data directly on compressed blocks. The extension is available as a core DuckDB plugin and supports heterogeneous data types and GPU acceleration
A hands‑on tutorial covering how to build, test and distribute skills for Claude. It walks through creating skill files, using evaluation harnesses, and publishing skills for use in Claude Code or other AI assistants.
AI-Driven Development (vibe coding) on Databricks just got a whole lot better! The Databricks AI Dev Kit gives your AI coding assistant (Claude Code, Cursor, etc.) the trusted sources it needs to build faster and smarter on Databricks.
SaaS based stocks are down more than 60% and traditional software companies are facing AI disruption happening as the first wave. How will the AI industry and the broader economy value software companies as the unit economics change with how pricing works with agents and agentic applications that interact with applications?
A curated collection of agent skills for dbt tasks (model building, testing, debugging, semantic layer creation) packaged for Claude Code and other agents. Use these to improve your agent’s data‑engineering abilities.
An open library of skills for Terraform and Packer. Skills encode best practices and patterns, helping AI assistants generate, test, refactor and manage infrastructure code with consistent style and security
Qwen’s new model version features improved reasoning, longer context and API endpoints for custom agent applications.
Databricks’ open-source kit provides templates, connectors and orchestration scripts for building generative AI applications on the Databricks Lakehouse.
A server and library that aggregates content for MCP agents. It supports indexing web pages, documents and codebases, exposing them through a unified MCP interface.
Not all data and AI initiatives deliver value. This webinar series focuses on impact over hype — how data platforms, AI systems, and teams can be designed to genuinely support people, decisions, and outcomes.