How LinkedIn uses Hoptimator to automate end-to-end data pipelines for Apache Pinot—shifting from producer-driven to fully managed, AI-aware ingestion.
Meet dbt’s experimental MCP Server, bridging structured data and LLMs by exposing dbt models, metrics, and lineage through the Model Context Protocol.
A raw, honest take on setting up Polaris as an Iceberg REST catalog in production. Spoiler: it hurt, but it worked.
By switching to a distroless base image and adopting best practices, Dorian cut image size from 380 MB to 60 MB—boosting security and performance.
A practical walkthrough of building a data lake pipeline using MinIO, Trino, Iceberg, dbt, and Airflow—end-to-end and production-ready.
Starlake lets you define extract, load, transform, and test tasks in YAML, and auto-generates DAGs—like Terraform for your data pipelines.
PyCharm merges Community and Pro editions into a single product with a free Pro trial and built-in Jupyter support for all.
From NLP research to building Bauplan, Jacopo dives deep into AI architectures, startup realities, and the evolution of “Git for Data.”
Join for a behind-the-scenes webinar organized by Xebia, Databricks, and Red Flag Alert. During the session, we’ll share the journey of platform design, full production deployment, and workload migration — all completed in just 8 weeks.
DATA Pill is partnering with Infoshare 2025, taking place on May 27-28 in Gdańsk, Poland. As the country’s largest technology conference, Infoshare brings together business and technology leaders to explore the latest trends, tools, and insights across seven thematic stages: AI & Data, DevTrends, Architecture, Growth, Leaders, Marketing, and Inspire. Attendees can expect expert meetups, roundtable discussions, extensive networking opportunities, and side events including a Great Networking Party and a Sunset Leaders Boat Trip. Tickets are available now, and you can get 10% off with the discount code ISC25-DATAPill10.
From architecture to analytics—follow this practical guide to designing a scalable, SQL Server-based data warehouse using the Medallion Architecture.
A hands-on comparison of Flink and Kafka Streams, breaking down time handling, state, deployment, and scalability. Flink comes out on top for flexibility and long-term reliability.
A practical, example-rich walkthrough of how LLMs are applied in daily workflows—perfect for both enthusiasts and builders.