ARTICLES
Apache Iceberg: The Hadoop of the Modern Data Stack? | 6 min | Data Engineering | Dani | Data Engineer Things
Apache Iceberg is likened to Hadoop for its role in managing evolving datasets with ACID compliance and schema evolution. However, rapid adoption may lead to technical debt and bottlenecks without proper planning.
My LLM’s outputs got 1000% better with this simple trick| 5 min | LLM | Nikhil Anand | AI Advances
Learn how a technique called "logit transformation" and filtering functions improved LLM accuracy and fluency during an Adobe Research experiment.
TUTORIALS
Should You Ditch Spark for DuckDb or Polars? | 35 min | Data Engineering | Miles Cole | Personal Blog
Benchmarking DuckDB and Polars against Spark for smaller workloads reveals performance and cost advantages—though engine maturity varies.
Microsoft Fabric and Databricks Mirroring | 5 min | Data Engineering | Mariusz Kujawski | Personal Blog
Discover how to integrate Databricks with Microsoft Fabric to simplify data processing and reporting via Unity Catalog and SQL endpoints.
Exploring Flink CDC| 9 min | Data Engineering | Robin Moffatt | Decodable Blog
Simplify pipelines with Flink CDC’s YAML configurations, handling tasks like schema evolution and primary key management with ease.
Two ways to perform CI/CD for SQL databases in Fabric using YAML Pipelines | 9 min | DevOps | Kevin Chant | Personal Blog
Step-by-step guide to creating YAML pipelines in Azure DevOps for SQL database CI/CD, including schema extraction and deployment tips.
Best 5 Frameworks To Build Multi-Agent AI Applications | 26 min | ML | Samy Baladram | Stream Blog
Explore frameworks like Phidata and LangGraph for developing advanced multi-agent AI systems powered by LLMs.
Real-Time AI Stock Advisor with Ollama (Llama 3) & Streamlit | 6 min | LLM | Tapan Babbar | InsiderFinance Wire
Build a stock advisor app with Streamlit and Ollama’s Llama 3 for minute-by-minute market analysis and insights.
WEBINAR ON-DEMAND
LLMOps: from Demo to Production-Ready GenAI Systems| 46 min | LLMops | Marek Wiewiórka | GetInData | Part of Xebia
Explore how LLMOps tackles challenges like prompt sensitivity, cost control, and model tuning for operationalizing GenAI systems.
DATA TUBE
OpenLineage:From operators to hooks | 52 min | Data Engineering | Maciej Obuchowski | Apache Airflow
Dive into Airflow’s latest OpenLineage updates, enhancing data pipeline lineage coverage with AIP-62 and beyond.
CONFS EVENTS AND MEETUPS
Big Data Technology Warsaw Summit | Warsaw and Online | 9th and 10th April
Join over 600 attendees and 90 speakers for technical sessions, workshops, and networking opportunities in one of the biggest Big Data events of the year.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Dig previous editions of DataPill