DATA Pill #136 - From Apache Iceberg to Real-Time AI: Trends, Tutorials, and Tools for Modern Data Pros

ARTICLES

Apache Iceberg: The Hadoop of the Modern Data Stack? | 6 min | Data Engineering | Dani | Data Engineer Things

Apache Iceberg is likened to Hadoop for its role in managing evolving datasets with ACID compliance and schema evolution. However, rapid adoption may lead to technical debt and bottlenecks without proper planning.

My LLM’s outputs got 1000% better with this simple trick| 5 min | LLM | Nikhil Anand | AI Advances

Learn how a technique called "logit transformation" and filtering functions improved LLM accuracy and fluency during an Adobe Research experiment.

TUTORIALS

Should You Ditch Spark for DuckDb or Polars? | 35 min | Data Engineering | Miles Cole | Personal Blog

Benchmarking DuckDB and Polars against Spark for smaller workloads reveals performance and cost advantages—though engine maturity varies.

Microsoft Fabric and Databricks Mirroring | 5 min | Data Engineering | Mariusz Kujawski | Personal Blog

Discover how to integrate Databricks with Microsoft Fabric to simplify data processing and reporting via Unity Catalog and SQL endpoints.

Exploring Flink CDC| 9 min | Data Engineering | Robin Moffatt | Decodable Blog

Simplify pipelines with Flink CDC’s YAML configurations, handling tasks like schema evolution and primary key management with ease.

Two ways to perform CI/CD for SQL databases in Fabric using YAML Pipelines | 9 min | DevOps | Kevin Chant | Personal Blog

Step-by-step guide to creating YAML pipelines in Azure DevOps for SQL database CI/CD, including schema extraction and deployment tips.

Best 5 Frameworks To Build Multi-Agent AI Applications | 26 min | ML | Samy Baladram | Stream Blog

Explore frameworks like Phidata and LangGraph for developing advanced multi-agent AI systems powered by LLMs.

Real-Time AI Stock Advisor with Ollama (Llama 3) & Streamlit | 6 min | LLM | Tapan Babbar | InsiderFinance Wire

Build a stock advisor app with Streamlit and Ollama’s Llama 3 for minute-by-minute market analysis and insights.

WEBINAR ON-DEMAND

Explore how LLMOps tackles challenges like prompt sensitivity, cost control, and model tuning for operationalizing GenAI systems.

DATA TUBE

OpenLineage:From operators to hooks | 52 min | Data Engineering | Maciej Obuchowski | Apache Airflow

Dive into Airflow’s latest OpenLineage updates, enhancing data pipeline lineage coverage with AIP-62 and beyond.

CONFS EVENTS AND MEETUPS

Big Data Technology Warsaw Summit | Warsaw and Online | 9th and 10th April

Join over 600 attendees and 90 speakers for technical sessions, workshops, and networking opportunities in one of the biggest Big Data events of the year.

________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on G itHub

➡ Dig previous editions of DataPill

2024-12-19 11:05