DATA Pill feed

DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

ARTICLES

Exploring the Potential of Graph Neural Networks to Transform Recommendations at Zalando | 4 min | Recommendations Systems | Mariia Bulycheva | Zalando Engineering Blog
Zalando leverages Graph Neural Networks to enhance homepage CTR predictions, achieving a 0.6% improvement in ROC-AUC. Discover how user-content interaction patterns drive personalized recommendations.
Learn how eBay’s custom Llama-based models deliver domain-specific capabilities while maintaining efficiency and security, setting new standards for AI in e-commerce.

TUTORIALS

Simplify Amazon MSK topic provisioning with Terraform. A step-by-step guide to improving consistency and scalability while reducing manual errors.
Airflow in a multi-teams / multi-tenant environment. Deployment strategies | 22 min | Data Engineering | Kacper Muda | GetInData | Part of Xebia Blog
Explore deployment solutions for Apache Airflow in multi-team environments. Highlights include resource isolation, shared access options, and a glimpse at Airflow 3's upcoming capabilities.
Discover tools like ExtractThinker and Ollama for secure, on-prem document processing. Tailored for industries with strict data privacy regulations, like finance and healthcare.

DATA LIBRARY

The Dawn of AI Agents | AI | Maja Vujinovic, Jensen Huang | 51x
A comprehensive white paper on AI agents’ real-world applications, key metrics, and the convergence of generative AI with blockchain technologies.
How to build effective AI Agents | AI | Rakesh Gohel | Anthropic
Dive into the core components and frameworks like LangChain for constructing AI agents tailored for complex decision-making tasks.

TOOL

drawdata | Data Visualization
A Python library for interactive dataset creation directly in Jupyter notebooks. Perfect for machine learning tutorials and algorithm demos.

NEWS

Paimon 1.0: Unified Lake Format for Data + AI | 4 min | AI | Martin Grund, Stefania Leone | Alibaba Cloud Blog
Introducing Apache Paimon, a groundbreaking data lakehouse solution integrating batch and streaming operations for real-time AI workflows.

CONFS, EVENTS AND MEETUPS

Discover how WHOOP maximizes customer lifetime value with AI-powered personalization using Hightouch and Snowflake.

PINNACLE PICKS

Your last week top picks:
Apache Kafka + Vector Database + LLM = Real-Time GenAI| 12 min | Gen AI | Kai Waehner | Personal Blog
An exploration of how event-driven architectures with Kafka and Flink enable real-time GenAI use cases, combining large language models (LLMs) with vector databases for semantic search.
Building a Unified Healthcare Data Platform: Architecture | 14 min | Data Platform | Alexandre Guitton | Doctolib blog
Doctolib's shift from a centralized monolithic platform to a data mesh architecture that supports scalable AI, analytics, and robust data governance.
An overview of trends in AI, highlighting agentic workflows, inference optimizations, and the societal impacts of AI-driven automation.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda