DATA Pill #152 - The Top 7 MCP-Supported AI Frameworks, Extraction introduction

ARTICLES

The Top 7 MCP-Supported AI Frameworks | 19 min | AI | Amos Gyamfi | Personal Blog

A hands-on guide to frameworks like LangChain, Chainlit & Mastra that make integrating tools into LLM agents a breeze using the Model Context Protocol (MCP).

Extraction introduction | 3 min | Data Engineering | Shu Zhao | Xebia Blog

Why most of your valuable data lives in PDFs, emails, and images—and how to start extracting it smartly (LLMs included).

Improving Pinterest Search Relevance Using Large Language Models | 7 min | LLM | Han Wang, Mukuntha Narayanan, Onur Gungor, Jinfeng Rao | Pinterest Engineering Blog

Pinterest boosts search relevance with a distilled LLM model that scales globally, improves nDCG@20 by 2.18%, and understands user queries better than ever.

TUTORIALS

How to Build a Multi-Agent Orchestrator Using Flink and Kafka | 8 min | Gen AI | Sean Falconer | Personal Blog

Step-by-step guide to orchestrating multiple agents using Apache Flink and Kafka. Includes a real-world sales AI assistant example.

Process millions of observability events with Apache Flink and write directly to Prometheus | 6 min | Streaming Data & Analytics | Lorenzo Nicora, Francisco Morillo | AWS Blog

Learn how to process millions of events from distributed devices and write straight to Prometheus. Great for IoT-scale monitoring.

TOOL

MarkItDown | LLM

A simple Python tool that turns docs into Markdown, preserving structure for LLM consumption. Clean, readable, and tailor-made for pipelines.

NEWS

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation | 12 min | LLM | Meta Engineering blog

Meet Scout, Maverick, and Behemoth – Meta’s new multimodal models that outperform GPT-4.5 and Gemini 2.0 in coding, reasoning, and vision.

Announcing Airbyte Embedded| 3 min | AI | Teo Gonzalez | Airbyte Blog

Airbyte now lets you embed data pipelines directly into your AI app. A must-have for building context-rich assistants or copilots.

DATA TUBE

DevOps in Data Engineering: End-to-End Automation with CI/CD, Terraform & AWS - PART 1 | Data Engineering | 57 min | Yusuf Ganiyu | CodeWithYu

Learn how to automate data workflows with CI/CD, Terraform, and AWS – from scratch to secure deployment.

CONFS, EVENTS AND MEETUPS

AI Learning Week | Online | 28th April-1st May

Attend the Data & AI Learning Week, a free webinar series where you can explore both the technical and strategic sides of AI.

Choose from 4 topics in 2 paths:
Tech Track – Learn Prompt Engineering, AI coding skills, and Generative AI.
Base Track – Discover AI strategies, ethics, and how AI can drive success.

PINNACLE PICKS

Your last week top picks:

My data governance framework | 13 min | Data Governance | Willem Koenders | ZS Associates Blog

Willem Koenders shares a practical framework built from a decade of experience. It covers strategy, roles, capabilities, and how to embed governance into day-to-day operations.

Data and AI Monitor 2025 | 5 min

We’re proud to be a community partner of the Data & AI Monitor 2025! Share your perspective on the evolving world of data & AI by joining this quick 5-minute survey on the latest trends, tools, and technologies.

Mastering Spark: The Art and Science of Table Compaction | 20 min | Data Engineering | Miles Cole | Personal Blog

A benchmark of compaction strategies in Delta Lake on Fabric Spark. Learn why Auto Compaction + Optimized Writeoffers the best long-term performance.

________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on G itHub

2025-04-10 12:03