DATA Pill feed

DATA Pill #152 - The Top 7 MCP-Supported AI Frameworks, Extraction introduction

ARTICLES

The Top 7 MCP-Supported AI Frameworks | 19 min | AI | Amos Gyamfi | Personal Blog
A hands-on guide to frameworks like LangChain, Chainlit & Mastra that make integrating tools into LLM agents a breeze using the Model Context Protocol (MCP).
Extraction introduction | 3 min | Data Engineering | Shu Zhao | Xebia Blog
Why most of your valuable data lives in PDFs, emails, and images—and how to start extracting it smartly (LLMs included).
Improving Pinterest Search Relevance Using Large Language Models | 7 min | LLM | Han Wang, Mukuntha Narayanan, Onur Gungor, Jinfeng Rao | Pinterest Engineering Blog
Pinterest boosts search relevance with a distilled LLM model that scales globally, improves nDCG@20 by 2.18%, and understands user queries better than ever.

TUTORIALS

How to Build a Multi-Agent Orchestrator Using Flink and Kafka | 8 min | Gen AI | Sean Falconer | Personal Blog
Step-by-step guide to orchestrating multiple agents using Apache Flink and Kafka. Includes a real-world sales AI assistant example.
Process millions of observability events with Apache Flink and write directly to Prometheus | 6 min | Streaming Data & Analytics | Lorenzo Nicora, Francisco Morillo | AWS Blog
Learn how to process millions of events from distributed devices and write straight to Prometheus. Great for IoT-scale monitoring.

TOOL

A simple Python tool that turns docs into Markdown, preserving structure for LLM consumption. Clean, readable, and tailor-made for pipelines.

NEWS

Meet Scout, Maverick, and Behemoth – Meta’s new multimodal models that outperform GPT-4.5 and Gemini 2.0 in coding, reasoning, and vision.
Announcing Airbyte Embedded| 3 min | AI | Teo Gonzalez | Airbyte Blog
Airbyte now lets you embed data pipelines directly into your AI app. A must-have for building context-rich assistants or copilots.

DATA TUBE

Learn how to automate data workflows with CI/CD, Terraform, and AWS – from scratch to secure deployment.

CONFS, EVENTS AND MEETUPS

AI Learning Week | Online | 28th April-1st May
Attend the Data & AI Learning Week, a free webinar series where you can explore both the technical and strategic sides of AI.

Choose from 4 topics in 2 paths:
Tech Track – Learn Prompt Engineering, AI coding skills, and Generative AI.
Base Track – Discover AI strategies, ethics, and how AI can drive success.

PINNACLE PICKS

Your last week top picks:
My data governance framework | 13 min | Data Governance | Willem Koenders | ZS Associates Blog
Willem Koenders shares a practical framework built from a decade of experience. It covers strategy, roles, capabilities, and how to embed governance into day-to-day operations.
We’re proud to be a community partner of the Data & AI Monitor 2025! Share your perspective on the evolving world of data & AI by joining this quick 5-minute survey on the latest trends, tools, and technologies.
Mastering Spark: The Art and Science of Table Compaction | 20 min | Data Engineering | Miles Cole | Personal Blog
A benchmark of compaction strategies in Delta Lake on Fabric Spark. Learn why Auto Compaction + Optimized Writeoffers the best long-term performance.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
2025-04-10 12:03