DATA Pill feed

DATA Pill #114 - Real-time Fraud Detection & Supercharged Data Pipelines

ARTICLES

Leverage graph technology for real-time Fraud Detection and Prevention | 7 min | ML | Deepak Patankar, Mathijs de Jong | Booking Engineering Blog
This blog demonstrates how graphs reveal complex fraud patterns and discusses our real-time graph service that supports our machine learning models and fraud experts, ensuring a secure and trustworthy platform for our customers and partners.
Supercharging Airflow & dbt with Astronomer Cosmos on Azure Container Instances | 6 min | Data Engineering | Daniel van der Ende | Xebia Blog
Transform your dbt project into an Airflow DAG with Astronomer Cosmos on Azure Container Instances.
How PostNL processes billions of IoT events with Amazon Managed Service for Apache Flink | 5 min | Real-time Data Processing | Çağrı Çakır, Ozge Kavalci, Amit Singh, Lorenzo Nicora | AWS Tech Blog
Discover how PostNL, a leading logistics provider in the Netherlands, transformed their IoT data stream processing by migrating to Amazon Managed Service for Apache Flink. This detailed case study explores the challenges of handling real-time IoT data, the benefits of Flink’s event time semantics, and the journey to a scalable, robust stream processing solution.
SmolLM - blazingly fast and remarkably powerful | 7 min | LLM | Loubna Ben Allal, Anton Lozhkov, Elie Bakouch | Hugging Face blog
This blog post introduces SmolLM, a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset. It covers data curation, model evaluation, and usage.

TUTORIALS

Building “Auto-Analyst” — A data analytics AI agentic system | 10 min | LLM | Arslan Shahid | FireBird Technologies blog
This blog post provides a step-by-step guide to building the agent, including code blocks for each component and demonstrating how they integrate seamlessly.
Slim CI with dbt Core: Efficient Pipelines Using Azure DevOps | 16 min | Data Engineering | Allan Rasmussen | Personal Blog
Optimize CI/CD pipelines with GitHub and dbt in Azure DevOps, featuring the Jaffle Shop project.
Upsert operation in Azure Data Factory Copy Activity | 6 min | Data Factory | Adrian Chodkowski | Personal Blog
Explore Azure Data Factory’s UPSERT feature for efficient data comparison and modification within target tables.

PODCAST

The framework helping devs build LLM apps | 34 min | LLM | Hosts: Eira May, Ben Popper ; Guests: Jerry Lu, Jerry Chen | Stack Overflow Podcast
Ben and Eira chat with LlamaIndex CEO Jerry Lu and venture capitalist Jerry Chen about simplifying LLM app development. They discuss the importance of high-quality training data, prompt engineering, larger context windows, and the challenges of RAG.

DATA TUBE

A Short Summary of the Last Decades of Data Management | 50 min | Data Management | Hannes Mühleisen | GOTO Conferences
Data systems have evolved from restrictive 90s models. Open source, open formats, and cloud computing have transformed data management, supporting semi-structured data and vector databases. This presentation covers key trends and innovations in data management.

CONFS EVENTS AND MEETUPS

Real-time data processing can be time-consuming and complex. Decodable simplifies this process, ensuring high data quality and smooth pipeline operations. Join Eric Sammer, CEO of Decodable, for an in-depth look at its architecture and real-time ETL capabilities.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on
Tilda