DATA Pill feed

DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

ARTICLES

SQL is all you need! | 5 min | Data Analytics | Paul Marcombes | Google Cloud - Community Blog
SQL is at the heart of modern data operations, eliminating the need for external tools and custom scripts. Learn how Nickel’s approach enables self-service analytics through a governed SQL function catalog using BigFunctions, an open-source framework.
AI-Ready Organization How AI is Changing the Hiring Process | 3 min | AI | Giovanni Lanzani | Xebia Blog
AI is transforming recruitment by automating screening and improving efficiency. However, human judgment remains irreplaceable. This article explores how organizations can optimize AI for fair and ethical hiring decisions.
30 Must-Know Tools for Python Development | 4 min | Data Engineering | KDNuggets
From pip and Poetry to Scalene and pytest, this guide covers the essential open-source tools for debugging, profiling, and securing your Python projects. Improve your efficiency with this must-read list.

TUTORIALS

Traditional PDF processing is costly and complex. Gemini 2.0 Flash simplifies the workflow by handling OCR and text extraction in a single step, reducing costs and improving efficiency with KDB.AI for vector search.
Kubecost: Cross Charging Costs of Data Processing Pipelines in Data Mesh Architecture | 9 min | DevOps | Daniel Noworyta | GetInData | Part of Xebia Blog
Learn how Kubecost helps teams running Airflow DAGs optimize their Kubernetes costs. Get insights into cost monitoring, budgeting, and resource allocation to streamline your cloud expenses.
Revenue Automation Series: Building Revenue Data Pipeline | 6 min | Data Engineering | Yizheng Zhang, Yirun Zhou | Yelp Engineering Blog
Yelp automated its revenue recognition process by integrating a Revenue Recognition SaaS (REVREC) with a Data Lake + Spark ETL pipeline. Discover how they tackled data gaps, architecture decisions, and complex business logic.
Sync users and groups automatically from Microsoft Entra ID | 3 min | Data Engineering | Microsoft Blog
Azure Databricks now supports real-time identity synchronization with Microsoft Entra ID, eliminating the need for SCIM provisioning. This simplifies access management, security, and user group administration.

NEWS

Introducing SAP Databricks | Data Engineering | 3 min | Ali Ghodsi, Reynold Xin, Arsalan Tavakoli-Shiraji, Michael Kiermaier | Databricks Blog
SAP and Databricks have teamed up to integrate SAP data with enterprise analytics and AI applications. This partnership enables businesses to unlock deeper insights while maintaining original data semantics.

CONFS, EVENTS AND MEETUPS

Learn how to deploy AI inference workloads on Amazon EKS using Terraform, Triton Inference Server, and Prometheus Adapter for autoscaling, monitoring, and optimization.
Discover how Drata uses Change Data Capture (CDC) and Apache Flink to build a scalable RAG system, ensuring compliance and real-time data ingestion with Decodable and Vellum.

PINNACLE PICKS

Your last week top picks:
Introducing Impressions at Netflix | 6 min | Data Engineering | Tulika Bhatt | Netflix Tech Blog
Netflix tracks homepage image interactions (‘impressions’) to optimize personalization and content recommendations. This blog series details how they process billions of impressions daily to refine engagement strategies.
Data vs. Business Strategy | 10 min | Data Strategy | Jens Linden | Personal Blog
A strong data strategy must align with business goals, not exist separately. Learn how to embed data initiatives within broader strategic frameworks to maximize impact.
Top Themes in Data in 2025 | 3 min | Data | Tomasz Tunguz | Personal Blog
Data in 2025 is shaped by consolidation of the modern data stack and AI-driven expansion. Companies are streamlining architectures while leveraging AI-driven SQL execution and cost-efficient models.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
Made on
Tilda