ARTICLES
Databricks SDKs vs. CLI vs. REST APIs vs. Terraform provider vs. DABs | 6 min | Data Engineering | Alex Ott | Personal Blog
This comprehensive comparison explains when to use Databricks REST APIs, SDKs, CLI, DABs, and Terraform based on your flexibility, simplicity, or complex environment management needs.
Stream Processing Demystified: Stateless vs. Stateful | 4 min | Stream Processing | David Fabritius | Decodable Blog
Explore why stateful processing is essential for complex real-time analytics, handling event correlation, and maintaining context across streams, while stateless processing shines in more straightforward use cases.
Step-by-Step Guide to Creating Your Own Large Language Model | 6 min | LLM | Sciforce Blog
Learn how to build and fine-tune private LLMs, covering data curation, training, customization, and data security advantages.
Content Creation Copilot - AI-assisted product onboarding | 3 min | ML | Michał Kubacki, Nikhil Iyer, Bhagyesh Prabhu | Zalando Engineering Blog
This blog highlights Zalando's use of AI to automate product attribute generation, improving data quality and reducing errors in the content creation process. The AI-assisted tool helps speed up product onboarding and time-to-market.
TUTORIALS
From keywords to relationships: Reveal deeper insights with full-text search and Spanner Graph | 5 min | Data Engineering | Bei Li, Jeff Sosa | Google Cloud Blog
Learn how integrating full-text search with Spanner Graph streamlines data retrieval and relationship modeling for improved workflow efficiency.
NEWS
BigQuery Engine for Apache Flink overview | 3 min | Data Processing | Google Cloud Blog
BigQuery Engine for Apache Flink simplifies infrastructure management for running Apache Flink, offering autoscaling and easy integration with other Google Cloud services.
PODCAST
Unlocking the Power of LLMs with Data Prep Ki | 38 min | LLM | Ben Lorica, Petros Zerfos, Hima Patel | The Data Exchange Podcast
A deep dive into Data Prep Kit’s scalability, cloud-native architecture, and integration with popular tools like Ray for large-scale LLMs.
Looking under the hood at the tech stack that powers multimodal AI | 29 min | AI | Ryan Donovan, Russ d’Sa | The Stack Overflow Podcast
Russ d’Sa, CEO of LiveKit, discusses the technology behind multimodal AI, including WebRTC and real-time streaming with privacy challenges like end-to-end encryption.
DATA TUBE
AI prompt engineering: A deep dive | 1h 17 min | AI | Amanda Askell, Alex Albert, David Hershey, Zack Witten | Anthropic
Anthropic's prompt engineering experts discuss the evolution of prompt engineering, offering practical tips and insights into how prompting might change as AI capabilities advance. Key topics include refining prompts, model reasoning, and the differences between enterprise, research, and general chat prompts.
CONFS EVENTS AND MEETUPS
MOPS - Meetup #5 | Warsaw | 25th September
Join MOPS #5 for an evening of insightful discussions on cutting-edge AI topics, including the power of Small Language Models for on-device intelligence, deploying generative AI at scale with NVIDIA NIM, and practical strategies for self-hosting LLMs.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Dig previous editions of DataPill