ARTICLES
AWS Lambda vs. Cloudflare Workers Detailed Comparison | 7 min | Data Engineering | Kiryl Anoshka | Fively Blog
This article compares AWS Lambda and Cloudflare Workers, focusing on their theoretical capabilities and practical differences across key categories such as performance, runtime, and pricing. It also includes insights on which platform excels and a cold start comparison to highlight their distinctions, particularly for smaller tasks.
Apache Flink® on Kubernetes | 15 min | Streaming | Ran Zhang | Airbnb Tech Blog
Evolution of Flink architecture at Airbnb and comparison with their prior Hadoop Yarn platform with the current Kubernetes-based architecture.
Machine Learning in Content Moderation at Etsy | 15 min | ML | David Azcona | Etsy Blog
Evolution of Flink architecture at Airbnb and comparison with their prior Hadoop Yarn platform with the current Kubernetes-based architecture.
Transforming Sports Data with Databricks | 12 min | Data Infrastructure | Jared Chavez | Personal Blog
Basketball Analytics looked to the cloud for its next evolution, and the organization turned toward centralization to dramatically reduce operational costs and improve synergy across our brands. This article is about redesign the infrastructures of our respective departments and redefine how data operated within the organization.
TUTORIALS
How we built RudderStack’s real-time personalization engine | 9 min | Real-time personalization | Mackenzie Hastings, Matt Kelliher-Gibson, Chandler Van De Water, Eric Dodds | Rudderstack Blog
Creating real-time personalized website and app experiences. From identity resolution to tracking success, this tutorial will walk you through how to build a dynamic, user-focused experience that drives engagement and conversions.
Making WAF ML models go brrr: saving decades of processing time | 23 min | ML | Alex Bocharov | The Cloudflare Blog
This one covers the performance optimizations for our WAF ML product, showcasing code examples, benchmarks, and the impressive latency reductions achieved.
Flink with metadata catalog | 5 min | Data Streaming | Maciej Maciejko | GetInData | Part of Xebia Blog
Setting up Flink with Hive Metastore Service (HMS) as an alternative to platforms like Ververica. Discover how to avoid duplicating table definitions and efficiently manage sources and sinks across various projects.
Crazy Challenge: Run Llama 405B on a 8GB VRAM GPU | 4 min | LLM | Gavin Li | AI Advances Blog
The challenge of running the massive 820GB Llama 3.1 405B model on a GPU with just 8GB of VRAM is addressed.
DATA LIBRARY
Accelerate ETL, data warehousing, BI and AI | ebook | databricks
- Building applications with traditional AI and generative AI
- Databricks Data Intelligence Platform
DATA TUBE
Realtime Streaming with Data Lakehouse - End to End Data Engineering Project | 1h | Streaming | CodeWithYu
How to design, implement and maintain secure, scalable and cost effective lakehouse architectures leveraging Apache Spark, Apache Kafka, Apache Flink, Delta Lake, AWS, and open-source tools.
CONFS, EVENTS AND MEETUPS
Airflow Summit 2024 | San Francisco | 10-12 September
This conference does not need to be introduced. In agenda:
- Mastering LLM Batch Pipelines: Handling Rate Limits, Asynchronous APIs, and Cloud Scalability
- OpenLineage: From Operators to Hooks by Maciej Obuchowski - our community member 👏
- How we use Airflow at Booking to orchestrate Big Data workflows