The latest blog post examines the selection of Apache Flink as the foundation for Decodable's real-time data processing platform. It delves into its architectural design, stateful stream processing capabilities, and production readiness, highlighting how these aspects enable agile, scalable solutions for dynamic ETL, ELT, and data movement workflows.
Jeroen explores the benefits of RAG for machine learning engineers. Learn how to overcome challenges, improve data retrieval, and model accuracy using RAG techniques.
Netflix introduces Maestro, a horizontally scalable workflow orchestrator that is now open source. Learn how Maestro manages large-scale Data/ML workflows, handles retries, and integrates seamlessly with tools like Docker, providing a robust solution for complex data processing needs.
In June 2024, Snowflake announced the Polaris Catalog to enhance data control and interoperability for organizations and the Iceberg community. Now open source under Apache 2.0 and available on GitHub, it is also in public preview for Snowflake customers.
How is Apple training LLMs for Apple Intelligence? A new technical report reveals insights into the architecture, training, distillation, and benchmarks for the 2.7B on-device (iPhone) and a larger server-based model for Private Cloud computing.
This tutorial covers a real-time data engineering project using Apache Spark Structured Streaming, Kafka, Cassandra, and Airflow. It involves retrieving random user data from an API, processing it in real-time, and storing it for analysis, all containerized with Docker for seamless deployment.
This blog post explores the importance of maintaining data quality and integrity using AWS Glue DataBrew. Discover practical strategies and tools from the upcoming eBook, "Data Quality No-Code Automation with AWS Glue DataBrew: A Proof of Concept," and learn how to implement effective data quality rules for accurate and reliable datasets.
Outline:
- What is DataOps?
- Productivity, failure, and emotional crisis in data & analytics teams - are LLMs (not) the solution?
- Lean Agile, Lean, and DevOps
- Conclusions from a decade in DataOps and the future
This presentation covers how Shared Clusters and Unity Catalog enable cost reduction and minimize operational toil, allowing secure and economical workload execution on shared compute resources
Join one of our four specialized Data Science, Data Analysis, Analytics Engineering, or Data Literacy cohorts. Each cohort offers targeted, hands-on training sessions scheduled throughout the week for an immersive learning experience.