This text explains how Airflow orchestrated a DBT Core project, creating an intuitive pipeline for data analysts and product owners to develop and maintain their data models. With just SQL and basic Git knowledge, anyone in the business can turn their models into Airflow DAGs within minutes, ready for execution with built-in alerting, data quality tests, and access control. Importantly, they can understand Airflow DAGs only after interacting with the UI. Key areas covered include:
- Mono vs. Multi DAG approach
- Project structure and DAGs layout
- DAG generation pipeline
- Creation of DBTOperator
- Conclusion and plans
This blog explores the journey of taking an LLM project from concept to completion, highlighting key steps, tips, and considerations to ensure success.
Real-time data processing is vital for businesses, and Apache Flink excels in this area. This blog explores strategies to tackle data skew in Flink SQL, ensuring efficient and balanced processing.
Over the last 10 years, Torsten worked in analytics at various companies, from startups to big tech firms. Each company had unique challenges and data cultures. Key learnings include the importance of data storytelling, business acumen, and pragmatism in analytics.
It's an exciting time for LLMs, which are now effective for real-world applications and driving significant AI investments. While creating a proof-of-concept is easy, building a successful product remains challenging. This post shares key lessons and tips for developing LLM-based products based on practical experiences.
This text explores the pros and cons of denormalized models, the challenges of managing changes, and the evolving landscape of real-time streaming technologies, ultimately questioning the balance between performance and data modeling.
Databricks announces its acquisition of Tabular, Inc., bringing together the creators of Apache Iceberg™ and Delta Lake to lead in data compatibility. This blog will outline Databricks' plans to collaborate with the Iceberg and Delta Lake communities to achieve format compatibility and evolve towards a single open standard of interoperability.
Snowflake announces Polaris Catalog, offering enhanced data choice, flexibility, and security with interoperability across significant platforms like AWS, Google Cloud, and Azure. Open-sourcing within 90 days, Polaris allows seamless data interoperability without moving or copying data.
Orchestration manages multiple systems and tasks to make workflows run smoothly and efficiently. This tutorial shows how to manage and run various notebooks from a main notebook using the runMultiple method in Microsoft Fabric. You'll learn to easily create and execute notebooks with built-in dependencies, helping streamline your data processing tasks.
In this video, you will dive into platform architecture and see how real-life streaming application works based on SQL queries using Apache Flink and Jupiter Notebooks.
Watch how Albert Heijn optimized their demand forecasting services. Learn why they chose a custom solution, the necessary processes, people, and technology, and the challenges to scaling forecasts.
ChatGPT was only the beginning. Generative AI is now revolutionizing every industry. Join us for RADAR: AI Edition, exploring how businesses and individuals can unlock their full potential with AI.