Read the journey from a traditional architecture rooted in Apache Phoenix on HBase to their adoption of ClickHouse eighteen months ago. The author shares key lessons and tips throughout the narrative, making it a read that readers will want to take advantage of.
In a past role, the author had to migrate SQL code to Data Build Tool (dbt) tables, revealing the necessity for a deeper understanding beyond SQL. The exploration of dbt covers crucial aspects like data modeling, macros and advanced tests, highlighting that mastering dbt involves a skill set beyond conventional SQL workflows.
Delve into KPIs that are crucial considerations for your Generative AI projects. The KPIs listed here are not exhaustive. Girish highlighted the significant ones that could assist you in optimizing your Generative AI projects.
This blog post delves into various experiments conducted with generative AI models over the past few years. It also provides a behind-the-scenes glimpse into key learnings derived from these experiments. It also explores the transition from a concept to a product, showcasing the journey with a radically new technology.
In this article, the author argues that semantic layers should be written in real programming languages. The article outlines the author's choice, aligned with the perspective of the Malloy team, to use YAML for GoodData's semantic layer. It shows the advantages, including widespread support in the developer community, ease of understanding, declarative nature, flexibility, and seamless integration with IDEs for enhanced productivity.
Azure DevOps pipelines offer an excellent means of automating your CI/CD process, typically configured on a per-project basis. While effective for a few projects, scaling up becomes a challenge with numerous projects. This blog post demonstrates how to enhance the scalability, reusability and ease of maintenance in your Azure DevOps CI/CD setup.
Discover a seamless solution for monitoring data pipelines from raw data to ML models. Integrated into Unity Catalog, it simplifies tracking quality and governance, offering deep insights into performance. Fully serverless, it eliminates infrastructure worries. This unified approach streamlines quality tracking, error diagnosis and solutions within the Databricks Intelligence Platform.
Google recently introduced Gemini, its latest AI model available in three sizes. Now, Gemini Pro is publicly accessible on Vertex AI, Google Cloud's end-to-end platform. This empowers developers to create intelligent "agents" capable of quickly processing and responding to information.
Fury is a blazing-fast multi-language serialization framework powered by jit (just-in-time compilation) and zero-copy, providing up to 170x performance and ultimate ease of use.
Northvolt aims to be Europe's leading sustainable battery and gigafactory producer. Leveraging AWS for its "Platform" initiative, including Connected Factory and Battery Systems cloud platforms, Northvolt adopts a "factory as code" approach to deploy new facilities swiftly. Integrating technology, data and automation enhances productivity and quality, whilst reducing time to market. Advanced analytics, simulation techniques and applied AI further contribute to Northvolt's success.
Dive deep into the CDC world and how it can be implemented for real-time data streaming using a powerful tech stack. You will integrate technologies like Docker, Postgres, Debezium, Kafka, Apache Spark and Slack to create an efficient and responsive data pipeline.
You will learn how to:
- Configure and Save data into the PostgreSQL database
- Configure and capture changes on PostgreSQL with Debezium
- Stream data into Kafka
- Add a streaming layer on top of Kafka with Apache Spark, Flink, Storm or ksqlDB
The home team talks about Google’s new AI model, Gemini; the problems with regulating technology that evolves as quickly as AI; how governments can spy on their citizens via push notification; and more.
The data lakehouse architecture emerged to combine the benefits of scalability and flexibility of data lakes with the governance, schema enforcement, and transactional properties of data warehouses. Iceberg Tables (Public Preview) bring Snowflake’s easy management and great performance to data stored externally in the open source Apache Iceberg format.
In this lab, our instructor will help you follow along to build an open data lakehouse architecture. You’ll learn how to:
- Create Iceberg Tables to store data in cloud object storage
- Perform read and write operations on Iceberg Tables
- Perform time travel on Iceberg Tables
- Apply governance policies on Iceberg Tables