DATA Pill #083 - ClickHouse is in the house,bringing your Azure Devops CI/CD setup to the next level

ARTICLES

ClickHouse is in the house | 12 min | Data Engineering | Vimeo Engineering Blog

Read the journey from a traditional architecture rooted in Apache Phoenix on HBase to their adoption of ClickHouse eighteen months ago. The author shares key lessons and tips throughout the narrative, making it a read that readers will want to take advantage of.

3 Reasons Why Working With Data Build Tool (dbt) Is Not “Just Doing SQL” | 4 min | Analytics Engineering | Luís Oliveira | Personal Blog

In a past role, the author had to migrate SQL code to Data Build Tool (dbt) tables, revealing the necessity for a deeper understanding beyond SQL. The exploration of dbt covers crucial aspects like data modeling, macros and advanced tests, highlighting that mastering dbt involves a skill set beyond conventional SQL workflows.

KPI's (Key Performance Indicators) in Generative AI Projects | 3 min | Gen AI | Girish Pai | Personal Blog

Delve into KPIs that are crucial considerations for your Generative AI projects. The KPIs listed here are not exhaustive. Girish highlighted the significant ones that could assist you in optimizing your Generative AI projects.

How we’re experimenting with LLMs to evolve GitHub Copilot | 8 min | LLM | Sara Verdi | GitHub Blog

This blog post delves into various experiments conducted with generative AI models over the past few years. It also provides a behind-the-scenes glimpse into key learnings derived from these experiments. It also explores the transition from a concept to a product, showcasing the journey with a radically new technology.

5 Reasons Why to Write Your Semantic Layer in YAML | 4 min | Data Science | Tomáš Muchka | GoodData Developers

In this article, the author argues that semantic layers should be written in real programming languages. The article outlines the author's choice, aligned with the perspective of the Malloy team, to use YAML for GoodData's semantic layer. It shows the advantages, including widespread support in the developer community, ease of understanding, declarative nature, flexibility, and seamless integration with IDEs for enhanced productivity.

TUTORIALS

Scaling up: bringing your Azure Devops CI/CD setup to the next level | 11 min | DevOps | Jeroen Overschie, Timo Uelen | Xebia Blog

Azure DevOps pipelines offer an excellent means of automating your CI/CD process, typically configured on a per-project basis. While effective for a few projects, scaling up becomes a challenge with numerous projects. This blog post demonstrates how to enhance the scalability, reusability and ease of maintenance in your Azure DevOps CI/CD setup.

NEWS

Lakehouse Monitoring: A Unified Solution for Quality of Data and AI | 3 min | AI | Jacqueline Li, Alkis Polyzotis, Kasey Uhlenhuth | Databricks Blog

Discover a seamless solution for monitoring data pipelines from raw data to ML models. Integrated into Unity Catalog, it simplifies tracking quality and governance, offering deep insights into performance. Fully serverless, it eliminates infrastructure worries. This unified approach streamlines quality tracking, error diagnosis and solutions within the Databricks Intelligence Platform.

Gemini, Google’s most capable model, is now available on Vertex AI | 3 min | AI | burak göktürk kim | Google Cloud Blog

Google recently introduced Gemini, its latest AI model available in three sizes. Now, Gemini Pro is publicly accessible on Vertex AI, Google Cloud's end-to-end platform. This empowers developers to create intelligent "agents" capable of quickly processing and responding to information.

TOOL

Fury | Data Engineering

Fury is a blazing-fast multi-language serialization framework powered by jit (just-in-time compilation) and zero-copy, providing up to 170x performance and ultimate ease of use.

DATA TUBE

Northvolt’s software-defined factories | 51 min | AI | Karthik Krishnamurthy, Marcus Ulmefors | AWS re:Invent 2023

Northvolt aims to be Europe's leading sustainable battery and gigafactory producer. Leveraging AWS for its "Platform" initiative, including Connected Factory and Battery Systems cloud platforms, Northvolt adopts a "factory as code" approach to deploy new facilities swiftly. Integrating technology, data and automation enhances productivity and quality, whilst reducing time to market. Advanced analytics, simulation techniques and applied AI further contribute to Northvolt's success.

Dive deep into the CDC world and how it can be implemented for real-time data streaming using a powerful tech stack. You will integrate technologies like Docker, Postgres, Debezium, Kafka, Apache Spark and Slack to create an efficient and responsive data pipeline.

You will learn how to:

Configure and Save data into the PostgreSQL database
Configure and capture changes on PostgreSQL with Debezium
Stream data into Kafka
Add a streaming layer on top of Kafka with Apache Spark, Flink, Storm or ksqlDB

PODCAST

What Gemini means for the GenAI boom | 21 min | Gen AI | Ben Popper, Ryan Donovan, Eira May | The Stack Overflow Podcast

The home team talks about Google’s new AI model, Gemini; the problems with regulating technology that evolves as quickly as AI; how governments can spy on their citizens via push notification; and more.

CONFS EVENTS AND MEETUPS

How to Build an Open Lakehouse on Snowflake With Apache Iceberg™ | Virtual Hands-on Lab | 10th January

The data lakehouse architecture emerged to combine the benefits of scalability and flexibility of data lakes with the governance, schema enforcement, and transactional properties of data warehouses. Iceberg Tables (Public Preview) bring Snowflake’s easy management and great performance to data stored externally in the open source Apache Iceberg format.

In this lab, our instructor will help you follow along to build an open data lakehouse architecture. You’ll learn how to:

Create Iceberg Tables to store data in cloud object storage
Perform read and write operations on Iceberg Tables
Perform time travel on Iceberg Tables
Apply governance policies on Iceberg Tables

________________________

Have any interesting content to share in the DATA Pill newsletter?

➡ Join us on G itHub

➡ Dig previous editions of DataPill