DATA Pill feed

DATA Pill #068 - Amazon S3, Athena & AWS Glue ❤️Iceberg, ClickHouse 🤝 DuckDB = OLAP²


Zero Configuration Service Mesh with On-Demand Cluster Discovery | 9 min | Cloud | David Vroom, James Mulcahy, Ling Yuan, Rob Gulewich | Netflix TechBlog
How Netflix worked with Kinvolk and the Envoy community on on-demand cluster discovery - a feature that streamlines service mesh adoption in complex microservice environments.
Less data, less problems: Airbyte’s column selection is finally here | 14 min | dbt | Jakub Szafran | GetInData | Part of Xebia Blog
Airbyte 0.50 introduces platform changes, including checkpointing, automatic schema propagation and highly anticipated column selection. To address community demand, the GetInData team conducted tests on this feature, exploring issues such as column extraction and CDC incremental ingestion handling. Find detailed insights in this blog post.


ClickHouse 🤝 DuckDB = OLAP² | 4 min | BigData | Lorenzo Mangani | qryn dev
Explore the seamless integration of ClickHouse and DuckDB in the OLAP ecosystem through the innovative tool Quackpipe. This tutorial demonstrates how Quackpipe enables effortless data exchange between these two platforms, offering both installation guidance and exciting use cases, highlighting the collaborative power of ClickHouse and DuckDB for data analytics and manipulation.
AWS users: Amazon S3, Athena & AWS Glue ❤️ Iceberg | 15 min | Data Engineering | Anna Geller | AWS in Plain English
This tutorial will walk you through the process of initiating Apache Iceberg on AWS. After reading, you will have the proficiency to generate Iceberg tables, manipulate data stored in S3 in Parquet format, execute SQL queries on data and table details, and efficiently oversee data ingestion.
Using MLflow AI Gateway and Llama 2 to Build Generative AI Apps | 6 min | AI | Kasey Uhlenhuth, Xiangrui Meng, Hagay Lupesko, Sean Owen, Corey Zumar, Liang Zhang, Ina Koleva, Vladimir Kolovski, Arpit Jasapara | Databricks Blog
This blog will guide you through creating and deploying a RAG application on the Databricks Lakehouse AI platform. Utilize the Llama2-70B-Chat model for text generation and Instructor-XL for text embeddings, both efficiently hosted and optimized with MosaicML's Starter Tier Inference APIs. This setup enables a swift and cost-effective start for low throughput experiments.
High-performance computing on AWS | 8 min | Cloud | Steyn Huizinga | Xebia Tech Blog
Explore the potential of Amazon Web Services (AWS) in revolutionizing high-performance computing (HPC) in various applications in the article "High-Performance Computing on AWS." Discover how cloud resources can empower HPC workloads, drive innovation and transform industries.


OpenTF Announces Fork of Terraform | 5 min | Cloud | OpenTF Blog
HashiCorp changed the license for their core products, including Terraform, to BSL. In response, the community crafted the OpenTF manifesto, garnering support from 100+ companies, 10 projects and 400 individuals to create OpenTF.
Let’s explore the capabilities and implications of Code Llama, a Large Language Model designed to revolutionize coding practices. Code Llama is an LLM capable of generating code, and natural language about code, from both code and natural language prompts. In benchmark testing, Code Llama outperformed state-of-the-art publicly available LLMs on code tasks. Let's find out more.
In the last three years, Vertex AI has evolved into a comprehensive AI/ML platform, supporting generative AI, user-friendly tools and a large model repository. It emphasizes data science and machine learning, introduces productivity enhancements like Colab Enterprise, expands open-source capabilities with Ray on Vertex AI and focuses on MLOps for gen AI, enabling organizations to excel in AI adoption and readiness.


The talk will focus on Garima’s experience and journey in executing company-wide digital transformation, in decentralized and globally distributed big size enterprises, with the help of automated versions of enterprise architecture.


How Azure Embraces Terraform For Infrastructure As Code | 46 min | Cloud | Hosts: Ned Bellavance, Ethan Banks; Guests: Mark Gray, Steven Ma | Day Two Cloud Podcast
Delve into the world of Infrastructure as Code (IaC) with Microsoft's Mark Gray and Steven Ma. Discover how Microsoft is embracing Terraform to enhance its Azure offerings, including the Terraform Export Tool, the AzAPI Provider and the thriving Terraform in the Azure community. Explore the collaboration between Microsoft and HashiCorp, learn about the tool's capabilities and gain insights into the future of Terraform on Azure.
Navigating Event Streaming | 31 min | Streaming | Host: Tim Berglund Guest: Eric Sammer | Real-Time Analytics Podcast
Join Eric Sammer, Founder and CEO of Decodable, as he discusses stream processing, real-time data management and integration with systems like Apache Pinot and StarTree. Dive into the complexities of data management, the balance between generalization and specialization and the role of stream processing in intelligent data distribution in this insightful discussion.


In this live hands-on workshop, you’ll follow a step-by-step guide to achieving production-grade data transformation using dbt Cloud with Databricks. You’ll build a scalable transformation pipeline for analytics, BI and ML – entirely from scratch.

You’ll learn how to:

  • Quickly connect dbt Cloud and Databricks in Databricks Partner Connect
  • Model data with dbt Cloud using data in Delta Lake, following software engineering best practices like version control, testing and documentation
  • Build highly scalable and reliable data transformation pipelines for analytics, BI and ML
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on