DATA Pill feed

DATA Pill #059 - Snowflake's Document AI and their expanding partnership with Microsoft


Top 10 Snowflake Query Optimization Tactics | 7 min | Data Engineering | John Ryan | Analytics Today Blog
This article delves into Snowflake query optimization, shedding light on fine-tuning SQL queries, even without indexes on default tables. It comprehensively lists the top 10 Snowflake optimization tips, encompassing various techniques and best practices. From understanding the Snowflake Architecture and leveraging caching effectively, to employing scaling policies, optimizing WHERE clauses and avoiding over-complex transformation code, these tips aim to enhance query performance and minimize costs.
Metis: Building Airbnb’s Next Generation Data Management Platform | 8 min | Data Engineering | Erik Ritter, Jiaxin Ye, Sylvia Tomiyama, Woody Zhou, Xiaobin Zheng, Zuzana Vejrazkova | The Airbnb Tech Blog
Let's read about Airbnb's Data Management team platform, Metis, which ensures the capture, management and consumption of accurate metadata at scale. Metis has evolved from the initial Dataportal project, which has successfully democratized data access to include comprehensive features like Apache Atlas for data lineage and a data catalog for improved governance, data quality and cost management.
From concept to production in 2 months: sales forecasting Machine Learning model for | 9 min | Machine Learning | Michał Madej | GetInData | Part of Xebia Blog
From concept to production in just 2 months! This blog post unveils the remarkable journey of developing a sales forecasting machine learning model for Discover how GetInData | Part of Xebia harnessed cutting-edge technologies to empower e-commerce businesses with accurate predictions, optimized inventory and sky-high profitability.
This one discusses streaming systems and their continuous data flow from various sources. It explores the transformation of Azure's pricing system from batch processing to streaming, addressing challenges related to changing prices and time complexities. The management of the state in streaming systems is explored, along with the difficulties of scaling it. The text proposes a versioning strategy for pricing changes and emphasizes understanding system requirements when designing streaming systems.


How to simplify unstructured data analytics using BigQuery ML and Vertex AI | 6 min | Data Analytics | Rachael Deacon-Smith | Google Cloud Blog
Dig into a tutorial made by Google where you will see four well explained steps on how to simplify unstructured data analytics using BigQuery ML and Vertex AI. Are you curious what are they?

1. Define your AI Models in BigQuery.
2. Use the Vision AI API to detect text in images stored in Cloud Storage.
3. Use the Translation AI API to translate foreign movie titles.
4.Use natural language processing (NLP) to run sentiment analysis against movie reviews.
A four stage tutorial that provides a detailed walkthrough of creating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI and Dash. Creating a modern data pipeline might seem like a daunting task, but with the right tools and a clear roadmap, it becomes a manageable and even enjoyable process.

Shh… We are talking about Mage AI in the tools section.


Mage AI | 4 min | AI
Hey, you. Stop wasting time waiting around for your DAGs to finish testing. Get instant feedback from your code each time you run it.

  • Interactive code: Immediately see results from your code’s output with an interactive notebook UI.
  • Data is a first-class citizen: Each block of code in your pipeline produces data that can be versioned, partitioned and cataloged for future use.
  • Collaborate on cloud: Develop collaboratively on cloud resources, version control with Git, and test pipelines without waiting for an available shared staging environment.


Snowflake has announced an expanded partnership with Microsoft, focusing on joint product integrations across AI, low code/no code app development and data governance. The collaboration aims to enhance go-to-market strategies and field collaboration, bringing joint solutions to customers. Snowflake will increase its Azure spend commitment, enabling data scientists and developers to leverage AI solutions and integrate the Data Cloud with Microsoft's technologies and AI capabilities, empowering customers to manage and understand their data better.
Introducing Materialized Views and Streaming Tables for Databricks SQL | 6 min | Data Analysis | Paul Lappas, Michael Armbrust, Yannis Papakonstantinou, Nitin Sharma, Andreas Neumann | Databricks Blog
Databricks SQL on AWS and Azure has made materialized views and streaming tables accessible to the public. With streaming tables, you can quickly ingest data from cloud storage and message queues incrementally. Check out this blog post to see how analysts and analytics engineers can now deliver data and analytics applications more efficiently in the data warehouse.
Exciting news from Databricks. Their latest article announces Delta Lake 3.0, a game-changing universal format that brings automatic data management to data lakes. With features like automatic schema evolution, data compaction and time travel, Delta Lake 3.0 empowers organizations to simplify and automate their data pipelines, ensuring data integrity and making analytics more efficient than ever before.


Let's get into the Data + AI Summit mood a bit. Let's talk about enhancing AI. Have you seen it yet?


Great announcements were released by Snowflake last week. Document AI leverages a purpose-built, multimodal LLM. It analyzes PDF files. This tool doesn't need any prior training and is ready to use immediately. By natively integrating this model within the Snowflake platform, organizations can easily extract content, such as invoice amounts or contractual terms from documents securely stored in Snowflake.
Unlocking Data Value with Large Language Models | 34 min | AI | Talha Chattha | Data Phoenix Events
Large Language Models or Foundation Models are the ones that power Generative AI applications. FMs challenge classical Machine Learning with a paradigm shift towards Prompt Engineering, which is the new way of building ML applications for businesses. In this talk we will discuss how businesses can leverage FMs using Prompt Engineering and build Generative AI applications in the cloud. We will also go over the architectural components and resources on how to get started alongside how much it costs.


GoDataFest | Amsterdam | 5th July
Take part in a multitude of sessions focused on various data & AI technologies and platforms. From fireside chats and presentations to ask-me-anything sessions and/or workshops, each session is hosted by seasoned experts.

You can expect insights, developments and tutorials about the latest and greatest data technology. Topics include modern data platforms, analytics engineering, data democratization, AI, MLOps, pipeline orchestration and much, much more.
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on