A whole article with tips on how to make you deliver faster, more efficient and simpler data pipelines. You'll also find here the overall data transformation landscape on Snowflake, explained steps and the options available, and finally a summary of the best lessons learned from over 50 engagements with Snowflake customers.
If you have ever wondered how to get started with data lineage on AWS using OpenLineage, then this is a must-read for you. Check out the step-by-step configuration guide for the openlineage-airflow plugin on Amazon MWAA. Here you will also find the AWS Cloud Development Kit project that deploys a pre-configured demo environment for evaluating and experiencing OpenLineage first-hand.
After reading this one, you will learn what problems the team encountered whilst administering Yelp’s clusters. The team decided to tackle them in two parts:
In this blog you will see establishing the ML life cycle leveraging MLFlow – an open source machine learning platform and framework for managing the ML life cycle. This is a short step-by-step hands-on demonstration of the MLOPs standardization on a Mesh Platform.
Read the story on how the Pinterest team decided to create an end-to-end pipeline with the following characteristics:
A traditional software engineer sets out rules in code. In contrast, a data scientist identifies with learning algorithms that analyze patterns in data. But analytics projects are still bound together with conventional code, and as a data scientist, you can benefit from best practices first pioneered by software engineering.
When you start to cut code on a prototype, you may not prioritize maintainability and consistency, but adopting a culture and way of working that is already proven can get your prototype production-ready faster.
What is better? As always, we can say that it depends on your needs. But if you know exactly what you need, then you should consider the pros and cons of dbt and Delta Live Tables with this article. After reading, you will know where dbt and DLT shine and where not, and in the end - what Rahul likes and doesn't in both.
In this article, you will learn how you can seamlessly track your machine learning experiments by integrating Kedro with MLflow via the kedro-mlflow plugin (which we extensively use in our projects too!).
Cloud data warehouses have become extremely popular in recent years. Their low cost and fully-managed services make it easy for businesses to get started and scale their data analysis efforts as needed. However, the pricing models for these services can be complicated, with a lot of factors affecting cost. The choice between Snowflake and BigQuery will depend on the organization's specific needs and usage patterns. Discover which solution you should choose.
A new adapter that makes the dbt Python experience 10x better is available now. It is the easiest way to run your dbt Python models. Read how it can run your Python code locally and in the cloud, lets you run code in the same environment and provides easy environment management and isolation between models.
Azure OpenAI Service is generally available now. Now businesses can apply for access to the most advanced AI models in the world—including GPT-3.5, Codex, and DALL•E 2. Customers will also be able to access ChatGPT—a fine-tuned version of GPT-3.5 that has been trained and runs inference on Azure AI infrastructure—through Azure OpenAI Service soon.
As our data grows it’s become increasingly challenging to find what you need. In comes OpenSearch to help out. Learn how OpenSearch uses search techniques to find the data that is relevant to you.
This year you can choose whether you prefer to watch the conference online or attend in person in Warsaw. Over 90 outstanding speakers who work with Big Data and top data-driven companies are waiting to share their knowledge with you. Use the code DataPill15 and get a 15% discount.