Today, 75% of executives don’t trust their own data, and only 27% of data projects are successful. DataOps could be the resolution for data chaos.
Gartner released predictions that DataOps will fully penetrate the market in 2–5 years.
In this comprehensive article, Prukalpa explains what DataOps is and its assumptions and values.
In this article, the marginal contributions of a linear regression model are calculated.
Conclusions:
Explaining a linear regression model is a straightforward process which is easily implemented. Calculating the marginal contributions gives a clear view of the mechanics of the model. This allows a data scientist to validate the output, explain the predictions to stakeholders with more confidence and tune the model based on the findings.
A very helpful resource for Flink development. Let's see the first two:1 Having the right profiling tools on hand are key to getting insights into how to solve a performance problem. In the case of Shopify they are:
The T-Mobile journey to cloud computing. From weekly reporting in Excel and PowerPoint in 2018, through V1 - Delta Lake (TMUS) to finally: V2 - Data Lakehouse.
What were the drivers for change, architecture and approach of both iteration and centralization.
AutoML can be an easy start to a Machine Learning journey. We'll be introduced to some AutoML models, followed by an example of the AutoML Classifier model trained in BigQuery ML. Michał discusses:- AutoML's applicability with examples
Who are Data Scientists, what expectations do they have and what percentage of their time is spent on data preparation versus training models? All can be found in this report.
An article on how Teads manage the BigQuery data warehouse and how they monitor and manage three important topics:
Business Intelligence tools have gone through three main stages of evolution: Traditional BI, Self-Service BI and Augmented Analytics. All three are characterized in this read.
Common issues with self-service BI tools e.g.:
Fresh from github, still piping-hot release of Airflow 2.4.0
The most anticipated change is this one:
New to this release of Airflow is the concept of Datasets to Airflow, and with it a new way of scheduling dags: data-aware scheduling.
https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html
We're working with Astronomer to make this integration seamless - and data-aware Airflow will help a lot.
Druid has really caught up! From Clickhouse to even closed source DWHs, while keeping the unique millis-latency OLAP queries execution.
In this release they’ve added INSERT INTO/REPLACE INTO, which allows SQL-based batch ingestion and also in-druid data transformation with SQL (imagine dbt on Druid right now …)
Covers pretty much everything about the Feature Store: how to build a well-functioning Feature Store, which Feature Store to choose and an example of MLOps Platform architecture.
Also provides a more extended insight into the dependencies of ML processes.
Check it out, especially if you're struggling with latency, data silos, data drift or data skew! (free download)
Flink Table Store is a data lake storage for streaming updates/deletes changelog ingestion and high-performance queries in real time.
A goldmine of knowledge from the conference, focused around Apache Beam and data and stream processing.
Big Data, Data Science, Machine Learning and AI, all in the context of cloud solutions.
Remember about 10% discount with this code: DATAPILL10!
for 2 or more participants – 20% OFF