Did you know that the top 10 cloud service providers capture ~77% of spending on cloud infrastructure services? On the other hand, there is still a significant number of small vendors present in global markets such as Huawei Cloud, UCloud in China or Bleu in Europe.
There are really just three companies pushing the industry forward: Facebook, OpenAI, and Google. What do they have in common? The ability to run massive models at scale. In other words, they’re doing AI in a way that you and I can’t. They’re not trying to be secretive; they simply have the infrastructure and knowledge of how to run that infrastructure that you and I don’t.
There are a lot of commercial tools and open source frameworks which provide the capabilities of implementing data quality into the data engineering process, but the author explores how we can implement data observability with just core dbt.
This is a second part of the blog post, where the author describes how Studio Search supports querying the data available in indices.
The above article provides the instructions on how to read/write tables using each data lake format on AWS Glue Studio Notebook.
Google Cloud has added cloud native support for batch workloads! Last week they announced the release of Batch - a fully managed batch service to schedule, queue, and execute batch jobs on Google's infrastructure.
Dataplex is an intelligent data fabric that helps you unify distributed data and automate data management and governance across that data to power analytics at scale.
During the OpenLineage Meeting, the speaker talked about recent releases, updates on the progress of Flink integration, streaming services and more.
This is the interview with Dong L, the Apache Flink committer. The main lesson learned by the host Robert Metzger was that Flink ML is particularly well suited for feature engineering, and there's a growing ML ecosystem for Flink.
In this video you will find out about the skills that data engineers need, but also about the skills that have arguably nothing or very little to do with data.
During the #108 MLOps Coffee Sessions, Byron Allen talks about why MLflow and Kubeflow are not playing the same game!
Mark Etherington, CTO in Crux discusses, for example, the different costs involved in managing external data.