Change history (pre-GA) allows tracking the history of changes to a BigQuery table. This allows the user to incrementally maintain a table replica outside of BigQuery, avoiding costly copies. An interesting use case on how to link it with dbt incremental models (be aware of limitations).
The data journey with Volt.io. In this blog post you can find the architecture of the Modern Data Platform.
A designed and built Modern Data Platform based upon Cloud components including:
The Sync Autotuner API enables you to continuously monitor and tune your Apache Spark jobs at scale by making it easy to harness the capability of the Sync Autotuner in a programmatic manner.
The Sync Autotuner will quickly provide you with the most optimal set of cluster configurations, in terms of cost, runtime and infrastructure selection. Furthermore, it is able to do this using data from a single run.
Etsy’s journey from a decision tree model to a unified deep learning model for search ranking.
Along the way we had to modernize our ML development pipeline, and we moved to open-source tools and functions – with TF Ranking, TensorFlow's learning-to-rank library, at the core, along with off-the-shelf losses and metrics – to build the new model.
This tutorial aims to show how you can take advantage of the power of Iceberg tables and the convenience of AWS Glue for streaming.
The ING journey to the CC 2.0 platform, the patterns in the CC 2.0 applied to achieve high availability and resiliency.
What would you say if you stored 1 000 records in a database, and the database claimed that there were only 998 of them? Behavior like this is not necessarily an error, as long as you use a database that implements probabilistic algorithms and data structures. Solutions based on these methods allow for some inaccuracy in the results, but in return they are able to provide us with great savings in the resources used.
In this post, you will learn about two probability-based techniques, see experiments and consider when it is worth using a database like this.
The OpenLineage Airflow integration detects which Airflow operators your DAG is using and extracts lineage data from them using extractors.
In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices.
A new streaming application development framework and operation platform is incubating. Although it's hard to understand what this framework is... it seems like this:
Google has announced the startup CPU boost for Cloud Run and Cloud Functions 2nd gen. This new feature allows the user to drastically reduce the cold start time of Cloud Run and Cloud Functions.
Internal testers and private preview customers reported the following startup time reductions for their Java applications:
Meta announced Make-A-Video, a new AI system that lets people turn text prompts into brief, high-quality video clips.
Google has unveiled TensorStore, a C++ and Python open-source software library designed to decrease ML system development time.
The library, developed by Google Research, is designed to avoid issues relating to storing and manipulating data, providing users with an API capable of handling large datasets without requiring the use of powerful devices.
The talk about key findings in the 2022 DevOps report published by Google, especially in the security space. Some of the most notable findings include the adoption of DevOps security practices and the decreased incidence of burnout on teams who leverage security practices. Nathen and Derek elaborate on how this year’s research has changed from last year and what remained the same.
Confluent announced: Stream Designer which will enable you to:
Airbyte is handing out prizes to anyone who helps to build or edit a connector.
Anyone who submits a connector within the event dates will receive $500 and Airbyte swag.
24 hours of broadcast from around the world professionals: Noram, Tokyo, Bengaluru, Munich stages. Live programming and much more.
A chance to speak in front of an audience of Big Data professionals.
More than 500 professionals will attend the conference to hear dozens of technical presentations. One of them could be yours ;) BDTW has opened a call for presentation.