A Lock-in to the cloud is maybe unavoidable, but the risk is strongly reduced if you introduce good architecture patterns as an abstraction layer. The author nicely describes this concept here.
It’s difficult to set up the infrastructure needed to support high-throughput updates and low-latency retrieval of data.
Starting this month, the Vertex AI Matching Engine and Feature Store will support real-time Streaming Ingestion as Preview features. With Streaming Ingestion for Matching Engine, a fully managed vector database for a vector similarity search and items in an index are updated continuously and reflected in the similarity search results immediately.
This blog post covers how these new features can improve predictions and enable near real-time use cases, such as recommendations, content personalization and cybersecurity monitoring.
Airbnb’s experience with upgrading their Data Warehouse infrastructure to Spark and Iceberg.
In our data ingestion framework, we found that we could take advantage of Iceberg’s flexibility to define multiple partition specs to consolidate ingested data over time. Ingested tables write new data with an hourly granularity (ds/hr), and a daily automated process compresses the files on a daily partition (ds), without losing the hourly granularity, which later can be applied to queries as a residual filter.
A bit of a provocative title, but the content features a concrete proposition of Keep it Simple alternatives to complex MLOps solutions.
It also provides some sort of a rule of thumb when complexity is actually necessary, so it's not all hype.
Lyft’s journey of evolving our streaming platform and pipeline to better scale and support new use cases. Each iteration provided a better scale, but also exposed shortcomings.
Timestone is a high-throughput, low-latency priority queueing system which Netflix built in-house, to support the needs of their media encoding platform, Cosmos. Over the past 2.5 years, its usage has increased, and Timestone is now also the priority queueing engine backing the general-purpose workflow orchestration engine (Conductor) and the scheduler for large-scale data pipelines (BDP Scheduler). All in all, millions of critical workflows within Netflix now flow through Timestone on a daily basis.
In this article you can dive into the architecture and concept.
Super Table is LinkedIn's idea on how to solve these Big Data issues that have been causing problems for the last decade:
1) multiple similar datasets often led to inconsistent results and wasted resources
2) a lack of standards in data quality and reliability made it hard to find a trustworthy dataset among the long list of potential matches
3) complex and unnecessary dependencies among datasets led to poor and difficult maintainability
Super Tables (ST) are pre-computed, denormalized, and consistently consolidated attributes and insights of entities or events that are optimized for common and efficient analytic use cases. STs have well-defined service level agreements (SLAs) and simplify data discovery and downstream data processing.
The Super Tables idea seems to suit the Data Mesh.
Apache Druid is a real-time analytics database designed for fast slice-and-dice analytics (“OLAP” queries) on large data sets. It is one of the most popular open source solutions for OLAP. It is designed for serving both real-time (streaming sources like Kafka, Kinesis) and historical data (batch sources like HDFS, S3).
A use case example:
Used for IoT and device metrics — Druid is often used as a time series solution. Data generated from devices can be ingested in real-time and perform ad-hoc analytics. Druid lets you search and filter on tag values faster than traditional time series databases.
Tutorial to multi-node kind cluster with extraPortMappings to forward requests from your host to an NGINX ingress controller, which uses the path to send your request to the appropriate service, rewriting the target so it can recognise the request.
A short insight into why McDonald's open sourced OpenTest.
The open sourcing for us led to another significant benefit by reducing the unnecessary friction involved in getting the software onto people’s machines. No more approvals required and no more dependencies on other teams for the actual binaries and updates.
Yashar Behzadi is the CEO & Founder of Synthesis AI, a startup that uses synthetic data technologies to enable teams to build AI applications, as well as gaming and metaverse applications.
The third edition of the Big Data, AI, ML and Data Science conference organized by Computerworld Magazine.
Michał Bryś - Senior ML Engineer and Technical Product Owner will cover:
Product updates and a lot about accelerating app development.