A very in depth read with an analysis of Uber's new testing platform project.
Uber had an experimentation platform called Morpheus that was built 7+ years ago in the early days to do both feature flagging and A/B testing. Uber has outgrown Morpheus significantly since then in terms of scale, users, use cases etc, therefore Morpheus became insufficient. In 2020, Uber took on the challenge of creating a new testing platform.
In this article, you will find out what the assumptions of the project were, how the implementation went and what were the conclusions.
Lyfts result in automatic deployment: the number of commits per production deploy has significantly decreased from around 3 to around 1.4. Fewer commits per deploy means changes in production are more predictable and easier to monitor.
An insight into analyzing and predicting “out of memory” or OOM kills on the Netflix App.
A journey through building a message classification model for Shopify's Inbox by applying the data-centric approach. The model aim is to help merchants prioritize responses that would convert into sales and guide our product team on what functionality to build next.
Some abstract from the results:
Model accuracy:
version 1.0 ~70%
version 2.0 ~90%
High confidence coverage:
version 1.0 ~35%
version 2.0 ~80%
In this blog we will explore how we can use Apache Flink to get insights from data at lightning-fast speed, utilizing Cloudera SQL Stream Builder GUI to easily create streaming jobs using only SQL language (no Java/Scala coding required). We will also use the information produced by the streaming analytics jobs to feed different downstream systems and dashboards.
BTW - here is nice Flik related position in very interesting project.
Databricks SQL introduces Python UDFs.
Lineage extraction is now possible for Azure Databricks and Microsoft Purview users. Thanks to a robust OpenLineage Spark integration, users can both extract and visualize lineage from their Databricks notebooks and jobs inside Microsoft Purview.
Btw. A high five to our community members who contributed to this project:
The headline says it all.
What Datadog is, who uses it, what data it collects, and how data is used in their product. Multi-cloud developer experience at Datadog (technology stack, cloud providers, open-source). Future plans for the evolution of the data platform at DatadogDifferences between Datadog and Spotify in the context of building the data platform, goals, and challenges. Important patterns that one can notice when working with big data for 12 yearsGaps and areas to watch for new tools/products in the data landscape.
Mark Chen is a Research Scientist at OpenAI and part of the team behind DALL·E 2, a new AI system that can create realistic images and art based on natural language descriptions. In the podcast:
Two data storage solutions that started in very different worlds converging on the data platform.
Both want to be your one stop shop.
Your data warehouse and data lake
Your data lakehouse...
But really they want to be your data operating system.
You will learn: