For several years, Uber used gradient-boosted decision tree ensembles to refine ETA (arrival time predictions). Eventually, Uber’s Apache Spark™ team reached a point where increasing the dataset and model size using XGBoost became untenable. To continue scaling the model and improving accuracy, they decided to explore deep learning. To justify switching to deep learning, they needed to overcome three main challenges:
To meet these challenges, Uber AI partnered with Uber’s Maps team on a project called DeepETA, which is presented in this article.
How did Zalando enable the versioning of the performance marketing pipeline which is based on Apache Airflow?
Insights of applying Data Mesh principles in a prototype to share data at scale.
This post focuses on the large volume of high-quality data stored in Axion — fact store that is leveraged to compute ML features offline. Netflix built Axion primarily to remove any training-serving skew and make offline experimentation faster. This post will show how its design has evolved over the years and the conclusions after building it.
“In our analysis we decided to focus on the offline data ingestion with Airbyte. Why this one in particular? Mostly for its business model. Airbyte is an Open Source project backed by a company. It has a lot of available connectors, supports different data ingestion modes and integrates pretty well with Apache Airflow and dbt.”
Snowflake introduces Unistore, which will allow organizations can use a single, unified data set to develop and deploy applications, and analyze both transactional and analytical data together in near-real time.
Did you watch the Snowflake summit? Unistore looks very promissing!
Blake Lemoine - a Google engineer who was suspended for alleging that LaMDA (Language Model for Dialogue Applications) is sentient, posts conversations between him and Google’s system for building chatbots. You can read it here.
“MLOps is too tool-driven, don't let FOMO drive you to pick the latest feature/model/evaluation/ store, but pay closer attention to what you actually need to release more safely and reliably.”
How projects of ML models training can go from zero to production in much shorter time?
How to achieve superior performance, high code quality, training repeatability and governance?
How platform mixes best-of-breed cloud managed services with a small number of powerful open-source components (e.g. Kedro, MLflow, Seldon) to get extra functionality that data scientists and their ML models need?
All on case studies.
Talk with: Alex Chircop, co-chair of the CNCF Storage Technical Advisory Group (TAG), as well as founder and CEO of Ondat (formerly StorageOS) on why no app is truly stateless, and how data is the new storage.
Interview with Glenn Hofmann, Chief Analytics Officer at New York Life Insurance.
How did he build NeW York Life Insurance’s 50-person data science and AI function?
How do they utilize skillsets to offer different career paths for data scientists, building relationships across the organization?
Find out in the podcast.
Eventarc: asynchronous events in Google Cloud
“Eventarc is a product that helps to build event-driven architecture without having to implement, customize or maintain the underlying infrastructure. In this talk you will learn about Eventarc, how it can be used, and how the Warsaw engineering team builds UIs like this, going from the idea right trough to the launch.”
Speakers: Maciej Szarliński and Sasha Sabov
Big Data, Data Science, Machine Learning and AI biggest conference in the Northern Poland.