Ad-hoc solutions where different technologies are piled on top of each other is not only inefficient but also difficult to scale and operate. Picking the right framework and creating the right building blocks is crucial in order to ensure success. Apache Kafka, Apache Flink, Confluent Rest Proxy and Schema registry prove to be both scalable and reliable. Researching and leveraging the sweet spots of those frameworks dramatically reduced the time needed to develop and operate this large scale event processing system.
Data Mesh is now a general purpose data movement and processing platform for moving data between Netflix systems at scale. It also has a growing number of use cases. This article provides an overview of the system.
PayPal is in the process of migrating its analytical workloads to the Google Cloud Platform (GCP). This is part of the migration designed streaming application which consumes data from Kafka and streams it directly to BigQuery. It reduces the time for readouts from 12 hours to a few seconds (it takes approximately 30–35 billion events on a daily basis). This article displays an approach to testing these applications and how they increased the performance of the application by tuning a few parameters
There are some interesting facts that you can discover in this article, e.g.:
It's also interesting that more companies acquired by Microsoft (not only LinkedIn) still use their previous cloud provider e.g. Microsoft acquired GitHub and they are still on AWS too.
The moral of the story: Usually, cloud migration for the sake of migration doesn’t justify the costs.
Nowadays, we can see that AI/ML is visible everywhere.
Including advertising, healthcare, education and many other sectors.
Adam as CEO shares his conclusions based on the data/ML-related projects that GetInData is running, internal market research and that Retail and eCommerce have become one of the hottest sectors for AI/ML.
Can we expect this Big Data trend to grow bigger?
Photon is now generally available on Databricks across all major cloud platforms.
Today, it is the most comprehensive Lakehouse format used by over 7,000 organizations, processing exabytes of data per day.
Delta Lake's story from day 1 and it’s genesis in Apple until Delta Lake 2.0 and bringing Delta Lake APIs to open-source.
How bias can produce harmful outcomes in machine learning systems, the different types of technical and non-technical solutions for tackling bias, the future of machine learning interpretability and much more.
One of takeaways: The best way to assess risk is to view machine learning models as systems with different factors that interact with each other. This prioritizes experimentation, not just inference or prediction, to determine how different aspects of the model impact each other and the outcome.
Ben Taylor is the Chief AI Strategist at DataRobot, he shares his predictions about what we can expect from industry platforms 10 years from now.
Why AI has become such a high priority and how business leaders can think about developing and adopting AI solutions.
It's maybe not exactly a conf or meetup but a free, 6-week training course about GCP with 5 role-based tracks (like Infrastructure or Data Management) wiz biz and technical levels. Seems worth considering.