Microservices and serverless components are tools that do work at high scale, but the decision as to whether to use them over monolith or not has to be made on a case-by-case basis.
In the case of Amazon Prime, moving the service to a monolith reduced their infrastructure cost by over 90%. It also increased scaling capabilities.
Instacart’s used Flink to meet a range of needs like:
- Real-time decision making, such as fraud/spam detection
- Real-time data augmentation, like Catalog data pipelines
- Machine Learning real-time feature generation
- OLAP events ingestion for our experimentation platform
They accomplished all of this by running Flink on AWS’ EMR, so why have they decided to build a new platform on top of Kubernetes? And what are the lessons learned?
The entire Flink service onboarding and operations should be streamlined without K8S details. Most of our platform users don’t have knowledge of Kubernetes, so we should abstract K8S details as much as possible.
In recent years, Feature Stores have become an integral part of many ML projects, and their popularity is continuing to grow. This article will look at the most popular solutions available this year.
ChatGPT has become increasingly popular. Despite its popularity, Xebia spotted some areas for improvement, like privacy, flexibility and collaboration, to make it even better.
To address these issues, they developed an internal tool called SlackGPT.
SlackGPT not only tackles these limitations but also gives our colleagues a unique experience when working with and building modern LLM applications.
Datadog can now monitor streaming data pipelines with Kafka as a bus.
In less than one year, they managed to migrate siloed data pipelines from tools like Informatica, Spark, Talend and Oracle into dbt, powering close to 50 dashboards today.
The problem of label noise is unavoidable in machine learning practice. Fortunately, numerous methods exist that diminish the impact of label noise on prediction performance by increasing the robustness of machine learning models. In experiments The Summit is aimed at people who use the cloud in their daily work to solve Data Engineering, Big Data, Data Science, Machine Learning and AI problems. The main idea of the conference is to promote knowledge and experience in designing and implementing tools for solving difficult and interesting challenges.carried out by Allegro, they implemented 7 of those methods and showed that they increase prediction accuracy in the presence of 20% synthetic noise when compared to the baseline (Cross-Entropy loss), most of them by a significant margin. The simple Clipped Cross-Entropy proved to be the best, with an accuracy score of 89.51% (an increase of 4.2 p.p. vs the baseline trained with noisy labels). This result is very close to the baseline trained with clean labels (90.26%). Thus, we showed that in the case of 20% synthetic label noise, it is possible to increase robustness so that the impact of label noise is negligible.
Part 1: the challenges we faced for model monitoring and our strategy for addressing some of these problems. We briefly mentioned using z-scores to identify anomalies
Part 2: a deeper dive into anomaly detection and building a culture of observability.
Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability of AI hardware and extensibility of AI models.
Microsoft announces the latest update to the Power BI and Jupyter Notebook library, which empowers users to create powerful reports based on their data directly in their notebooks, without leaving their workflow. With this new update, users can gain insights instantly without the hassle of switching between tools or dealing with cumbersome data exports.
Mark Zuckerberg, CEO of Meta, has stated that generative AI will eventually be integrated into all of the company's products due to its potential impact on billions of users.
The name says it all. Need something to be done by AI? Check out the list of tools, which is getting longer and longer every minute.
- An overview of the solution: Enterprise Analytics Platform (EAP)
- Evolution of MLOps at Swedbank - How it all started and how the solution has evolved over time.
- Iterative development for ML models - How can one improve the iterative development process for ML models?
- The secret of success - What has led to this successful migration?
- Key take-away points and the lessons learned from our ML cloud transformation journey and how can one start or improve in this area?
Let’s hear about Open Assistant - an ambitious project aiming to create a truly open-source AI language model. Yannic reveals the behind-the-scenes process of developing this revolutionary technology, addressing the critical role of community involvement and the importance of a diverse dataset.
Learn from Databricks, Fivetran and dbt Labs experts about how to:
- Automate data movement and transform raw data into analytics-ready tables using your favorite tools like Fivetran and dbt
- Unify and govern business-critical data at scale to build a curated data lake for data warehousing, SQL and BI
- Reduce costs and get started in seconds with on-demand, elastic SQL serverless compute
- Use automated and real-time lineage to monitor end-to-end data flow
- What is a data strategy, and why do you need one?
- How to build a proper data strategy?
- How to use the latest tools to 10x productivity of your employees?
The Summit is aimed at people who use the cloud in their daily work to solve Data Engineering, Big Data, Data Science, Machine Learning and AI problems. The main idea of the conference is to promote knowledge and experience in designing and implementing tools for solving difficult and interesting challenges. If you have something to share with the community in this area - submit your presentation!