ARTICLES
Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System | 7 min | ML | Roger Menezes, Rahul Jha, Gary Yeh, and Sudarshan Lamkhede | Netflix Tech Blog
The Netflix team demonstratehow they combined multiple machine learning models used in Netflix's large-scale search and recommendation systems into a single unified model, simplifying the architecture, improving performance and enabling faster system development. They share the trade-offs and lessons learned from this approach for broader applications.
ELT is dead, and EtLT will be the end of modern data processing architecture | 11 min | Data Science | DevGenius
Let’s discuss the evolution of data processing architectures, from ETL to ELT and finally to the current EtLT architecture. This text explores the reasons behind these changes, their strengths and weaknesses and why EtLT is emerging as the dominant data processing architecture, along with open-source implementations like Apache SeaTunnel to meet modern data infrastructure demands.
How to build an e-commerce shopping assistant (chatbot) with LLMs | 15 min | LLM | Michał Madej | Part of Xebia Blog
In today's e-commerce landscape, exceptional customer service is not a choice but a must. Online shopping's growth has increased the need for personalized experiences and 24/7 support. This blog will show how to build an efficient e-commerce assistant with Large Language Models (LLMs), highlighting their development complexities and capabilities.
Apache Flink and Kafka Streams: A Comparative Analysis | 11 min | Data Engineering | Bitrock
This article explores the differences between Apache Flink and Kafka Streams in terms of their capabilities, use cases, scalability and fault tolerance. It also discusses the learning curve and resources required for both frameworks, ultimately concluding that the choice between them depends on the specific needs of the application, with Apache Flink being more generalized and Kafka Streams more specific to stream processing.
Spread Your Wings: Falcon 180B is here | 9 min | LLM | Philipp Schmid, Omar Sanseviero, Pedro Cuenca, Leandro von Werra & Julien Launay | Hugging Face
How to advance and democratize artificial intelligence through open source and open science? What makes Falcon 180B so good? This one looks at some evaluation results and shows how you can use the model.
What motivated Ericsson’s big push into the cloud | 5 min | Cloud | Karin Lindstrom | CIO
Ericsson successfully moved 80% of its applications to the cloud under CIO Mats Hultin and VP Johan Sporre Lennberg's leadership. This transformation enhanced agility and innovation, fostering cultural alignment between IT and the business, while streamlining operations and facilitating rapid adoption of technologies like AI.
Introducing Entity-Centric Data Modeling for Analytics | 7 min | Data Engineering | Maxime Beauchemin | Preset.io Blog
Let’s introduce entity-centric data modeling (ECM), a novel approach that prioritizes "entities" (such as users, products and campaigns) at the forefront of analytics, by merging ideas from dimensional modeling and feature engineering to enhance data representation.
Pokémon GO architecture to support millions of requests | 2 min | Software Engineering | David Mosyan | Personal Blog
This one offers insights into Pokémon GO's infrastructure scaling with Google Cloud services like Spanner and Kubernetes for handling large user requests. It describes the request flow, involving components like CDN, NGINX, game services, Bigtable storage and Pub/Sub for analysis.
Integrating Azure OpenAI with Snowflake: Architecture and Implementation Patterns | 6 min | AI | Shankar Narayanan | Snowflake Blog
This text explores various secure integration methods, such as Azure Machine Learning with Prompt Flow, Power Apps, Snowflake External Function and Snowpark External Access, as well as Streamlit's role in enabling interaction with Azure OpenAI for Data Apps on the Snowflake Data Cloud.
TUTORIAL
How to Write Simple and Efficient Flink SQL | 10 min | SQL | Xiaolin He | Alibaba Cloud
This article is compiled from Xiaolin He, Alibaba's Senior Technical Expert and Apache Flink PMC Member and Committer, who shared it at the 2022 Flink Forward Asia (FFA) Conference. This article is mainly divided into three parts:
- Flink SQL Insight
- Best Practices
- Future Works
NEWS
The State of Serverless | 7 min | Data Engineering | Datadog
In this report, the analysis involved examining usage data from over 20,000 customers across various major cloud platforms, where they monitored their serverless workloads with the platform. The report presents essential insights into how these customers utilize serverless technologies in practical situations.
TOOLS
DevOpsGPT | AI
Welcome to the AI Driven Software Development Automation Solution, abbreviated as DevOpsGPT. It combines LLMs (Large Language Models) with DevOps tools to convert natural language requirements into working software. This innovative feature greatly improves development efficiency, shortens development cycles and reduces communication costs, resulting in higher-quality software delivery.
Manage Apache Kafka® Connect connectors with kcctl | 5 min | AI | Francesco Tisiot | Aiven blog
This blog explores kcctl, a new open source command line tool for Kafka Connect. You'll find out how to integrate it with Apache Kafka and manage connections to other systems.
DATA TUBE
Opportunities in AI - 2023 | 37 min | AI | Andrew Ng | Stanford Online
Dr. Andrew Ng leads a discussion on AI's potential and impact, emphasizing the significance of supervised and generative AI tools, the rise of low-code and no-code AI development, untapped opportunities across industries and the importance of responsible AI for addressing challenges like pandemics and climate change, while dispelling exaggerated fears of AI causing human extinction.
CONFS EVENTS AND MEETUPS
Generative AI Summit | On-site | 9th November 2023
Cut through the clutter, harness generative AI's potential for your industry. Join innovative engineers and leaders, master generative systems, build better models, find cost-effective infrastructure, and gain a strong support network for faster production.
Agenda & topics covered
- Breaking through the noise: how your organisation can innovate
- Quantifying uncertainty in generated models to create more reliable products
- Powering your GANs & VAEs with state-of-the-art compute for rapid output
- A fully monetised generative AI landscape: how to drive revenue in a new ecosystem
…and more.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Dig previous editions of DataPill