DATA Pill feed

DATA Pill #063 - are Kubernetes days numbered? Hugging Face & data streaming in retail

ARTICLES

12 things I wish I knew before starting to work with Hugging Face LLM | 8 min | LLM | Fabio Matricardi | Artificial Corner
If you want to improve your progress with Hugging Face LLMs, this one is a must-read for you. It explores the 12 things every beginner should know, split into 4 main topics such as: the training course, transformers and pipelines, what model you should pick and LangChain and Text2Text-generation.
Are Kubernetes days numbered? | 5 min | Cloud | Alistair Grew | CTS Google Cloud Tech Blog
Dig down deep into the 'guts' of Kubernetes to learn more about how it works. Read about Managed Kubernetes, Google’s Container Offerings, compare GKE Standard to Autopilot, Cloud Run and finally decide which you should choose.
The Future of Observability | 6 min | ML | Laduram Vishnoi | Personal Blog
How should the new generation of observability tools respond in 2023? Here are seven things we will need to win the market. You're welcome!
Airflow vs. Prefect vs. Kestra — What is The Best Data Orchestration Platform in 2023? | 13 min | Data Engineering | Dario Radečić | Python in Plain English
Take a look at the realm of data orchestration platforms and compare three contenders: Airflow, Prefect and Kestra. As of 2023, let’s analyze their respective strengths and features, aiming to uncover which platform stands out as the best choice for effectively managing and orchestrating data workflows.
The State of Data Streaming for Retail in 2023 | 9 min | Data Streaming | Kai Waehner | Personal Blog
In 2023, the retail industry is evolving with omnichannel experiences, hybrid shopping and hyper-personalized recommendations. Kai explores how data streaming enables real-time integration at scale, driving retail transformation with compelling examples from major players like Walmart and Albertsons.

TUTORIALS

Simplify Mission-Critical Workloads by Migrating to CockroachDB with AWS DMS | 7 min | Cloud | Oliver Tan, Ryan Kuo, and Pranav Deshmukh | AWS Blog
Let’s explore what CockroachDB is and describe how AWS DMS can help migrate data to CockroachDB, and walk through an example migration. Find out more about migrating mission-critical workloads with AWS DMS and migrate to CockroachDB Using AWS DMS.
Terraform with YAML: Part 2 | 5 min | Data Engineering | Chris ter Beke | Xebia Tech Blog
Another part of the tutorial we shared with you in DATA Pill 47 has landed.

In part one of this series we learned how to use YAML to simplify the configuration of Terraform resources. We mainly focussed on reducing syntax overheads of the HCL language and making the configuration accessible to non-infra engineers. In this second part we will dive into some more advanced techniques and patterns.
Vodafone: A DevOps approach to AI/ML through cloud-native CI/CD pipelines | 7 min | AI/ML | Riccardo Carlesso, Ashish Vijayvargia | Google Cloud Blog
Dig into and see how Vodafone achieved significant efficiency gains, improved data quality, flexibility and compliance processes by using their AI Booster. Read more about AI Booster at work and what is predicting churn in two different markets.

DATA LIBRARY

How Is ChatGPT’s Behavior Changing over Time? | 26 min | AI | Lingjiao Chen, Matei Zaharia, James Zou | Arxiv Archive
This one really shakes up what we thought about GPT4 and other LLMs. Some Stanford University people, including Matei Zaheria, checked if GPT4 and ChatGPT's performance has been flip-flopping over time. They found some pretty big changes. For example, GPT-4 went from scoring a cool 97.6% to a measly 2.4% between March and June in tasks like figuring out prime numbers. Other tasks saw less dramatic shifts, but there's no doubt the LLM performance is like a roller coaster ride. We're guessing that OpenAI is tweaking things on the regular, but the how and why is still a mystery. A heads up to everyone building stuff with GPT-4, unpredictable LLM behavior is a real headache.

DATA TUBE

MLOps London May - Driving ML Data Quality with Data Contracts | 44 min | ML | Andrew Jones | MLOps London
In this one, Andrew introduces the concept of Data Contracts and talks about how they at GoCardless are using it to improve the quality and reliability of data by empowering data consumers - including our Data Scientists - to work closely with the data generators and get the data they really need to power highly effective ML models and other data-driven products.

PODCAST

Data Journey with Ola Sars (Soundtrack Your Brand) - Data and AI in the music streaming B2B industry | 46 min | AI | Host: Adam Kawa; Guest: Ola Sars | Radio DaTa Podcast
Meet Ola Sars, the visionary music tech guru hailing from Stockholm. As the brain behind Soundtrack Your Brand, he's revolutionized the B2B music streaming landscape with a cloud-based platform that harmonizes licensed tunes, powerful analytics and personalized playlists.

Topics included in this episode:

  • The importance of data at Soundtrack Your Brand
  • Data-driven use-cases implemented at Soundtrack Your Brand
  • Differences between B2B music streaming (e.g. Soundtrack Your Brand) vs. B2C music streaming (e.g. Apple, Spotify)
  • If and how data & AI helps in achieving profitability at Soundtrack Your Brand
  • Current plans for investing in data & AI at Soundtrack Your Brand
  • Generative AI in the B2B music streaming industry

and more.

CONFS EVENTS AND MEETUPS

DevOps Summit 2023 | Live Virtual Online | 22-23th August
Join the DeveloperWeek CloudX, where one track is DevOps. In this one, the latest in automated testing, reporting and deployment while exploring and defining best practices for implementing security (DevSecOps), infrastructure as code and organizational change management to reap the full benefits of engineering automation will be discussed.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on
Tilda