DATA Pill feed

DATA Pill #115 - CI/CD at Amazon vs. Google, Building Churn Models, LLM Principles

ARTICLES

How Amazon and Google view CI/CD in an entirely different way | 9 min | Engineering | Carlos Arguelles | Personal Blog
Dive into different approaches to CI/CD at Amazon and Google. The article highlights the unique CI/CD philosophies of these companies, drawing from the author's experience as the Technical Lead for Integration Testing Infrastructure at both.
How to predict Subscription Churn: key elements of building a churn model | 11 min | ML | Adrian Dembek | GetInData | Part of Xebia Blog
This article guides you through building a churn model from a business perspective, covering key challenges, the importance of business input in feature creation, and translating business insights into data for a machine learning model.
The LLM Triangle Principles to Architect Reliable AI Apps | 16 min | LLM | Almog Baku | Towards Data Science
This article distills critical principles for building practical LLM applications, focusing on a structured approach with a clear SOP, a suitable model, strategic engineering techniques, and relevant contextual data.
How Can Organizations Think Differently to Get the Most Out of AI? | 7 min | Data Science | Cassie Kozyrkov | Personal Blog
What is the future of data scientists? As AI evolves, the role of data scientists is more critical than ever. They focus on enterprise-scale automation and maintaining robust, reliable systems. Learn why precision thinking and the ability to translate business needs into AI solutions will keep data scientists in high demand.

NEWS

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context | 6 min | LLM | Philipp Schmid, Omar Sanseviero, Alvaro Bartolome, Leandro von Werra, Daniel Vila, Vaibhav Srivastav, Marc Sun, Pedro Cuenca | Hugging Face Blog
Llama 3.1 is out with eight open-weight models (3 base and five fine-tuned) in three sizes: 8B, 70B, and 405B, all available on Hugging Face. Meta also released Llama Guard 3 and Prompt Guard, models designed to classify LLM inputs and detect prompt injections and jailbreaks.

TUTORIALS

Star Schema Data Modeling Best Practices on Databricks SQL | 11 min | Data Modeling | Shyam Rao | DBSQL SME Engineering
In this blog, you'll find how to use the Star Schema with Databricks SQL to improve your data warehouse. Learn about boosting performance and scalability with Delta Live Tables for ETL, managed Delta Lake tables, and Liquid Clustering. Discover how the Databricks AI assistant can automate data model creation for seamless AI integration.
Use Rust to Write Spark Apps | 4 min | Data Engineering | Steve Russo | Personal Blog
Developing and deploying Spark applications was difficult and limited until Spark 3.4. Spark Connect's new architecture allows Spark applications to be written in various languages, including Rust, simplifying development and deployment.
Automate AWS Lambda Layer Creation for Python with terraform and GitHub Action | 8 min | DevOps | Akhilesh Mishra | KPMG UK Engineering
Dive into serverless architecture, AWS Lambda, and the benefits of Lambda Layers for managing dependencies. The tutorial also covers building and managing Lambda Layers using Terraform and GitHub Actions for efficient, automated deployments.

PODCAST

Are Spreadsheets Still Relevant For Data Analysis? | 34 min | Data Analysis | Host: Adel Nehme; Guest: Jordan Goldmeier | Data Camp Podcast
Adel and Jordan explore excel in data science, the impact of GenAI on Excel, Power Query and data transformation, advanced Excel features, Excel for prototyping and generating buy-in, the limitations of Excel and what other tools might emerge in its place, and much more.
How Generative AI Is Impacting Data Engineering Teams | 55 min | Gen AI | Host: Tobias Macey; Guest: Lior Gavish | Data Engineering Podcast
Generative AI is rapidly gaining adoption, requiring data platforms to add new features and data teams to take on more responsibilities. In this episode, co-founder of Monte Carlo, Lior Gavish, discusses how data teams are evolving to support AI-powered features and incorporate AI into their work.

CONFS EVENTS AND MEETUPS

Data Expo 2024 | Utrecht | 11-12th September
Data Expo offers essential insights, connections, and technology for data-driven leaders, whether advanced or just starting your data journey. Attend to explore data, analytics, and cloud solutions, gain strategic knowledge, compare tools and technologies, discover trends, and network with over 4,000 professionals.
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on
Tilda