DATA Pill feed

DATA Pill #089 - Looker, dbt, real-time streaming, Java and Kubernetes relationship

ARTICLES

A developer’s second brain: Reducing complexity through partnership with AI | 7 min | AI | Eirini Kalliamvakou | Github Engineering Blog
The article talks about how AI is changing the way software developers work. It's based on interviews with 25 developers by GitHub Next, aiming to get a real-world perspective on AI's role in their daily tasks and how it shapes their jobs. This feedback helps in figuring out the future of AI in software development.
Data Modelling in Looker: PDT vs DBT | 15 min | Data Analytics | Anna Wnuczko | GetInData | Part of Xebia Blog
The accurate data model is one of the essential aspects that help companies to become more data-driven organizations. When we think about data modelling in Looker, we have two approaches: we can use Looker PDT or model data in dbt first. Which approach is better and when? Read about two ways of modeling data on the same use case.
The Scary Thing About Automating Deploys | 14 min | DevOps | Sean McIlroy | Slack Engineering Blog
The article explains Slack's deployment strategy, focusing on quick and frequent updates for user-responsive iteration and error reduction and efficient management of high-frequency updates despite large-scale inputs. It also highlights the transition to automated processes with ReleaseBot, addressing the technical aspects of deployment management, including anomaly detection, monitoring, and the benefits and challenges of automation.
Warm up the relationship between Java and Kubernetes | 14 min | Data Engineering | Tony Demol | BlaBlaCar Blog
This article will detail how BlaBlaCar faced cold JVM issues and implemented a warmup system leveraging the Kubernetes native features. It will also explore some other possible existing or emerging alternatives (because warmup is a “hot” topic!).
Real-time data processing using Change Data Capture and event-driven architecture | 8 min | Data Streaming | Ranjit Singh | Macquarie Engineering Blog
Macquarie Group's Banking and Financial Services division is adopting event-streaming and microservices-based architecture to enable real-time event processing. Read how they address integration challenges with existing systems using CDC and make strategic technology choices to ensure compatibility and system efficiency.

TUTORIALS

Unit testing with dbt | 7 min | Data Engineering | Matthieu Bonneviot | Teads Engineering
The article discusses Teads' shift from a Spark and Parquet-based BI system to a cutting-edge dbt and BigQuery framework. It highlights the author's journey in migrating a pipeline from the former system to the latter, emphasizing the critical role and methodology of unit testing within dbt.
Building real-time data views with Streamhouse | 7 min | Data streaming | Alexey Novakov | Ververica Blog
This blog post explores building a real-time data view with Apache Paimon on Streamhouse, focusing on efficient data analytics pipelines and low-latency solutions for data engineers. It shows the use of Apache Flink for real-time processing and Apache Paimon for cost-effective storage, demonstrating their combined power in modern data management.
Towards AGI: Making LLMs better at Reasoning | 13 min | LLM | Manas Singh | MLOps Community Blog
The article discusses a proposed LLM structure that combines data processing, prompting, and Reinforcement Learning to develop a customer support bot. This bot is designed to handle customer questions involving mathematical, commonsense, and symbolic reasoning.
Design a data mesh on AWS that reflects the envisioned organization | 7 min | Data Mesh | Claudia Chitu, Spyridon Dosis, Srikant Das | AWS Blog
This tutorial discusses how Acast overcame the challenge of coupled dependencies between teams working with data at scale by employing the concept of a data mesh.

TOOLS

FOCUS™ | FinOps
The FinOps Cost and Usage Specification (FOCUS™) standardizes cloud cost data, making it easier for companies to understand and manage their cloud expenses. It converts complex cloud billing data into a straightforward, standardized format. This simplification aids consistent reporting across multiple cloud vendors and reduces the complexity of financial operations like allocation, chargeback, budgeting, and forecasting.
SQL Assistant: Text-to-SQL Application in Streamlit | 7 min | Data Science | Romy Mendez | Personal Blog
This article explores the application of Vanna.ai, a Python library specifically designed for training a model capable of processing natural language questions and generating SQL queries as responses. The implementation will be integrated into a Streamlit application, creating a chatbot that facilitates posing questions and explains the returned queries.

PODCAST

AI Roundtable | 51 min | AI | Kyle Polich, Pramit Choudhary, Frank Bell | Data Skeptic Podcast
Listen to a talk where Kyle, Pramit, and Frank discuss the impacts LLMs and machine learning have had on the industry in the past year and where things may go in the current year.

CONFS EVENTS AND MEETUPS

Real-Time Data to Drive Business Growth and Innovation in 2024 | Data Streaming | Webinar | 31st January
During this webinar, you will explore practical examples and success stories that highlight the benefits realized by top companies through their implementation of data streaming strategies.
Big Data Technology Warsaw 2024 | On-site and Online event | 10-11th April
The Big Data Technology Warsaw Summit returns on April 10-11, 2024. This event is a prime gathering for data enthusiasts, experts, and innovators from across the globe. Take advantage of this opportunity to broaden your knowledge, connect with industry leaders, and shape your data strategy for success. Remember, the special promotional price is available for a limited time only!
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on
Tilda