DATA Pill feed

DATA Pill #007 - learn DATA Mesh and Apache Kafka, take part in the Kaggle competition and be like Bond in the DATA world

Pill, DATA Pill 007 You know like agent Bond, James Bond

OK, let's get back on track.

It was a juicy week with lots of valuable content to share with you.
Selecting the ‘creme de la creme’ was a struggle for me.

So please, grab a coffee (or vodka martini "shaken not stirred" ;)) and enjoy DATA Pill 007 (like James enjoys fast cars and beautiful women…)


The State of Data Engineering 2022 | 10 min read | Data Engineering | Einat Orr | lakeFS

A map of categorized tools and technologies with comments and an explanation.

What drives your customer’s decisions? Find answers with Machine Learning Models! H&M’s Kaggle competition | 11 min read | ML | 👏 Adrian Dembek | GetInData 

GetInData recently took part in the Kaggle H&M Personalized Fashion Recommendations competition where they were challenged to build a recommendation engine that would predict which articles a customer would buy in a particular week.
In this blog post Adrian presents:
  • a humans' decision making process as if it was an algorithm which processes multidimensional input information and generates outputs in the form of decisions.
  • how to represent the decision-making process with numbers
  • how to match machine learning algorithms to real-life concepts.

The CDP as we know it is dead: Introducing the Unbundled CDP | 7 min read | Data Warehouse | Tejas Manohar | Hightouch

"We’re predicting that most companies’ CDPs will be rebuilt on top of the data warehouse and look like this:"

Our journey towards an open data platform | 8 min read | Data Platform Engineering | Doran Parat

The journey of shaping Yotpo’s data platform architecture:
"Navigating the flooded data technologies market can be confusing at times. We find ourselves mixing managed, open-source and self-development solutions to build a balanced stack. So many decisions to make along the way — all made under one clear principle — keeping our data platform as open as possible."

dbt, BigQuery and Looker: Your Modern BI Tech Stack | 9 min read | dbt, BigQuery, Looker | Datatonic

dbt vs. Looker - sometimes it's not so clear where to put a certain transformation or how to model the ultimate reporting layer, etc.
This articleadresses some rules of thumb.

Presto® on Apache Kafka® At Uber Scale | 9 min read | Big Data | Uber Engineering

Presto® and Apache Kafka® play critical roles in Uber’s big data stack. In this article you will discover how Uber connected these two services together to enable a lightweight, interactive SQL query directly over Kafka via Presto at Uber scale.

Last but not least, something for DATA Dads & Moms ;):

Functional programming for kids? | 4 min read | Adam Warski | SoftwareMill 

Check out Shelly. The basic commands, going forward (fw), or turning right (right) remain the same as in the original (Logo). HOWEVER, Shelly language contains some elements of functional programming:
  • everything in Shelly is an expression, and hence has a value,
  • functions are first-class values,
  • everything is immutable.


Terraform Cloud Adds Drift Detection for Infrastructure Management | 5 min read | HasiCorp Blog 

Drift Detection provides continuous checks against an infrastructure state to detect and notify when there are changes.

Databricks Terraform Provider Is Now Generally Available | 5 min read | Databricks Blog 

There are multiple areas where a Databricks Provider can be used, such as:
  • Automating aspects of provisioning the Lakehouse components and implementing DataOps/DevOps/MLOps
  • Implementing an automated disaster recovery strategy

Introducing the dbt Certification Program | 2 min read | dbt

Introducing the new dbt Certification Program and the first dbt Analytics Engineering Certification exam.


Streaming Pseudonymization by tokenization | 10 min read | Streaming, Architecture | Robert Sahlin

Robert shared his presentation on LinkedIn from the Heroes of Data meetup.



Data Analytics Democratization: How ING Data Analytics Platform Bootstraps New Data Driven Products | 49 min | Krzysztof Adamski from ING | The Linux Foundation

Three years ago, ING (banking industry) took on the challenge of gathering a curated portfolio of internal data sources together with a large scale compute platform.
The idea core:
  • allowing internal projects to get access to a rich toolset of open source and industry standard frameworks
  • preprocessed data to validate business ideas in a secure exploration environment.
In this presentation you will discoverthe results, what the key elements of the strategy are and what is still ahead of the ING Data Analytics Platform.



Building for Crypto with Lewis Tuff | 51 min | Blockchain | Lewis Tuff in SE Daily’s VP of Engineering Lewis Tuff takes us behind the scenes for a look at the architecture, programming languages and database choices required to build an open, accessible and fair financial future, one piece of software at a time.

Data Journey with Max Schultze (Zalando) - Data Mesh | 1 h | 👏 Radio DaTa by GetInData 

Adam Kawa talks Max about how Zalando use data and analytics and how their data platform has evolved over the last few years
Data Mesh - what it is, what it is NOT, how it helps Zalando to become an even more data-driven company, if, when & how to introduce it to your organization.


DATA + AI SUMMIT | 27-30 June | San Francisco Hybrid

Made on