DATA Pill feed

DATA Pill #028 - how data-driven is your company really? Also what is the future of AI?

Today on the menu we have an exquisite meal of data. 
We have dished up data for you in every possible form:
Data driven, Data Pipeline, Data Warehouse and Data Lakehouse.
Choose whatever you fancy.


Is my company data-driven? Here’s how you can find out | 12 min | Data Driven | Piotr Menclewicz | GetInData Blog

As data is becoming the primary source of wealth in the world, being data-driven is not optional anymore. So… is your company data-driven? Find out:
  • How to understand the current data maturity of your company
  • How Diagnosis helps you set realistic goals for the nearest future

You can also diagnose your company by filling out the survey to get a tailored summary report with insights from experts. It’s basically a free data-drivenness assessment ;)

The Near Future of AI is Action-Driven | 8 min | AI | John McDonnell | Personal Blog

Large Language Models got good in 2022. This means that in the nearest feature we will finally see more action-driven applications. Find out how ReAct takes these three steps: Thought (about what is needed), Act (choice of action), and Observation (see the outcome of the action) together.

Principles for data driven organizations | 8 min | Data | Xavier Amatriain | Personal Blog

Data drivenness is a somewhat controversial term that has many definitions and interpretations. In fact, many argue that being data-driven can harm organizations. So, what does it mean to be a data driven organization? What are the aspects involved in making data driven decisions? What does Hypothesis-driven product development look like? Here are a few pillars of a data driven decision culture:

  • A hypothesis-driven approach to data in which the hypothesis and metric that will be used to test it should be specified beforehand. In this approach, questions are asked and formulated ahead of time, and data is sought in the right way to answer the question at hand.
  • Metrics used to make decisions are explicitly communicated, known and agreed upon by all stakeholders
  • Data that enables decision making processes is expected to be trustworthy, which means that data quality mechanisms that guarantee such quality need to be in place.

The rest is in the article.

Will Rust Take over Data Engineering? | 10 min | Data | Simon Späti | Airbyte Blog

The goal of Rust is to be a good programming language for creating highly concurrent, safe and performant systems. Read why Rust is good for Data Engineers and if it is going to kill Python. You'll also find Open-Source Rust Projects interesting here.

For your eyes only: improving Netflix video quality with neural networks | 6 min | Neural Networks | Christos G. Bampis, Li-Heng Chen & Zhi Li | Netflix Blog

Recently, Netflix added a powerful tool to video encoding: neural networks, for video downscaling. In this tech blog they share how they improved Netflix video quality with neural networks.

3 Reasons Product Analytics Should Work Directly on Your Data Warehouse | 5 min | Analytics + Data Warehouse | Abhishek Rai | NetSpring Blog

Despite these benefits, many product management teams still haven’t adopted analytics tools, primarily because their legacy architecture is unable to address many real-world needs. Why product analytics should work directly on the modern data warehouse/data lakehouse and how this solves many of the challenges and limitations for analytics today.
1. Avoid Data Duplication and ETL Pipelines
2. Eliminate Tunnel Vision
3. Provide Greater Extensibility

Metadata-driven architecture for Data Lakehouse | 6 min | Architecture | Kaspar Haavajõe, Andreas Aadli, Newel Rice | Personal Blog

How metadata-driven architecture is used to build out and automate the Data Lakehouse. In this case, the company streams a high volume of real-time business-critical events using the Kafka ecosystem. For security reasons, the company is using a “collector application” that uses a JSON Schema document. Currently they do not get access to the data quickly  enough to make decisions due to high latency (from hours up to days) as the data is batch-processed using Amazon S3 and Apache Spark on Amazon EMR cluster. This article is about the architecture that should fix this. 

Dynamic Tables: Delivering Declarative Streaming Data Pipelines with Snowflake | 8 min | Streaming | Saras Nowak, Jeremiah Hansen | Snowflake Blog

What are dynamic tables and how do they work? 
What you should choose when:
  •  You need to incorporate UDFs/UDTFs, Stored Procedures, External Functions and Snowpark transformations written in Python, Java, or Scala.
  •  You want to improve the performance of external tables.
  •  Your transformation requires complex SQL, including Joins, Aggregates, Window Functions and more.

Check the background of modern data architectures and the reasons data pipelines have become hard to manage and even harder to scale.




MLOps for Ad Platforms | 49 min | MLOps | hosts: Demetrios Brinkmann, Abi Aryan; guest: Andrew Yates |

Andrew Yates is a CEO at He led the ads ranking, auction and marketplace engineering and research teams at Facebook and Pinterest. Listen to a podcast episode about
  • searching and discovering teams in bigger tech companies, 
  • strategy around technical debt, 
  • drawbacks with doing real-time streaming,
 and more.

Data Journey with Michał Wróbel (RenoFi) - Doing more with less with Modern Data Platform and ML at a home renovation FinTech | 55 min | MDP & ML | hosts: Adam Kawa; guest: Michał Wróbel | Radio DaTa

Michał shares experience from FinTech that uses the after-renovation value instead of your home's current value, enabling to borrow the most money at the lowest rates. Topics:
  • Data that is used at RenoFiBusiness use cases that are developed using this data (e.g. lead scoring)
  • Modern Data Platform on top of Google Cloud Platform at RenoFi
  • Building more (stuff) with less (people) 
  • Good decisions made by the CTO when launching the company
  • Advanced ML/AI modes or real-time analytics - build or not to build at a startup?

Build Data Products Without A Data Team Using AgileData | 1 h 12 min | Data Engineering | hosts: Tobias Macey; guest: Shane Gibson | Data Engineering Podcast

With the rise of use in cloud platforms and self-serve data technologies, the complications encountered  when building data products are dropping. In this episode, Shane Gibson who co-founded AgileData explains the design of the platform and how it builds on agile development principles to help focus on delivering value.



Data-driven Fast Track: introduction to data drivenness | 23 November | Online free webinar

You will take a look at our 3-step framework for data-driven transformation. You will learn how to:

  • assess how data-driven your company is today
  • generate ideas for new initiatives 
  • implement these initiatives to increase your chances of success

AI & Big Data Global | 1-2 December | London & online

This is a free event where dozens of speakers discuss the latest developments in the world of AI & Big Data. It is a showcase of next-generation technologies and strategies.


Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub

Made on