DATA Pill feed

DATA Pill #005 - For the brainiacs : ML, Data Science and Feature Stores cheat sheets


Machine Learning Cheat Sheet | 5 min read | Machine Learning | DataCamp Blog

A handy guide describing the most widely used machine learning models, their advantages, disadvantages, and some key use-cases.

Because we all love cheat sheets, I want to show you one more. It’s a bit older, but still…
Data Science Cheat Sheet for Business Leaders | 7 min read | Data Science | DataCamp Blog

The basics of how data science can help businesses, including building a data science team and the common steps in the data science workflow.

4 Feature Stores - explained and compared | 5 min read | ML &MLOps | 👏 Jakub Jurczak | GetInData Blog 

4 popular Feature Stores comparison: Vertex AI Feature Store, FEAST, AWS SageMaker Feature Store and Databricks Feature Store on one cheat sheet. 

The Future of the Modern Data Stack in 2022 | 13 min read | Data Science | Prukalpa | Towards Data Science Blog 

Maybe not the newest article, but definitely one of the hottest.
The 6 big ideas you should know from 2021 - presented, analyzed and sprinkled with a prediction of the future.
  1. Data Mesh - “I think we’ll see a ton of platforms rebrand and offer their services as the 'ultimate data mesh platform'. The thing is, data mesh isn’t a platform or a service that you can buy off the shelf. It’s a design concept with some wonderful concepts like distributed ownership, domain-based design, data discoverability and data product shipping standards — all of which are worth trying to operationalize in your organization.”
  2. Metrics Layer 
  3. Reverse ETL
  4. Active metadata & Third-Gen Data Catalogs
  5. Data Teams and Product Teams
  6. Data Observability 

Data Race Patterns in Go | 10 min read | Programming | Milind Chabbi and Murali Krishna Ramanathan | Uber Engineering Blog

Uber has adopted Golang (Go for short) as a primary programming language for developing microservices.

“In this blog, we will present the various data race patterns we found in our Go programs. This study was conducted by analyzing over 1,100 data races fixed by 210 unique developers over a six-month period. Overall, we noticed that Go makes it easier to introduce data races, due to certain language design choices.”


The basics of how data science can help businesses, including building a data science team and the common steps in the data science workflow | 10 min read | Product Management | Olga Dudzik | allegro Tech 

On average, 10-20% of an IT budget is ultimately consumed by tech debt management and most CIOs interviewed consider the problem significantly increasing over past years, especially in enterprise-sized companies.

A juicy and very specific article on how to work with technology debt, how and who to convince of the need to work on debt and how to do value mapping and roadmapping of technological debt. 

Transitioning to Modern Testing: How Testers Can Stop Being the Training Wheels for Teams | 5 min read | Culture & Methods | Ben Linders | infoQ Blog

Conor Fitzgerald: 'Accelerate and the Modern Testing Principles' has shown us that teams that test their code and drive the automation efforts are amongst the best approaches to accelerate the achievement of shippable quality”

How to Deliver a Customer-Centric Banking and Insurance Experience with Data | 12 min read | Culture & Methods | Rinesh Patel & Jonathan Beaulier | Snowflake Blog

Nearly 70% of consumers say they’d like their banking experience to be similar to the experiences they have with Netflix, Amazon, and other tech companies when it comes to offering personalized recommendations.”
However,  the problem is that this industry is subject to much more regulation. So how do you provide the deepest possible analytics and data security in the cloud for the financial sector? Search for the answer in this article. 


Building a Machine Learning Pipeline With DBT | 15 min read | dbt | Joselito Balleta |

Setting up a proper data pipeline that performs feature engineering, trains and makes predictions on our data can get pretty complicated.  Yet it doesn’t have to be. Check out this guide. 



Orchestrating Machine Learning Applications | 47 min |  The Data Exchange
What is Flyte? 
Who uses Flyte?
Multi-modal models
Roadmap for Flyte and Union AI

FinOps with Joe Daly | 40 min | Google Cloud Podcast
FinOps principles and how they’re helping companies take advantage of the cloud while saving their bottom lines. FinOps - financial DevOps, making financial decisions in an effective and optimized way.



Snowflake Summit 2022 | 13-16 June | Vegas

10 different tracks about Data Cloud like:
Modern Data Architectures 
Data Engineering
Executive Insights
Data Science and Machine Learning
Accelerating Analytics

Data Science Summit: Machine Learning | 21-22 June | Online and ONSITE

We recommend the whole event, but especially 2 speeches of our colleagues from GetInData 👏 :
  • Piotr Chaberski and Adrian Dembek: How NOT to win a Kaggle competition
  • Mariusz Strzelecki: 7 Jupyter architectures for 7 different organizations

CONFITURA 2022 | 25 June | Warsaw

One of the biggest Polish JAVA conferences. 35 talks in 5 parallel sessions.

DEVOXX Poland | 22-24 June | Krakow | Hybrid event

2.700 Devoxxians onsite and online from 20 different countries, 100+ speakers.
A lot of tracks e.g.:

Big Data & AI 
Development Practices and more
Made on