DATA Pill feed

DATA Pill #029 - What is the future of Apache Flink? And what do football and LoL have to do with DATA?


Apache Flink SQL: Past, Present, and Future | 12 min | Apache Flink | Becket Qin | Ververica Blog

This post summarizes the important milestones of Flink SQL over the past several years. It highlights the critical issues and challenges that have arisen to explain where it is today, and demonstrates the path Flink SQL has been through and where it might head in the future, based on personal observation and opinions.

Replacing Apache Druid with Snowflake Snowpipe | 8 min | Snowflake | Jitesh Mogre | GumGum Tech Blog

The battle between Apache Druid and Snowflake Snowpipe is over. Or maybe just in this case, but the whole article about replacing Apache with Snowflake Snowpipe is waiting for you. If you want to know what the author thinks about the benefits, do not hesitate to read the text.

Thinking Industry Solutions on Snowflake | Part 1 | 7 min | Snowflake| Chinmayee Lakkad | Snowflake Blog

Snowflake worked on an evolutionary solution that has revolutionized the healthcare industry. Read the overview of how technology helps with the struggle in dealing with serious personnel shortages across the field of nursing.

Deep Learning with Azure: PyTorch distributed training done right in Kedro | 5 min | Deep Learning | Marcin Zabłocki | GetInData Blog

A Deep Learning tutorial on how you can use PyTorch with Kedro to train neural networks and easily scale-out the training using distributed computing in Azure ML.

(Re)building Threat Detection and Incident Response at LinkedIn | 15 min | Data | Sagar Shah & Jeff Bollinger | LinkedIn Blog

With Moonbase (a program whose aim was to reimagine its capabilities and the scale of its monitoring and response solutions), LinkedIn was able to reduce incident investigation times by 50%, increase threat detection coverage expansion by 900% and reduce the amount of time needed to detect and contain security incidents from weeks or days to hours.

17 dbt Commands You Should Start Using Today | 12 min | dbt | Bruno Souza de Lima | Indicium Engineering Blog

A 1-page cheat sheet with the main commands and flags you should use in your transformations with dbt. It will help people learn about features they don’t know dbt provides, or even deliver experienced professionals a place to remember commands faster, instead of having to spend minutes looking for it in the documentation.

Using DBT for building a Medallion Lakehouse architecture (Azure Databricks + Delta + DBT) | 15 min | dbt | Piethein Strengholt | Personal Blog

Read an article about what exactly is dbt, where can you position it in your data landscape and how it works. What differs from other tools? Slide into this article and check out a use case. Be forewarned - there is a lot to read, but without doubt you won't regret it!

Emerging Architectures for Modern Data Infrastructure | 12 min | Data Infrastructure | Matt Bornstein, Jennifer Li, and Martin Casado | Andreessen Horowitz 

This one is maturing like a wine.  Released two years ago, it's still current and touches on an important topic. Here you will discover more about the data platform hypothesis, Multimodal Data Processing, Modern Business Intelligence and data apps.




Realities of Being A Data Engineer — Migrations | 15 min | Migration | Ben Rogojan | SeattleDataGuy Blog

If you are a Data Engineer you are going to be taking part in a migration project approx. every 2 years. Why invest in what could end up being millions of dollars to switch from one system to another , or why postpone  other work to take on a project that doesn’t directly add to your company’s bottom line? How do you migrate? Find out in this info-packed article. 

How you roll out your migration should be decided before starting.
There are a lot of ways migrations can go wrong throughout the process and only a few ways they can go right.

How To Update Your Status During Standup Like a Senior Engineer | 9 min | Tips | Edward Huang | Better Programming Blog

A status update is where you can showcase how well you manage ambiguity and is an important way to build trust within your team. What should you talk about on a status update? How should you communicate blockers? 

Since we are talking about being a Senior, here is an interesting Senior Software Engineer position

Top Tips for Data Engineers | 15 min | Tips | Hubert Liang | Personal Blog

Cloud Architecture, Data Mining and Open Source availability has transformed Data Engineering. The top traits to prep and be passionate about are the following:



Morning Coffee with Google Cloud - Epidemic Sound: On a growth journey to soundtrack the world | 23 min | Google Cloud | host: Maria Wiss, guest: Alexander Holzmann | Google Cloud Events

An interview with Alexander Holzmann Epidemic Sound (a platform with soundtracks for creators), about how Google's tools support Epidemic Sounds (a platform with soundtracks for creators) work.




How Chelsea FC Uses Analytics to Drive Matchday Success | 46 min | Analytics | hosts: Richie Cotton; guest: Federico Bettuzzi | Data Framed

Podcast takeaways:

  • Chelsea’s data team uses two main sources: event data, which is every relevant action taken in a game and tracking data, which provides information about players and ball positioning.
  • Chelsea’s data team chooses long-term projects based on what will result in regular usage by the team and how much time they have to ensure that the project is both effective and fully-functional when it is put in place.

A Look At The Data Systems Behind The Gameplay For League Of Legends | 1 h | Data Engineering | hosts: Tobias Macey; guest: Ian Schweer | Data Engineering Podcast Podcast

In this episode, Ian Schweer shares his experiences at Riot Games supporting player-focused features such as machine learning models and recommender systems that are deployed as part of the game binary. He explains the constraints that he and his team are faced with and the various challenges that they have overcome to build useful data products on top of a legacy platform where they don’t control the end-to-end systems.

Cloudy with a chance of… the state of cloud in 2022 | 28 min | Cloud | hosts: Ryan Donovan & Ben Popper; guest: Drew Firment | The Stack Overflow Podcast

Ben and Ryan chat with Drew Firment, chief cloud strategist at Pluralsight, about the state of cloud today. They cover the skills gap that leads to delays in implementation, the inertia around infrastructure at a lot of organizations and the steps you can take to get (and prove) cloud literacy. 



Applying AI & Machine Learning to Finance & Technology | 23 November | Online free webinar

AI and machine learning applications in finance and technology.
Learn from practitioners, technical experts and executives about how to solve real-world problems by harnessing disruptions in data, artificial intelligence, machine learning and cutting-edge technologies.

Big Data Tech Warsaw Summit  | 29-30 March 2023 | online and onsite | Call For Presentation till 30 November

A chance to speak in front of an audience of Big Data professionals. 
More than 500 professionals will attend the conference to hear dozens of technical presentations. One of them could be yours ;)

Made on