DATA Pill feed

DATA Pill #010 - MLflow on GCP, the Modern Data Stack is dead and trends in software development.


An Open-Source Tool to Change Data Validation As You Know It | 8 min read | Data Validation | Madison Schott | Towards Data Science

If you’ve ever had to migrate your data warehouse to a new platform or account, then you know how time-consuming and painful this can be. Recently, I’ve had to migrate my data from one Snowflake account to another, reconfiguring all of the data ingestion pipelines and orchestration. You quickly realize how one little change can (and will) break everything in your data pipeline.

Is “Self-Service” Data’s Biggest Lie? | 8 min read | Self - Serve | Barr Moses| Monte Carlo

Is self-service a lie? I know - another provocative title, however it's about some of the challenges and pitfalls of MDS setup. By the CEO of Monte Carlo, about an interesting discussion that recently took place in NY. Take a look.

Deploying MLflow on the Google Cloud Platform using App Engine | 12 min read | Cloud | 👏Marcin Zabłocki | GetInData Blog

Read the step-by-step guide which will help you to deploy MLflow instances on the Google Cloud Platform using App Engine. In the article, Marcin Zabłocki has described how to:
  • Pre-configure OAuth 2.0 Client
  • Build the docker image for MLflow on App Engine
  • Prepare the Terraform inputs

How Airbnb Safeguards Changes in Production | 8 min read | Software Engineering | Michael LIn | Airbnb Tech Blog

With the statistical methods in place to evaluate business metrics in near real-time, we can now detect problems that were invisible to Spinnaker, or required too much lead time to rely on traditional ERF experiments.

5 Principles You Need To Know About Continuous ML Data Intelligence | 9 min read | Vikram Chatterji | Machine Learning | ML Community

The five pillars of ML data intelligence are:
  • Inspection
  • Actionability
  • Continuity
  • Collaboration
  • Scalability


Ververiva Platform 2.7 for Flink | 5 min read | Daisy Tsang | Ververica Blog

This new release includes full support for Apache Flink, a lot of improvements to user experience and a new visual brand identity for Ververica.

Trends in Software Development 2022 | 11 min read | Andrzej Frydryszak | ITMagination Blog

Some of the fifteen most impactful trends in 2022:
  • Observability is crucial
  • It’s good practice to use both the “serverless” & “serverful” approaches
  • Containerize everything! Kubernetes is a hot technology!
  • Everything can be done “as a Service”
  • A great long-term hiring strategy is to hire juniors and train them


Modern Data Stack is Dead? | 4 min read

Lauren Balik argues that the Modern Data Stack is already dying, that this is a flawed concept and should be replaced with the “Postmodern Data Stack” that she defines… Do you agree? We advise you to go through the comments.


Multicloud reporting and analytics using Google Cloud SQL and Power BI | 7 min read | Google Cloud | Matthew Smith | Google Cloud Blog

The following guide demonstrates the key steps to configuring Power BI reporting from Cloud SQL.


7 Jupyter architectures for 7 different organizations | 49 min | GetInData

As ML engineers, you often work on providing the Jupyter environment for Data Science teams, so you probably know that providing a platform that is both flexible and cost-effective is a challenge. In this video, you can learn about the different Jupyter setups, their pros and cons and listen to the lessons we learned.


Why and When to Use Kubeflow for MLOps | 58 min | MLOps | ML Community

Kubeflow is an excellent platform if your team is already leveraging Kubernetes and allows for a truly collaborative experience. In this episode, Ryan Russon talks about the pros and cons of using Kubeflow in your MLOps.

Charting the Path of Riskified’s Data Platform Journey| 40 min | Data Platform | Data Engineering Podcast

Inbar Yogev and Lior Winner share the Data Platform’s Journey that Riskified have been on and talk about how they have established a guild system for training and supporting data professionals in the organization.


Today, we would like to share the second part of an assessment of one of them with you.
A Review of the Big Data Technology Warsaw Summit 2022! Part 2. Top 3 best-rated presentations |11 min read | 👏 Michał Rudko & Mariusz Strzelecki | GetInData Blog

Furthermore, we'd like to invite you to some of upcoming events:

Meetup #10: Service Mesh, GKE and Cloud Native applications | Google Cloud Warsaw | 23 July | Warsaw

PrestoCon Day | Linux Foundation | 21 July | Free Virtual Conference

The future of data lineage | Devs & Data | 27 July | Tel Aviv-Yafo

Made on