Well, almost everything, because there are also some interesting articles from Data Analytics, real-time, deep learning and shocking news.
How Capital One Operationalized Data Mesh on Snowflake | 9 min read | Data Mesh & Snowflake | Salim Syed & Patrick Barch | Capital One Blog
In 2020, Capital One moved the entire data workload to the cloud. That came with a challenge in managing the exponential growth in data that was previously stored in different places and meeting the demands of users with a variety of use cases. They had to change the complicated Data Ecosystem which was hard to scale. In the article, they described how they managed it by federated data management and a centralized tooling and policy.
Revolutionizing Analytics in Insurance using Fivetran, dbt and Vertex AI | 12 min read | Data Analytic | Sara Landfors | Heroes of Data Blog
Takeaway from Jacinta Waak, Filip Allard and Hannes Kindbom (Hedvig) presentation.
- That was not the first time we observed that companies consider replacing Fivetran with Airbyte to be more cost-effective.
- The ML world is still separate, next step from the Data world, but can be well integrated. ML on Vertex AI steps in after data is being ingested with Fivetran and modeled with dbt.
A look inside pseudonymization of PII in Streaming Data with Mathem | 12 min read | Data Streaming | Sara Landfors | Heroes of Data Blog
Technology to avoid high fines for GDPR issues - how to comply with regulations while also maintaining data integrity. Technology developed by Robert Sahlin, the Data Engineering Lead at Mathem.
They can simply delete the row for the member in the operational token vault, and then that person can’t be re-identified. They also have the option of deleting certain fields if only parts of the data needs to be deleted.
Key Takeaways from Gartner Data & Analytics Summit 2022 | 8 min read | Data Analytic | Prukalpa | Towards Data Science Blog
A couple of interesting takeaways from one of the recent data conferences by Prukalpa:
- Concept of MVD (minimum viable dataset) - the shift from big data to small data, going on a "data diet", avoiding data FOMO and stopping ingesting everything
- The replacement of real by synthetic data for AI - addressing weaknesses of real data (quality, privacy, etc.)
- Active metadata - ferrying metadata back and forth among all the tools in the stack
The next generation of Data Platforms is the Data Mesh | 10 min | Data Mesh | Jean-Georges Perrin | PayPal Blog
PayPal is building a Data Mesh, the next generation of data platforms. This post details the evolution of data platforms, highlights their problems and why they decided to build a Data Mesh.
How Airbnb safeguards changes in production Part II: Near Real-time Experiments | 8 min read | Deep Learning | Mike Lin, Preeti Ramasamy, Toby Mao, Zack Loebel-Begelman | Airbnb Tech Blog
This article reveals the architecture and engineering choices behind the various components that Safe Deploys comprise.
Designing a near real-time experimentation system required making explicit tradeoffs among speed, precision, cost, and resiliency. An early decision was to limit near real-time results to only the first 24 hours of an experiment — enough time to catch any major issues and transition to using comprehensive results from the batch pipeline. The idea being once batch results were available, experimenters would no longer need real time results.
Data Lakehouse vs. Data Mesh | 3 min | Data Mesh | Christian Lauer | CodeX
Data Warehouse, Data Lake and now Data Lakehouse and Data Mesh, what is what and where are the differences, especially the question of how do they relate to each other?
Data Mesh Operability Pattern | 7 min | Data Mesh | Eric Broda | Towards Data Science
The Data Mesh Operability Pattern helps us understand the operating characteristics of an enterprise Data Mesh.
- How does this pattern work?
- Why is this so important for the stability, resilience, and performance of enterprise Data Mesh?
Akka is moving away from Open Source | 8 min read | Open Source, Scala | Alexandru Nedelcu | alexn.org Blog
Lightbend is changing Akka’s licensing to “Business Source License (BSL)”.
A stab in the back for many projects.
Discussion on GitHub: https://github.com/akka/akka/pull/31561#issuecomment-1239473395
Data Journey with Alessandro Romano (FREE NOW) - Dynamic pricing in a real-time app, technology stack and pragmatism in data science | 1 h | Adam Kawa & Alessandro Romano | RadioDaTa
- Alessandro's journey to FREE NOW
- Techniques, signals, and KPIs used to develop the dynamic pricing ML model for a real-time mobile app
- Working with stakeholders to understand the changing priorities to adapt and optimize ML models
- The technology stack used by data scientists and ML engineers at FREE NOW
If you find this interesting, check out the Data Mass Gdańsk Summit (September 30th), where Alessandro will be a speaker. Also, the agenda is worth checking.
Data Mesh 101 | 1 h | Data Mesh | Zhamak Dehghani | Data Talks Club
If someone prefers (quite long) interviews, here is a high-quality interview about Data Mesh - with Zhamak Dehghani who defined Data Mesh while working for Thoughworks.
CONFS AND MEETUPS
PyData Trójmiasto x Thomson Reuters Labs #19 | 9 September | Olnine
- Practical application of language models in the legal domain
- AI Economics & delivering production level machine learning components with ModelOps +Q&A
Become IAM Cloud Hero with GDG Cloud Bydgoszcz | 20 September | Olnine
Gamification is on it's way to google certification. Registration for the event wili give you free access to Cloud Skills Boost for 3 calendar months.
Soon we will have some exciting news for our community.
BTW did You know that there are more than 1200 of us already in the DATA Pill community?