In 2020, Capital One moved the entire data workload to the cloud. That came with a challenge in managing the exponential growth in data that was previously stored in different places and meeting the demands of users with a variety of use cases. They had to change the complicated Data Ecosystem which was hard to scale. In the article, they described how they managed it by federated data management and a centralized tooling and policy.
Takeaway from Jacinta Waak, Filip Allard and Hannes Kindbom (Hedvig) presentation.
Key insights:
Technology to avoid high fines for GDPR issues - how to comply with regulations while also maintaining data integrity. Technology developed by Robert Sahlin, the Data Engineering Lead at Mathem.
They can simply delete the row for the member in the operational token vault, and then that person can’t be re-identified. They also have the option of deleting certain fields if only parts of the data needs to be deleted.
A couple of interesting takeaways from one of the recent data conferences by Prukalpa:
PayPal is building a Data Mesh, the next generation of data platforms. This post details the evolution of data platforms, highlights their problems and why they decided to build a Data Mesh.
This article reveals the architecture and engineering choices behind the various components that Safe Deploys comprise.
Designing a near real-time experimentation system required making explicit tradeoffs among speed, precision, cost, and resiliency. An early decision was to limit near real-time results to only the first 24 hours of an experiment — enough time to catch any major issues and transition to using comprehensive results from the batch pipeline. The idea being once batch results were available, experimenters would no longer need real time results.
Data Warehouse, Data Lake and now Data Lakehouse and Data Mesh, what is what and where are the differences, especially the question of how do they relate to each other?
The Data Mesh Operability Pattern helps us understand the operating characteristics of an enterprise Data Mesh.
Lightbend is changing Akka’s licensing to “Business Source License (BSL)”.
A stab in the back for many projects.
Discussion on GitHub: https://github.com/akka/akka/pull/31561#issuecomment-1239473395
Topic:
If you find this interesting, check out the Data Mass Gdańsk Summit (September 30th), where Alessandro will be a speaker. Also, the agenda is worth checking.
If someone prefers (quite long) interviews, here is a high-quality interview about Data Mesh - with Zhamak Dehghani who defined Data Mesh while working for Thoughworks.
Gamification is on it's way to google certification. Registration for the event wili give you free access to Cloud Skills Boost for 3 calendar months.