Before the predictions for 2023, let’s take a step back into the past and check the reflections on six major trends from 2022 that Prukalpa made at the beginning of this year. What did we get right? What didn’t quite go as expected? What did we completely miss? Read more about
where we started and where are we now with:
1.Data Mesh
2.Metrics Layer
3.Reverse ETL
4.Active Metadata & Third-Gen Data Catalogs
5.Data Teams as Product Teams
6.Data Observability
A short story about building a nationwide 5G network. Read how T-Mobile, who use Power BI built a centralized source of data, maintaining high levels of performance and functionality using a data lakehouse supported by Microsoft Azure Data Factory, Azure Synapse Analytics and Azure Databricks.
All the reasons why the company felt compelled to migrate over 30 of their models and sunset the old models explained. They did the migration in three distinct steps, which you can read about. A great example of how big migrations of business-critical models do not have to be boring or feel stressful.
In this article you can explore a proof-of-concept written in Terraform, where they will for example create the front-end layer of three-tier architecture.
We are starting 2023 soon and you are probably thinking about your new year’s resolutions. If one of them is better productivity then this article is for you. Here are some tips:1. Validate your choice of no-code tools with business needs
Compared to full-scale ML, a multi-armed bandit is a lightweight solution that can help teams to quickly optimize their product features without major commitments. However, bandits need to have a candidate selection step when they have too many items to choose from. Using A/B testing to optimize the candidate selection step causes new bandit bias and convergence selection bias. New bandit bias occurs when we try to compare new bandits with established ones in an experiment; convergence selection bias creeps in when we try to solve the new bandit bias by defining and selecting established bandits. We discuss our strategies to mitigate the impacts of these two biases..
If you are thinking about making your company more data-driven, in this blog post, you will find four enablers which will help you along the way:
What disadvantages do they have? Why should we use them? You can find the answers to these questions in the blog post
Our engineering and security teams have done some incredible work in 2022. Let’s take a look at how we use GitHub to be more productive, build collaboratively, and shift security left.
Einride is rethinking every piece of the freight system, from trailers to local deliveries to the remote and autonomous platforms to operate them. If you want to check how they plan to create a sustainable, resilient delivery network using AI and tech, read this blog post.
Did you know that most of the videos on Youtube are in English but less than 20% of the world’s population speak English as a first or second language? This is why voice dubbing is increasingly used to transform video in other languages. In this blog post you can read about the research of voice dubbing quality using deep learning.
Read this step-by-step tutorial where you will explore design patterns of your BigQuery storage that you can use to increase the speed and performance of your queries. To optimize your workloads on BigQuery, you can optimize your storage by:
In this blog post, you will also read about BigQuery storage and compute costs, how to investigate BigQuery performance issues, and more.
In this blog Keshav will establish the ML life cycle leveraging MLFlow – an open source machine learning platform and framework for managing ML life cycle. It is a short hands-on demonstration of the MLOPs standardization on a Mesh Platform.
After reading this one, you will know why the Dropbox team chose Apache Superset. They explain the problem they started with, evaluating data exploration tools and the results they gave them. Also, in this blogpost you can find the Data Visualization Platform Comparison Matrix.
A database transaction is a unit of work, designed to handle the changes of data in the database. It makes sure that the output of the data is consistent and doesn’t generate errors. It helps with concurrent changes to the database, and makes the database more scalable.
In this post, Kinnar discusses the features of Node Decommissioning and Persistent Volume Claim (PVC) reuse and their impact on increasing the fault tolerance of Spark jobs on Amazon EMR and EKS when optimizing using EC2 Spot Instances.
In this blog post you can find:
The add-on makes it possible for developers to gain access to Snowflake from within the VS Code architecture. This extension also connects the user to Snowflake and enables them to write and execute SQL queries, but also to see the results without ever leaving the VS Code. After one has successfully signed in, they can see and change their active database, schema, role and whole warehouse
Recently Grafana announced two new additions to its suite of observability and monitoring tools
"You might recently noticed that Debezium went a bit silent for the last few weeks. No, we are not going away. In fact the elves in Google worked furiously to bring you a present under a Christmas tree - Debezium Spanner connector."
After listening to this episode you will get to know the details about how batching works, the replication protocol, how Kafka’s networking stack dances with Linux’s one and which is the most important Scala class to read if you’re only going to read one.
Anna gives Kris the details about the bugs that she found and about some of the scariest, most surprising, and most enlightening corner cases.
In this video, Felipe Leite and Stephen Pastan from Miro unpack their shift to a Modern Data Stack and share the vital technical changes they made to build a scalable and tech-forward data stack. Watch this to discover how to efficiently scale your analytics stack when your data and data team grows 10x in 2 years and start prioritizing what gets done when there's that much growth.