Take a little step back again and take a look at what Amazon did over the last year. Read about more than 40 of the launched features in Amazon Redshift to help customers with their top data warehousing use cases, for example:
The above-mentioned are described in more detail in the article, enjoy!
a16z asked their partners to spotlight one big idea that startups in their fields would tackle in 2023. From entertainment franchise games to the precise delivery of medicines or small modular reactors to loads of AI applications. In the article are 40+ builder-worthy pursuits for the year ahead.
P.S. More summaries and predictions for 2023 in this edition of DATA Pill.
How can you achieve better performance from your models? Read the story on how Teads built an internal SQL query executor tool to wrap the execution of BigQuery Jobs, that now is a part of their go-to solution.
Some tips from Youssef on how to develop software and programming skills as a data scientist. For people who already have strong programming skills and would like to take it to the next level. 10 areas to work on and how to do it is already waiting for you to read.
Read the review of the book Fundamentals of Data Engineering by Joe Reis and Matt Housley, published by O’Reilly in June of 2022, and some takeaway lessons.
What will you find here?
and a little about choosing technology, security and privacy, and the future of Data Engineering.
If you are building a Data team you have to decide what structure you will make. Mikkel answers some of the most frequently asked questions.
Manually managing the lifecycle of Kubernetes nodes can become difficult as the cluster scales. Especially if your clusters are multi-tenant and self-managed. You may need to replace nodes for various reasons, such as OS upgrades and security patches. One of the biggest challenges is how to terminate nodes without disturbing the tenants. This blogpost is on how they managed this at yelp.
How Canva used a CLIP-inspired model to suggest keywords for template labeling in multiple languages. In this blog post Sachinthaka will walk you through how they gathered the data, designed the model architecture, trained with the special loss function that they chose and finally discuss the results.
If you have already got started with Terraform, this one can be something of interest for you. This is a document that provides guidelines and recommendations for effective development with Terraform across multiple team members and work streams.
An announcement about a useful subset of the new functions.
What does this mean for your data processing journeys? Read more and find examples of how they may prove useful.
The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge, due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Listen to the episode where Adam Kamor explores:
In this Q&A session, Marcin answers the following:
Plus more MLOps-related questions.
The next Paper Talks meeting is coming! A meeeting for anyone interested in Data Science or Machine Learning. If you join the event, you will be able to meet the Analytics team and talk with them about a paper called “Emergent Abilities of Large Language Models”. They are encouraged to be active and leave comments about the paper before the event. No registration needed!
In this talk, Dipankar will walk you through the various data & file optimization strategies that help to achieve robust performance in #ApacheIceberg.
This talk will cover the importance of a solid foundation and what management should do to fix it. To do this Jesse will be sharing a real-life analogy to show how we can be misled and what this means for our success rates.
During this webinar you will learn about: