Dive into a story about how during the past six months, Mathem transitioned to a data platform team, emphasizing data contracts and a unified architecture. This involves streaming Beam jobs, GCP Dataflow and DBT for data warehousing. StreamProcessor consolidates data streams, optimizing resource usage whilst data contracts handle the schema and metadata, and enable customized data processing.
This blog describes how we built a system to swiftly answer these questions without requiring deep technical expertise or a context. Read more about the three goals they had:
- Improving advertiser satisfaction by reducing the time taken to resolve their issues
- Automating the data analysis and generating recommendations for the advertiser to improve the delivery rate
- Coveraging all system components (indexing, budgeting/pacing, candidate generation, ranking, ad funnel, auction, etc.)
Dive into the challenges they faced, like Data Coverage, microservices, accessibility to Non-Engineering teams and the solutions they have made.
Remember the first part of the article from the DATA Pill 56 edition where Marcin, Marek and Michał presented their kedro-snowflake plugin that enables you to run your Kedro pipelines on the Snowflake Data Cloud in 3 simple steps? Your wait for the next part is over right now. Dive into the instructions on how you can implement the end-to-end MLOps platform on top of Snowflake powered by Kedro and MLflow.
If you are interested in getting started with the AWS CDK in combination with Go language then this text is for you. Compare the Python and the Go implementation and identify the differences and solutions on how you can implement the same functionality, but in these different programming languages.
In this one, the AWS team demonstrates how to perform feature engineering using Athena with Apache Iceberg. You can also read about using the CTAS query to create an Apache Iceberg table on Athena from an existing dataset in Apache Parquet format, adding new features in an existing Apache Iceberg table on Athena using the ALTER query, and using UPDATE and MERGE query statements to update the feature values of existing columns.
Specify what you want it to build, the AI asks for clarification, and then builds it. GPT Engineer is made to be easy to adapt, extend and make your agent learn how you want your code to look. It generates an entire codebase based on a prompt.
Let’s explore the recommendation systems definition, how they work, how you can implement them, what benefits you can get from recommendations and how you should measure the performance and business value of recommender systems. At the end you can read about the GID ML Framework and how it can help you in developing your recommendation systems.
We are excited to announce the release of Delta Lake 2.4.0 on Apache Spark 3.4. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
The key features in this release are as follows, for example:
- Support for Apache Spark 3.4.
- Support writing Deletion Vectors for the DELETE command. Previously, when deleting rows from a Delta table, any file with at least one matching row would be rewritten. With Deletion Vectors, these expensive rewrites can be avoided. See 'What are deletion vectors?' for more details.
- Support for all write operations on tables with Deletion Vectors enabled.
Microsoft announces Azure OpenAI Service in public preview. Harness the power of OpenAI models like ChatGPT and GPT-4 with your data. Revolutionize data interaction, analysis, and gain valuable insights. Discover features, use cases, data sources, and the next steps in leveraging Azure OpenAI Service.
Mayo Clinic teams up with Google Cloud to boost patient care. AI-powered tools will make it simpler for doctors to find important info and improve clinical workflows. This collaboration ensures HIPAA compliance for secure data access and informed decision-making.
OpenLineage is a standard for metadata and lineage collection that is growing rapidly. Column-level lineage is one of its most anticipated features of the community that has been developed recently.
Listen to the talk where Paweł and Maciej:
- show foundations for column lineage within OpenLineage standard,
- provide real-life demo on how is it automatically extracted from Spark jobs,
- describe and demo column lineage extraction from SQL queries,
- show how the lineage can be consumed on Marquez backend.
They aim to provide demos to focus on practical aspects of the column-level lineage which are interesting to data practitioners all over the world.
Let’s listen to a talk with Jakub, who has over 15 years of experience in the financial industry, especially in using data & analytics at banks. Before joining Commerzbank in 2019, he had been working at mBank and Alior Bank.
In this episode, you can find out:
- How banks use data & analytics
- Types of data that banks analyze, e.g., payments, clickstream, chatbot conversations
- Importance of personalized approach to every customer and its real-world examples
- Use of AI in the banking industry, e.g., Doc AI, Personalized AI assistants & advisors
and much more.
Learn from the trailblazers from some of Poland’s leading companies how they leverage data to solve their most demanding data challenges, and connect with your fellow data leaders from across the country. You will have an opportunity to listen to Marek Wiewiórka’s presentation: From 0 to MLOps with ❄️Snowflake Data Cloud that is related to text you could read in the articles section.
Join and learn how to effectively manage your Snowflake costs using advanced techniques from the Data Cloud Deployment Framework. Gain insights into tracking, monitoring, reporting, and receiving alerts to allocate costs across teams and business entities. You will learn how to:
- Utilize DCDF concepts to capture and monitor usage data.
- Leverage chargebacks and showbacks for tracking and allocating cloud infrastructure and services costs.
- Proactively notify your team with native alerts and resource monitors.