A business’s taxes can be difficult to manage, especially in the United States. The Shopify team launched the Tax Insights feature as part of Shopify Tax, which has helped merchants stay more on top of their tax compliance than ever before. The entire product entailed intensive data work behind the scenes. It included modifying several existing data models, creating four new ones, building in functionality to handle dynamically changing data and publishing insights to a key-value store that subsequently gets surfaced to the end user. A nice case to explore deeper.
Dive into an overview of some recent breakthroughs of deep learning-based language models in molecular biology. Read about how these advances will converge with the direct training of LLMs on large-scale biomolecular and population health data in the coming years and propel the field forward.
It includes a brief overview of LLMs, a more detailed introduction into molecular biology, then proceeds to describe a few recent LLM advances in molecular biology, and finally glance into the future.
This is a must read if you want to know how to extract data from the Flink UI and plot a Flame Graph from it for offline analysis. This solves the problem, with the Flink Flame Graph being updated during Job execution or even being no longer available after a job terminates.
At Pinterest, Closeup recommendations are an important feed of recommended content shown in pin closeups. They generate the highest number of impressions and play a crucial role in inspiring users. To provide high-quality recommendations, the Closeup relevance team uses advanced machine learning techniques. They have developed deep neural network models that predict user outcomes and incorporate sequential features and personalized blending to create real-time rankings. This blog post includes how the Pinterest Team:
- got started on multi-task prediction
- further improved multi-task prediction in our DNN architecture using the Multi-gate Mixture of Experts (MMoE)
- introduced teacher-student regularization to stabilize ranking model predictions
and lots more.
Dataform is a tool that enables cross-team collaboration on SQL-based pipelines. By pairing SQL data transformations with configuration-as-code, data engineers can collectively create an end-to-end workflow within a single repository.
The purpose of this article is to demonstrate how to set up a repeatable and scalable ELT pipeline in Google Cloud using Dataform and Cloud Build. The overall architecture discussed here can be scaled across environments and developed collaboratively by teams, ensuring a streamlined and scalable production-ready set up.
Read about how to use column selection that has become available to the community on both Airbyte Open Source and Airbyte Cloud.
Databricks announces the preview of a Hive Metastore (HMS) interface for the Databricks Unity Catalog. This feature lets organizations centralize their data management, discovery and governance in the Unity Catalog and connect to it from a wide range of computing platforms. It also ensures consistent data governance across these platforms.
Announcement of Dataform, which lets data teams develop, version control and deploy SQL pipelines in BigQuery. You can read the tutorial on how to set up a repeatable and scalable ELT pipeline in Google Cloud using it in the tutorial section.
Mayo Clinic teams up with Google Cloud to boost patient care. AI-powered tools will make it simpler for doctors to find important info and improve clinical workflows. This collaboration ensures HIPAA compliance for secure data access and informed decision-making.
To grasp what lies ahead requires an understanding of the breakthroughs that have enabled the rise of generative AI, which were decades in the making. ChatGPT, GitHub Copilot, Stable Diffusion, and other generative AI tools that have captured current public attention are the result of significant levels of investment in recent years that have helped advance machine learning and deep learning.
McKinsey created a great report to dive deeper into AI generative potential.
What will you find here?
- Generative AI as a technology catalyst.
- Generative AI use cases across functions and industries.
- The generative AI future of work: Impacts on work activities, economic growth and productivity.
- Considerations for businesses and society.
In this podcast, Bob Muglia, an Enterprise, Builder and Author of The Datapreneurs: The Promise of AI and the Creators Building Our Future, answers every question you may have about the current and future state of generative AI.
BTW - here is a nice Snowflake related position in a very interesting project.
The Summit is aimed at people who use the cloud in their daily work to solve Data Engineering, Big Data, Data Science, Machine Learning and AI problems. The main idea of the conference is to promote knowledge and experience in designing and implementing tools for solving difficult and interesting challenges. If you have something to share with the community in this area - submit your presentation!
Join a dive deep into the revolutionary new world of LLMs, agents, auto-healing code, image generators, personalized tutors and more. Learn how to take advantage of these cutting edge new tools and how to do it consistently and reliably.
These talks will help you embrace the age of ambient intelligence and start putting these powerful new programs to work for you today.