DATA Pill feed

DATA Pill #064 - Generative AI in Jupyter, dbt & Machine Learning


Innovative Recommendation Applications Using Two Tower Embeddings at Uber | 7 min | AI | Bo Ling, Melissa Barr, Dhruva Dixith Kurra, Chun Zhu, Nicholas Marcott | Uber Engineering Blog
This blog covers what embeddings are, Uber’s architecture, challenges and the thousands of city-wide models they developed before they reached their single, global contextual model.
dbt & Machine Learning? It is possible! | 9 min | ML | Magda Stawirowska | GetInData | Part of Xebia Blog
Learn how to streamline your Data Science process with BigQuery ML, eliminating the need for a dedicated team. Combining dbt and BQML allows you to create models and pipelines directly on the MDP platform, simplifying your workflow and avoiding infrastructure complexities. This tutorial will demonstrate how the dbt_ml package facilitates model creation within dbt, making it feasible and straightforward.
Costwiz: Saving cost for LinkedIn enterprise on Azure | 9 min | Cloud | Deven Walia, Vivek Subramaniam, Simon Desowza, and Karthik Subramanian | Linkedin Engineering Blog
Cloud services revolutionized infrastructure management, but ease can lead to cost inefficiencies. Costwiz is a unified tool for budget tracking and resource optimization, reducing operating expenses. This blog post shares the team's journey, highlighting the progress, challenges and lessons learned with Costwiz.


How to use Multiple Databricks Workspaces with one dbt Cloud Project | 12 min | Cloud | Lucas Ortiz | Xebia Tech Blog
This guide'll cover setting up Databricks with a single dbt Cloud project, utilizing multiple Databricks Workspaces. The steps include creating and configuring two Databricks Workspaces (development and production), setting up an Azure Service Principal for dbt authentication and creating a single dbt Cloud project with multiple Environments. You can easily switch connections between different Databricks Workspaces by leveraging Environment Variables.
Use query queues | 7 min | SQL | Google Cloud Blog
Learn how BigQuery's dynamic concurrency automatically handles the number of concurrent queries, queuing them when the maximum target is reached. This guide outlines how to control the maximum concurrency and set queue timeouts for both interactive and batch queries.
Generative AI in Jupyter | 8 min | AI | Jason Weill | Jupyter Blog
Read about Jupyter AI, an official subproject of Project Jupyter. It seamlessly connects to large language models from leading providers, offering code generation, error fixing, content summarization and natural language interactions, all while prioritizing responsible AI and data privacy.


Announcing OverflowAI | 5 min | AI | Prashanth Chandrasekar | The Overflow Blog
OverflowAI marks a new era for Stack Overflow, integrating generative AI into Stack Overflow for Teams and introducing exciting product areas like IDE integration. The focus remains on the developer community, emphasizing trust, attribution, and recognition of the contributors' knowledge. Learn about the latest features and products unveiled at WeAreDevelopers, aligned with Stack Overflow's core values and community principles.
Announcing the MLflow AI Gateway | 6 min | AI | Arpit Jasapara, Ben Wilson, Corey Zumar, Harutaka Kawamura, Mingyu Li, Vladimir Kolovski and Zhe Wang | Databricks Blog
Databricks introduces the MLflow 2.5 AI Gateway preview. This scalable API gateway facilitates LLM management for experimentation and production by offering centralized credential and deployment management and standardized interfaces for chat, completions and integrations with various SaaS and open-source LLMs. With the AI Gateway:

  • Organizations can secure their LLMs from development through production
  • Data analysts can safely query LLMs with cost management guardrails
  • Data scientists can seamlessly experiment with a variety of cutting-edge LLMs to build high-quality applications
  • ML Engineers can reuse LLMs across multiple deployments


Designing future factories requires two key elements: a cloud-native approach and a focus on sustainability. AWS enables innovative manufacturers like Northvolt to optimize operations, enhance product quality and support a decarbonized future. They use digital assets, real-time monitoring and machine learning to improve battery production efficiently. Serverless and open-source frameworks help them accelerate innovation while minimizing resource usage.


The LLM Battle Begins: Google Bard vs ChatGPT | 25 min | LLM | Francesco Gadaleta | Data Science at Home Podcast
Brace yourselves as we uncover the mind-blowing AI model, Google Bard, poised to challenge ChatGPT and other conversational AI systems. Join us as we explore the revolutionary features of Bard, its cutting-edge architecture, and its ability to generate human-like responses. Discover why AI enthusiasts are buzzing with excitement.
BloombergGPT – an LLM for Finance with David Rosenberg | 36 min | LLM | Host: Sam Charrington; Guest: David Rosenberg | The TWIML AI Podcast
In a conversation with David, the focus is on BloombergGPT, a specialized LLM designed for financial applications. They delve into its architecture, validation, benchmarks, and unique differentiators from other language models. Additionally, David covers evaluation, performance comparisons, progress, future directions and the ethical considerations taken into account during the model's development.


Azure Databricks Lakehouse Labs | Virtual Workshop | 9th August
Join Databricks and Microsoft to learn how to leverage best practices for implementing a complete data analytics, data engineering and data science lifecycle on the lakehouse with Azure Databricks.

This live, virtual hands-on lab will teach you how to:
  • Access all your data — structured, semi-structured, unstructured — with a lakehouse
  • Use Databricks SQL to query and visualize data in your lakehouse
  • Train models and create predictions with Azure Databricks
  • Track experiments and tune hyperparameters with MLflow
  • Deploy and serve models with MLflow and other Azure services
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on