DATA Pill #120 - Just use Postgres, How Pytorch Powers Training Inference

ARTICLES

Just use Postgres | 7 min | Database Management | Ethan McCue | Personal Blog

When building an application with persistent data storage, Postgres should be the default choice due to its flexibility and reliability. While alternatives like SQLite, NoSQL, Kafka, and ElasticSearch have specific uses, Postgres offers the best balance for most web applications.

How best to use Databricks, Fabric, and Snowflake? | 6 min | Data Engineering | Franco Patano | Personal Blog

Choosing the right tools can be overwhelming in the fast-evolving world of cloud data platforms. Databricks, Snowflake, and Microsoft Fabric each offer distinct advantages but also challenges. This blog explores navigating these options and building a robust, cost-effective data architecture.

TUTORIALS

This blog post explores the concepts and definitions behind streaming databases, compares them with familiar technologies, and focuses on one specific implementation of streaming databases: Materialize.

Run Large Language Models locally with Ollama and Open WebUI | 7 min | LLM | Timo Uelen | Xebia Blog

This tutorial explores how to run LLMs locally using Ollama and Open WebUI, bypassing the need for cloud-based solutions. It covers installation steps, how to interact with models like Gemma2, and highlights the features of Open WebUI for an enhanced LLM experience on your computer.

Implementing Model Versioning in dbt | 8 min | Data Governance | Andy Sawyer | Personal Blog

This article will delve into model versioning, provide a code-based example of its implementation in dbt, and explain how it helps align data engineering efforts with business objectives.

DATA TUBE

How Pytorch Powers Training Inference | 23 min | LLM | Wanchao Liang, Kimish Patel, and Evan Smothers | @Scale

Let’s dive into the significance of memory-efficient fine-tuning and share key architectural and algorithmic strategies that make fine-tuning possible on consumer-grade hardware. You'll also hear about the latest PyTorch advancements for LLMs and developments that improve every stage of the LLM lifecycle.

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation | 5 h 46 min | LLM | Umar Jamil

An in-depth coding session focused on building the PaliGemma Vision Language Model from scratch using Python and PyTorch, covering key concepts such as Transformers, Vision Transformers, and Contrastive Learning. The session also delves into advanced topics like Multi-Head Attention, Rotary Positional Embedding, and KV-Cache, with detailed visual representations to aid comprehension.

CONFS, EVENTS AND MEETUPS

Open Source Real-Time Data Warehouse & Real-Time Analytics | Zürich | 5th September

Data enthusiasts and ClickHouse fans! Exciting news—their next meetup is coming up in Zurich! Get ready for fascinating data stories, great conversations, and a few surprises!

Airflow Summit 2024 | San Francisco | 10th-12th September

Airflow Summit is the annual conference for the worldwide community of Apache Airflow users and contributors. Get ready to celebrate the 10th anniversary of Airflow!