DATA Pill feed

DATA Pill #090 - understanding unstructured Data, taking RAG pipelines to a next level

ARTICLES

Part 1: Extract Structured Data from Unstructured Text using LLMs | 5 min | LLM | Ingrid Stevens | Personal Blog
This one focuses on extracting structured data from unstructured text.
Part 2: Analyze Structured Data (extracted from Unstructured Text) using LLM Agents | 3 min | LLM | Ingrid Stevens | Personal Blog
This part focuses on analyzing structured data extracted from unstructured text with a LangChain agent.
Building LLM Platform for Your Organisation – Step 2 Platforming | 7 min | LLM | Abiodun Ekundayo | MLOps Community Blog
Abiodun outlines several key factors crucial for successful enterprise-level deployment. These elements are essential for teams seeking to integrate Knowledge Assistants (KAs) effectively in enterprise environments.
Uber: GC Tuning for Improved Presto Reliability | 8 min | Software Engineering | Cristian Velazquez, Vineeth Karayil Sekharan | Uber Engineering Blog
Read about making Presto more reliable by tweaking how it handles the Garbage Collection. This change has made a big difference, reducing system crashes and making queries run smoother. Uber's team outlined what they did and what it meant for Presto's performance.
Building scalable data pipelines with Kedro and Ibis | 10 min | Data Engineering | Deepyaman Datta | Kedro Blog
The guide dives into using Kedro and Ibis for building data pipelines that easily adapt from development to production, combining Python and SQL for a seamless transition. It simplifies and streamlines the process of scaling data pipelines.

TUTORIAL

How to Fine-Tune LLMs in 2024 with Hugging Face | 12 min | LLM | Philipp Schmid | Personal Blog
This blog post walks you thorugh how to fine-tune open LLMs using Hugging Face TRL, Transformers& datasets in 2024. In the blog, we are going to:
  1. Define our use case
  2. Setup development environment
  3. Create and prepare the dataset
  4. Fine-tune LLM using trl and the SFTTrainer
  5. Test and evaluate the LLM
  6. Deploy the LLM for Production

TOOLS

Taking your RAG pipelines to a next level ! LangGraphs | 5 min | LLM | Ahmed Abdullah | Tensor Labs
How LangGraph enhances LangChain's capabilities, turning linear RAG pipelines into dynamic, iterative processes. This new addition allows for more sophisticated decision-making and problem-solving, akin to adding multidimensional thinking to AI applications.

DATA TUBE

The Secret to Getting Ahead with AI in 2024 | 29 min | AI | Harald Walden, Patrik Liu Tran | Brite Payments
Brite CTO Harald Walden interviews CEO and Co-Founder of Validio, Patrik Liu Tran, about the most impactful AI trends of 2024.

This webinar explores:

  • What generative AI requires to work effectively
  • How ensuring data quality is essential
  • Darkdata – finding a use for all the data you already have
  • How companies can best use AI in an optimal and resource-friendly way
  • Plus, much more…

PODCAST

Sam Partee on Retrieval Augmented Generation (RAG) | 32 min | RAG | Roland Meertens, Sam Partee | InfoQ Podcast
In this podcast, Sam shares his insights on Redis' vector database offering, different approaches to embeddings, how to enhance large language models by adding a search component for retrieval augmented generation, and the use of hybrid search in Redis.
Navigating the MLOps landscape: Insights from Valohai’s CEO | 30 min | MLOps | Simba Khadder, Eero Laaksonen | MLOps Weekly Podcast
Listen to a talk about how MLOps platforms work. It discusses the practical side of machine learning, looking at different types of tools and how big language models are changing the game in MLOps.

CONFS EVENTS AND MEETUPS

AI & LLMs for Engineering Operations | Online Event | 15th February
The webinar will cover LLM applications in natural language processing, machine translation, text summarization, sentiment analysis, and more.

Attending the webinar offers:

  1. Deep insights into LLMs for data challenges in key sectors.
  2. Knowledge of enhancing data analytics and efficiency with LLMs.
  3. Tips for LLM integration in data systems.
  4. Examples of LLM success in field services.
  5. Understanding the benefits of developing your LLMs.
Certified Data Science with Python | Online Event or Amsterdam | Few dates
The Data Science with Python Foundation course covers training models with scikit-learn and best practices for transforming your data with pandas, with a perfect combination of theory and practice.

Key takeaways:
  • Data Wrangling with Pandas
  • Machine Learning with Scikit-Learn
  • Machine Learning Theory
________________________
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on
Tilda