[ad_1]
Retrieval Augmented Generation (RAG) is a powerful NLP technique that combines large language models with selective access to knowledge. It allows us to reduce LLM hallucinations by providing the relevant pieces of the context from our documents. The idea of this article is to show how you can build your RAG system using locally running LLM, which techniques can be used to improve it, and finally — how to track the experiments and compare results in W&B.
We will cover the following key aspects:
- Building a baseline local RAG system using Mistral-7b and LlamaIndex.
- Evaluating its performance in terms of faithfulness and relevancy.
- Tracking experiments end-to-end using Weights & Biases (W&B).
- Implementing advanced RAG techniques, such as hierarchical nodes and re-ranking.
The complete notebook, including detailed comments and the full code, is available on GitHub.
First, install the LlamaIndex library. We’ll start by setting the environment and loading the documents for our experiments. LlamaIndex supports a variety of custom data loaders, allowing for flexible data integration.
# Loading the PDFReader from llama_index
from llama_index import VectorStoreIndex, download_loader# Initialise the custom loader
PDFReader = download_loader("PDFReader")
loader = PDFReader()
# Read the PDF file
documents = loader.load_data(file=Path("./Mixtral.pdf"))p
Now we can setup our LLM. Since I am using MacBook with M1 it’s extremely useful to use llama.cpp. It natively works with both Metal and Cuda and allows running LLMs with limited RAM. To install it you can refer to their official repo or try to run:
[ad_2]
Source link