RAG The Power of Retrieval-Augmented Generation ()
Retrieval-Augmented Generation (RAG) combines the strengths of LLMs and external knowledge bases. This allows for more accurate, factually grounded responses, reduces the risk of the LLM fabricating information (hallucinations), and enables the integration of domain-specific data.
However, standard RAG implementations have limitations. This tutorial introduces Agentic RAG, a method to overcome these drawbacks by using agents that reformulate queries and self-query to improve retrieval performance.
Vanilla RAG Limitations of Standard RAG
Standard RAG often performs only a single retrieval step, limiting its ability to adapt if the initial retrieval results are suboptimal.
Semantic similarity calculations using the user's original query can be less effective. For example, a user's question may not directly match the wording in the knowledge base, leading to missed relevant information.
Agentic Introducing RAG
Agentic RAG addresses these limitations by introducing an agent equipped with a retriever tool.
The agent formulates its own queries to retrieve information, potentially matching documents more closely than the initial user query. The agent can also analyze retrieved snippets and re-retrieve if necessary.
Building Your Agentic RAG System
This section outlines the steps to build an Agentic RAG system. You will need to load a knowledge base (e.g., documentation pages stored as Markdown files) and process the data for storage in a vector database. This example uses LangChain and thenlper/gte-small for embeddings.
The core of the Agentic RAG system is the agent, which is initialized with tools (like a retriever) and an LLM (e.g., CohereForAI/c4ai-command-r-plus).
“Agentic RAG significantly enhances RAG performance with a simple setup, leading to more accurate and reliable information retrieval.
Aymeric Roucher
Key Takeaways
Explore the Benefits of Agentic RAG
Query Reformulation
Agentic RAG intelligently reformulates queries for improved retrieval accuracy.
Self-Querying
The agent analyzes and re-queries based on initial results.
Enhanced Accuracy
Agentic RAG leads to better factual grounding and reduces hallucinations.
Agent Setup & LLM Choice
The agent is initialized with tools and an LLM. A suitable LLM needs to handle a list of messages and return text, and accept a 'stop' argument. CohereForAI/c4ai-command-r-plus is a good choice due to its 128k context window and availability on HF's Inference API.
The agent uses a default system prompt to process the information step-by-step, including tool calls in JSON format. It calls the LLM, parses tool calls, and executes them in a loop.
Comparison Agentic RAG vs. Standard RAG
The Agentic setup is compared to a standard RAG system using the LLM Judge framework.
Meta-Llama-3-70B-Instruct is used for evaluation.
Evaluation and Results
The evaluation uses principles from the LLM Judge, with a small Likert scale, clear criteria, and descriptions for each score.
Agentic RAG improves scores by 8.5% compared to standard RAG (from 70.0% to 78.5%).