RAG Explained: How to Give Your LLM a Brain Upgrade
Abstract AlgorithmsTL;DR
TLDR: RAG (Retrieval-Augmented Generation) stops LLMs from making stuff up. It works by first searching a private database for facts (Retrieval) and then pasting those facts into the prompt for the LLM to use (Augmented Generation). It's like giving ...

TLDR: RAG (Retrieval-Augmented Generation) stops LLMs from making stuff up. It works by first searching a private database for facts (Retrieval) and then pasting those facts into the prompt for the LLM to use (Augmented Generation). It's like giving your AI an open-book exam.
1. The Problem: Why LLMs Need Help
Standard LLMs have two major flaws:
- Hallucinations: They confidently invent facts because they are just predicting the next likely word, not querying a knowledge base.
- Knowledge Cutoff: Their knowledge is frozen at the time of training. They don't know about recent events or your company's private data.
Example (Without RAG):
You: "What were our company's Q3 earnings?" LLM: "As a large language model, I don't have access to your private financial data..." (Useless).
2. The Solution: Retrieval-Augmented Generation (RAG)
The "No-Jargon" Explanation: Imagine an Open-Book Exam.
- Standard LLM: A student taking a closed-book exam, relying only on what they memorized months ago.
- RAG LLM: A student who can bring the textbook to the exam. Before answering a question, they look up the relevant page.
RAG connects the LLM to a live, external knowledge source.
The RAG Workflow
It's a two-step process: Retrieval, then Generation.
Step 1: Retrieval (Find the Textbook Page)
- User asks a question: "How do I reset my password?"
- Embed the question: Turn the question into a vector (a list of numbers) that represents its meaning.
- Vector Search: Search your private database (e.g., company wiki, PDFs) to find text chunks with similar vectors.
- Retrieve: Pull the top 3-5 most relevant chunks.
Step 2: Augmentation & Generation (Answer the Question)
Build a new prompt: Combine the original question with the facts you just found.
Context: "To reset your password, go to Settings > Security > Reset Password." Question: "How do I reset my password?" Answer based only on the context above:- Send to LLM: The LLM uses the provided context to generate a factual, grounded answer.
3. Deep Dive: How Vector Search Works
How do we "search for meaning"? We use Vector Embeddings and Cosine Similarity.
The Concept
- Indexing: We use an embedding model (like
text-embedding-ada-002) to convert every paragraph of our documents into a high-dimensional vector. - Storage: We store these vectors in a specialized Vector Database (e.g., Pinecone, Chroma, Weaviate).
- Querying: When a user asks a question, we embed their query into the same vector space.
- Similarity Search: We find the vectors in the database that are "closest" to the query vector.
The Math: Cosine Similarity
Instead of measuring distance, we measure the angle between vectors. A smaller angle means a more similar meaning.
$$ \text{Cosine Similarity} = \frac{A \cdot B}{\|A\| \|B\|} $$
- Result = 1: Vectors point in the same direction (Identical meaning).
- Result = 0: Vectors are perpendicular (Unrelated).
- Result = -1: Vectors point in opposite directions (Opposite meaning).
Toy Example:
| Text | Vector (Simplified 2D) |
| "How to reset password" (Query) | [0.9, 0.1] |
| "Password reset guide" (Doc A) | [0.8, 0.2] |
| "Company holiday schedule" (Doc B) | [-0.1, 0.9] |
The angle between the Query and Doc A is very small (high similarity). The angle between the Query and Doc B is large (low similarity). The system retrieves Doc A.
4. RAG vs. Fine-Tuning: What's the Difference?
| Feature | RAG | Fine-Tuning |
| Goal | Injecting knowledge | Teaching a skill/style |
| How | Adds a database | Updates model weights |
| Data | Easy to add/delete | Requires retraining |
| Cost | Cheap (API calls) | Expensive (GPU time) |
| Use Case | "Answer questions from this PDF" | "Act like a sarcastic pirate" |
Summary & Key Takeaways
- RAG = Retrieval (Search) + Generation (Answer).
- It solves Hallucinations and Knowledge Cutoff by grounding the LLM in facts.
- The core technology is Vector Search (using Embeddings and Cosine Similarity).
- Use RAG to inject knowledge. Use Fine-Tuning to change behavior.
Practice Quiz: Test Your Knowledge
Scenario: You want to build a chatbot that can answer questions about your company's internal, private documents. Which is the best approach?
- A) Hope the LLM already knows the data.
- B) Use RAG to connect the LLM to a vector database of your documents.
- C) Ask the user to copy-paste the documents into the chat.
Scenario: What is the primary mathematical operation used in the "Retrieval" step of RAG to find relevant documents?
- A) Matrix Multiplication
- B) Cosine Similarity
- C) Standard Deviation
Scenario: You want to make an LLM write in the style of Shakespeare. What is the best approach?
- A) RAG with a database of Shakespeare's plays.
- B) Fine-Tuning the model on a dataset of Shakespeare's works.
- C) Prompting "Please act like Shakespeare."
(Answers: 1-B, 2-B, 3-B)

Written by
Abstract Algorithms
@abstractalgorithms
