Retrieval-Augmented Generation (RAG)
An AI architecture that combines a language model with a retrieval system, grounding responses in specific documents rather than relying solely on training knowledge.
What Is RAG?
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances a language model's responses by first retrieving relevant information from an external knowledge source — a document library, database, or knowledge base — and then using that retrieved content to generate a more accurate, grounded response.
Instead of relying entirely on knowledge encoded during training, a RAG system retrieves relevant context dynamically at inference time and provides it to the LLM as additional input.
How RAG Works
- User query: A user asks a question
- Retrieval: A search component (often semantic search using vector embeddings) finds the most relevant documents from a knowledge base
- Augmentation: The retrieved documents are added to the LLM's context (the prompt)
- Generation: The LLM generates a response grounded in the retrieved content rather than guessing
Why RAG Matters
Reduces hallucination: By grounding responses in retrieved documents, RAG significantly reduces the LLM's tendency to fabricate information.
Knowledge is updatable: Unlike fine-tuning, the knowledge base can be updated without retraining the model. Add a new policy document today; the AI answers questions about it tomorrow.
Auditability: Responses can be traced back to source documents, enabling verification and citations.
Domain specialisation: Organisations can build AI assistants grounded in their own internal knowledge — policies, contracts, support documentation, product manuals.
RAG Security Considerations
Document access control: The RAG knowledge base may contain sensitive documents. The retrieval system must enforce appropriate access controls — a user querying the AI should only retrieve documents they're authorised to access.
Indirect prompt injection via documents: Malicious instructions embedded in documents in the knowledge base can be retrieved and followed by the LLM, enabling indirect prompt injection attacks.
Data classification: Sensitive documents in the knowledge base must be classified and handled appropriately — an AI assistant shouldn't surface HR confidential records in response to general queries.
Document integrity: The knowledge base must be protected from unauthorised modification — an attacker who can inject documents could manipulate AI responses.