Retrieval-Augmented Generation (RAG)

An AI architecture that combines a language model with a retrieval system, grounding responses in specific documents rather than relying solely on training knowledge.

What Is RAG?

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances a language model's responses by first retrieving relevant information from an external knowledge source — a document library, database, or knowledge base — and then using that retrieved content to generate a more accurate, grounded response.

Instead of relying entirely on knowledge encoded during training, a RAG system retrieves relevant context dynamically at inference time and provides it to the LLM as additional input.

How RAG Works

User query: A user asks a question
Retrieval: A search component (often semantic search using vector embeddings) finds the most relevant documents from a knowledge base
Augmentation: The retrieved documents are added to the LLM's context (the prompt)
Generation: The LLM generates a response grounded in the retrieved content rather than guessing

Why RAG Matters

Reduces hallucination: By grounding responses in retrieved documents, RAG significantly reduces the LLM's tendency to fabricate information.

Knowledge is updatable: Unlike fine-tuning, the knowledge base can be updated without retraining the model. Add a new policy document today; the AI answers questions about it tomorrow.

Auditability: Responses can be traced back to source documents, enabling verification and citations.

Domain specialisation: Organisations can build AI assistants grounded in their own internal knowledge — policies, contracts, support documentation, product manuals.

RAG Security Considerations

Document access control: The RAG knowledge base may contain sensitive documents. The retrieval system must enforce appropriate access controls — a user querying the AI should only retrieve documents they're authorised to access.

Indirect prompt injection via documents: Malicious instructions embedded in documents in the knowledge base can be retrieved and followed by the LLM, enabling indirect prompt injection attacks.

Data classification: Sensitive documents in the knowledge base must be classified and handled appropriately — an AI assistant shouldn't surface HR confidential records in response to general queries.

Document integrity: The knowledge base must be protected from unauthorised modification — an attacker who can inject documents could manipulate AI responses.

What Is RAG?

How RAG Works

Why RAG Matters

RAG Security Considerations

Related Terms