RAG (Retrieval-Augmented Generation)
An AI architecture that combines a language model with external knowledge retrieval to ground its responses in specific, verifiable sources. Instead of relying solely on what the model memorized during training, RAG retrieves relevant documents at query time and generates answers based on that retrieved context.
Why It Matters
RAG is the most practical solution to AI hallucination for enterprise deployments. By grounding responses in your organization's actual documents, policies, and data, RAG dramatically reduces fabricated outputs and makes responses auditable.
Example
An internal HR chatbot using RAG retrieves the relevant section of the employee handbook before answering a question about parental leave policy, citing the specific policy document and page number in its response rather than generating an answer from its general training.
Think of it like...
RAG turns an AI from a student taking an exam from memory into a student taking an open-book exam — they still need to understand the material to give a good answer, but they can reference the actual sources rather than relying on potentially faulty recall.
Related Terms
Hallucination (AI)
When a generative AI model produces outputs that are factually incorrect, fabricated, or inconsistent with reality, while presenting them with apparent confidence. Hallucinations are an inherent property of how language models generate text — they produce statistically plausible sequences, not verified facts.
Large Language Model (LLM)
A type of foundation model trained on massive text datasets that can understand, generate, and reason about human language. LLMs like GPT-4, Claude, and Gemini use transformer architecture and typically have billions of parameters, enabling capabilities from summarization to coding to complex reasoning.
Grounding
The practice of connecting AI model outputs to verifiable sources of information, ensuring responses are based on factual data rather than the model's potentially unreliable internal knowledge.