What is RAG? Retrieval-Augmented Generation, explained in plain language and grounded in production work
The short answer
RAG – retrieval-augmented generation – is the standard pattern for getting an LLM to answer questions about your own documents without making things up. The system first retrieves the relevant passages from your knowledge base, then asks the LLM to write an answer using those passages as source material, with citations back to the originals. It is the architecture behind almost every production AI knowledge assistant we have shipped, including the GDV system serving 400+ insurance companies and a leading member network chatbot serving 1,000+ HumHub members.
How RAG works in three steps
-
Index – We chunk your documents (policies, manuals, member content, internal wikis), embed each chunk into a vector and store them in a vector database like Qdrant or pgvector.
-
Retrieve – When a user asks a question, the question is embedded the same way, and the database returns the most relevant chunks – usually the top 5 to 20.
-
Generate – Those chunks plus the original question are sent to an LLM like GPT-4o via Microsoft AI Foundry. The LLM is instructed to answer using only the retrieved passages and to cite them. If the passages do not contain the answer, the LLM is told to say so.
The third step is the part that prevents hallucination. The LLM is not "thinking" up an answer from its training data – it is reading the passages we just handed it.
When to use RAG
Use RAG when the answer lives in documents your team controls and you need answers grounded in those documents – not in whatever the LLM happened to absorb during training. Knowledge assistants over policy archives, internal wikis, product documentation, member portals and regulatory frameworks all fit.
Do not use RAG when the task is generative writing with no factual constraint (just use a plain LLM), or when the data is so structured that a normal database query would do the job better (an LLM is overkill for "list all members in Hamburg").
Do not use fine-tuning instead of RAG for fact retrieval. Fine-tuning bakes facts into model weights, which makes them hard to update, expensive to verify and impossible to cite. RAG keeps the source-of-truth in your database where you can update, audit and govern it.
Why N3XTCODER
We bring a decade of impact-tech experience and over 160 AI projects since 2019. Through our free AI for Impact course, more than 100,000 people have learned to use AI for the common good. Our default stack: n8n in Berlin, Qdrant in the EU, Azure OpenAI via Microsoft EU Sovereignty.
Talk through your AI project
Tell us what you are trying to ship. We will reply with a proposal and a date, usually within a working day.

Simon Stegemann
Co-Founder and CEO