Question 1

What is RAG (Retrieval-Augmented Generation)?

Accepted Answer

RAG is an AI architecture that enhances LLM outputs by retrieving relevant external documents at query time and including them in the model's context. Instead of relying solely on the model's training data, RAG grounds answers in a specific, up-to-date knowledge base - reducing hallucination and enabling AI products to answer questions about proprietary or recent information.

Question 2

How does RAG work technically?

Accepted Answer

A RAG pipeline has two stages: retrieval and generation. During retrieval, the user's query is embedded into a vector and used to search a vector database for the most similar document chunks. During generation, those chunks are injected into the LLM's prompt context along with the question, so the model can answer based on real source material.

Question 3

When should a startup use RAG vs fine-tuning?

Accepted Answer

Use RAG when you need the model to answer questions based on a specific knowledge base (product docs, company data, customer records), when that knowledge changes frequently, or when you need citations. Use fine-tuning when you need to change the model's style, tone, or behavior consistently across all outputs - not just inject knowledge.

Question 4

What are the main challenges of building a RAG system?

Accepted Answer

The biggest challenges are chunking strategy (how you split documents affects retrieval quality), retrieval accuracy (finding the right chunks for ambiguous queries), context window limits (you can only inject so many chunks), and evaluation (measuring whether retrieved chunks actually helped the answer). Many RAG systems fail not at the LLM step but at the retrieval step.

Dimension	RAG	Fine-Tuning
Best for	Injecting knowledge	Changing model behavior/style
Data freshness	Easy to update	Requires retraining
Cost	Low setup, per-query retrieval	High upfront training cost
Hallucination	Reduced (grounded in sources)	Still possible
Citations	Easy to provide	Not inherent

Retrieval-Augmented Generation (RAG)

What Is Retrieval-Augmented Generation?

How RAG Works

RAG vs Fine-Tuning

Common RAG Failure Modes

Key Takeaway

Frequently Asked Questions

Comments