Question 1

What is fine-tuning in AI?

Accepted Answer

Fine-tuning is the process of taking a pre-trained foundation model and continuing its training on a smaller, curated dataset of examples specific to your task or domain. The model's weights are updated to better match the desired behavior - output format, tone, domain vocabulary, or task-specific reasoning - without training from scratch. The result is a new model version that inherits the base model's general capabilities while excelling at your specific use case.

Question 2

When should you fine-tune an LLM instead of using RAG or prompt engineering?

Accepted Answer

Fine-tuning is the right choice when: (1) you need the model to consistently follow a specific output format or style that prompt engineering alone cannot enforce reliably, (2) you have hundreds or thousands of high-quality input-output examples, (3) you want to reduce prompt length and inference cost at scale, or (4) you need domain-specific reasoning patterns the base model lacks. RAG is better when the task is knowledge retrieval; prompt engineering is better when you're still exploring and iterating fast.

Question 3

How much does fine-tuning an LLM cost?

Accepted Answer

Costs vary widely by model and dataset size. Fine-tuning GPT-4o mini via OpenAI's API costs roughly $3 per million training tokens - a 10,000-example dataset of 500-token examples runs about $15. Fine-tuning a larger model like GPT-4o is significantly more expensive. Open-source fine-tuning (Llama 3 on your own GPU) can be done for $10–$200 on cloud GPU instances using LoRA, but requires ML engineering time. Expect to spend $50–$5,000 for a first fine-tune, depending on scale and model.

Question 4

What is LoRA and why does it matter for fine-tuning?

Accepted Answer

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that updates only a small set of added weight matrices rather than all of the model's parameters. This reduces GPU memory requirements by 10–50x and training time proportionally. LoRA makes it practical to fine-tune 7B–70B parameter open-source models on a single consumer GPU or a small cloud instance. It produces models nearly as capable as full fine-tunes at a fraction of the cost.

Question 5

Does fine-tuning eliminate hallucinations?

Accepted Answer

No - fine-tuning does not reliably eliminate hallucinations and can sometimes introduce new ones. Because knowledge is baked into model weights during training, the model may confidently generate plausible-sounding but incorrect facts, especially for information not well-represented in the fine-tuning data. For knowledge-intensive tasks where accuracy is critical, combine fine-tuning (for style and format) with RAG (for factual grounding).

Approach	Best Use Case	Data Required	Time to Deploy	Ongoing Update Cost
Prompt engineering	Format, tone, constraints	None	Minutes	Free
RAG	Knowledge retrieval, live data	Documents to index	Hours	Low (update the store)
Fine-tuning	Style, format, domain reasoning	100–10,000 examples	Days–weeks	High (retrain on new data)

Fine-Tuning

What Is Fine-Tuning?

How Fine-Tuning Works

Fine-Tuning vs. RAG vs. Prompt Engineering

When Fine-Tuning Makes Sense

LoRA: Making Fine-Tuning Accessible

Common Pitfalls

Key Takeaway

Frequently Asked Questions

Comments