Foundation Model
A foundation model is a large AI model trained on broad data at scale, designed to be adapted to many downstream tasks rather than one specific use case.
What Is a Foundation Model?
A foundation model is a large AI model trained on broad, diverse data at massive scale, designed not for a single narrow task but as a versatile base that can be adapted to many downstream applications. The term was coined by researchers at Stanford’s Center for Research on Foundation Models (CRFM) in a 2021 paper that described the emerging paradigm: instead of training a separate model for each task (translation, summarization, classification, code generation), one large model trained broadly can do all of them, often with just a prompt or light fine-tuning. GPT-4, Claude 3, Gemini 1.5, and Meta’s Llama 3 are the most prominent examples. These models typically contain hundreds of billions to over a trillion parameters and cost tens to hundreds of millions of dollars to train.
Why “Foundation”?
The Stanford researchers chose the word deliberately. Foundation models are:
- Broad: Trained on internet-scale text, code, images, audio, or combinations - not a narrow domain
- Transferable: Useful for tasks not explicitly present in training data
- Emergent: Capabilities appear that weren’t directly trained for (translation, arithmetic, analogy reasoning) as scale increases
- Adaptable: Can be steered toward specific behaviors via prompting, fine-tuning, or additional training
The analogy to a building foundation is apt: the same foundation supports many different structures built on top of it, and its quality determines the ceiling for everything above.
Foundation Models vs. LLMs
| Type | Modality | Examples |
|---|---|---|
| LLM | Text only | GPT-4, Claude 3, Llama 3, Mistral |
| Multimodal | Text + image (+ audio) | GPT-4o, Gemini 1.5 Pro, Claude 3 Opus |
| Image generation | Text → image | DALL-E 3, Stable Diffusion XL, Midjourney |
| Audio | Speech → text, text → speech | Whisper, ElevenLabs, Voicebox |
| Code | Text + code | GitHub Copilot (GPT-4), StarCoder2, DeepSeek-Coder |
All LLMs are foundation models, but foundation models extend beyond text.
Proprietary vs. Open-Source: Startup Tradeoffs
| Dimension | Proprietary (GPT-4, Claude) | Open-Source (Llama 3, Mistral) |
|---|---|---|
| Capability | Frontier quality | Approaching frontier at 70B+ |
| Cost | Per-token API pricing | Infrastructure + engineering |
| Data privacy | Data sent to provider | Runs on your infrastructure |
| Customizability | Fine-tuning via API | Full weight access, unconstrained |
| Time to first call | Minutes | Hours–days (infrastructure setup) |
| Break-even volume | ~1M–10M requests/day | Varies by model size and GPU cost |
For most early-stage startups, proprietary APIs are the right starting point. The engineering time to self-host, monitor, and scale an open-source model typically costs more than the API fees until you reach significant scale. The exception is data sensitivity: if your product handles highly confidential data (health records, legal documents, financial data) that legally or contractually cannot leave your infrastructure, open-source self-hosting is often necessary from day one.
The Scaling Laws Foundation
Foundation models work because of scaling laws - empirical relationships discovered by researchers at OpenAI in 2020 (the “Chinchilla scaling laws” were refined by DeepMind in 2022) showing that model performance improves predictably as you increase model parameters, training data, and compute. This predictability is what justified the massive investments in training frontier models: GPT-4’s training run is estimated to have cost over $100 million. The existence of these laws means the organizations with the most compute and data can build the most capable foundations - creating significant structural advantages for a small number of labs (OpenAI, Anthropic, Google DeepMind, Meta AI).
Key Takeaway
Foundation models are the infrastructure layer of modern AI - the equivalent of cloud computing platforms in 2010. Startups don’t build their own AWS; they build on top of it. Similarly, almost no startup should build a foundation model from scratch. The strategic question is which foundation model to build on, how to adapt it to your use case (prompting, RAG, fine-tuning), and whether to use a proprietary API or self-host open-source weights. The choice matters for cost, data privacy, and long-term vendor dependency - not for whether you can ship a competitive AI product.
Frequently Asked Questions
What is a foundation model?
Is every LLM a foundation model?
What is the difference between proprietary and open-source foundation models?
Which foundation model should a startup use?
What does it mean to 'adapt' a foundation model?
Create an account to track your progress across all lessons.
Comments
Loading comments...