Beginner AI

Foundation Model

A foundation model is a large AI model trained on broad data at scale, designed to be adapted to many downstream tasks rather than one specific use case.

Published March 17, 2026

What Is a Foundation Model?

A foundation model is a large AI model trained on broad, diverse data at massive scale, designed not for a single narrow task but as a versatile base that can be adapted to many downstream applications. The term was coined by researchers at Stanford’s Center for Research on Foundation Models (CRFM) in a 2021 paper that described the emerging paradigm: instead of training a separate model for each task (translation, summarization, classification, code generation), one large model trained broadly can do all of them, often with just a prompt or light fine-tuning. GPT-4, Claude 3, Gemini 1.5, and Meta’s Llama 3 are the most prominent examples. These models typically contain hundreds of billions to over a trillion parameters and cost tens to hundreds of millions of dollars to train.

Why “Foundation”?

The Stanford researchers chose the word deliberately. Foundation models are:

  • Broad: Trained on internet-scale text, code, images, audio, or combinations - not a narrow domain
  • Transferable: Useful for tasks not explicitly present in training data
  • Emergent: Capabilities appear that weren’t directly trained for (translation, arithmetic, analogy reasoning) as scale increases
  • Adaptable: Can be steered toward specific behaviors via prompting, fine-tuning, or additional training

The analogy to a building foundation is apt: the same foundation supports many different structures built on top of it, and its quality determines the ceiling for everything above.

Foundation Models vs. LLMs

TypeModalityExamples
LLMText onlyGPT-4, Claude 3, Llama 3, Mistral
MultimodalText + image (+ audio)GPT-4o, Gemini 1.5 Pro, Claude 3 Opus
Image generationText → imageDALL-E 3, Stable Diffusion XL, Midjourney
AudioSpeech → text, text → speechWhisper, ElevenLabs, Voicebox
CodeText + codeGitHub Copilot (GPT-4), StarCoder2, DeepSeek-Coder

All LLMs are foundation models, but foundation models extend beyond text.

Proprietary vs. Open-Source: Startup Tradeoffs

DimensionProprietary (GPT-4, Claude)Open-Source (Llama 3, Mistral)
CapabilityFrontier qualityApproaching frontier at 70B+
CostPer-token API pricingInfrastructure + engineering
Data privacyData sent to providerRuns on your infrastructure
CustomizabilityFine-tuning via APIFull weight access, unconstrained
Time to first callMinutesHours–days (infrastructure setup)
Break-even volume~1M–10M requests/dayVaries by model size and GPU cost

For most early-stage startups, proprietary APIs are the right starting point. The engineering time to self-host, monitor, and scale an open-source model typically costs more than the API fees until you reach significant scale. The exception is data sensitivity: if your product handles highly confidential data (health records, legal documents, financial data) that legally or contractually cannot leave your infrastructure, open-source self-hosting is often necessary from day one.

The Scaling Laws Foundation

Foundation models work because of scaling laws - empirical relationships discovered by researchers at OpenAI in 2020 (the “Chinchilla scaling laws” were refined by DeepMind in 2022) showing that model performance improves predictably as you increase model parameters, training data, and compute. This predictability is what justified the massive investments in training frontier models: GPT-4’s training run is estimated to have cost over $100 million. The existence of these laws means the organizations with the most compute and data can build the most capable foundations - creating significant structural advantages for a small number of labs (OpenAI, Anthropic, Google DeepMind, Meta AI).

Key Takeaway

Foundation models are the infrastructure layer of modern AI - the equivalent of cloud computing platforms in 2010. Startups don’t build their own AWS; they build on top of it. Similarly, almost no startup should build a foundation model from scratch. The strategic question is which foundation model to build on, how to adapt it to your use case (prompting, RAG, fine-tuning), and whether to use a proprietary API or self-host open-source weights. The choice matters for cost, data privacy, and long-term vendor dependency - not for whether you can ship a competitive AI product.

Frequently Asked Questions

What is a foundation model?
A foundation model is a large AI model trained on broad, diverse data at massive scale - billions to trillions of data points - with the intent that it can be adapted to many different downstream tasks. The term was coined by Stanford researchers in 2021 to capture the idea that these models serve as a 'foundation' upon which many applications are built, either directly via API or after fine-tuning for a specific use case.
Is every LLM a foundation model?
Most modern LLMs are foundation models, but not all foundation models are LLMs. Foundation models also include large vision models (like DALL-E, Stable Diffusion), multimodal models (GPT-4o, Gemini), and audio models (Whisper). The term 'foundation model' is the broader category; 'LLM' refers specifically to the text-based subset.
What is the difference between proprietary and open-source foundation models?
Proprietary models (GPT-4, Claude, Gemini) are accessed via API, with the model weights kept private by the provider. Open-source models (Llama 3, Mistral, Falcon) release their weights publicly, allowing anyone to download, fine-tune, and self-host them. Proprietary models are typically more capable at the frontier but cost per token; open-source models are cheaper at scale, auditable, and run on your own infrastructure without sending data to third parties.
Which foundation model should a startup use?
Start with a proprietary API model (GPT-4o, Claude 3.5 Sonnet) for prototyping - they require no infrastructure and are fastest to iterate on. Once you have production traffic and understand your cost and performance requirements, evaluate whether an open-source model (Llama 3, Mistral) self-hosted on cloud GPUs makes economic sense. For most startups processing under 1 million requests per day, the API model remains cheaper when engineering and infrastructure costs are factored in.
What does it mean to 'adapt' a foundation model?
Adapting a foundation model means tailoring its behavior for a specific use case without training it from scratch. The three main adaptation techniques are: prompt engineering (no code changes, just better instructions), RAG (grounding responses in retrieved documents), and fine-tuning (updating a subset of the model's weights on task-specific examples). All three build on top of the foundation model rather than replacing it.

Share with your team

Create an account to track your progress across all lessons.

Comments

Log in to join the conversation.

Loading comments...