Context Window
The maximum amount of text an LLM can process in a single interaction - inputs plus outputs combined.
What Is a Context Window?
A context window is the maximum amount of text - measured in tokens - that a large language model can “see” and process in a single interaction. It includes everything: the system prompt, conversation history, any documents you’ve fed in, and the model’s own response. Once you exceed this limit, the model cannot access earlier parts of the conversation.
Think of it as the model’s working memory. Anything outside the window is invisible to it, as if it never existed.
How Large Are Modern Context Windows?
Context windows have grown dramatically since the first GPT models:
| Model | Context Window |
|---|---|
| GPT-4o | 128,000 tokens (~96,000 words) |
| Claude 3.5 Sonnet | 200,000 tokens (~150,000 words) |
| Gemini 1.5 Pro | 1,000,000 tokens (~750,000 words) |
| Llama 3.1 70B | 128,000 tokens |
As a rough rule: 1 token ≈ 4 characters ≈ 0.75 words in English.
Why Context Windows Matter for Products
Long-document analysis: With a 200K token window, Claude can process an entire legal contract, financial report, or codebase in one call - no chunking required.
Conversation memory: Larger windows mean chatbots can maintain coherent conversations for longer without “forgetting” earlier context.
Cost implications: You pay per token - both input and output. A 100-page document stuffed into every API call gets expensive fast. Many startups use RAG (Retrieval-Augmented Generation) to selectively pull only the relevant chunks instead of sending the entire corpus every time.
Attention degradation: Research shows LLMs perform worse at retrieving information from the middle of very long contexts (“lost in the middle” problem). Critical information is better placed at the beginning or end of the context.
Context Window vs Long-Term Memory
The context window is not permanent memory. When a conversation ends, the context is discarded. For persistent memory across sessions, startups either store conversation summaries in a database and inject them at session start, or use external memory systems.
Key Takeaway
The context window defines the boundaries of what an AI model knows in any given interaction. Larger windows enable richer, more capable products - but they also increase cost and latency. Smart product design uses context efficiently: inject only what the model needs, use RAG for large knowledge bases, and never assume the model remembers anything from previous sessions.
Frequently Asked Questions
What is a context window in AI?
How large are current AI context windows?
Does a larger context window mean better AI performance?
What happens when you exceed the context window limit?
Create an account to track your progress across all lessons.
Comments
Loading comments...