Question 1

What is a token in AI and LLMs?

Accepted Answer

A token is the smallest unit of text that a large language model reads and generates. Tokens are not exactly words or characters - they are subword fragments produced by a tokenization algorithm. On average, one token equals roughly 3–4 characters or 0.75 words in English. The word 'startup' is one token; 'entrepreneurship' might be two or three. LLMs measure all inputs, outputs, context limits, and API pricing in tokens, not words or characters.

Question 2

How many tokens is 1,000 words?

Accepted Answer

Approximately 1,333 tokens for typical English text, based on the 0.75 words-per-token rule of thumb. A 10-page business document (~5,000 words) is roughly 6,600 tokens. Code and languages with long compound words (German, Finnish) tend to use more tokens per word; short-word languages (Mandarin with BPE tokenization) may use fewer. For precise counts, use OpenAI's free Tokenizer tool at platform.openai.com/tokenizer.

Question 3

How does token pricing work for LLM APIs?

Accepted Answer

LLM APIs charge separately for input tokens (the prompt you send) and output tokens (the response the model generates). Output tokens typically cost 3–5x more than input tokens because generation is more compute-intensive than reading. For example, GPT-4o charges ~$2.50 per million input tokens and ~$10 per million output tokens. A single query with a 1,000-token system prompt, 200-token user message, and 500-token response costs roughly $0.008 - about a cent.

Question 4

What is a context window in terms of tokens?

Accepted Answer

A context window is the maximum number of tokens an LLM can process in a single request - both input and output combined. GPT-4o has a 128,000-token context window; Gemini 1.5 Pro supports up to 1 million tokens. Exceeding the context window causes an error or forces truncation. In practice, the context window limits how much conversation history, document content, or instruction detail you can provide to the model at once.

Question 5

Why do different languages use different numbers of tokens?

Accepted Answer

LLM tokenizers are trained primarily on English text, so English is the most token-efficient language. Non-Latin scripts (Arabic, Chinese, Korean) and languages with long compound words often require more tokens per semantic unit. A 1,000-word English document might use 1,300 tokens; the same content in a non-English language could use 2,000–4,000 tokens. This has direct cost implications for multilingual AI products - a product serving non-English users at scale should budget 1.5–3x higher token costs than an English-only equivalent.

Text	Tokens	Count
”startup”	[“startup”]	1
”pre-money valuation”	[“pre”, “-money”, ” val”, “uation”]	4
”Hello, world!”	[“Hello”, ”,”, ” world”, ”!“]	4
”GPT-4o”	[“G”, “PT”, ”-”, “4”, “o”]	5
”😀“	[emoji byte fragments]	3

Model	Context window
GPT-4o	128,000 tokens (~96,000 words)
Claude 3.5 Sonnet	200,000 tokens (~150,000 words)
Gemini 1.5 Pro	1,000,000 tokens (~750,000 words)
Llama 3 70B	8,000 tokens (~6,000 words)
GPT-4o mini	128,000 tokens

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o	$2.50	$10.00
Claude 3.5 Sonnet	$3.00	$15.00
GPT-4o mini	$0.15	$0.60
Gemini 1.5 Flash	$0.075	$0.30

Token (AI)

What Is a Token (AI)?

How Tokenization Works

Tokens in Context: The Three Numbers That Matter

1. Context window

2. Input vs. output tokens

3. Throughput (tokens per second)

Practical Cost Calculations

Optimizing Token Usage

Key Takeaway

Frequently Asked Questions

Comments