What is chain-of-thought prompting?

Chain-of-thought prompting instructs the LLM to reason step-by-step before giving a final answer, rather than jumping directly to a conclusion. Adding a phrase like 'Think through this step by step' or showing an example with intermediate reasoning steps consistently improves accuracy on math, logic, and multi-step reasoning tasks. Google researchers demonstrated in 2022 that chain-of-thought prompting improved GPT-3's performance on grade-school math from 18% to 57%.

What is the difference between a system prompt and a user prompt?

A system prompt is a persistent set of instructions given to the LLM before the conversation begins - it defines the model's persona, rules, output format, and constraints. A user prompt is the individual message the end-user sends in a conversation. The system prompt is typically written by the developer and hidden from the user; it shapes how the model interprets and responds to every user message.

Can prompt engineering replace fine-tuning?

For many tasks, yes. Prompt engineering is faster, cheaper, and easier to iterate on than fine-tuning. It works well for defining output format, providing domain context, setting tone, and guiding reasoning. Fine-tuning becomes necessary when the desired behavior is very different from the model's defaults, when you have hundreds of examples of the ideal input-output pattern, or when you need to reduce prompt length significantly for cost reasons.

What are few-shot prompts?

Few-shot prompting means including 2–5 examples of the desired input-output pattern directly in the prompt, before the actual query. For example, showing the model three examples of 'customer complaint → professional reply' teaches it the expected tone and format far more reliably than describing it in words. Zero-shot prompting provides no examples; one-shot provides one; few-shot provides two or more.

Prompt Engineering - Startup Super School

Q: What is prompt engineering?

Prompt engineering is the practice of designing and refining the text inputs (prompts) given to a large language model to reliably produce the desired output. It includes writing system prompts, choosing the right instructions, providing examples, and structuring context - all without changing the model's weights. A well-engineered prompt can dramatically improve output quality at zero additional cost.

What Is Prompt Engineering?

Prompt engineering is the practice of crafting and optimizing the text inputs given to a large language model to reliably produce accurate, correctly formatted, and useful outputs. Because LLMs are highly sensitive to how instructions are phrased, a well-designed prompt can be the difference between a product that delights users and one that produces inconsistent, unreliable results. Prompt engineering requires no ML expertise or model training - it is a product and communication discipline applied to AI systems, making it accessible to anyone building on top of LLMs.

Core Techniques

System prompts

The system prompt is the developer-controlled instruction layer that runs before every user interaction. It defines the model’s role, constraints, output format, and any rules it must follow. A strong system prompt typically includes:

Role definition: “You are a senior financial analyst specializing in SaaS metrics.”
Task scope: What the model should and should not do
Output format: JSON, markdown, bullet points, specific field names
Constraints: “Never mention competitor products. Always cite sources.”
Fallback behavior: “If you don’t know the answer, say ‘I don’t have that information’ rather than guessing.”

Few-shot prompting

Providing 2–5 examples of the desired input-output pattern within the prompt. Examples outperform instructions for tasks that are hard to describe in words - complex classification, specific writing styles, or precise data extraction schemas.

Input: "Delivery was late and the box was damaged."
Output: {"sentiment": "negative", "issue_type": "logistics", "urgency": "high"}

Input: "Great product, arrived quickly!"
Output: {"sentiment": "positive", "issue_type": "none", "urgency": "low"}

Input: "{{customer_message}}"
Output:

Chain-of-thought (CoT)

Adding “Think step by step” or showing an example with intermediate reasoning steps improves accuracy on math, logic, and multi-step analysis tasks. A 2022 Google paper showed chain-of-thought prompting improved performance on grade-school math benchmarks from 18% to 57% for large models. CoT works because it forces the model to allocate more computation to reasoning before committing to an answer.

Structured output prompting

Requesting JSON or another structured format with a defined schema makes LLM output reliable enough to parse programmatically. Most frontier model APIs now support structured outputs or function calling - features that enforce a JSON schema at the sampling level, eliminating parsing failures entirely.

Prompt chaining

Breaking a complex task into sequential prompts where the output of one becomes the input of the next. A document summarization pipeline might use: (1) extract key facts → (2) identify the main argument → (3) write a 100-word summary using the facts and argument. Chaining improves reliability compared to asking a single prompt to do everything at once.

Why Prompt Engineering Matters for Startup Products

A startup’s system prompt is often a core IP asset - a carefully crafted set of instructions that determines product behavior. Changes to the system prompt can fix bugs, add features, and improve output quality without a code deploy. This means product managers and writers can directly improve AI behavior, not just engineers.

Practical impact:

Reducing hallucinations: Explicit grounding instructions (“Base your answer only on the context provided”) measurably reduce false information.
Output consistency: Format instructions ensure outputs are parseable downstream, reducing engineering overhead.
Cost reduction: Shorter, tighter prompts reduce token usage. Cutting 200 tokens from a system prompt called 1 million times per month saves roughly $500/month on GPT-4o.
Safety: Clear refusal instructions and topic restrictions reduce the surface area for misuse.

Common Pitfalls

Vague instructions: “Be helpful and concise” is harder for a model to follow than “Respond in 2–3 sentences. Never use bullet points.”

Prompt injection risk: If user-provided text is inserted directly into a prompt without sanitization, a malicious user can override your system instructions. Always separate system-controlled context from user input.

Overfitting to one model: Prompts that work perfectly on GPT-4o may produce different results on Claude or Gemini. Test prompts on any model you deploy to production.

Not versioning prompts: Treat system prompts like code - store them in version control, log which version produced each output, and A/B test changes before rolling them out.

Key Takeaway

Prompt engineering is the fastest, cheapest way to improve LLM-powered product quality. Before reaching for fine-tuning or a different model, invest in writing a precise system prompt, adding few-shot examples, and enabling chain-of-thought for complex tasks. Most LLM product quality problems are prompt problems, and the iteration cycle is measured in minutes rather than days.

Prompt Engineering