Reasoning Model
A class of language models that allocate extra compute at inference time to think step by step before answering, trading speed for accuracy on complex problems.
A reasoning model is an LLM that spends extra compute at inference time to decompose problems into intermediate steps before producing a final answer. It thinks before it speaks, at measurable cost.
The tradeoff is real: reasoning tokens run up your bill and add latency. On a simple question, a reasoning model is the wrong tool — you're paying for computation you don't need. On multi-step problems where wrong answers are expensive — complex code generation, legal analysis, financial modeling, agentic planning — the accuracy improvement is often worth it.
The practical pattern is selective deployment. Use reasoning models where mistakes matter. Route everything else to faster, cheaper models. This is increasingly how agentic systems work: a reasoning model handles planning and verification while standard models execute the routine steps.
The category launched with OpenAI's o1 in late 2024 and expanded quickly to o3, Anthropic's Claude with extended thinking, and DeepSeek's R1. They approach the inference-time compute idea differently, but the core bet is the same: you can buy better answers by spending more at inference rather than only at training. Whether that's more efficient than simply training a bigger model is still an open question — and the answer probably depends on the task.
Pick the model for the job. Reasoning models are a specialized tool, not a universal upgrade.