Glossary Entry

Reasoning Model

A language model trained to produce an explicit intermediate chain of thought before its final answer, trading extra inference compute for accuracy on multi-step problems.

LLMs Training RL

Also called: reasoning models, reasoning LLM, reasoning LLMs, thinking model

A reasoning model is a large language model specialised for complex, multi-step problems such as competition mathematics, coding, and logic puzzles. Instead of emitting an answer in a single pass, it first generates a long intermediate “thinking” trace and then commits to a final answer, which lets it allocate more computation to harder questions.

These models are usually produced by reinforcement learning against automatically checkable rewards, sometimes combined with supervised fine-tuning. The DeepSeek-R1 family popularised the recipe, showing that reasoning behaviour can emerge from reward signals alone rather than being hand-written into prompts.