A reasoning model is a large language model specialised for complex, multi-step problems such as competition mathematics, coding, and logic puzzles. Instead of emitting an answer in a single pass, it first generates a long intermediate “thinking” trace and then commits to a final answer, which lets it allocate more computation to harder questions.
These models are usually produced by reinforcement learning against automatically checkable rewards, sometimes combined with supervised fine-tuning. The DeepSeek-R1 family popularised the recipe, showing that reasoning behaviour can emerge from reward signals alone rather than being hand-written into prompts.
