Glossary Entry

Test-Time Compute

Computation spent during inference rather than training (for example, longer reasoning traces or sampling many candidate answers) to raise quality without changing the weights.

LLMs Optimization

Also called: inference-time scaling, test-time scaling, inference-time compute

Seed source: Snell et al. 2024

Test-time compute is the idea that you can improve a model’s answers by letting it do more work at inference, instead of making the model bigger. Generating a longer chain of thought, sampling many solutions and taking a majority vote, or searching over candidate steps all spend more compute per question in exchange for higher accuracy.

This is the second scaling axis behind reasoning models. Pre-training scaling grows the weights; test-time scaling grows the amount of thinking per query. The two are complementary, and reasoning models are partly a way of training a model to use test-time compute well.

Useful background: Snell et al. (2024) show that optimally scaling test-time compute can beat simply using a much larger model; this Hugging Face overview surveys the main techniques; and Zeng et al. (2025) study how to revisit and refine reasoning at inference.