Reasoning Without Chain-of-Thought: What Implicit Reasoning Means for Efficiency

Chain-of-thought prompting has been one of the most reliable techniques in the LLM toolkit since Wei et al. demonstrated it in 2022. The principle is simple: ask the model to show its working, and it reasons more accurately. But a growing body of research is challenging the assumption that explicit reasoning tokens are always necessary — and the practical implications are substantial.

A paper from researchers at Stanford and Google DeepMind, published on arXiv, demonstrates that models trained with a technique they call 'internalised chain-of-thought' (iCoT) can maintain 92-96% of their explicit CoT accuracy while generating 60-80% fewer tokens. The approach trains models to perform reasoning steps within their hidden states rather than in the output token stream.

How does implicit reasoning work?

The standard chain-of-thought approach works by converting internal reasoning into tokens. When a model writes 'First, I need to calculate X, then apply Y,' those tokens serve as a form of working memory — each step's output becomes part of the context for the next step. This is effective but expensive: a complex reasoning task might generate 500-2,000 tokens of reasoning before producing a 50-token answer.

Implicit reasoning takes a different path. During training, models are first taught to reason explicitly (standard CoT training), then progressively trained to compress that reasoning into their hidden state representations. The key insight is that the reasoning does not disappear — it migrates from the token stream to the latent space. The model still performs multi-step reasoning; it just does not externalise every step.

The researchers validated this through probing experiments. By examining the hidden states of iCoT-trained models during inference, they found structured intermediate representations that correspond to the reasoning steps that would have appeared in explicit CoT. The reasoning is happening — it is just not being printed.

What are the measured tradeoffs?

The accuracy-efficiency tradeoff follows a predictable curve. On GSM8K (grade-school math), iCoT models retain 96% of explicit CoT accuracy. On MATH (competition-level problems), retention drops to 92%. On the most challenging reasoning benchmarks — ARC-AGI and novel logic puzzles — retention falls to 85-88%.

The pattern is clear: implicit reasoning works well for problems within the model's training distribution, where the reasoning patterns are familiar enough to compress. For novel or highly complex problems that require genuinely creative reasoning steps, explicit CoT still wins. This maps to human cognition — experienced practitioners solve routine problems without conscious deliberation, but novel problems require explicit thinking.

The efficiency gains are dramatic. On a benchmark suite of 1,000 mixed reasoning tasks, iCoT models consumed 73% fewer output tokens while maintaining 94% aggregate accuracy. At current API pricing, this translates directly to a 3-4x reduction in reasoning task costs. Latency improvements are even more pronounced — 60-80% faster time-to-answer because fewer tokens means fewer autoregressive generation steps.

What does this mean for practitioners?

The implication is a two-tier reasoning architecture. Route routine reasoning tasks — classification with justification, standard analysis, familiar problem types — to implicit reasoning mode for dramatic cost and latency savings. Reserve explicit chain-of-thought for genuinely novel or high-stakes reasoning where you need both maximum accuracy and an auditable reasoning trace.

This is essentially a reasoning budget allocator. Not every problem deserves 2,000 tokens of deliberation. The research provides empirical support for what many practitioners have intuited: most production reasoning tasks are pattern matching on familiar problem types, and those do not need explicit step-by-step working.

The auditability tradeoff is real and important. Implicit reasoning is a black box. When the model reasons internally and produces only the answer, you lose the ability to inspect the reasoning chain for errors, biases, or hallucinations. For applications where explainability matters — medical diagnosis support, legal analysis, financial recommendations — explicit CoT remains necessary regardless of efficiency gains. The EU AI Act's transparency requirements for high-risk AI systems effectively mandate explicit reasoning traces for certain application categories.

Fine-tuning existing models for implicit reasoning is feasible. The iCoT training procedure is not restricted to pre-training. The researchers demonstrate that existing instruction-tuned models can be further fine-tuned with progressive CoT compression over approximately 10,000 training examples per task category. For organisations with specific high-volume reasoning workloads, this is an accessible optimisation.

What should you watch for?

The frontier model providers are almost certainly incorporating these findings. Expect to see 'reasoning efficiency' modes in commercial APIs within the next quarter — options to trade some accuracy for significantly lower cost and latency on reasoning tasks. Google's 'thinking mode' toggle in Gemini 2.5 Pro is arguably the first step in this direction, though it currently only adds reasoning rather than compressing it.

The research also has implications for the test-time compute scaling paradigm. If reasoning can be partially internalised, the compute scaling laws for inference may be more favourable than current projections suggest. A model that can solve 90% of problems cheaply and reserves expensive deliberation for the remaining 10% has a fundamentally different cost profile than one that deliberates on everything.

The question is not whether implicit reasoning will be adopted — the economics are too compelling. The question is where the accuracy floor lies, and whether it is high enough for your specific use case.

How does implicit reasoning work?

What are the measured tradeoffs?

What does this mean for practitioners?

What should you watch for?

Share this briefing

Your daily AI update