Search-Based Decoding for Language Models
Search-based decoding uses additional inference-time computation to choose or discover better continuations from an autoregressive language model. Instead of sampling a single token at each step, decoding can be treated as a search problem over partial sequences.
An autoregressive language model defines a policy over next-token actions:
and therefore a path probability for a complete sequence:
This makes text generation look like tree search: each node is a partial sequence, and each edge appends one token or higher-level action. The right search algorithm depends on what the decoding objective is.
Intuition
If the only goal is to find high-likelihood text under the model, ordinary decoding methods are usually enough. Search becomes more interesting when there is an external verifier, reward model, tool result, or environment feedback that is not captured by next-token probability alone.
Decoding as Tree Search
For a prompt
There are several possible objectives:
- Likelihood maximization: find a sequence with high probability under the language model.
- First-solution search: find any sequence that satisfies an external verifier.
- Expected-utility maximization: choose actions that maximize downstream reward.
- Diversity generation: produce multiple distinct plausible completions.
- Interactive planning: choose actions whose consequences affect future observations.
These objectives are not equivalent. A high-likelihood answer may fail a verifier, and a low-probability action may be valuable because it enables a later reward.
Beam Search
Beam search keeps the best
Beam search is a good match when the decoding objective is close to model likelihood. It is also hardware-friendly: all hypotheses in the beam usually have the same depth, so their next-token distributions can be computed in batched forward passes.
Beam search is less natural when the important signal is sparse, delayed, or external. If the model assigns low probability to a correct but unusual solution, beam search may prune it early and never recover it.
Use Beam Search When
The goal is to find a high-probability completion under the model, and there is no strong external verifier or reward signal that should override model likelihood.
Levin Tree Search
Levin Tree Search (LTS) treats the language model as a policy and prioritizes nodes by Levin cost:
where
For language models, the path probability is already available from the token probabilities. This makes LTS a natural fit for verifier-driven generation, where the goal is not to find the most likely string, but to find a valid solution with bounded search effort relative to the policy probability of that solution.
Examples include:
- Code generation with unit tests.
- Theorem proving with a proof checker.
- Program synthesis from input-output examples.
- Constrained generation where validity can be checked.
- Mathematical reasoning with a reliable final-answer verifier.
Compared with beam search, LTS does not require all candidates to stay at the same depth. Compared with MCTS, LTS does not need rollouts or value backups. It is therefore especially attractive when rewards are sparse and terminal, and the only reliable signal is whether a completed candidate passes a verifier.
Use Levin Tree Search When
The problem is best viewed as policy-guided first-solution search: the language model proposes likely paths, and an external verifier determines whether a path is a solution.
Monte Carlo Tree Search
Monte Carlo Tree Search (MCTS) incrementally builds a search tree by sampling trajectories, estimating action values, and backing up rewards. In an LLM setting, the language model can provide a prior policy over actions, while rollouts, reward models, tool results, or verifiers provide value information.
MCTS is most appropriate when the decoding problem is closer to planning than enumeration. The important question is not merely which sequence is likely under the model, but which action has the best expected downstream outcome after further search.
Examples include:
- Tool-use tasks where actions change the environment state.
- Agentic planning with intermediate observations.
- Code or proof search with informative partial rewards.
- Tasks using a learned value model to evaluate partial trajectories.
- Interactive environments where exploration can reveal useful information.
MCTS is less compelling when the reward is only a terminal pass/fail verifier and there is no useful value estimate for partial sequences. In that case, its value estimates can be noisy, and policy-guided first-solution methods such as LTS may use the language-model probabilities more directly.
Use MCTS When
The problem is best viewed as value-guided planning: partial actions can be evaluated by rewards, rollouts, learned value estimates, or environment feedback.
Comparing Search Methods
| Method | Main score | Best fit | Main weakness |
|---|---|---|---|
| Beam Search | Sequence likelihood or normalized log-likelihood | High-likelihood decoding | Prunes low-probability paths early, even if they would later pass a verifier |
| Levin Tree Search | Search effort proxy | First-solution search with an external verifier | Strongly depends on the model assigning enough probability to a valid solution |
| MCTS | Empirical action value plus exploration | Planning with rewards, rollouts, or value estimates | Can be expensive and noisy when rewards are sparse or only terminal |
The distinction can be summarized by what each method trusts:
- Beam search trusts the model likelihood as the objective.
- Levin tree search trusts the model likelihood as a search prior for finding a verified solution.
- MCTS treats the model as a prior or proposal mechanism, but relies on sampled returns or value estimates to decide where search should go.
Rule of Thumb
Use beam search when likelihood is the objective. Use Levin tree search when a verifier defines success. Use MCTS when rewards or value estimates can guide planning before a complete answer is known.
Inference-Time Reasoning Methods
Tree of Thought (ToT)
Future Extensions
Several inference-time reasoning methods can be understood as changing the search space, the scoring rule, or the feedback signal. This note can be extended with:
- Tree of Thought.
- Graph of Thought.
- Best-of-
sampling. - Self-consistency.
- Verifier-guided decoding.
- Process reward models.
- Tool-use and agent search.