Search-Based Decoding for Language Models

Search-based decoding uses additional inference-time computation to choose or discover better continuations from an autoregressive language model. Instead of sampling a single token at each step, decoding can be treated as a search problem over partial sequences.

An autoregressive language model defines a policy over next-token actions:

and therefore a path probability for a complete sequence:

This makes text generation look like tree search: each node is a partial sequence, and each edge appends one token or higher-level action. The right search algorithm depends on what the decoding objective is.

Intuition

If the only goal is to find high-likelihood text under the model, ordinary decoding methods are usually enough. Search becomes more interesting when there is an external verifier, reward model, tool result, or environment feedback that is not captured by next-token probability alone.

For a prompt , a partial completion can be viewed as a search node. The children of that node are possible next tokens, or in some systems, larger units such as thoughts, tool calls, proof steps, or program fragments.

There are several possible objectives:

  • Likelihood maximization: find a sequence with high probability under the language model.
  • First-solution search: find any sequence that satisfies an external verifier.
  • Expected-utility maximization: choose actions that maximize downstream reward.
  • Diversity generation: produce multiple distinct plausible completions.
  • Interactive planning: choose actions whose consequences affect future observations.

These objectives are not equivalent. A high-likelihood answer may fail a verifier, and a low-probability action may be valuable because it enables a later reward.

Beam search keeps the best partial sequences at each depth, usually scored by accumulated log-probability:

Beam search is a good match when the decoding objective is close to model likelihood. It is also hardware-friendly: all hypotheses in the beam usually have the same depth, so their next-token distributions can be computed in batched forward passes.

Beam search is less natural when the important signal is sparse, delayed, or external. If the model assigns low probability to a correct but unusual solution, beam search may prune it early and never recover it.

Use Beam Search When

The goal is to find a high-probability completion under the model, and there is no strong external verifier or reward signal that should override model likelihood.

Levin Tree Search (LTS) treats the language model as a policy and prioritizes nodes by Levin cost:

where is the depth of the node and is the probability of the path from the root to .

For language models, the path probability is already available from the token probabilities. This makes LTS a natural fit for verifier-driven generation, where the goal is not to find the most likely string, but to find a valid solution with bounded search effort relative to the policy probability of that solution.

Examples include:

  • Code generation with unit tests.
  • Theorem proving with a proof checker.
  • Program synthesis from input-output examples.
  • Constrained generation where validity can be checked.
  • Mathematical reasoning with a reliable final-answer verifier.

Compared with beam search, LTS does not require all candidates to stay at the same depth. Compared with MCTS, LTS does not need rollouts or value backups. It is therefore especially attractive when rewards are sparse and terminal, and the only reliable signal is whether a completed candidate passes a verifier.

Use Levin Tree Search When

The problem is best viewed as policy-guided first-solution search: the language model proposes likely paths, and an external verifier determines whether a path is a solution.

Monte Carlo Tree Search (MCTS) incrementally builds a search tree by sampling trajectories, estimating action values, and backing up rewards. In an LLM setting, the language model can provide a prior policy over actions, while rollouts, reward models, tool results, or verifiers provide value information.

MCTS is most appropriate when the decoding problem is closer to planning than enumeration. The important question is not merely which sequence is likely under the model, but which action has the best expected downstream outcome after further search.

Examples include:

  • Tool-use tasks where actions change the environment state.
  • Agentic planning with intermediate observations.
  • Code or proof search with informative partial rewards.
  • Tasks using a learned value model to evaluate partial trajectories.
  • Interactive environments where exploration can reveal useful information.

MCTS is less compelling when the reward is only a terminal pass/fail verifier and there is no useful value estimate for partial sequences. In that case, its value estimates can be noisy, and policy-guided first-solution methods such as LTS may use the language-model probabilities more directly.

Use MCTS When

The problem is best viewed as value-guided planning: partial actions can be evaluated by rewards, rollouts, learned value estimates, or environment feedback.

Comparing Search Methods

MethodMain scoreBest fitMain weakness
Beam SearchSequence likelihood or normalized log-likelihoodHigh-likelihood decodingPrunes low-probability paths early, even if they would later pass a verifier
Levin Tree SearchSearch effort proxy First-solution search with an external verifierStrongly depends on the model assigning enough probability to a valid solution
MCTSEmpirical action value plus explorationPlanning with rewards, rollouts, or value estimatesCan be expensive and noisy when rewards are sparse or only terminal

The distinction can be summarized by what each method trusts:

  • Beam search trusts the model likelihood as the objective.
  • Levin tree search trusts the model likelihood as a search prior for finding a verified solution.
  • MCTS treats the model as a prior or proposal mechanism, but relies on sampled returns or value estimates to decide where search should go.

Rule of Thumb

Use beam search when likelihood is the objective. Use Levin tree search when a verifier defines success. Use MCTS when rewards or value estimates can guide planning before a complete answer is known.

Inference-Time Reasoning Methods

Tree of Thought (ToT)

Future Extensions

Several inference-time reasoning methods can be understood as changing the search space, the scoring rule, or the feedback signal. This note can be extended with:

  • Tree of Thought.
  • Graph of Thought.
  • Best-of- sampling.
  • Self-consistency.
  • Verifier-guided decoding.
  • Process reward models.
  • Tool-use and agent search.