Policy-Guided Heuristic Search (PHS) is a best-first search policy tree search method that augments the Levin tree search cost function with an explicit heuristic factor.
Definition
The cumulative expansion cost along the path from the root to , including the expansion of itself, is given by
where is the expansion cost of node and are all ancestors of (including itself).
Definition
Policy-Guided Heuristic Search is obtained from best-first search by using
where plays the same role as in the Levin cost with generalizing to non-unit expansion losses, and uses heuristic information to rescale this quantity toward the search effort required to reach a descendant solution.
Analysis
While the LTS cost function is monotone, need not be, due to the heuristic factor . To help provide a bound on the search, we need to define a few things.
Define the monotone non-decreasing envelope
To write this in the same multiplicative form, define
with when , such that
We also define as the set of nodes of value at most , and as the set of leaf nodes in .
Theorem
The number of node expansions PHS requires before reaching a solution is bounded by
Practical Heuristics
In policy tree search, search effort depends not only on how much path cost remains, but also how much probability mass must be preserved along the remaining trajectory. The LTS cost function captures the search effort required to reach the given node, but does not account for the remaining effort required to reach a descending goal node.
Many problems in single-agent deterministic search already have a readily available heuristic function. For example, if is a heuristic for A*, then the heuristic factor can be defined as
which results in the evaluation function
The heuristic factor extends the LTS cost to by incorporating an estimate of the remaining path cost. However, this construction inherits only the additive part of the future solution cost. If is least-cost descendant solution of , and was an exact heuristic in the usual A* sense, then , and therefore . Thus, even with an exact heuristic for the remaining cost, still misses the ratio
which accounts for the additional decay in probability along the suffix from to .
Accurately Predicting the Solution Cost (PHS*)
The motivation for is precisely to compensate for this missing probability term by extrapolating the future policy decay from the prefix observed so far. The difficulty is to estimate . One can view
as an average conditional probability per unit loss along the prefix from the root to . If this average is assumed to remain roughly stable along the remaining suffix, then the probability of a descendant solution with total loss can be approximated by
This produces a more aggressive PHS* evaluation function.
Implementation
Numerical Stability
The path probabilities may decrease exponentially with depth, which can lead to numerical underflow. Since applying a monotone transformation preserves the expansion order induced by the evaluation function, it is often preferable to compute PHS* costs in log-space:
Moreover, if is a child of , then the log-path probability can be updated incrementally:
Avoiding Zero Probabilities
To avoid zero probabilities being assigned to various paths, one can mix the policy with a uniform policy: