Delete Relaxation

Source slides: D1, D2, D3, D4, D5, D6

Delete relaxation is the first major example of a planning heuristic built by changing the planning task itself. The guiding question is simple: if the real task is too hard to solve during search, can we solve an easier task whose solution cost still tells us something useful about the real one?

For classical planning, one natural simplification is to ignore delete effects. If an action makes a fact true, the relaxed task keeps that achievement forever. This is unrealistic, but it creates a monotone planning problem: facts can accumulate, and previously achieved facts cannot be lost. From this relaxation we get several important heuristics, including $h^{\max}$ , $h^+$ , $h^{\mathrm{add}}$ , and $h^{\mathrm{FF}}$ .

The Relaxation Recipe

A heuristic estimates the remaining cost from a state $s$ to a goal. If $h^*(s)$ denotes the true optimal goal distance, then a heuristic $h(s)$ is useful when it is cheaper to compute than $h^*(s)$ and still correlated with it.

Many planning heuristics follow the same three-step recipe:

simplify the planning problem;
solve, or approximately solve, the simplified problem;
use the simplified solution cost as the heuristic value for the original problem.

A familiar analogy is straight-line distance in route planning. It ignores the constraint that a vehicle must follow roads. The resulting distance is easy to compute and never larger than the shortest road distance, so it is an admissible lower bound.

In planning, we usually want to know which formal properties a heuristic has.

Definition: Heuristic properties

Let $h : S \to \mathbb{N}_0 \cup \{\infty\}$ be a heuristic.

$h$ is goal-aware if $h(s)=0$ for every goal state $s$ .

$h$ is safe if $h(s)=\infty$ implies $h^*(s)=\infty$ .

$h$ is admissible if $h(s) \le h^*(s)$ for every state $s$ .

$h$ is consistent if $h(s) \le \mathit{cost}(o) + h(s')$ for every transition $s \xrightarrow{o} s'$ .

Goal-awareness says that goals are recognized as finished. Safety says that infinite estimates are not false dead-end reports. Admissibility is the lower-bound property needed by optimal search algorithms such as A*. Consistency is a stronger local lower-bound condition along transitions.

Removing Delete Effects

Assume the task is in positive normal form, so positive effects make variables true and negative effects make variables false. Positive effects are useful achievements. Negative effects are the source of interference: an action may help one subgoal while undoing another.

The delete relaxation removes that interference.

Definition: Delete-relaxed operator and task

Let $o$ be an operator in positive normal form. Its delete relaxation $o^+$ is obtained by replacing every negative atomic effect $\neg a$ in $\mathit{eff}(o)$ with $\top$ , meaning “do nothing”.

For a planning task $\Pi = \langle V,I,O,\gamma\rangle$ , the delete-relaxed task is
$\Pi^+ = \langle V,I,\{o^+ \mid o \in O\},\gamma\rangle.$

The initial state and goal are unchanged. Only the effects are simplified. If an action originally added $p$ and deleted $q$ , its relaxed version adds $p$ and leaves $q$ alone.

The crucial consequence is monotonicity. It is helpful to view a state as the set of variables that are true. In the relaxed task, applying an action can only enlarge this set.

Monotonicity lemma

For any state $s$ and relaxed operator $o^+$ , the successor $s\llbracket o^+\rrbracket$ dominates $s$ : every variable true in $s$ is also true in $s\llbracket o^+\rrbracket$ .

This has an important second consequence. If a relaxed plan works from a state $s$ , it also works from any state $s'$ that already contains at least the facts of $s$ .

Relaxation lemma

If $s'$ dominates $s$ , and $\pi^+$ is a relaxed plan from $s$ , then $\pi^+$ is also a relaxed plan from $s'$ .

The reason is that relaxed actions never depend on facts being false. Having extra true facts cannot invalidate a precondition or undo a future achievement.

A Small Example

Suppose a robot has to deliver two packages, and carrying one package prevents carrying the other in the real task. A real action pick-up-A might make $\mathit{hasA}$ true and $\mathit{handempty}$ false. A later drop-A might restore $\mathit{handempty}$ but remove $\mathit{hasA}$ .

In the delete relaxation, once $\mathit{hasA}$ becomes true, it stays true. The robot can also regain or keep facts such as $\mathit{handempty}$ without losing previous achievements. The relaxed task may therefore believe that package achievements combine freely even when the real task requires careful ordering.

This is not a bug in the heuristic. It is the intended approximation: delete relaxation measures how hard it is to make the required facts true when negative interactions are ignored.

Solving Relaxed Tasks Greedily

Because relaxed states only grow, a simple greedy algorithm is sound for finding some relaxed plan when one exists.

Start with $s := I$ and an empty relaxed plan $\pi^+ := \langle\rangle$ .
If $s \models \gamma$ , return $\pi^+$ .
Choose an applicable relaxed operator $o^+$ such that $s\llbracket o^+\rrbracket \ne s$ .
Apply it, append it to $\pi^+$ , and repeat.
If no such operator exists, report the relaxed task unsolvable.

Each successful iteration makes at least one new variable true, so there can be at most $|V|$ successful iterations. The algorithm is polynomial and sound. However, the particular relaxed plan it returns may be far from cheapest, because greedy choices can add unnecessary actions.

This distinction matters: delete-relaxed planning is easier than ordinary planning in some respects, but finding an optimal relaxed plan is still hard.

The Optimal Delete-Relaxation Heuristic

The most direct delete-relaxation heuristic is $h^+$ .

Definition: The $h^+$ heuristic

For a state $s$ , let $\Pi_s^+$ be the delete-relaxed task with initial state $s$ . Then $h^+(s)$ is the cost of an optimal plan for $\Pi_s^+$ , or $\infty$ if no relaxed plan exists.

Because every real plan is also a valid relaxed plan after deletes are ignored, the relaxed optimum cannot be more expensive than the real optimum. Therefore $h^+$ is admissible. It is also consistent.

The problem is computation. Optimal planning for delete-relaxed tasks is NP-hard. So $h^+$ is theoretically important but not usually computed exactly inside a planner. The practical heuristics in this chapter approximate it with polynomial graph computations.

Relaxed Task Graphs

The relaxed task graph, or RTG, is the shared data structure behind $h^{\max}$ , $h^{\mathrm{add}}$ , and $h^{\mathrm{FF}}$ . It represents the delete-relaxed task as an AND/OR graph.

In an AND/OR graph, an OR node is achieved when at least one predecessor is achieved. An AND node is achieved only when all predecessors are achieved. This distinction matches planning structure: a fact may be achieved in several alternative ways, but an action precondition with several required facts needs all of them.

For a relaxed task $\Pi_s^+$ , the RTG contains:

an initial node $n_I$ , representing the facts true in $s$ ;
variable nodes $n_v$ , one for each state variable that may need to become true;
formula nodes for compound preconditions, conditions, and the goal;
effect nodes for conditional effects such as $\chi \triangleright v$ ;
a goal node $n_\gamma$ .

Variable and disjunction nodes behave like OR nodes. Conjunctions and effect nodes behave like AND nodes. Effect nodes also carry the cost of the operator that produces the effect.

The RTG lets us propagate achievement costs from the initial facts toward the goal. The only question is how to assign a cost to an AND node. Two different answers give two different heuristics.

The Max Heuristic

The $h^{\max}$ heuristic treats a conjunction as if its cost were the cost of the most expensive conjunct.

For an RTG node $n$ with predecessors $\mathit{pred}(n)$ , the recurrence is:

h^{\max}(n) = \begin{cases} 0 & \text{if } n = n_I,\\ \min_{n' \in \mathit{pred}(n)} h^{\max}(n') & \text{if } n \text{ is an OR node},\\ \max_{n' \in \mathit{pred}(n)} h^{\max}(n') + c(n) & \text{if } n \text{ is an AND node}. \end{cases}

The heuristic value for state $s$ is $h^{\max}(s)=h^{\max}(n_\gamma)$ .

The max rule is optimistic. If a goal requires both $p$ and $q$ , and $p$ costs $3$ while $q$ costs $5$ , then $h^{\max}$ estimates the conjunction as $5$ , not $8$ . This can be admissible because one action sequence may achieve both subgoals together, but it is often weak because it ignores much of the work outside the single most expensive branch.

The Additive Heuristic

The $h^{\mathrm{add}}$ heuristic uses the same OR rule but changes the AND rule. A conjunction is estimated by summing its predecessor costs:

h^{\mathrm{add}}(n) = \begin{cases} 0 & \text{if } n = n_I,\\ \min_{n' \in \mathit{pred}(n)} h^{\mathrm{add}}(n') & \text{if } n \text{ is an OR node},\\ \sum_{n' \in \mathit{pred}(n)} h^{\mathrm{add}}(n') + c(n) & \text{if } n \text{ is an AND node}. \end{cases}

The value for the state is $h^{\mathrm{add}}(s)=h^{\mathrm{add}}(n_\gamma)$ .

This usually gives stronger search guidance than $h^{\max}$ , because it counts all required subgoals rather than only the hardest one. The cost is that it may count the same helpful action more than once. If one action achieves both $p$ and $q$ , and the goal requires $p \land q$ , then summing separate costs for $p$ and $q$ double-counts shared work. For that reason, $h^{\mathrm{add}}$ is not admissible in general.

Both $h^{\max}$ and $h^{\mathrm{add}}$ can be computed efficiently by propagating values through the RTG, often using a priority-queue style algorithm. The graph computation is polynomial in the number of RTG nodes and arcs.

Best Achievers and the FF Heuristic

The FF heuristic keeps the practical strength of additive costs but tries to avoid the most obvious double counting.

First, compute additive costs in the RTG. Then choose, for each OR node, a best achiever: one predecessor that attains the minimum cost for that node. This choice function extracts a best-achiever graph from the RTG by keeping the selected incoming arc for each OR node and the necessary arcs through AND nodes.

The heuristic then identifies the effect nodes needed to support the goal in this best-achiever graph and counts each such effect node only once.

Definition: The $h^{\mathrm{FF}}$ heuristic

For a state $s$ :

construct the relaxed task graph for $\Pi_s^+$ ;

compute best achievers using additive costs;

trace backward from the goal through the best-achiever graph;

return the sum of the costs of the reachable effect nodes, counting each effect node once.

The result is the cost of a relaxed plan extracted from the graph. It is usually more informative than $h^{\max}$ and less prone to overcounting than $h^{\mathrm{add}}$ . However, it is still not admissible. The best-achiever choices can depend on tie-breaking, and the extracted relaxed plan is not guaranteed to be optimal.

How the Heuristics Relate

The four delete-relaxation heuristics form an important comparison chain:

h^{\max}(s) \le h^+(s) \le h^{\mathrm{FF}}(s) \le h^{\mathrm{add}}(s)

for all states $s$ , with the usual convention that unreachable relaxed goals have value $\infty$ .

They also agree on relaxed dead ends:

h^{\max}(s)=\infty \iff h^+(s)=\infty \iff h^{\mathrm{FF}}(s)=\infty \iff h^{\mathrm{add}}(s)=\infty.

The practical interpretation is:

$h^{\max}$ is admissible and consistent, but often too weak.
$h^+$ is admissible and consistent, but NP-hard to compute.
$h^{\mathrm{add}}$ is polynomial and often informative, but neither admissible nor consistent.
$h^{\mathrm{FF}}$ is polynomial and highly practical, but neither admissible nor consistent.

All four are safe and goal-aware. That is why even inadmissible delete-relaxation heuristics can be useful for satisficing planning: they may not preserve optimality guarantees, but they still provide meaningful guidance toward relaxed reachability.

Common Mistakes

A common mistake is to say that delete relaxation simply “removes negative preconditions.” In positive normal form, the relaxation is about effects: negative effects are replaced by no-ops. Preconditions and the goal remain part of the task.

Another mistake is to assume that a polynomial relaxed-plan algorithm computes $h^+$ . The greedy relaxed-plan algorithm finds some relaxed plan, not necessarily an optimal one. The optimal relaxed-plan cost is exactly what makes $h^+$ hard.

Finally, do not confuse $h^{\mathrm{add}}$ with a safe lower bound. Summing subgoal costs feels conservative, but shared achievers can make the sum larger than the true remaining cost.

Chapter Summary

Delete relaxation estimates planning difficulty by ignoring delete effects. The resulting task is monotone: once a fact is true, it remains true. The optimal relaxed-plan cost $h^+$ is admissible but NP-hard to compute, so planners often use polynomial approximations based on relaxed task graphs. $h^{\max}$ uses a maximum at AND nodes and is admissible but weak. $h^{\mathrm{add}}$ uses a sum and is stronger but can overcount. $h^{\mathrm{FF}}$ extracts a relaxed plan from best achievers and counts each selected effect once, making it a powerful practical heuristic for satisficing search.

Check Your Understanding

What exactly changes when an operator is delete-relaxed?
Why does monotonicity make relaxed reachability easier to reason about?
Why is $h^{\max}$ admissible but often weak?
Give a small example where $h^{\mathrm{add}}$ double-counts useful work.
Why can $h^{\mathrm{FF}}$ be useful even though it is not admissible?