Skip to content

Delete Relaxation

Source slides: D1, D2, D3, D4, D5, D6

Delete relaxation is the first major example of a planning heuristic built by changing the planning task itself. The guiding question is simple: if the real task is too hard to solve during search, can we solve an easier task whose solution cost still tells us something useful about the real one?

For classical planning, one natural simplification is to ignore delete effects. If an action makes a fact true, the relaxed task keeps that achievement forever. This is unrealistic, but it creates a monotone planning problem: facts can accumulate, and previously achieved facts cannot be lost. From this relaxation we get several important heuristics, including hmaxh^{\max}, h+h^+, haddh^{\mathrm{add}}, and hFFh^{\mathrm{FF}}.

A heuristic estimates the remaining cost from a state ss to a goal. If h(s)h^*(s) denotes the true optimal goal distance, then a heuristic h(s)h(s) is useful when it is cheaper to compute than h(s)h^*(s) and still correlated with it.

Many planning heuristics follow the same three-step recipe:

  1. simplify the planning problem;
  2. solve, or approximately solve, the simplified problem;
  3. use the simplified solution cost as the heuristic value for the original problem.

A familiar analogy is straight-line distance in route planning. It ignores the constraint that a vehicle must follow roads. The resulting distance is easy to compute and never larger than the shortest road distance, so it is an admissible lower bound.

In planning, we usually want to know which formal properties a heuristic has.

Definition: Heuristic properties

Let h:SN0{}h : S \to \mathbb{N}_0 \cup \{\infty\} be a heuristic.

  • hh is goal-aware if h(s)=0h(s)=0 for every goal state ss.
  • hh is safe if h(s)=h(s)=\infty implies h(s)=h^*(s)=\infty.
  • hh is admissible if h(s)h(s)h(s) \le h^*(s) for every state ss.
  • hh is consistent if h(s)cost(o)+h(s)h(s) \le \mathit{cost}(o) + h(s') for every transition soss \xrightarrow{o} s'.

Goal-awareness says that goals are recognized as finished. Safety says that infinite estimates are not false dead-end reports. Admissibility is the lower-bound property needed by optimal search algorithms such as A*. Consistency is a stronger local lower-bound condition along transitions.

Assume the task is in positive normal form, so positive effects make variables true and negative effects make variables false. Positive effects are useful achievements. Negative effects are the source of interference: an action may help one subgoal while undoing another.

The delete relaxation removes that interference.

Definition: Delete-relaxed operator and task

Let oo be an operator in positive normal form. Its delete relaxation o+o^+ is obtained by replacing every negative atomic effect ¬a\neg a in eff(o)\mathit{eff}(o) with \top, meaning “do nothing”.

For a planning task Π=V,I,O,γ\Pi = \langle V,I,O,\gamma\rangle, the delete-relaxed task is

Π+=V,I,{o+oO},γ.\Pi^+ = \langle V,I,\{o^+ \mid o \in O\},\gamma\rangle.

The initial state and goal are unchanged. Only the effects are simplified. If an action originally added pp and deleted qq, its relaxed version adds pp and leaves qq alone.

The crucial consequence is monotonicity. It is helpful to view a state as the set of variables that are true. In the relaxed task, applying an action can only enlarge this set.

Monotonicity lemma

For any state ss and relaxed operator o+o^+, the successor so+s\llbracket o^+\rrbracket dominates ss: every variable true in ss is also true in so+s\llbracket o^+\rrbracket.

This has an important second consequence. If a relaxed plan works from a state ss, it also works from any state ss' that already contains at least the facts of ss.

Relaxation lemma

If ss' dominates ss, and π+\pi^+ is a relaxed plan from ss, then π+\pi^+ is also a relaxed plan from ss'.

The reason is that relaxed actions never depend on facts being false. Having extra true facts cannot invalidate a precondition or undo a future achievement.

Suppose a robot has to deliver two packages, and carrying one package prevents carrying the other in the real task. A real action pick-up-A might make hasA\mathit{hasA} true and handempty\mathit{handempty} false. A later drop-A might restore handempty\mathit{handempty} but remove hasA\mathit{hasA}.

In the delete relaxation, once hasA\mathit{hasA} becomes true, it stays true. The robot can also regain or keep facts such as handempty\mathit{handempty} without losing previous achievements. The relaxed task may therefore believe that package achievements combine freely even when the real task requires careful ordering.

This is not a bug in the heuristic. It is the intended approximation: delete relaxation measures how hard it is to make the required facts true when negative interactions are ignored.

Because relaxed states only grow, a simple greedy algorithm is sound for finding some relaxed plan when one exists.

  1. Start with s:=Is := I and an empty relaxed plan π+:=\pi^+ := \langle\rangle.
  2. If sγs \models \gamma, return π+\pi^+.
  3. Choose an applicable relaxed operator o+o^+ such that so+ss\llbracket o^+\rrbracket \ne s.
  4. Apply it, append it to π+\pi^+, and repeat.
  5. If no such operator exists, report the relaxed task unsolvable.

Each successful iteration makes at least one new variable true, so there can be at most V|V| successful iterations. The algorithm is polynomial and sound. However, the particular relaxed plan it returns may be far from cheapest, because greedy choices can add unnecessary actions.

This distinction matters: delete-relaxed planning is easier than ordinary planning in some respects, but finding an optimal relaxed plan is still hard.

The most direct delete-relaxation heuristic is h+h^+.

Definition: The h+h^+ heuristic

For a state ss, let Πs+\Pi_s^+ be the delete-relaxed task with initial state ss. Then h+(s)h^+(s) is the cost of an optimal plan for Πs+\Pi_s^+, or \infty if no relaxed plan exists.

Because every real plan is also a valid relaxed plan after deletes are ignored, the relaxed optimum cannot be more expensive than the real optimum. Therefore h+h^+ is admissible. It is also consistent.

The problem is computation. Optimal planning for delete-relaxed tasks is NP-hard. So h+h^+ is theoretically important but not usually computed exactly inside a planner. The practical heuristics in this chapter approximate it with polynomial graph computations.

The relaxed task graph, or RTG, is the shared data structure behind hmaxh^{\max}, haddh^{\mathrm{add}}, and hFFh^{\mathrm{FF}}. It represents the delete-relaxed task as an AND/OR graph.

In an AND/OR graph, an OR node is achieved when at least one predecessor is achieved. An AND node is achieved only when all predecessors are achieved. This distinction matches planning structure: a fact may be achieved in several alternative ways, but an action precondition with several required facts needs all of them.

For a relaxed task Πs+\Pi_s^+, the RTG contains:

  • an initial node nIn_I, representing the facts true in ss;
  • variable nodes nvn_v, one for each state variable that may need to become true;
  • formula nodes for compound preconditions, conditions, and the goal;
  • effect nodes for conditional effects such as χv\chi \triangleright v;
  • a goal node nγn_\gamma.

Variable and disjunction nodes behave like OR nodes. Conjunctions and effect nodes behave like AND nodes. Effect nodes also carry the cost of the operator that produces the effect.

The RTG lets us propagate achievement costs from the initial facts toward the goal. The only question is how to assign a cost to an AND node. Two different answers give two different heuristics.

The hmaxh^{\max} heuristic treats a conjunction as if its cost were the cost of the most expensive conjunct.

For an RTG node nn with predecessors pred(n)\mathit{pred}(n), the recurrence is:

hmax(n)={0if n=nI,minnpred(n)hmax(n)if n is an OR node,maxnpred(n)hmax(n)+c(n)if n is an AND node.h^{\max}(n) = \begin{cases} 0 & \text{if } n = n_I,\\ \min_{n' \in \mathit{pred}(n)} h^{\max}(n') & \text{if } n \text{ is an OR node},\\ \max_{n' \in \mathit{pred}(n)} h^{\max}(n') + c(n) & \text{if } n \text{ is an AND node}. \end{cases}

The heuristic value for state ss is hmax(s)=hmax(nγ)h^{\max}(s)=h^{\max}(n_\gamma).

The max rule is optimistic. If a goal requires both pp and qq, and pp costs 33 while qq costs 55, then hmaxh^{\max} estimates the conjunction as 55, not 88. This can be admissible because one action sequence may achieve both subgoals together, but it is often weak because it ignores much of the work outside the single most expensive branch.

The haddh^{\mathrm{add}} heuristic uses the same OR rule but changes the AND rule. A conjunction is estimated by summing its predecessor costs:

hadd(n)={0if n=nI,minnpred(n)hadd(n)if n is an OR node,npred(n)hadd(n)+c(n)if n is an AND node.h^{\mathrm{add}}(n) = \begin{cases} 0 & \text{if } n = n_I,\\ \min_{n' \in \mathit{pred}(n)} h^{\mathrm{add}}(n') & \text{if } n \text{ is an OR node},\\ \sum_{n' \in \mathit{pred}(n)} h^{\mathrm{add}}(n') + c(n) & \text{if } n \text{ is an AND node}. \end{cases}

The value for the state is hadd(s)=hadd(nγ)h^{\mathrm{add}}(s)=h^{\mathrm{add}}(n_\gamma).

This usually gives stronger search guidance than hmaxh^{\max}, because it counts all required subgoals rather than only the hardest one. The cost is that it may count the same helpful action more than once. If one action achieves both pp and qq, and the goal requires pqp \land q, then summing separate costs for pp and qq double-counts shared work. For that reason, haddh^{\mathrm{add}} is not admissible in general.

Both hmaxh^{\max} and haddh^{\mathrm{add}} can be computed efficiently by propagating values through the RTG, often using a priority-queue style algorithm. The graph computation is polynomial in the number of RTG nodes and arcs.

The FF heuristic keeps the practical strength of additive costs but tries to avoid the most obvious double counting.

First, compute additive costs in the RTG. Then choose, for each OR node, a best achiever: one predecessor that attains the minimum cost for that node. This choice function extracts a best-achiever graph from the RTG by keeping the selected incoming arc for each OR node and the necessary arcs through AND nodes.

The heuristic then identifies the effect nodes needed to support the goal in this best-achiever graph and counts each such effect node only once.

Definition: The hFFh^{\mathrm{FF}} heuristic

For a state ss:

  1. construct the relaxed task graph for Πs+\Pi_s^+;
  2. compute best achievers using additive costs;
  3. trace backward from the goal through the best-achiever graph;
  4. return the sum of the costs of the reachable effect nodes, counting each effect node once.

The result is the cost of a relaxed plan extracted from the graph. It is usually more informative than hmaxh^{\max} and less prone to overcounting than haddh^{\mathrm{add}}. However, it is still not admissible. The best-achiever choices can depend on tie-breaking, and the extracted relaxed plan is not guaranteed to be optimal.

The four delete-relaxation heuristics form an important comparison chain:

hmax(s)h+(s)hFF(s)hadd(s)h^{\max}(s) \le h^+(s) \le h^{\mathrm{FF}}(s) \le h^{\mathrm{add}}(s)

for all states ss, with the usual convention that unreachable relaxed goals have value \infty.

They also agree on relaxed dead ends:

hmax(s)=    h+(s)=    hFF(s)=    hadd(s)=.h^{\max}(s)=\infty \iff h^+(s)=\infty \iff h^{\mathrm{FF}}(s)=\infty \iff h^{\mathrm{add}}(s)=\infty.

The practical interpretation is:

  • hmaxh^{\max} is admissible and consistent, but often too weak.
  • h+h^+ is admissible and consistent, but NP-hard to compute.
  • haddh^{\mathrm{add}} is polynomial and often informative, but neither admissible nor consistent.
  • hFFh^{\mathrm{FF}} is polynomial and highly practical, but neither admissible nor consistent.

All four are safe and goal-aware. That is why even inadmissible delete-relaxation heuristics can be useful for satisficing planning: they may not preserve optimality guarantees, but they still provide meaningful guidance toward relaxed reachability.

A common mistake is to say that delete relaxation simply “removes negative preconditions.” In positive normal form, the relaxation is about effects: negative effects are replaced by no-ops. Preconditions and the goal remain part of the task.

Another mistake is to assume that a polynomial relaxed-plan algorithm computes h+h^+. The greedy relaxed-plan algorithm finds some relaxed plan, not necessarily an optimal one. The optimal relaxed-plan cost is exactly what makes h+h^+ hard.

Finally, do not confuse haddh^{\mathrm{add}} with a safe lower bound. Summing subgoal costs feels conservative, but shared achievers can make the sum larger than the true remaining cost.

Delete relaxation estimates planning difficulty by ignoring delete effects. The resulting task is monotone: once a fact is true, it remains true. The optimal relaxed-plan cost h+h^+ is admissible but NP-hard to compute, so planners often use polynomial approximations based on relaxed task graphs. hmaxh^{\max} uses a maximum at AND nodes and is admissible but weak. haddh^{\mathrm{add}} uses a sum and is stronger but can overcount. hFFh^{\mathrm{FF}} extracts a relaxed plan from best achievers and counts each selected effect once, making it a powerful practical heuristic for satisficing search.

  1. What exactly changes when an operator is delete-relaxed?
  2. Why does monotonicity make relaxed reachability easier to reason about?
  3. Why is hmaxh^{\max} admissible but often weak?
  4. Give a small example where haddh^{\mathrm{add}} double-counts useful work.
  5. Why can hFFh^{\mathrm{FF}} be useful even though it is not admissible?