Merge-and-Shrink Heuristics

Source slides: E7 Factored Transition Systems, E8 Merge-and-Shrink Algorithm, E9 Strategies and Label Reduction

Merge-and-shrink heuristics start from a simple observation about pattern databases. A pattern database can be exact on the variables it keeps, but it ignores every variable outside the pattern. Merge-and-shrink reverses that emphasis: it begins with all variables present, then deliberately compresses the representation when it becomes too large.

The method is useful because planning tasks usually contain interacting variables. If we keep every variable and build the full product transition system, we recover exact goal distances but run into the exponential state space. If we keep only a few variables, the computation is cheap but may miss the interaction that makes the problem hard. Merge-and-shrink gives an algorithmic framework for navigating between these extremes.

The Question

The question for this chapter is:

How can we build an admissible heuristic that reflects all variables of a planning task without constructing the full transition system explicitly?

The answer has three ingredients:

factors, which are small transition systems representing parts of the task;
merge operations, which combine two factors by synchronized product;
shrink operations, which abstract a factor by merging states.

At the end, one abstract transition system remains. Its exact goal distances are used as heuristic values for the original task.

Starting From One Variable at a Time

Assume an FDR planning task with variables $\mathcal{V} = \{v_1,\ldots,v_n\}$ , where each variable $v_i$ has finite domain $\mathcal{D}(v_i)$ . A complete state assigns one value to every variable, so the full state space has size

\prod_{i=1}^{n} |\mathcal{D}(v_i)|.

Merge-and-shrink does not start with this full product. It starts with one transition system per variable.

Definition: Atomic projection

The atomic projection $\mathcal{T}^{v_i}$ of a planning task onto variable $v_i$ is the transition system whose states are the values in $\mathcal{D}(v_i)$ . Its transitions are induced by the original operators as seen through their effect on $v_i$ , and its goal states are the values allowed by the goal condition for $v_i$ . If the goal does not constrain $v_i$ , then every value of $v_i$ is a goal state in this projection.

An atomic projection is small because it tracks only one variable. It is also exact for that variable: it does not approximate the behavior of $v_i$ by itself. The approximation enters later, when factors are shrunk.

Factored Transition Systems

A factored transition system represents a larger transition system as a collection of smaller ones. The factors all use the same labels, normally the task operators.

Definition: Factored transition system

A factored transition system is a finite set $\{\mathcal{T}_1,\ldots,\mathcal{T}_k\}$ of transition systems over a common label set $L$ . It represents the synchronized product
$\mathcal{T}_1 \otimes \cdots \otimes \mathcal{T}_k.$

Initially, the factors are the atomic projections

\{\mathcal{T}^{v_1},\ldots,\mathcal{T}^{v_n}\}.

If we took the synchronized product of all of them without any abstraction, we would reconstruct the original transition system, up to the usual correspondence between states and tuples of variable values. This is why merge-and-shrink can be seen as a generalization of pattern databases: it begins from projections, but it can combine them instead of leaving the non-pattern variables behind.

The Synchronized Product

The merge operation is the synchronized product of two transition systems. It forms states that are pairs, and it allows a label only when that label can move both components consistently.

Definition: Synchronized product

Let
$\mathcal{T}_1 = \langle S_1,L,c,T_1,s_0^1,S_\star^1\rangle$
and
$\mathcal{T}_2 = \langle S_2,L,c,T_2,s_0^2,S_\star^2\rangle$
be transition systems over the same labels and costs. Their synchronized product is
$\mathcal{T}_1 \otimes \mathcal{T}_2 = \langle S_1 \times S_2, L, c, T, \langle s_0^1,s_0^2\rangle, S_\star^1 \times S_\star^2\rangle,$
where
$\langle \langle s_1,s_2\rangle,\ell,\langle s_1',s_2'\rangle\rangle \in T$
exactly when
$\langle s_1,\ell,s_1'\rangle \in T_1 \quad\text{and}\quad \langle s_2,\ell,s_2'\rangle \in T_2.$

The phrase “synchronized” is important. A product transition labelled $\ell$ exists only if $\ell$ has a matching transition in both factors. If one factor says that $\ell$ can move and the other does not, the combined state has no $\ell$ -transition.

As a small interpretation, suppose one factor tracks a robot’s location and another tracks whether a key is held. The action pick-up-key may change the key factor and leave the location factor unchanged through a self-loop. The synchronized product then records both facts: the action is possible only in combined states where all relevant factor transitions exist, and the result updates the components together.

The Basic Algorithm

The merge-and-shrink algorithm alternates between building more expressive factors and keeping those factors small enough to store.

Algorithm: Merge-and-shrink

Create one atomic projection $\mathcal{T}^{v_i}$ for each variable $v_i$ .

While more than one factor remains:

choose two factors;

replace them by their synchronized product;

optionally shrink one or more factors by applying an abstraction.

Compute exact goal distances in the final factor.

Use the distance of the abstract state representing the current state as the heuristic value.

The merge step increases information because it allows interactions between variables to be represented. The shrink step may lose information because several states become a single abstract state. The heuristic remains admissible as long as the shrinking step is a sound abstraction.

Without shrinking, the algorithm eventually constructs the full transition system and therefore gives the perfect heuristic $h^*$ . That is mathematically clean but usually computationally impossible. Shrinking is the price paid for tractability.

How Shrinking Preserves Admissibility

A shrinking step replaces a transition system by an abstract transition system. The abstraction map tells us which concrete states are represented by each abstract state.

Definition: Abstraction by homomorphism

Let $\mathcal{T} = \langle S,L,c,T,s_0,S_\star\rangle$ be a transition system. An abstraction uses a surjective map $\sigma : S \to S'$ and constructs
$\mathcal{T}' = \langle S',L,c,T',\sigma(s_0),S_\star'\rangle$
so that every concrete transition
$\langle s,\ell,s'\rangle \in T$
has a corresponding abstract transition
$\langle \sigma(s),\ell,\sigma(s')\rangle \in T',$
and concrete goal states map to abstract goal states.

This condition gives admissibility. Any concrete plan path is also visible as an abstract path, possibly with additional abstract paths created by merging states. Extra paths can make the abstract goal distance smaller, but not larger. Therefore the abstract distance is a lower bound on the real distance.

The common mistake is to think of shrinking as deleting states. It is better to think of it as forgetting distinctions. The abstract system may become more permissive, which is safe for admissible heuristics because optimistic estimates are allowed.

Bisimulation Shrinking

The most conservative useful shrinking method is based on bisimulation. It merges only states that behave the same with respect to goals and labelled transitions.

Definition: Bisimulation

An equivalence relation $\sim$ on the states of a transition system is a bisimulation if $s \sim t$ implies:

$s$ is a goal state exactly when $t$ is a goal state; and

for every label $\ell$ , every $\ell$ -successor of $s$ can be matched by an $\ell$ -successor of $t$ in the same equivalence class, and conversely.

Collapsing bisimilar states preserves goal distances in the factor. In that sense, bisimulation shrinking removes redundant distinctions without weakening the heuristic. Its limitation is practical: many factors remain too large even after exact bisimulation minimization.

Greedy Shrinking

When exact shrinking is not enough, planners use greedy shrinking strategies. These strategies deliberately merge states that are not fully bisimilar.

Two common ideas are:

merge states in a way that keeps abstract classes balanced or the total factor size under a bound;
prefer merging states with the same current goal distance, because this tends to preserve the heuristic value even if transition details are lost.

Greedy shrinking is still admissible if implemented as a proper abstraction, but it may reduce heuristic quality. The trade-off is deliberate: a weaker heuristic that can be computed may be more useful than a perfect heuristic that cannot be stored.

Merge Strategies

The merge strategy decides which two factors to combine next. This choice matters because shrinking is usually applied along the way. Different merge orders expose different interactions before information is compressed.

A linear merge strategy orders the variables and repeatedly merges the accumulated factor with one new atomic projection. This is simple and cheap to control, but it can force early shrinking of a large accumulated factor.

A DAG-based or tree-shaped strategy builds a binary merge tree. It may first combine closely related variables into small groups before merging those groups into larger factors. Such strategies can avoid losing important local structure too early, especially when they use causal information or estimates of product sizes.

There is no universally best merge order. The merge strategy, shrink strategy, and size limits interact. Reading a merge-and-shrink configuration therefore means asking not only “what is represented?” but also “when is it represented before being abstracted?”

Label Reduction

The labels in merge-and-shrink are normally operators. The synchronized product distinguishes labels exactly, so two labels with the same cost and same behavior can still prevent states from being recognized as equivalent if they remain separate names.

Label reduction merges labels that are indistinguishable for the current factored transition system.

Definition: Exact label reduction

Two labels $\ell$ and $\ell'$ can be combined into one label if:

$c(\ell) = c(\ell')$ ; and

in every current factor, $\ell$ and $\ell'$ induce exactly the same transitions.

After label reduction, bisimulation may find more states equivalent, because irrelevant naming differences have disappeared. Exact label reduction does not weaken the heuristic: it only removes distinctions between labels that have the same cost and same transition behavior in the factors being considered.

Why This Can Beat Pattern Databases

Pattern databases choose a subset of variables and solve the projected task exactly. Merge-and-shrink can represent information about all variables at once, although abstractly. This gives it a different kind of expressive power.

For some task families, suitable merge-and-shrink strategies compute strong heuristics with only linear time and space in the number of variables, while every small pattern database misses the long-range interaction and gives only a very weak bound. The lesson is not that merge-and-shrink is always better. The lesson is that abstraction structure matters: projecting away variables and merging abstract states lose different kinds of information.

Chapter Summary

Merge-and-shrink builds an admissible heuristic from a factored transition-system representation. It starts with atomic projections, repeatedly merges factors by synchronized product, and applies abstractions to keep factors small. If no shrinking were used, the final factor would be the full transition system and would yield $h^*$ . In practice, shrinking makes the method tractable. Bisimulation shrinking preserves goal-distance information, greedy shrinking trades precision for memory, merge strategies determine when interactions are exposed, and label reduction removes unnecessary distinctions between equivalent operator labels.

Check Your Understanding

What information does an atomic projection keep, and what information does it ignore?
In a synchronized product, why must both factors have a transition with the same label?
What happens if merge-and-shrink is run without any shrinking?
Why does abstraction by state merging preserve admissibility?
What is the difference between bisimulation shrinking and greedy shrinking?
How can label reduction make later shrinking more effective?