Skip to content

Merge-and-Shrink Heuristics

Source slides: E7 Factored Transition Systems, E8 Merge-and-Shrink Algorithm, E9 Strategies and Label Reduction

Merge-and-shrink heuristics start from a simple observation about pattern databases. A pattern database can be exact on the variables it keeps, but it ignores every variable outside the pattern. Merge-and-shrink reverses that emphasis: it begins with all variables present, then deliberately compresses the representation when it becomes too large.

The method is useful because planning tasks usually contain interacting variables. If we keep every variable and build the full product transition system, we recover exact goal distances but run into the exponential state space. If we keep only a few variables, the computation is cheap but may miss the interaction that makes the problem hard. Merge-and-shrink gives an algorithmic framework for navigating between these extremes.

The question for this chapter is:

How can we build an admissible heuristic that reflects all variables of a planning task without constructing the full transition system explicitly?

The answer has three ingredients:

  • factors, which are small transition systems representing parts of the task;
  • merge operations, which combine two factors by synchronized product;
  • shrink operations, which abstract a factor by merging states.

At the end, one abstract transition system remains. Its exact goal distances are used as heuristic values for the original task.

Assume an FDR planning task with variables V={v1,,vn}\mathcal{V} = \{v_1,\ldots,v_n\}, where each variable viv_i has finite domain D(vi)\mathcal{D}(v_i). A complete state assigns one value to every variable, so the full state space has size

i=1nD(vi).\prod_{i=1}^{n} |\mathcal{D}(v_i)|.

Merge-and-shrink does not start with this full product. It starts with one transition system per variable.

Definition: Atomic projection

The atomic projection Tvi\mathcal{T}^{v_i} of a planning task onto variable viv_i is the transition system whose states are the values in D(vi)\mathcal{D}(v_i). Its transitions are induced by the original operators as seen through their effect on viv_i, and its goal states are the values allowed by the goal condition for viv_i. If the goal does not constrain viv_i, then every value of viv_i is a goal state in this projection.

An atomic projection is small because it tracks only one variable. It is also exact for that variable: it does not approximate the behavior of viv_i by itself. The approximation enters later, when factors are shrunk.

A factored transition system represents a larger transition system as a collection of smaller ones. The factors all use the same labels, normally the task operators.

Definition: Factored transition system

A factored transition system is a finite set {T1,,Tk}\{\mathcal{T}_1,\ldots,\mathcal{T}_k\} of transition systems over a common label set LL. It represents the synchronized product

T1Tk.\mathcal{T}_1 \otimes \cdots \otimes \mathcal{T}_k.

Initially, the factors are the atomic projections

{Tv1,,Tvn}.\{\mathcal{T}^{v_1},\ldots,\mathcal{T}^{v_n}\}.

If we took the synchronized product of all of them without any abstraction, we would reconstruct the original transition system, up to the usual correspondence between states and tuples of variable values. This is why merge-and-shrink can be seen as a generalization of pattern databases: it begins from projections, but it can combine them instead of leaving the non-pattern variables behind.

The merge operation is the synchronized product of two transition systems. It forms states that are pairs, and it allows a label only when that label can move both components consistently.

Definition: Synchronized product

Let

T1=S1,L,c,T1,s01,S1\mathcal{T}_1 = \langle S_1,L,c,T_1,s_0^1,S_\star^1\rangle

and

T2=S2,L,c,T2,s02,S2\mathcal{T}_2 = \langle S_2,L,c,T_2,s_0^2,S_\star^2\rangle

be transition systems over the same labels and costs. Their synchronized product is

T1T2=S1×S2,L,c,T,s01,s02,S1×S2,\mathcal{T}_1 \otimes \mathcal{T}_2 = \langle S_1 \times S_2, L, c, T, \langle s_0^1,s_0^2\rangle, S_\star^1 \times S_\star^2\rangle,

where

s1,s2,,s1,s2T\langle \langle s_1,s_2\rangle,\ell,\langle s_1',s_2'\rangle\rangle \in T

exactly when

s1,,s1T1ands2,,s2T2.\langle s_1,\ell,s_1'\rangle \in T_1 \quad\text{and}\quad \langle s_2,\ell,s_2'\rangle \in T_2.

The phrase “synchronized” is important. A product transition labelled \ell exists only if \ell has a matching transition in both factors. If one factor says that \ell can move and the other does not, the combined state has no \ell-transition.

As a small interpretation, suppose one factor tracks a robot’s location and another tracks whether a key is held. The action pick-up-key may change the key factor and leave the location factor unchanged through a self-loop. The synchronized product then records both facts: the action is possible only in combined states where all relevant factor transitions exist, and the result updates the components together.

The merge-and-shrink algorithm alternates between building more expressive factors and keeping those factors small enough to store.

Algorithm: Merge-and-shrink

  1. Create one atomic projection Tvi\mathcal{T}^{v_i} for each variable viv_i.
  2. While more than one factor remains:
    1. choose two factors;
    2. replace them by their synchronized product;
    3. optionally shrink one or more factors by applying an abstraction.
  3. Compute exact goal distances in the final factor.
  4. Use the distance of the abstract state representing the current state as the heuristic value.

The merge step increases information because it allows interactions between variables to be represented. The shrink step may lose information because several states become a single abstract state. The heuristic remains admissible as long as the shrinking step is a sound abstraction.

Without shrinking, the algorithm eventually constructs the full transition system and therefore gives the perfect heuristic hh^*. That is mathematically clean but usually computationally impossible. Shrinking is the price paid for tractability.

A shrinking step replaces a transition system by an abstract transition system. The abstraction map tells us which concrete states are represented by each abstract state.

Definition: Abstraction by homomorphism

Let T=S,L,c,T,s0,S\mathcal{T} = \langle S,L,c,T,s_0,S_\star\rangle be a transition system. An abstraction uses a surjective map σ:SS\sigma : S \to S' and constructs

T=S,L,c,T,σ(s0),S\mathcal{T}' = \langle S',L,c,T',\sigma(s_0),S_\star'\rangle

so that every concrete transition

s,,sT\langle s,\ell,s'\rangle \in T

has a corresponding abstract transition

σ(s),,σ(s)T,\langle \sigma(s),\ell,\sigma(s')\rangle \in T',

and concrete goal states map to abstract goal states.

This condition gives admissibility. Any concrete plan path is also visible as an abstract path, possibly with additional abstract paths created by merging states. Extra paths can make the abstract goal distance smaller, but not larger. Therefore the abstract distance is a lower bound on the real distance.

The common mistake is to think of shrinking as deleting states. It is better to think of it as forgetting distinctions. The abstract system may become more permissive, which is safe for admissible heuristics because optimistic estimates are allowed.

The most conservative useful shrinking method is based on bisimulation. It merges only states that behave the same with respect to goals and labelled transitions.

Definition: Bisimulation

An equivalence relation \sim on the states of a transition system is a bisimulation if sts \sim t implies:

  1. ss is a goal state exactly when tt is a goal state; and
  2. for every label \ell, every \ell-successor of ss can be matched by an \ell-successor of tt in the same equivalence class, and conversely.

Collapsing bisimilar states preserves goal distances in the factor. In that sense, bisimulation shrinking removes redundant distinctions without weakening the heuristic. Its limitation is practical: many factors remain too large even after exact bisimulation minimization.

When exact shrinking is not enough, planners use greedy shrinking strategies. These strategies deliberately merge states that are not fully bisimilar.

Two common ideas are:

  • merge states in a way that keeps abstract classes balanced or the total factor size under a bound;
  • prefer merging states with the same current goal distance, because this tends to preserve the heuristic value even if transition details are lost.

Greedy shrinking is still admissible if implemented as a proper abstraction, but it may reduce heuristic quality. The trade-off is deliberate: a weaker heuristic that can be computed may be more useful than a perfect heuristic that cannot be stored.

The merge strategy decides which two factors to combine next. This choice matters because shrinking is usually applied along the way. Different merge orders expose different interactions before information is compressed.

A linear merge strategy orders the variables and repeatedly merges the accumulated factor with one new atomic projection. This is simple and cheap to control, but it can force early shrinking of a large accumulated factor.

A DAG-based or tree-shaped strategy builds a binary merge tree. It may first combine closely related variables into small groups before merging those groups into larger factors. Such strategies can avoid losing important local structure too early, especially when they use causal information or estimates of product sizes.

There is no universally best merge order. The merge strategy, shrink strategy, and size limits interact. Reading a merge-and-shrink configuration therefore means asking not only “what is represented?” but also “when is it represented before being abstracted?”

The labels in merge-and-shrink are normally operators. The synchronized product distinguishes labels exactly, so two labels with the same cost and same behavior can still prevent states from being recognized as equivalent if they remain separate names.

Label reduction merges labels that are indistinguishable for the current factored transition system.

Definition: Exact label reduction

Two labels \ell and \ell' can be combined into one label if:

  1. c()=c()c(\ell) = c(\ell'); and
  2. in every current factor, \ell and \ell' induce exactly the same transitions.

After label reduction, bisimulation may find more states equivalent, because irrelevant naming differences have disappeared. Exact label reduction does not weaken the heuristic: it only removes distinctions between labels that have the same cost and same transition behavior in the factors being considered.

Pattern databases choose a subset of variables and solve the projected task exactly. Merge-and-shrink can represent information about all variables at once, although abstractly. This gives it a different kind of expressive power.

For some task families, suitable merge-and-shrink strategies compute strong heuristics with only linear time and space in the number of variables, while every small pattern database misses the long-range interaction and gives only a very weak bound. The lesson is not that merge-and-shrink is always better. The lesson is that abstraction structure matters: projecting away variables and merging abstract states lose different kinds of information.

Merge-and-shrink builds an admissible heuristic from a factored transition-system representation. It starts with atomic projections, repeatedly merges factors by synchronized product, and applies abstractions to keep factors small. If no shrinking were used, the final factor would be the full transition system and would yield hh^*. In practice, shrinking makes the method tractable. Bisimulation shrinking preserves goal-distance information, greedy shrinking trades precision for memory, merge strategies determine when interactions are exposed, and label reduction removes unnecessary distinctions between equivalent operator labels.

  1. What information does an atomic projection keep, and what information does it ignore?
  2. In a synchronized product, why must both factors have a transition with the same label?
  3. What happens if merge-and-shrink is run without any shrinking?
  4. Why does abstraction by state merging preserve admissibility?
  5. What is the difference between bisimulation shrinking and greedy shrinking?
  6. How can label reduction make later shrinking more effective?