We design a novel approximate policy iteration (API) method suited for learning good domain-specific control knowledge in large relational planning domains. The learned knowledge takes the form of a control policy for a single Markov decision process representing all problem instances of the planning domain. Our learned policies can quickly solve most or all problems within the domains we evaluate. The API methods we adapt move from policy to policy using a combination of policy simulation and inductive policy selection. Previous methods represent policies implicitly, using cost functions combined with greedy look-ahead. We represent policies directly as compact state-action mappings, and thus avoid the often awkward problem of giving any cost-function-based bias. We give a natural policy-space bias by specifying a general-purpose knowledge-representation language for policies. Our API method naturally incorporates heuristic functions, allowing us to exploit recent progress in domain-independent planning heuristics. In benchmark planning domains, we show that our technique can leverage the heuristic from FF-plan to generate fast and effective control policies for entire planning domains (not just individual instances). We also show iterative improvement upon previously published human-specified and learned initial policies.

Date of this Version

May 2003