We study the problem of automatically selecting problem features to use in approximately representing value in Markov decision processes. We propose and evaluate a simple approach reducing the problem of selecting a new feature to standard classification learning—we learn a classifier that predicts the sign of the Bellman error over a training set of states. By iteratively adding new classifiers as features with this method, training between iterations with TD-learning, we find a Tetris feature set that outperforms randomly constructed features significantly, and obtains a score of about one-third of the highest score obtained by using a carefully hand-constructed feature set. We also show that features learned with this method outperform those learned with the previous method of Patrascu et al. [2] on the same SysAdmin domain used for evaluation there.

Date of this Version

November 2004