We study the problem of automatically selecting problem features to use in approximately representing value in Markov decision processes. We propose and evaluate a simple approach reducing the problem of selecting a new feature to standard classification learning—we learn a classifier that predicts the sign of the Bellman error over a training set of states. By iteratively adding new classifiers as features with this method, training between iterations with TD-learning, we find a Tetris feature set that outperforms randomly constructed features significantly, and obtains a score of about one-third of the highest score obtained by using a carefully hand-constructed feature set. We also show that features learned with this method outperform those learned with the previous method of Patrascu et al.  on the same SysAdmin domain used for evaluation there.
Date of this Version