It is often possible to greatly improve the performance of a hardware system via the use of predictive (speculative) techniques. For example, the performance of out-of-order microprocessors is greatly enhanced by predicting the outcomes of conditional branch instructions. Most hardware predictors are table based (e.g., two-level branch predictors)- maintaining predictive information for each combination of values of a set of features (e.g., a set of previous branch outcomes). Table-based approaches suffer from the fact that the predictor sizes grow exponentially in the number of features considered by the predictor-this severely limits the amount of predictive information that can be effectively utilized by table-based predictors. For example, most table-based branch predictors use a feature set containing only a small number of recent branch outcomes, despite the fact that other bits of processor state contain predictive information (e.g., register values and target addresses). In this research we propose a dynamic feature selection framework for building hardware predictors. A predictor using dynamic feature selection selects the most predictive features from a potentially large feature set and uses only the selected features to make predictions. We design and evaluate a dynamic decision tree (DDT) predictor, an on-line adaptation of the decision trees that are widely used in machine learning research. Through the use of feature selection the predictor storage requirement grows only linearly with the number of features considered. This opens up the possibility of considering much more information for prediction than current table-based approaches. This possibility is particularly important for prediction domains where there is no known small set of useful features. We evaluate our predictor in the branch prediction domain and show that dynamic feature selection can dynamically select useful features from a much larger set of features than current predictors (without any specialized knowledge about which of the features are useful) to achieve high prediction rates. It is not our goal in this paper to show better performance than the best-known branch predictor but to introduce and evaluate a new framework of feature selection which enables predictors to consider large numbers of features in a systematic manner, adding to design options for future hardware prediction schemes. Our results suggest that dynamic feature selection will be useful (and perhaps essential) in domains where the set of useful features is unknown, or where that set varies across prediction instances. Simulation results on SPECint95 indicate that our predictor on average performs comparable to conventional two-level interference-free predictors with similar storage requirements and generally is more robust than familiar predictors-performing close to the best of such predictors across applications that favor various familiar predictors (this robustness is similar to that achievable by hybridizing familiar predictors, but is achieved here by a general-purpose technique that does not require designing multiple specialized predictors to combine).

Date of this Version

July 2000