As progress in new sensor technology continues, increasingly high resolution imaging sensors are being developed. HIRIS, the High Resolution Imaging Spectrometer, for example, will gather data simultaneously in 102 spectral bands in the 0.4 - 2.5 micrometer wavelength region at 30 m spatial resolution. AVIRIS, the Airborne Visible and Infrared Imaging Spectrometer, covers the 0.4 - 2.5 micrometer in 224 spectral bands. These sensors give more detailed and complex data for each picture element and greatly increase the dimensionality of data over past systems. In applying pattern recognition methods to remote sensing problems, an inherent limitation is that there is almost always only a small number of training samples with which to design the classifier. Both the growth in the dimensionality and the number of classes is likely to aggravate the already significant limitation of training samples. Thus ways must be found for future data analysis which can perform effectively in the face of large numbers of classes without unduly aggravating the limitations on training. A set of requirements for a valid list of classes for remote sensing data is that the classes must each be of informational value (i.e. useful in a pragmatic sense) and the classes be spectrally or otherwise separable (i.e., distinguishable based on the available data). Therefore, a means to simultaneously reconcile a property of the data (being separable) and a property of the application (informational value) is important in developing the new approach to classifier design. In this work we propose decision tree classifiers which have the potential to be more efficient and accurate in this situation of high dimensionality and large numbers of classes; In particular, we discuss three methods for designing a decision tree classifier, a top down approach, a bottom up approach, and a hybrid approach. Also, remote sensing systems which perform pattern recognition tasks on high dimensional data with small training sets require efficient methods for feature extraction and prediction of the optimal number of features to achieve minimum classification error. Three feature extraction techniques are implemented. Canonical and extended canonical techniques are mainly dependent upon the mean difference between two classes. An autocorrelation technique is dependent upon the correlation differences, The mathematical relationship between sample size, dimensionality, and risk value is derived. It is shown that the incremental error is simultaneously affected by two factors, dimensionality and separability. For predicting the optimal number of features, it is concluded that in a transformed coordinate space it is best to use the best one feature when only small numbers of samples are available. Empirical results indicate that a reasonable sample size is six to ten times the dimensionality.
Date of this Version