Efficient Fast Computing for U-Statistics via A-Optimal Subsampling

Mingao Yuan, Purdue University


U-statistics has been widely studied and used in both statistics and machine learning. One challenge in application of U-statistics is the intensively demanding computation. In this thesis, we propose subsampling method to fast compute U-statistics. Our work is fourfold: (1) we formally accommodate uniform subsampling to fast computing of U-statistics; (2) we propose A-optimal subsampling method, which outperforms uniform subsampling method in terms of MSE; (3) we provide a method to approximate the A-optimal subsampling probabilities, since the running time of the A-optimal subsampling probabilities is the same as the full sample U-statistics; (4) we get the limiting distribution of the subsampling estimator. Then we run simulations and employ two real datasets to assess the performance of the uniform subsampling and the A-optimal subsampling methods. Our simulation and real data result shows that the MSE of A-optimal subsampling estimator is significantly smaller that of the uniform subsampling estimator. And the A-optimal subsampling estimator takes much less computing time than the full sample U-statistics if the subsample size is not too large compared to the full sample size.




Peng, Purdue University.

Subject Area


Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server