Abstract
U-statistics has been widely studied and used in both statistics and machine learning. One challenge in application of U-statistics is the intensively demanding computation. In this thesis, we propose subsampling method to fast compute U-statistics. Our work is fourfold: (1) we formally accommodate uniform subsampling to fast computing of U-statistics; (2) we propose A-optimal subsampling method, which outperforms uniform subsampling method in terms of MSE; (3) we provide a method to approximate the A-optimal subsampling probabilities, since the running time of the A-optimal subsampling probabilities is the same as the full sample U-statistics; (4) we get the limiting distribution of the subsampling estimator. Then we run simulations and employ two real datasets to assess the performance of the uniform subsampling and the A-optimal subsampling methods. Our simulation and real data result shows that the MSE of A-optimal subsampling estimator is significantly smaller that of the uniform subsampling estimator. And the A-optimal subsampling estimator takes much less computing time than the full sample U-statistics if the subsample size is not too large compared to the full sample size.
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Mathematics
Committee Chair
Hanxiang Peng
Date of Award
8-2018
Recommended Citation
Yuan, Mingao, "Efficient Fast Computing For U-statistics Via A-optimal Subsampling" (2018). Open Access Dissertations. 2112.
https://docs.lib.purdue.edu/open_access_dissertations/2112
Committee Member 1
Benzion Boukai
Committee Member 2
Guang Lin
Committee Member 3
Zhongmin Shen
Committee Member 4
Fei Tan