Date of Award
8-2018
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Mathematics
Committee Chair
Hanxiang Peng
Committee Member 1
Benzion Boukai
Committee Member 2
Guang Lin
Committee Member 3
Zhongmin Shen
Committee Member 4
Fei Tan
Abstract
U-statistics has been widely studied and used in both statistics and machine learning. One challenge in application of U-statistics is the intensively demanding computation. In this thesis, we propose subsampling method to fast compute U-statistics. Our work is fourfold: (1) we formally accommodate uniform subsampling to fast computing of U-statistics; (2) we propose A-optimal subsampling method, which outperforms uniform subsampling method in terms of MSE; (3) we provide a method to approximate the A-optimal subsampling probabilities, since the running time of the A-optimal subsampling probabilities is the same as the full sample U-statistics; (4) we get the limiting distribution of the subsampling estimator. Then we run simulations and employ two real datasets to assess the performance of the uniform subsampling and the A-optimal subsampling methods. Our simulation and real data result shows that the MSE of A-optimal subsampling estimator is significantly smaller that of the uniform subsampling estimator. And the A-optimal subsampling estimator takes much less computing time than the full sample U-statistics if the subsample size is not too large compared to the full sample size.
Recommended Citation
Yuan, Mingao, "Efficient Fast Computing For U-statistics Via A-optimal Subsampling" (2018). Open Access Dissertations. 2112.
https://docs.lib.purdue.edu/open_access_dissertations/2112