Multiple Learning for Generalized Linear Models in Big Data

Xiang Liu, Purdue University

Abstract

Big data is an enabling technology in digital transformation. It perfectly complements ordinary linear models and generalized linear models, as training well-performed ordinary linear models and generalized linear models require huge amounts of data. With the help of big data, ordinary and generalized linear models can be well-trained and thus offer better services to human beings. However, there are still many challenges to address for training ordinary linear models and generalized linear models in big data. One of the most prominent challenges is the computational challenges. Computational challenges refer to the memory inflation and training inefficiency issues occurred when processing data and training models. Hundreds of algorithms were proposed by the experts to alleviate/overcome the memory inflation issues. However, the solutions obtained are locally optimal solutions. Additionally, most of the proposed algorithms require loading the dataset to RAM many times when updating the model parameters. If multiple model hyper-parameters needed to be computed and compared, e.g. ridge regression, parallel computing techniques are applied in practice. Thus, multiple learning with sufficient statistics arrays are proposed to tackle the memory inflation and training inefficiency issues.

Degree

Ph.D.

Advisors

Zhang, Purdue University.

Subject Area

Statistics|Artificial intelligence|Computer science|Higher education|Information Technology|Marketing|Mathematics|Web Studies

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS