Abstract

This research proposes a new ftting algorithm of logistic regression on IRWLS that utilizes the procedure of scanning data row-by-row and has the ability to acquire an exact result with only a few iterations. Furthermore, this research also realizes the distributed parallelization of the proposed method on Spark and conducts various experiments to manifest its memory-wise advantage over the traditional methods such as Spark MLlib package. The results show that the proposed method can provide an exact result rather than an approximated one within 5 or 6 iterations; achieve a satisfying accuracy for fight delay prediction within 1 or 2 iterations; has a better potential for parallelization and a better performance than MLlib with a 3-4x faster speed without full optimizations; and its performance is not undermined by an increasing data memory ratio.

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer and Information Technology

Committee Chair

Baijian Yang

Date of Award

5-2018

Committee Member 1

John A. Springer

Committee Member 2

Tonglin Zhang

Share

COinS