Date of Award
Master of Science (MS)
Computer and Information Technology
Committee Member 1
John A. Springer
Committee Member 2
This research proposes a new ftting algorithm of logistic regression on IRWLS that utilizes the procedure of scanning data row-by-row and has the ability to acquire an exact result with only a few iterations. Furthermore, this research also realizes the distributed parallelization of the proposed method on Spark and conducts various experiments to manifest its memory-wise advantage over the traditional methods such as Spark MLlib package. The results show that the proposed method can provide an exact result rather than an approximated one within 5 or 6 iterations; achieve a satisfying accuracy for fight delay prediction within 1 or 2 iterations; has a better potential for parallelization and a better performance than MLlib with a 3-4x faster speed without full optimizations; and its performance is not undermined by an increasing data memory ratio.
Wang, Mengyao, "Performance Enhancement of Logistic Regression for Big Data on Spark" (2018). Open Access Theses. 1471.