Date of Award
Fall 2014
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Statistics
First Advisor
William S. Cleveland
Second Advisor
Bowei Xi
Committee Chair
William S. Cleveland
Committee Co-Chair
Bowei Xi
Committee Member 1
Hao Zhang
Committee Member 2
Chuanhai Liu
Abstract
In this thesis multiple methods are proposed and applied to the Akamai CIDR time series data. The Akamai network is one of the world's largest distributed-computing platforms, with more than 250,000 servers in more than 80 countries. It is responsible for 15-20 percent of all web traffic. We obtained 110 GB raw CIDR data over a 18 month period, collected on the Akamai network from November 2011 to April 2013. ^ The Seasonal-Trend Decomposition procedure based on loess (STL+) is used to model the CIDR series. Motivated by the CIDR series analysis, we propose a general prediction based model selection procedure, where extensive visual diagnostics are part of the procedure for selecting the best performing model. Factorial experimental designs are used to explore the parameter space. We evaluate the performance of different models for the CIDR series using our proposed prediction based model selection procedure. Furthermore the analysis and modeling of the CIDR series is performed under the Divide and Recombine for large data framework. And we conduct a theoretical Divide and Recombine time series estimation study. ^ We also study the performance of Divide and Recombine estimates for Gaussian auto-regressive time series, Gaussian long range dependent series, and auto-regressive series with tails heavier than Gaussian.
Recommended Citation
Han, Xiang, "Divide and recombine: Autoregressive models and STL+" (2014). Open Access Dissertations. 281.
https://docs.lib.purdue.edu/open_access_dissertations/281