Local regression models: Advancements, applications, and new methods

Ryan P Hafen, Purdue University

Abstract

Local regression methods model the relationship between an independent and dependent variable through weighted fitting of polynomials in local neighborhoods of the design space. A popular method, loess, is a local regression method with favorable statistical and computational properties. Loess modeling has been adapted to the modeling of time series data with deterministic seasonal and trend components with the STL method (seasonal trend decomposition using loess). The first part of this work deals with some enhancements to the STL method. The second part presents an application of STL to syndromic surveillance data. Many of the improvements to STL were motivated by this application. Finally, a new modeling approach to nonparametric density estimation method, called ed, is presented which uses local regression to obtain density estimates. Enhancements to the STL method presented in this work include support for local quadratic smoothing, missing values, and statistical inference. Also, a new method for improving smoothing at the endpoints is presented. This method, called blending, takes the endpoint fit to be a weighted average between the original endpoint fit and another fit of smaller degree. Guidelines are given for the blending parameters. Software with these enhancements is also described. Syndromic surveillance is the monitoring of public health data to detect and quantify unusual health events. Monitoring pre-diagnostic data, such as emergency department (ED) patient chief complaints, enables rapid detection of disease outbreaks. There are many sources of variation in such data; statistical methods need to accurately model them as a basis for timely and accurate disease outbreak methods. Methods for modeling daily chief complaint counts presented in this work are based on STL and were developed using data from the 76 EDs of the Indiana surveillance program from 2004 to 2008. Square root counts are decomposed into inter-annual, yearly-seasonal, day-of-the-week, and random-error components. Using this decomposition method, a new synoptic-scale (days to weeks) outbreak detection method was developed and evaluated by a simulation study, comparing detection performance to four well-known methods. The STL detection method performs very well and requires only 90 days of historical data to be put into operation. The visualization tools that accompany the decomposition and outbreak methods provide insight into patterns in the data. The ed method of density estimation for a univariate x takes a model building approach: an estimation method that can accurately fit many density patterns in data, and leads to diagnostic visual displays for checking fits for different values of the tuning parameters. The two-step estimator begins with a small-bandwidth balloon density estimate, which is the inverse of the distance of x to the κ-th nearest neighbor. Next, loess is used to smooth the log of the balloon estimate as a function of x. This turns nonparametric density estimation into nonparametric regression estimation, allowing the full power of regression diagnostics, model selection, statistical inference, and computational methods. The ed method is straightforward. It deals well with problems encountered by other methods such as mis-fitting peaks and valleys, and it allows for simple identification and fitting of features such as discontinuities and boundary cut-offs all within the same framework.

Degree

Ph.D.

Advisors

Cleveland, Purdue University.

Subject Area

Statistics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS