Local likelihood modeling of the concept drift phenomenon
Abstract
Temporal text data is often generated by a time-changing process or distribution. Such a drift in the underlying distribution cannot be captured by stationary likelihood techniques. We consider the application of local likelihood methods to generative and conditional modeling of categorical temporal data such as time-stamped document sequences. The resulting model corresponds to a local n-gram model in the generative case and local logistic regression in the conditional case. We examine the asymptotic bias and variance of the local estimator and their implications to the optimal kernel bandwidth. We discuss various regularization schemes and methods to model periodicity. The proposed estimators are demonstrated using an experimental study on the Reuters RCV1 news data and AOL query logs.
Degree
Ph.D.
Advisors
Lebanon, Purdue University.
Subject Area
Statistics
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.