MAXIMUM LIKELIHOOD DISCRIMINATION AND LOGISTIC REGRESSION
Abstract
Suppose that a random vector X comes from one of the two populations H(,0) and H(,1). An optimal classification rule is one which minimizes the probability of misclassification. The optimal rule can be estimated from either a random sample of size n or random samples of size n(,0) and n(,1) from H(,0) and H(,1) respectively. Let n(,0) = n(1 - (pi)*), n(,1) = n(pi)*, 0 < (pi)* < 1. Classification rules can be constructed using the discriminant function or the logistic regression approach. Problems are (1) to compare the two sampling schemes and (2) to determine the optimal value for (pi)*. To solve these problems one needs theorems on the consistency and asymptotic normality of the estimator of the parameter vector (beta) obtained by maximizing (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI) with respect to (beta). Here the g(,i)(y(,i),(beta))'s are some positive functionals, and the Y(,i)'s are independent but not necessarily identically distributed with distribution functions F(,i)((.),(theta)(,s),v). The special case of two normal populations differing in mean but not covariance is considered in detail. Criteria are given for determining which sampling scheme is more efficient. For both sampling schemes, the discriminant function estimation procedure is preferred. Under the stratified sampling scheme optimal choices for (pi)* are found using different criteria.
Degree
Ph.D.
Subject Area
Statistics
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.