Date of Award

12-2017

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical and Computer Engineering

Committee Chair

Charles A. Bouman

Committee Member 1

Jan P. Allebach

Committee Member 2

Mary L. Comer

Committee Member 3

Mireille Boutin

Abstract

In this dissertation, we introduce a set of algorithms for document image processing, which are in the research area of color clustering and binarization. Color quantization algorithms are used to select a small number of colors that can accurately represent the content of a particular image. In this research, we introduce a novel color quantization algorithm which is based on the minimization of a modifed Lp norm rather than the more traditional L2 norm associated with mean square error (MSE) [1]. We demonstrate that the Lp optimization approach has two advantages. First, it produces more accurate perceived quality results, especially for important colors in small regions; and second, the norm’s value can be used as an e ective criterion for selecting the minimum number of colors necessary to achieve accurate representation of the image. Binarization algorithms are used to create a binary representation of a raster document image, typically with the intent of identifying text and separating it from background content. In this work, we propose a binarization algorithm via one-pass local classifcation [2]. The algorithm frst generates the initial binarization results by local thresholding, then corrects the results using a one-pass local classifcation strategy, followed by the process of component inversion. The experimental results demonstrate that our algorithm achieves a much lower binarization error rate than other popular binarization/thresholding algorithms. It is also demonstrated that the proposed algorithm achieves a somewhat lower binarization error rate than the state-of-the-art algorithm COS [3], while requiring signifcantly less computation.

Share

COinS