Date of Award
12-2017
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical and Computer Engineering
Committee Chair
Charles A. Bouman
Committee Member 1
Jan P. Allebach
Committee Member 2
Mary L. Comer
Committee Member 3
Mireille Boutin
Abstract
In this dissertation, we introduce a set of algorithms for document image processing, which are in the research area of color clustering and binarization. Color quantization algorithms are used to select a small number of colors that can accurately represent the content of a particular image. In this research, we introduce a novel color quantization algorithm which is based on the minimization of a modifed Lp norm rather than the more traditional L2 norm associated with mean square error (MSE) [1]. We demonstrate that the Lp optimization approach has two advantages. First, it produces more accurate perceived quality results, especially for important colors in small regions; and second, the norm’s value can be used as an e ective criterion for selecting the minimum number of colors necessary to achieve accurate representation of the image. Binarization algorithms are used to create a binary representation of a raster document image, typically with the intent of identifying text and separating it from background content. In this work, we propose a binarization algorithm via one-pass local classifcation [2]. The algorithm frst generates the initial binarization results by local thresholding, then corrects the results using a one-pass local classifcation strategy, followed by the process of component inversion. The experimental results demonstrate that our algorithm achieves a much lower binarization error rate than other popular binarization/thresholding algorithms. It is also demonstrated that the proposed algorithm achieves a somewhat lower binarization error rate than the state-of-the-art algorithm COS [3], while requiring signifcantly less computation.
Recommended Citation
Xue, Haitao, "Clustering and Segmentation with Application in Document Image Processing" (2017). Open Access Dissertations. 1663.
https://docs.lib.purdue.edu/open_access_dissertations/1663