Decision support for information indexing and retrieval: Implications for hypertext systems

Wenli Zhu, Purdue University

Abstract

In this study we propose statistical models to model the indexing of textual documents by human indexers in a hypertext environment. Previous research revealed that hypertext links connecting text indexed with the same subject index term assigned by a human indexer offer users an effective way to access information due to the structure, focus, and imbedded intelligence of a manually assigned subject index. We propose that for a document collection with an existing well developed subject index, the contingent relationship between subject index terms and words in the text can be used by a Bayesian inference rule in predicting which subject index term is relevant given the words occurring in a document. It was hypothesized that the indexing models (1) will be able to predict what index terms a human indexer will use, (2) serve as decision aids, and (3) will help users in information retrieval tasks. This problem area is approached by (1) direct comparison between the index terms assigned by the indexing models and those assigned by a human indexer The overlap between the two sets of index terms was measured by the proportion of index terms in the intersection for each set. (2) expert ratings on the relevance of index terms assigned by different means. The ratings as well as the quality of an index term (determined by the magnitude of the rating) were statistically analyzed. Potential errors made by the human indexer and the indexing models were estimated. (3) a controlled laboratory experiment in which users' performance in information retrieval tasks using different indices was evaluated. Results obtained revealed that (1) a significant overlap exists between the index terms assigned by the indexing models and those by a human indexer, (2) index terms suggested by the indexing models received significantly higher ratings than index terms randomly selected and the estimated number of errors made by the indexing models and the human indexer were similar, and (3) users performed better or equally well in information retrieval tasks using the subject index assigned by the indexing models than when using the index prepared by the human indexer.

Degree

Ph.D.

Advisors

Lehto, Purdue University.

Subject Area

Industrial engineering|Library science|Information Systems

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS