Biomedical literature mining with transitive closure and maximum network flow

Andrew P Hoblitzell, Purdue University

Abstract

The biological literature is a huge and constantly increasing source of information which the biologist may consult for information about their field, but the vast amount of data can sometimes become overwhelming. Medline, which makes a great amount of biological journal data available online, makes the development of automated text mining systems and hence "data-driven discovery" possible. This thesis examines current work in the field of text mining and biological literature, and then aims to mine documents pertaining to bone biology. The documents are retrieved from PubMed, and then direct associations between the terms are computers. Potentially novel transitive associations among biological objects are then discovered using the transitive closure algorithm and the maximum flow algorithm. The thesis discusses in detail the extraction of biological objects from the collected documents and the co-occurrence based text mining algorithm, the transitive closure algorithm, and the maximum network flow which were then run to extract the potentially novel biological associations. Generated hypotheses (novel associations) were assigned with significance scores for further validation by a bone biologist expert. Extension of the work in to hypergraphs for enhanced meaning and accuracy is also examined in the thesis.

Degree

M.S.

Advisors

Mukhopadhyay, Purdue University.

Subject Area

Bioinformatics|Information science|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS