Evaluation of a system for glass: Genomic literature area sequence search

Siddharth Pandey, Purdue University

Abstract

Scientists studying organisms face a challenge to identify similar or previous works related to the organism. This is due to the presence of vast biomedical literature and the absence of an automated mechanism to search through the published literature for an organism. This study proposes the use of the GLASS (Genomic Literature Sequence Area Search) tool to search through the existing literature for relevant literature thus facilitating information reuse as well as providing quicker access to the information that could have been a manual time intensive task. The application searches through the organism's DNA sequence by indexing the literature content and then distributing the indexed data to provide a faster and robust search process. The study also evaluates the performance of a commodity cluster to facilitate the task of crawl, index and search for such kind of applications. The contributions of this work include the development of plugins to parse and index webpage metadata and extract DNA sequences from embedded text. Finally, the study reports on the length of the indexed sequence that can lead to quicker search results.

Degree

M.S.

Advisors

Springer, Purdue University.

Subject Area

Bioinformatics|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS