A Parallel Computing Approach for Identifying Retinitis Pigmentosa Modifiers in Drosophila Using Eye Size and Gene Expression Data

Chawin Metah, Purdue University

Abstract

For many years, researchers have developed ways to diagnose degenerative disease in the retina by utilizing multiple gene analysis techniques. Retinitis pigmentosa (RP) disease can cause either partially or totally blindness in adults. For that reason, it is crucial to find a way to pinpoint the causes in order to develop a proper medication or treatment. One of the common methods is genome-wide analysis (GWA). However, it cannot fully identify the genes that are indirectly related to the changes in eye size. In this research, RNA sequencing (RNA-seq) analysis is used to link the phenotype to genotype, creating a pool of candidate genes that might associate with the RP. This will support future research in finding a therapy or treatment to cure such disease in human adults. Using the DrosophilaGenetic Reference Panel (DGRP) – a gene reference panel of fruit fly – two types of datasets are involved in this analysis: eye-size data and gene expression data with two replicates for each strain. This allows us to create a phenotype-genotype map. In other words, we are trying to trace the genes (genotype) that exhibit the RP disease guided by comparing their eye size (phenotype). The basic idea of the algorithm is to discover the best replicate combination that maximizes the correlation between gene expression and eye-size. Since there are 2N possible replicate combinations, where N is the number of selected strains, the original implementation of sequential algorithm was computationally intensive. The original idea of finding the best replicate combination was proposed by Nguyen et al. (2022). In this research, however, we restructured the algorithms to distribute the tasks of finding the best replicate combination and run them in parallel. The implementation was done using the R programming language, utilizing doParallel and foreach packages, and able to execute on a multicore machine. The program was tested on both a laptop and a server, and the experimental results showed an outstanding improvement in terms of the execution time. For instance, while using 32 processes, the results reported up to 95% reduction in execution time when compared with the sequential version of the code. Furthermore, with the increment of computational capabilities, we were able to explore and analyze more extreme eye-size lines using three eye-size datasets representing different phenotype models. This further improved the accuracy of the results where the top candidate genes from all cases showed connection to RP.

Degree

M.Sc.

Advisors

Khalifa, Purdue University.

Subject Area

Bioinformatics|Computer science|Genetics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS