Exploring applications of suboptimal alignments in threading protein structure prediction

Hao Chen, Purdue University

Abstract

Protein structures provide keys to deep understanding of biological problems and applicable research. When experimental structures are unavailable, threading can provide a rough structure prediction for biologists to accelerate the pace of their research. However, current threading confronts two challenges: (1) offering the error estimation for predicted structures; (2) improving the prediction accuracy of threading. The objectives of my Ph.D. project are therefore to develop some new techniques that address these challenges. To achieve our objectives, we introduce suboptimal alignments into threading. This idea is rarely explored in previous studies and offers possibilities to the birth of the following new techniques: (1) To predict the threading error: The SPAD score is proposed to quantify the diversity of suboptimal alignments in threading. Then we measure the SPAD scores and the errors of 5232 threading predictions made on the L-E dataset. These data show that logarithms of SPAD scores are linearly correlated with those of threading errors at global and local levels. Seven other error-indicating parameters are collected from the same set of predictions and head-to-head compared with SPAD scores. The comparison indicates that SPAD scores are the best index among these parameters to predict threading errors since it has the highest correlation coefficient with prediction errors. We conduct a regression analysis to derive a quantitative relationship between SPAD scores and threading errors. With this relationship, we predicted the errors of 383 CASP threading predictions. The predicted errors match the actual errors well at both global and local levels. (2) To improve the threading accuracy: (i) We propose the reranking strategy and the probabilistic contact strategy to consider two-body contact potentials in threading. The benchmarking on the SALIGN dataset and the L-E dataset shows that these two strategies improve the template recognition accuracy and the alignment accuracy of threading. (ii) We use the optimal and suboptimal alignments, rather than the optimal alignment alone, to build 3D predicted structures. This technique reduces the RMSD of predicted structures, according to the test of CASP7 targets. (iii) We combine SPAD scores and Z-scores for template recognition, which improves the recognition accuracy on the L-E dataset.

Degree

Ph.D.

Advisors

Kihara, Purdue University.

Subject Area

Bioinformatics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS