Bayesian models and Markov chain Monte Carlo methods for protein motifs using secondary characteristics

Nak-Kyeong Kim, Purdue University


Statistical methods have been successfully used to analyze biological sequences. Identifying common local patterns, also called motifs, in multiple protein sequences plays an important role for establishing homology between proteins. Homology is easy to establish when sequences are similar (sharing an identity > 25%). However for distantly-related proteins, current available methods often fail to align motifs. We develop new probability models that utilize the secondary characteristics such as amino acid polarity and predicted secondary structures for profiling protein motifs. Bayesian models and Markov chain Monte Carlo methods are employed to estimate the model parameters, therefore to identify protein motifs in multiple sequences. The extra information brought by the secondary characteristics greatly increase the sensitivity of detecting common local patterns for a group of distantly-related proteins.




Xie, Purdue University.

Subject Area


Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server