Model-based identification and quantification of metabolites in 1H NMR spectra

Cheng Zheng, Purdue University

Abstract

Identification and quantification of metabolites are key to NMR-based metabolomics studies. However, existing methods for these purposes, such as binning and target profiling, require substantial expertise with the NMR spectrum for identification and yield crude approximation for quantification. In addition, replicate spectra from one treatment group are often processed individually. Such inadequate information collecting may reduce the accuracy of identification and quantification. We introduce two automatic workflows: one for identification of metabolites and the other one for both identification and quantification of metabolites. Because of the specific advantages and limitations of these two methods, they are meant to be used under different problem settings and can be properly applied in combination with other existing procedures. The first one is an untargeted, unsupervised database-supported workflow for automatic identification of metabolites and it is called statistical total correlation spectroscopy guided clustering and curve-fitting. It uses the combination of global clustering based on statistical total correlation spectroscopy (STOCSY) and local curve-fitting. Based on experimental NMR datasets, SGCC outperforms three other automatic methods that we have tested. SGCC can be regarded as a more detailed extension of STOCSY. Not only can it replace the original STOCSY normally used in conjunction with multivariate statistical analysis, but it may also function as a precursor step before metabolite quantification by various curve-fitting based methods, such as target profiling. The second inferential workflow is called Bayesian quantification ( BQuant). It is used for metabolite identification and quantification in local NMR spectrum regions based on Bayesian model selection in linear mixed models. The workflow takes as input a set of NMR spectra from the same local region and an existing NMR spectral database. It outputs a list of identified metabolites with their corresponding relative abundances. We show, using simulated and experimental datasets, that BQuant outperforms the currently available automated alternatives in accuracy of both identification and quantification of metabolites.

Degree

Ph.D.

Advisors

Vitek, Purdue University.

Subject Area

Statistics|Bioinformatics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS