Signal enhancement and data mining for chemical and biological samples using mass spectrometry

Yuezhi Du, Purdue University

Abstract

Mass spectrometry has been actively involved in the areas of healthcare, pharmaceutics, environmental analysis, food industry and forensics due to its ability to provide molecular information at trace levels. Recently, because of the complexity of chemical and biological samples, computer-assisted mass spectra analysis, including signal enhancement, statistics and machine learning, has been drawn more and more attention especially for researches in biomarker identification, sample classification and omics-related areas where high volume of data is generated. Typically, mass spectra analysis follows two steps. Firstly, signal enhancement is performed to systematically filter out the background noise and enhance the detected signals. Secondly, data mining is used to extract the meaningful signals in the mass spectra. Depending on the mechanisms of mass spectrometry and nature of samples, different methods in signal enhancement and data mining are developed to address the needs. Image current measurement followed by Fourier transform is a non-destructive mass analysis method and has been widely used for Fourier transform ion cyclotron resonance, Orbitrap mass spectrometers and recently quadrupole ion traps. The phase between the ion excitation and the image current measurement typically needs to be well controlled for obtaining high quality spectra. In this thesis, a data processing method based on self-correlation (SC) function has been explored for signal enhancement with image current data recorded at random phases. The simple algorithm of the SC method was introduced and a series of data used for demonstrations was simulated based on a previous study on non-destructive mass analysis using an ion trap. A significant improvement has been achieved in the signal-to-noise ratio (SNR) as well as in the accuracy of the peak ratio. The efficiency of using a mask data set for selected ion monitoring has also been demonstrated. In recent researches in chemical and biological studies, biomarker profiling using mass spectrometry plays an essential role in biological studies and is high dependent on the data analysis for sample classification. In this thesis, power normalization of the mass spectra has been proposed as a method of altering the weights of peaks at different intensity levels. In combination of the supporting vector machine method, its impact on the sample classification has been characterized using the data in four studies previously reported for distinguishing anomeric configurations of sugars, types of bacteria, stages of melanoma and types of breast cancer. Comprehensive analysis of the data with normalization at different power normalization index (PNI) was developed with analysis tools, including error-PNI plots, reference profiles and error source profiles, to assess the analytical method as well as to find the proper approach to classify the samples involved in the study.

Degree

Ph.D.

Advisors

Zheng, Purdue University.

Subject Area

Biomedical engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS