Study on Preprocessing and Classifying Mass Spectral Raw Data Concerning Human Normal and Disease Cases
Mass spectrometry is becoming an important tool in biological sciences. Tissue samples or easily obtained biological fluids (serum, plasma, urine) are analysed by a variety of mass spectrometry methods, producing spectra characterized by very high dimensionality and a high level of noise. Here we address a feature exraction method for mass spectra which consists of two main steps : In the first step an algorithm for low level preprocessing of mass spectra is applied, including denoising with the Shift-Invariant Discrete Wavelet Transform (SIDWT), smoothing, baseline correction, peak detection and normalization of the resulting peak-lists. After this step, we claim to have reduced dimensionality and redundancy of the initial mass spectra representation while keeping all the meaningful features (potential biomarkers) required for disease related proteomic patterns to be identified. In the second step, the peak-lists are alligned and fed to a Support Vector Machine (SVM) which classifies the mass spectra. This procedure was applied to SELDI-QqTOF spectral data collected from normal and ovarian cancer serum samples. The classification performance was assessed for distinct values of the parameters involved in the feature extraction pipeline. The method described here for low-level preprocessing of mass spectra results in 98.3% sensitivity, 98.3% specificity and an AUC (Area Under Curve) of 0.981 in spectra classification.
Keywordsovarian cancer mass spectra preprocessing biomarkers feature extraction early diagnosis classification
Unable to display preview. Download preview PDF.
- 1.Coombes, K.R., Tsavachidis, S., Morris, J.S., Baggerly, K.A., Hung, M.-C., Kuerer, H.M.: Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from Surface-Enhanced Laser Desorption and Ionization by Denoising Spectra with the Undecimated Discrete Wavelet Transform. Proteomics 5(16), 4107–4117 (2005)CrossRefGoogle Scholar
- 2.Kalousis, A., Prados, J., Rexhepaj, E., Hilario, M.: Feature extraction from mass spectral data for the classification of pathological states. In: Principles of Data Mining and Knowledge Discoverty, Ninth European Conference. Springer, Heidelberg (2005)Google Scholar
- 6.Qu, Y., Adam, B.I., Thornquist, M., Potter, J.D., Thompson, M.L., Yasui, Y., Davis, J., Schellhammer, P.F., Cazares, L., Clements, M., Wright, G.L., Feng, Z.: Data reduction using a discrete wavelet transform in discriminant analysis of very high dimensional data. Biometrics 59, 143–151 (2003)CrossRefMathSciNetzbMATHGoogle Scholar
- 7.Lee, K.R., Lin, X., Park, D., Eslava, S.: Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method. Proteomics 3 (2003)Google Scholar
- 8.Conrads, T.P., Fusaro, V.A., Ross, S., Johann, D., Rajapakse, V., Hitt, B.A., Steinberg, S.M., Kohn, E.C., Fishman, D.A., Whitely, G., Barrett, J.C., Liotta, L.A., Petricoin III, E.F., Veenstra, T.D.: High-resolution serum proteomic features for ovarian cancer detection. Endocrine-Related Cancer 11, 163–178 (2004)CrossRefGoogle Scholar
- 9.Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., Wells Jr., R.O.: Nonlinear processing of a shift invariant DWT for noise reduction. In: Mathematical Imaging: Wavelet Applications for Dual Use, SPIE Proceedings, Orlando FL, vol. 2491 (1995)Google Scholar
- 14.Andrade, L., Manolakos, E.: Signal Background Estimation and Baseline Correction Algorithms for Accurate DNA Sequencing. Journal of VLSI, special issue on Bioinformatics 35(3), 229–243 (2003)Google Scholar
- 15.Alfassi Zeen, B.: On the normalization of a mass spectrum for comparison of two spectra (2004)Google Scholar
- 17.Ovarian Cancer DataSet, http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp
- 18.Rice Wavelet Toolbox Licence, http://www.dsp.rice.edu/software/RWT/LICENSE