GMM-Based Molecular Serum Profiling Framework

  • Małgorzata Plechawska-WójcikEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 538)


The paper presents GMM-based molecular serum profiling framework dedicated to complete analyzing of Maldi-ToF mass spectrometry data. The presented Matlab-based framework is a comprehensive, self-adapting solution dedicated to different kind of spectra datasets. The process of mass spectrometry data analysis consists of several procedures, like data preparation, data pre-processing including baseline correction, detection of outliers and noise removal. The mean spectrum is calculated, modeled with GMM and decomposed using the Expectation-Maximization algorithm. In this process localization of the mean spectrum peaks is done with the dedicated adaptive procedure. Results of the mean spectrum decomposition in the subsequent step are applied into each single spectrum in the dataset in the form of Gaussian mask. The result is a data set ready for further statistical analysis.


Biomedical signal processing Gaussian mixture models Spectrometry data analyzing Biomedical data statistics 


  1. 1.
    Baggerly, K.A., Morris, J., Wang, J., Gold, D., Xiao, L.C., Coombes, K.R.: A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight proteomics spectra from serum samples. Proteomics 1667–1672 (2003)CrossRefGoogle Scholar
  2. 2.
    Barnhill, S., Vapnik, V., Guyon, I., Weston, J.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)CrossRefGoogle Scholar
  3. 3.
    Boster, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory, pp. 114–152 (1992)Google Scholar
  4. 4.
    Clyde, M.A., House, L.L., Wolpert, R.L. Nonparametric models for proteomic peak identification and quantification. ISDS Discussion Paper, 2006–2007 (2006)Google Scholar
  5. 5.
    Coombes, K., Baggerly, K., Morris, J.: Pre-processing mass spectrometry data. In: Dubitzky, W., et al. (eds.) Fundamentals of Data Mining in Genomics and Proteomics, pp. 79–99. Kluwer, New York (2007)CrossRefGoogle Scholar
  6. 6.
    Coombes, K.R., Koomen, J.M., Baggerly, K.A., et al.: Understanding the characteristics of mass spectrometry data through the use of simulation. Cancer Inform. 1, 41–52 (2005)CrossRefGoogle Scholar
  7. 7.
    Comon, P.: Independent component analysis – new concept? Sig. Proc. 36, 287–314 (1994)CrossRefGoogle Scholar
  8. 8.
    Fung, E.T., Enderwick, C.: Proteinchip clinical proteomics: computational challenges and solutions. Biotechniques 32(Suppl 1), 34–41 (2002)CrossRefGoogle Scholar
  9. 9.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Dijkstra, M., Roelofsen, H., Vonk, R., Jansen, R.: Peak quantification in surface-enhanced laser desorption/ionization by using mixture models. Proteomics 6, 5106–5116 (2006)CrossRefGoogle Scholar
  11. 11.
    Du, P., Kibbe, W., Lin, S.: Improved peak detection in mass spectrum by incorporating continuos wavelet transform-based pattern matching. Genome Anal. 22, 2059–2065 (2006)Google Scholar
  12. 12.
    Gentzel, M., Kocher, T., Ponnusamy, S., Wilm, M.: Preprocessing of tandem mass spectrometric data to support automatic protein identyfication. Proteomics 3, 1597–1610 (2003)CrossRefGoogle Scholar
  13. 13.
    Gyaourova, A., Kamath, C., Fodor, I.K.: Undecimated wavelet transforms for image de-noising. Technical Report UCRL-ID-150931, Lawrence Livermore National Laboratory, Livermore, CA (2002)Google Scholar
  14. 14.
    Hubert, M., Van der Veeken, S.: Outlier detection for skewed data. J. Chemometrics 22, 235–246 (2008)CrossRefGoogle Scholar
  15. 15.
    Jutten, C., Herault, J.: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Sig. Process. 24, 1–10 (1991)CrossRefGoogle Scholar
  16. 16.
    Kempka, M., Sjodahl, J., Bjork, A., Roeraade, J.: Improved method for peak picking in matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom. 18, 1208–1212 (2004)CrossRefGoogle Scholar
  17. 17.
    Koziel, G.: Fourier transform based methods in sound steganography. Actual Probl. Econ. 6(120), 321–328 (2011)Google Scholar
  18. 18.
    Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., Well Jr, R.O.: Noise reduction using an undecimated discrete wavelet transform. IEEE Sig. Process. Lett. 3, 10–12 (1996)CrossRefGoogle Scholar
  19. 19.
    Mantini, D., Petrucci, F., Del Boccio, P., et al.: Independent component analysis for the extraction of reliable protein signal profiles from Maldi-ToF mass spectra. Bioinformatics 24, 63–70 (2008)CrossRefGoogle Scholar
  20. 20.
    Miłosz, M.: Performance testing of new enterprise applications using legacy load data: a HIS case study. In: ICEIS 2013 - 15th International Conference on Enterprise Information Systems, pp. 269–274 (2013)Google Scholar
  21. 21.
    Morris, J., Coombes, K., Kooman, J., Baggerly, K., Kobayashi, R.: Feature extraction and quantification for mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)CrossRefGoogle Scholar
  22. 22.
    Pietrowska, M., Marczak, L., Polanska, J., Behrendt, K., Nowicka, E., Walaszczyk, A., Widlak, P.: Mass spectrometry-based serum proteome pattern analysis in molecular diagnostics of early stage breast cancer. J. Transl. Med. 7(60.10), 1186 (2009)Google Scholar
  23. 23.
    Polanska, J., Plechawska, M., Pietrowska, M., Marczak, L.: Gaussian mixture decomposition in the analysis of MALDI-TOF spectra. Expert Syst. 29(3), 216–231 (2012)CrossRefGoogle Scholar
  24. 24.
    Plechawska, M., Polanska, J.: Simulation of the usage of Gaussian mixture models for the purpose of modelling virtual mass spectrometry data. In: MIE, pp. 804–808 (2009)Google Scholar
  25. 25.
    Plechawska, M., Polańska, J., Polański, A., Pietrowska, M., Tarnawski, R., Widlak, P., Stobiecki, M., Marczak, Ł.: Analyze of Maldi-TOF proteomic spectra with usage of mixture of gaussian distributions. In: Cyran, K.A., Kozielski, S., Peters, J.F., Stańczyk, U., Wakulicz-Deja, A. (eds.) Man-Machine Interactions. AISC, vol. 59, pp. 113–120. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  26. 26.
    Randolph, T., et al.: Quantifying peptide signal in MALDI-TOF mass spectrometry data. Mol. Cell. Proteomics MCP 4(12), 1990–1999 (2005)CrossRefGoogle Scholar
  27. 27.
    Tibshirani, R., Hastiey, T., Narasimhanz, B., Soltys, S., Shi, G., Koong, A., Le, Q.T.: Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20, 3034–3044 (2004)CrossRefGoogle Scholar
  28. 28.
    Tversky, A., Hutchinson, J.W.: Nearest neighbor analysis of psychological spaces. Psychol. Rev. 93(1), 3–22 (1993)CrossRefGoogle Scholar
  29. 29.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRefGoogle Scholar
  30. 30.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  31. 31.
    Windham, M.P., Cutler, A.: Information ratios for validating cluster analyses. J. Am. Stat. Assoc. 87, 1188–1192 (1993)CrossRefGoogle Scholar
  32. 32.
    Wold, H.: Estimation of principal components and related models by iterative least squares. Multivar. Anal. 391–420 (1966)Google Scholar
  33. 33.
    Yasui, Y., Pepe, M., Thompson, M.L., Adam, B.L., Wright, G.L., Qu, Y., Potter, J.D., Winget, M., Thornquist, M., Feng, Z.: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4(3), 449–463 (2003)CrossRefGoogle Scholar
  34. 34.
    Zhang S.Q., et al.: Peak detection with chemical noise removal using Short-Time FFT for a kind of MALDI Data. In: Proceedings of OSB 2007, Lecture Notes in Operations Research, vol. 7, pp. 222–231 (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Lublin University of TechnologyLublinPoland

Personalised recommendations