Efficient Model-Based Clustering for LC-MS Data

  • Marta Łuksza
  • Bogusław Kluge
  • Jerzy Ostrowski
  • Jakub Karczmarski
  • Anna Gambin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4175)


Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. The issue of high-throughput data processing is therefore becoming more and more significant. The problems of data imperfectness, presence of noise and of various errors introduced during experiments arise.

In this paper we focus on the peak alignment problem. As an alternative to heuristic based approaches to aligning peaks from different mass spectra we propose a mathematically sound method which exploits the model-based approach. In this framework experiment errors are modeled as deviations from real values and mass spectra are regarded as finite Gaussian mixtures. The advantage of such an approach is that it provides convenient techniques for adjusting parameters and selecting solutions of best quality. The method can be parameterized by assuming various constraints. In this paper we investigate different classes of models and select the most suitable one. We analyze the results in terms of statistically significant biomarkers that can be identified after alignment of spectra.


Feature Selection False Discovery Rate Liquid Chromatography Mass Spectrometry Monoisotopic Peak DBSCAN Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aebersold, R., Mann, M.: Mass-spectrometry based proteomics. Nature 422, 198–207 (2003)CrossRefGoogle Scholar
  2. 2.
    Tibshirani, R., Hastie, T., Narasimhan, B., Soltys, S., Shi, G., Koong, A., Le, Q.T.: Sample classification from protein mass spectrometry, by peak probability contrasts. Bioinformatics 20, 3034–3044 (2004)CrossRefGoogle Scholar
  3. 3.
    Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., Zhao, H.: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)CrossRefGoogle Scholar
  4. 4.
    Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T.A., Hill, L.R., Norton, S., Kumar, P., Anderle, M., Becker, C.H.: Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Analytical Chemistry 75, 4818–4826 (2003)CrossRefGoogle Scholar
  5. 5.
    Wong, J.W.H., Cagney, G., Cartwright, H.M.: SpecAlign. processing and alignment of mass spectra datasets. Bioinformatics 21, 2088–2090 (2005)CrossRefGoogle Scholar
  6. 6.
    Prakash, A., Mallick, P., Whiteaker, J., Zhang, H., Paulovich, A., Flory, M., Lee, H., Aebersold, R., Schwikowski, B.: Signal maps for mass spectrometry-based comparative proteomics. Molecular and Cellular Proteomics 5, 423–432 (2006)CrossRefGoogle Scholar
  7. 7.
    Smith, C.A., Want, E.J., O’Maille, G., Abagyan, R., Siuzdak, G.: XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry 78, 779–787 (2006)CrossRefGoogle Scholar
  8. 8.
    Fraley, C., Raftery, A.E.: How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal 41, 578–588 (1998)MATHCrossRefGoogle Scholar
  9. 9.
    Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)CrossRefGoogle Scholar
  10. 10.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)Google Scholar
  11. 11.
    Gambin, A., Dutkowski, J., Karczmarski, J., Kluge, B., Kowalczyk, K., Ostrowski, J., Poznański, J., Tiuryn, J., Bakun, M., Dadlez, M.: Automated reduction and interpretation of multidimensional ms data for analysis of complex peptide mixtures. International Journal of Mass Spectrometry (in press, 2006)Google Scholar
  12. 12.
    Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statististical Society, Series B, 1–38 (1977)Google Scholar
  13. 13.
    Petersen, K.B.: On the slow convergence of EM and VBEM in low-noise linear models. Neural Computation 17, 1921–1926 (2005)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Fraley, C., Raftery, A.E.: MCLUST: Software for model-based clustering, density estimation and discriminant. Technical Report 415R, University of Washington, Department of Statistics (2002)Google Scholar
  16. 16.
    Haughton, D.M.A.: On the choice of a model to fit data from an exponential family. The Annals of Statistics 16, 342–355 (1988)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Breiman, L.: Random forests. Machine learning 45, 5–32 (2001)MATHCrossRefGoogle Scholar
  19. 19.
    Storey, J., Tibshirani, R.: Statistical significance for genomewide studies. PNAS 100, 9440–9445 (2003)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Delaglio, F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J., Bax, A.: Nmrpipe: a multidimensional spectral processing system based on unix pipes. J. Biomol. NMR 6, 277–293 (1995)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Marta Łuksza
    • 1
  • Bogusław Kluge
    • 1
  • Jerzy Ostrowski
    • 2
  • Jakub Karczmarski
    • 2
  • Anna Gambin
    • 1
  1. 1.Institute of InformaticsWarsaw UniversityWarsawPoland
  2. 2.Department of GastroenterologyMedical Center for Postgraduate Education and Maria Sklodowska-Curie Memorial Cancer Center and Institute of OncologyWarsawPoland

Personalised recommendations