Knowledge and Information Systems

, Volume 25, Issue 2, pp 327–343 | Cite as

Evolving trees for the retrieval of mass spectrometry-based bacteria fingerprints

  • Stephan Simmuteit
  • Frank-Michael SchleifEmail author
  • Thomas Villmann
  • Barbara Hammer
Regular Paper


In this paper, we investigate the application of Evolving Trees (ET) for the analysis of mass spectrometric data of bacteria. Evolving Trees are extensions of self-organizing maps (SOMs) developed for hierarchical classification systems. Therefore, they are well suited for taxonomic problems such as the identification of bacteria. Here, we focus on three topics, an appropriate pre-processing and encoding of the spectra, an adequate data model by means of a hierarchical Evolving Tree and an interpretable visualization. First, the high dimensionality of the data is reduced by a compact representation. Here, we employ sparse coding, specifically tailored for the processing of mass spectra. In the second step, the topographic information which is expected in the fingerprints is used for advanced tree evaluation and analysis. We adapted the original topographic product for SOMs for ET to achieve a judgment of topography. Additionally we transferred the concept of U-matrix for evaluation of the separability of SOMs to their analog in ET. We demonstrate these extensions for two mass spectrometric data sets of bacteria fingerprints and show their classification and evaluation capabilities in comparison to state of the art techniques.


Evolving tree Sparse coding Mass spectrometry Bacteria identification Prototype learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barbuddhe SB, Maier T, Schwarz G, Kostrzewa M, Hof H, Domann E, Chakraborty T, Hain T (2008) Rapid identification and typing of listeria species by matrix-assisted laser desorption ionization-time of flight mass spectrometry. Appl Environ Microbiol 74(17): 5402–5407CrossRefGoogle Scholar
  2. 2.
    Bauer H-U, Herrmann M, Villmann T (1999) Neural maps and topographic vector quantization. Neural Netw 12(4–5): 659–676CrossRefGoogle Scholar
  3. 3.
    Bauer H-U, Pawelzik KR (1992) Quantifying the neighborhood preservation of self-organizing feature maps. IEEE Trans Neural Netw 3(4): 570–579CrossRefGoogle Scholar
  4. 4.
    Bauer H-U, Villmann T (1997) Growing a hypercubical output space in a self-organizing feature map. IEEE Trans Neural Netw 8(2): 218–226CrossRefGoogle Scholar
  5. 5.
    Bruker Daltonik GmbH (2008) Bruker BioTyper 2.0. Available on
  6. 6.
    Bruker Daltonik GmbH (2008) Bruker BioTyper 2.0, User manual. Available on
  7. 7.
    Bruker Daltonik GmbH (2008) Bruker listeria and vibrio spectra. Available on (Dr. Markus Kostrzewa), Personal Communication
  8. 8.
    Chaoji V, Al Hasan M, Salem S, Zaki MJ (2009) Sparcl: an effective and efficient algorithm for mining arbitrary shape-based clusters. Knowl Inf Syst (in press)Google Scholar
  9. 9.
    Cottrell M, Hammer B, Hasenfuss A, Villmann T (2006) Batch and median neural gas. Neural Netw 19: 762–771CrossRefzbMATHGoogle Scholar
  10. 10.
    Forero MG, Sroubek F, Cristobal G (2004) Identification of tuberculosis bacteria based on shape and color. Real-time Imaging 10(4): 251–262CrossRefGoogle Scholar
  11. 11.
    Guyon I (2006) Feature extraction. Foundations and applications. Springer, BerlinCrossRefzbMATHGoogle Scholar
  12. 12.
    Hammer B, Hasenfuss A (2007) Relational neural gas. In: Künstliche Intelligenz 2007, Lecture Notes in Computer Science (LNAI), Springer, Heidelberg, pp 190–204Google Scholar
  13. 13.
    Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84: 502–516CrossRefMathSciNetzbMATHGoogle Scholar
  14. 14.
    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New YorkzbMATHGoogle Scholar
  15. 15.
    Hollemeyer K, Altmeyer W, Heinzle E, Pitra C (2008) Species identification of oetzis clothing with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry based on peptide pattern similarities of hair digests. Rapid Commun Mass Spectr 22: 2751–2767CrossRefGoogle Scholar
  16. 16.
    Hsieh S-Y, Tseng C-L, Lee Y-S (2008) Highly efficient classification and identification of human pathogenic bacteria by MALDI-TOF-MS. Mol Cell Proteomics 7(2): 448–456Google Scholar
  17. 17.
    Hu A, Lo AA, Chen CT, Lin KC, Ho YP (2007) Identifying bacterial species using CE-MS and SEQUEST with an empirical scoring function. Electrophoresis 28(9): 1387–1392CrossRefGoogle Scholar
  18. 18.
    Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1): 95–116CrossRefGoogle Scholar
  19. 19.
    Keys CJ, Dare DJ, Sutton H, Wells G, Lunt M, McKenna T, McDowall M, Shah HN (2004) Compilation of a MALDI-TOF mass spectral database for the rapid screening and characterisation of bacteria implicated in human infectious diseases. Infect Genet Evol 4(3): 221–242CrossRefGoogle Scholar
  20. 20.
    Khatun J, Hamlett E, Giddings MC (2008) Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification. Bioinformatics 24(5): 674–681CrossRefGoogle Scholar
  21. 21.
    Kohonen T (1995) Self-organizing maps. Springer Series in Information Sciences, vol 30 (2nd extended edn). Springer, Berlin (1997)Google Scholar
  22. 22.
    Kostrzewa M (2009) Efficieny of MS + BioTyper based bacteria identification for the clinical market. Personal CommunicationGoogle Scholar
  23. 23.
    Kushner HJ, Clark DS (1978) Stochastic appproximation methods for constrained and unconstrained systems. Springer, New YorkGoogle Scholar
  24. 24.
    Labusch K, Barth E, Martinetz T (2008) Learning data representations with sparse coding neural gas. In: Verleysen M (ed) Proceedings of the European symposium on artificial neural networks ESANN. d-side publications, Evere, pp 233–238Google Scholar
  25. 25.
    Labusch K, Barth E, Martinetz T (2009) Sparse coding neural gas: learning of overcomplete data representations. Neurocomputing 72: 1547–1555CrossRefGoogle Scholar
  26. 26.
    Liebler DC (2002) Introduction to proteomics. Humana Press, New JerseyGoogle Scholar
  27. 27.
    Martinetz TM, Berkovich SG, Schulten KJ (1993) ’Neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Trans Neural Netw 4(4): 558–569CrossRefGoogle Scholar
  28. 28.
    Mathworks (2008) MATLAB statistics-toolbox. Accessed on
  29. 29.
    Mazzeo MF, Sorrentino A, Gaita M, Cacace G, Di Stasio M, Facchiano A, Comi G, Malorni A, Siciliano RA (2006) Matrix-assisted laser desorption ionization-time of flight mass spectrometry for the discrimination of food-borne microorganisms. Appl Environ Microbiol 72(2): 1180–1189CrossRefGoogle Scholar
  30. 30.
    Oja E (1989) Neural networks, principle components and subspaces. Int J Neural Syst 1: 61–68CrossRefMathSciNetGoogle Scholar
  31. 31.
    Olshausen BA, Finch DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381: 607–609CrossRefGoogle Scholar
  32. 32.
    Pakkanen J, Iivarinen J, Oja E (2004) The evolving tree—a novel self-organizing network for data analysis. Neural Process Lett 20(3): 199–211CrossRefGoogle Scholar
  33. 33.
    Pfitzner D, Leibbrandt R, Powers R (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19(3): 361–394CrossRefGoogle Scholar
  34. 34.
    Saha S, Bandyopadhyay S (2009) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst (in press)Google Scholar
  35. 35.
    Schleif F-M, Lindemann M, Maass P, Diaz M, Decker J, Elssner T, Kuhn M, Thiele H (2008) Support vector classification of proteomic profile spectra based on feature extraction with the bi-orthogonal discrete wavelet transform. Comput Vis Sci. doi: 10.1007/s00791-008-0087-z
  36. 36.
    Schleif F-M, Villmann T, Kostrzewa M, Hammer B, Gammerman A (2008) Cancer informatics by prototype networks in mass spectrometry. Artif Intell Med. page PMID:18778925Google Scholar
  37. 37.
    Schmid O, Ball G, Lancashire L, Culak R, Shah H (2005) New approaches to identification of bacterial pathogens by surface enhanced laser desorption/ionization time of flight mass spectrometry in concert with artificial neural networks, with special reference to Neisseria gonorrhoeae. J Med Microbiol 54: 1205–1211CrossRefGoogle Scholar
  38. 38.
    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeGoogle Scholar
  39. 39.
    Simmuteit S (2008) Effizientes Retrieval aus Massenspektrometriedatenbanken, Diplomarbeit, Technische Universität Clausthal, February 2008Google Scholar
  40. 40.
    Ultsch A, Siemon HP (1990) Kohonen’s self organizing feature maps for exploratory data analysis. In: Proceedings of the INNC’90, international neural network conference, The Netherlands. Kluwer, Dordrecht, pp 305–308Google Scholar
  41. 41.
    Valentine N, Wunschel S, Wunschel D, Petersen C, Wahl K (2005) Effect of culture conditions on microorganism identification by matrix-assisted laser desorption ionization mass spectrometry. Appl Environ Microbiol 71(1): 58–64CrossRefGoogle Scholar
  42. 42.
    Villmann T, Claussen J-C (2006) Magnification control in self-organizing maps and neural gas. Neural Comput 18(2): 446–469CrossRefMathSciNetzbMATHGoogle Scholar
  43. 43.
    Villmann T, Der R, Herrmann M, Martinetz T (1997) Topology preservation in self-organizing feature maps: exact definition and measurement. IEEE Trans Neural Netw 8(2): 256–266CrossRefGoogle Scholar
  44. 44.
    Villmann T, Schleif F-M, Hammer B, Kostrzewa M (2008) Exploration of mass-spectrometric data in clinical proteomics using learning vector quantization methods. Briefing Bioinf 9(2): 129–143CrossRefGoogle Scholar
  45. 45.
    Wilkes JG, Glover KL, Holcomb M (2002) Defining and using microbial spectral databases. J Am Soc Mass Spectr 13(7): 875–887CrossRefGoogle Scholar
  46. 46.
    Zhang Z, Jackson GW, Fox GE, Willson RC (2006) Microbial identification by mass cataloging. BMC Bioinf 7: 117CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  • Stephan Simmuteit
    • 1
  • Frank-Michael Schleif
    • 1
    Email author
  • Thomas Villmann
    • 2
  • Barbara Hammer
    • 3
  1. 1.Medical Department, Computational Intelligence GroupUniversity LeipzigLeipzigGermany
  2. 2.Department of MPIUniversity of Applied Science MittweidaMittweidaGermany
  3. 3.Department of Computer Science, Computational Intelligence GroupClausthal UniversityClausthal-ZellerfeldGermany

Personalised recommendations