Skip to main content
Log in

Evolving trees for the retrieval of mass spectrometry-based bacteria fingerprints

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we investigate the application of Evolving Trees (ET) for the analysis of mass spectrometric data of bacteria. Evolving Trees are extensions of self-organizing maps (SOMs) developed for hierarchical classification systems. Therefore, they are well suited for taxonomic problems such as the identification of bacteria. Here, we focus on three topics, an appropriate pre-processing and encoding of the spectra, an adequate data model by means of a hierarchical Evolving Tree and an interpretable visualization. First, the high dimensionality of the data is reduced by a compact representation. Here, we employ sparse coding, specifically tailored for the processing of mass spectra. In the second step, the topographic information which is expected in the fingerprints is used for advanced tree evaluation and analysis. We adapted the original topographic product for SOMs for ET to achieve a judgment of topography. Additionally we transferred the concept of U-matrix for evaluation of the separability of SOMs to their analog in ET. We demonstrate these extensions for two mass spectrometric data sets of bacteria fingerprints and show their classification and evaluation capabilities in comparison to state of the art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Barbuddhe SB, Maier T, Schwarz G, Kostrzewa M, Hof H, Domann E, Chakraborty T, Hain T (2008) Rapid identification and typing of listeria species by matrix-assisted laser desorption ionization-time of flight mass spectrometry. Appl Environ Microbiol 74(17): 5402–5407

    Article  Google Scholar 

  2. Bauer H-U, Herrmann M, Villmann T (1999) Neural maps and topographic vector quantization. Neural Netw 12(4–5): 659–676

    Article  Google Scholar 

  3. Bauer H-U, Pawelzik KR (1992) Quantifying the neighborhood preservation of self-organizing feature maps. IEEE Trans Neural Netw 3(4): 570–579

    Article  Google Scholar 

  4. Bauer H-U, Villmann T (1997) Growing a hypercubical output space in a self-organizing feature map. IEEE Trans Neural Netw 8(2): 218–226

    Article  Google Scholar 

  5. Bruker Daltonik GmbH (2008) Bruker BioTyper 2.0. Available on http://www.bdal.de

  6. Bruker Daltonik GmbH (2008) Bruker BioTyper 2.0, User manual. Available on http://www.bdal.de

  7. Bruker Daltonik GmbH (2008) Bruker listeria and vibrio spectra. Available on http://www.bdal.de (Dr. Markus Kostrzewa), Personal Communication

  8. Chaoji V, Al Hasan M, Salem S, Zaki MJ (2009) Sparcl: an effective and efficient algorithm for mining arbitrary shape-based clusters. Knowl Inf Syst (in press)

  9. Cottrell M, Hammer B, Hasenfuss A, Villmann T (2006) Batch and median neural gas. Neural Netw 19: 762–771

    Article  MATH  Google Scholar 

  10. Forero MG, Sroubek F, Cristobal G (2004) Identification of tuberculosis bacteria based on shape and color. Real-time Imaging 10(4): 251–262

    Article  Google Scholar 

  11. Guyon I (2006) Feature extraction. Foundations and applications. Springer, Berlin

    Book  MATH  Google Scholar 

  12. Hammer B, Hasenfuss A (2007) Relational neural gas. In: Künstliche Intelligenz 2007, Lecture Notes in Computer Science (LNAI), Springer, Heidelberg, pp 190–204

  13. Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84: 502–516

    Article  MathSciNet  MATH  Google Scholar 

  14. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York

    MATH  Google Scholar 

  15. Hollemeyer K, Altmeyer W, Heinzle E, Pitra C (2008) Species identification of oetzis clothing with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry based on peptide pattern similarities of hair digests. Rapid Commun Mass Spectr 22: 2751–2767

    Article  Google Scholar 

  16. Hsieh S-Y, Tseng C-L, Lee Y-S (2008) Highly efficient classification and identification of human pathogenic bacteria by MALDI-TOF-MS. Mol Cell Proteomics 7(2): 448–456

    Google Scholar 

  17. Hu A, Lo AA, Chen CT, Lin KC, Ho YP (2007) Identifying bacterial species using CE-MS and SEQUEST with an empirical scoring function. Electrophoresis 28(9): 1387–1392

    Article  Google Scholar 

  18. Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1): 95–116

    Article  Google Scholar 

  19. Keys CJ, Dare DJ, Sutton H, Wells G, Lunt M, McKenna T, McDowall M, Shah HN (2004) Compilation of a MALDI-TOF mass spectral database for the rapid screening and characterisation of bacteria implicated in human infectious diseases. Infect Genet Evol 4(3): 221–242

    Article  Google Scholar 

  20. Khatun J, Hamlett E, Giddings MC (2008) Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification. Bioinformatics 24(5): 674–681

    Article  Google Scholar 

  21. Kohonen T (1995) Self-organizing maps. Springer Series in Information Sciences, vol 30 (2nd extended edn). Springer, Berlin (1997)

  22. Kostrzewa M (2009) Efficieny of MS + BioTyper based bacteria identification for the clinical market. Personal Communication

  23. Kushner HJ, Clark DS (1978) Stochastic appproximation methods for constrained and unconstrained systems. Springer, New York

    Google Scholar 

  24. Labusch K, Barth E, Martinetz T (2008) Learning data representations with sparse coding neural gas. In: Verleysen M (ed) Proceedings of the European symposium on artificial neural networks ESANN. d-side publications, Evere, pp 233–238

  25. Labusch K, Barth E, Martinetz T (2009) Sparse coding neural gas: learning of overcomplete data representations. Neurocomputing 72: 1547–1555

    Article  Google Scholar 

  26. Liebler DC (2002) Introduction to proteomics. Humana Press, New Jersey

    Google Scholar 

  27. Martinetz TM, Berkovich SG, Schulten KJ (1993) ’Neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Trans Neural Netw 4(4): 558–569

    Article  Google Scholar 

  28. Mathworks (2008) MATLAB statistics-toolbox. Accessed on http://www.mathworks.com

  29. Mazzeo MF, Sorrentino A, Gaita M, Cacace G, Di Stasio M, Facchiano A, Comi G, Malorni A, Siciliano RA (2006) Matrix-assisted laser desorption ionization-time of flight mass spectrometry for the discrimination of food-borne microorganisms. Appl Environ Microbiol 72(2): 1180–1189

    Article  Google Scholar 

  30. Oja E (1989) Neural networks, principle components and subspaces. Int J Neural Syst 1: 61–68

    Article  MathSciNet  Google Scholar 

  31. Olshausen BA, Finch DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381: 607–609

    Article  Google Scholar 

  32. Pakkanen J, Iivarinen J, Oja E (2004) The evolving tree—a novel self-organizing network for data analysis. Neural Process Lett 20(3): 199–211

    Article  Google Scholar 

  33. Pfitzner D, Leibbrandt R, Powers R (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19(3): 361–394

    Article  Google Scholar 

  34. Saha S, Bandyopadhyay S (2009) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst (in press)

  35. Schleif F-M, Lindemann M, Maass P, Diaz M, Decker J, Elssner T, Kuhn M, Thiele H (2008) Support vector classification of proteomic profile spectra based on feature extraction with the bi-orthogonal discrete wavelet transform. Comput Vis Sci. doi:10.1007/s00791-008-0087-z

  36. Schleif F-M, Villmann T, Kostrzewa M, Hammer B, Gammerman A (2008) Cancer informatics by prototype networks in mass spectrometry. Artif Intell Med. page PMID:18778925

  37. Schmid O, Ball G, Lancashire L, Culak R, Shah H (2005) New approaches to identification of bacterial pathogens by surface enhanced laser desorption/ionization time of flight mass spectrometry in concert with artificial neural networks, with special reference to Neisseria gonorrhoeae. J Med Microbiol 54: 1205–1211

    Article  Google Scholar 

  38. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Google Scholar 

  39. Simmuteit S (2008) Effizientes Retrieval aus Massenspektrometriedatenbanken, Diplomarbeit, Technische Universität Clausthal, February 2008

  40. Ultsch A, Siemon HP (1990) Kohonen’s self organizing feature maps for exploratory data analysis. In: Proceedings of the INNC’90, international neural network conference, The Netherlands. Kluwer, Dordrecht, pp 305–308

  41. Valentine N, Wunschel S, Wunschel D, Petersen C, Wahl K (2005) Effect of culture conditions on microorganism identification by matrix-assisted laser desorption ionization mass spectrometry. Appl Environ Microbiol 71(1): 58–64

    Article  Google Scholar 

  42. Villmann T, Claussen J-C (2006) Magnification control in self-organizing maps and neural gas. Neural Comput 18(2): 446–469

    Article  MathSciNet  MATH  Google Scholar 

  43. Villmann T, Der R, Herrmann M, Martinetz T (1997) Topology preservation in self-organizing feature maps: exact definition and measurement. IEEE Trans Neural Netw 8(2): 256–266

    Article  Google Scholar 

  44. Villmann T, Schleif F-M, Hammer B, Kostrzewa M (2008) Exploration of mass-spectrometric data in clinical proteomics using learning vector quantization methods. Briefing Bioinf 9(2): 129–143

    Article  Google Scholar 

  45. Wilkes JG, Glover KL, Holcomb M (2002) Defining and using microbial spectral databases. J Am Soc Mass Spectr 13(7): 875–887

    Article  Google Scholar 

  46. Zhang Z, Jackson GW, Fox GE, Willson RC (2006) Microbial identification by mass cataloging. BMC Bioinf 7: 117

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank-Michael Schleif.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Simmuteit, S., Schleif, FM., Villmann, T. et al. Evolving trees for the retrieval of mass spectrometry-based bacteria fingerprints. Knowl Inf Syst 25, 327–343 (2010). https://doi.org/10.1007/s10115-009-0249-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0249-4

Keywords

Navigation