Abstract
In this paper, we investigate the application of Evolving Trees (ET) for the analysis of mass spectrometric data of bacteria. Evolving Trees are extensions of self-organizing maps (SOMs) developed for hierarchical classification systems. Therefore, they are well suited for taxonomic problems such as the identification of bacteria. Here, we focus on three topics, an appropriate pre-processing and encoding of the spectra, an adequate data model by means of a hierarchical Evolving Tree and an interpretable visualization. First, the high dimensionality of the data is reduced by a compact representation. Here, we employ sparse coding, specifically tailored for the processing of mass spectra. In the second step, the topographic information which is expected in the fingerprints is used for advanced tree evaluation and analysis. We adapted the original topographic product for SOMs for ET to achieve a judgment of topography. Additionally we transferred the concept of U-matrix for evaluation of the separability of SOMs to their analog in ET. We demonstrate these extensions for two mass spectrometric data sets of bacteria fingerprints and show their classification and evaluation capabilities in comparison to state of the art techniques.
Similar content being viewed by others
References
Barbuddhe SB, Maier T, Schwarz G, Kostrzewa M, Hof H, Domann E, Chakraborty T, Hain T (2008) Rapid identification and typing of listeria species by matrix-assisted laser desorption ionization-time of flight mass spectrometry. Appl Environ Microbiol 74(17): 5402–5407
Bauer H-U, Herrmann M, Villmann T (1999) Neural maps and topographic vector quantization. Neural Netw 12(4–5): 659–676
Bauer H-U, Pawelzik KR (1992) Quantifying the neighborhood preservation of self-organizing feature maps. IEEE Trans Neural Netw 3(4): 570–579
Bauer H-U, Villmann T (1997) Growing a hypercubical output space in a self-organizing feature map. IEEE Trans Neural Netw 8(2): 218–226
Bruker Daltonik GmbH (2008) Bruker BioTyper 2.0. Available on http://www.bdal.de
Bruker Daltonik GmbH (2008) Bruker BioTyper 2.0, User manual. Available on http://www.bdal.de
Bruker Daltonik GmbH (2008) Bruker listeria and vibrio spectra. Available on http://www.bdal.de (Dr. Markus Kostrzewa), Personal Communication
Chaoji V, Al Hasan M, Salem S, Zaki MJ (2009) Sparcl: an effective and efficient algorithm for mining arbitrary shape-based clusters. Knowl Inf Syst (in press)
Cottrell M, Hammer B, Hasenfuss A, Villmann T (2006) Batch and median neural gas. Neural Netw 19: 762–771
Forero MG, Sroubek F, Cristobal G (2004) Identification of tuberculosis bacteria based on shape and color. Real-time Imaging 10(4): 251–262
Guyon I (2006) Feature extraction. Foundations and applications. Springer, Berlin
Hammer B, Hasenfuss A (2007) Relational neural gas. In: Künstliche Intelligenz 2007, Lecture Notes in Computer Science (LNAI), Springer, Heidelberg, pp 190–204
Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84: 502–516
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Hollemeyer K, Altmeyer W, Heinzle E, Pitra C (2008) Species identification of oetzis clothing with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry based on peptide pattern similarities of hair digests. Rapid Commun Mass Spectr 22: 2751–2767
Hsieh S-Y, Tseng C-L, Lee Y-S (2008) Highly efficient classification and identification of human pathogenic bacteria by MALDI-TOF-MS. Mol Cell Proteomics 7(2): 448–456
Hu A, Lo AA, Chen CT, Lin KC, Ho YP (2007) Identifying bacterial species using CE-MS and SEQUEST with an empirical scoring function. Electrophoresis 28(9): 1387–1392
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1): 95–116
Keys CJ, Dare DJ, Sutton H, Wells G, Lunt M, McKenna T, McDowall M, Shah HN (2004) Compilation of a MALDI-TOF mass spectral database for the rapid screening and characterisation of bacteria implicated in human infectious diseases. Infect Genet Evol 4(3): 221–242
Khatun J, Hamlett E, Giddings MC (2008) Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification. Bioinformatics 24(5): 674–681
Kohonen T (1995) Self-organizing maps. Springer Series in Information Sciences, vol 30 (2nd extended edn). Springer, Berlin (1997)
Kostrzewa M (2009) Efficieny of MS + BioTyper based bacteria identification for the clinical market. Personal Communication
Kushner HJ, Clark DS (1978) Stochastic appproximation methods for constrained and unconstrained systems. Springer, New York
Labusch K, Barth E, Martinetz T (2008) Learning data representations with sparse coding neural gas. In: Verleysen M (ed) Proceedings of the European symposium on artificial neural networks ESANN. d-side publications, Evere, pp 233–238
Labusch K, Barth E, Martinetz T (2009) Sparse coding neural gas: learning of overcomplete data representations. Neurocomputing 72: 1547–1555
Liebler DC (2002) Introduction to proteomics. Humana Press, New Jersey
Martinetz TM, Berkovich SG, Schulten KJ (1993) ’Neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Trans Neural Netw 4(4): 558–569
Mathworks (2008) MATLAB statistics-toolbox. Accessed on http://www.mathworks.com
Mazzeo MF, Sorrentino A, Gaita M, Cacace G, Di Stasio M, Facchiano A, Comi G, Malorni A, Siciliano RA (2006) Matrix-assisted laser desorption ionization-time of flight mass spectrometry for the discrimination of food-borne microorganisms. Appl Environ Microbiol 72(2): 1180–1189
Oja E (1989) Neural networks, principle components and subspaces. Int J Neural Syst 1: 61–68
Olshausen BA, Finch DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381: 607–609
Pakkanen J, Iivarinen J, Oja E (2004) The evolving tree—a novel self-organizing network for data analysis. Neural Process Lett 20(3): 199–211
Pfitzner D, Leibbrandt R, Powers R (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19(3): 361–394
Saha S, Bandyopadhyay S (2009) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst (in press)
Schleif F-M, Lindemann M, Maass P, Diaz M, Decker J, Elssner T, Kuhn M, Thiele H (2008) Support vector classification of proteomic profile spectra based on feature extraction with the bi-orthogonal discrete wavelet transform. Comput Vis Sci. doi:10.1007/s00791-008-0087-z
Schleif F-M, Villmann T, Kostrzewa M, Hammer B, Gammerman A (2008) Cancer informatics by prototype networks in mass spectrometry. Artif Intell Med. page PMID:18778925
Schmid O, Ball G, Lancashire L, Culak R, Shah H (2005) New approaches to identification of bacterial pathogens by surface enhanced laser desorption/ionization time of flight mass spectrometry in concert with artificial neural networks, with special reference to Neisseria gonorrhoeae. J Med Microbiol 54: 1205–1211
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Simmuteit S (2008) Effizientes Retrieval aus Massenspektrometriedatenbanken, Diplomarbeit, Technische Universität Clausthal, February 2008
Ultsch A, Siemon HP (1990) Kohonen’s self organizing feature maps for exploratory data analysis. In: Proceedings of the INNC’90, international neural network conference, The Netherlands. Kluwer, Dordrecht, pp 305–308
Valentine N, Wunschel S, Wunschel D, Petersen C, Wahl K (2005) Effect of culture conditions on microorganism identification by matrix-assisted laser desorption ionization mass spectrometry. Appl Environ Microbiol 71(1): 58–64
Villmann T, Claussen J-C (2006) Magnification control in self-organizing maps and neural gas. Neural Comput 18(2): 446–469
Villmann T, Der R, Herrmann M, Martinetz T (1997) Topology preservation in self-organizing feature maps: exact definition and measurement. IEEE Trans Neural Netw 8(2): 256–266
Villmann T, Schleif F-M, Hammer B, Kostrzewa M (2008) Exploration of mass-spectrometric data in clinical proteomics using learning vector quantization methods. Briefing Bioinf 9(2): 129–143
Wilkes JG, Glover KL, Holcomb M (2002) Defining and using microbial spectral databases. J Am Soc Mass Spectr 13(7): 875–887
Zhang Z, Jackson GW, Fox GE, Willson RC (2006) Microbial identification by mass cataloging. BMC Bioinf 7: 117
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Simmuteit, S., Schleif, FM., Villmann, T. et al. Evolving trees for the retrieval of mass spectrometry-based bacteria fingerprints. Knowl Inf Syst 25, 327–343 (2010). https://doi.org/10.1007/s10115-009-0249-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0249-4