Abstract
Exploitation of microbial wealth, of which almost 95% or more is still unexplored, is a growing need. The taxonomic placements of a new isolate based on phenotypic characteristics are now being supported by information preserved in the 16S rRNA gene. However, the analysis of 16S rDNA sequences retrieved from metagenome, by the available bioinformatics tools, is subject to limitations. In this study, the occurrences of nucleotide features in 16S rDNA sequences have been used to ascertain the taxonomic placement of organisms. The tetra- and penta-nucleotide features were extracted from the training data set of the 16S rDNA sequence, and was subjected to an artificial neural network (ANN) based tool known as self-organizing map (SOM), which helped in visualization of unsupervised classification. For selection of significant features, principal component analysis (PCA) or curvilinear component analysis (CCA) was applied. The SOM along with these techniques could discriminate the sample sequences with more than 90% accuracy, highlighting the relevance of features. To ascertain the confidence level in the developed classification approach, the test data set was specifically evaluated for Thiobacillus, with Acidiphilium, Paracocus and Starkeya, which are taxonomically reassigned. The evaluation proved the excellent generalization capability of the developed tool. The topology of genera in SOM supported the conventional chemo-biochemical classification reported in the Bergey manual.
Similar content being viewed by others
Abbreviations
- ANN:
-
artificial neural network
- CCA:
-
curvilinear component analysis
- ID:
-
intrinsic dimensionality
- MDS:
-
multidimensional scaling
- PCA:
-
principal component analysis
- PE:
-
processing element
- SOM:
-
self-organizing map
- VQ:
-
vector quantization
References
Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T and Ikemura T 2003 Informatics for unveiling hidden genome signatures; Genome Res. 13 693–702
Amann R, Ludwig W and Schleifer K 1995 Phylogenetic identification and in situ detection of individual microbial cells without cultivation; Microbiol. Rev. 59 143–169
Buchala S, Davey N, Frank R J, Gale T M, Loomes M J, and Kanargard W 2004a Gender classification of face images, 763–768. The role of global and feature-based information (INCONIP)
Buchala S, Davey N, Frank R J and Gale T M 2004b Dimensionality reduction of face images for gender classification; Intelligent Systems, 2004; Proceedings. 2004 2nd International IEEE Conference, Volume: 1, pp 88–93
Demartines P and Herault J 1997 Curvilinear component analysis: A self-organizing neural network for non-linear mapping of data sets; IEEE Trans. Neural Network 8 148–154.
Durbin R, Eddy S, Krough A and Mitchinson G 1998 Biological sequence analysis (Cambridge: Cambridge University Press)
Garrity G M, Winters M and Searles D B 2001 Taxonomic outline of prokaryotic genera-Bergey’s Manual of systematic bacteriology, second edition (New York: Springer-Verlag)
Hugenholtz P, Goebel B M and Pace N R 1998 Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity; J. Bacteriol. 180 4765–4774
Katayama Y, Hiraishi A and Kuraishi H 1995 Paracoccus thiocyanatus sp. nov., a new species of thiocyanate-utilizing facultative chemolithotroph, and transfer of Thiobacillus versutus to the genus Paracoccus as Paracoccus versutus comb. nov. with emendation of the genus; Microbiology 141 1469–1477
Kasturi J, Acharya R and Ramanathan M 2003 An information theoretic approach for analyzing temporal patterns of gene expression; Bioinformatics 19 449–458
Kohonen T 1990 The self-organizing map; Proc. IEEE 78 1464–1480
Kruskal J B 1964 Multidimensional scaling by optimizing goodness of a fit to a non metric hypothesis; Phychometrica 29 1–27
Liskiewicz M, Purohit H J and Raje D V 2004 Relation of residues in the variable regions of 16S rDNA and their relevance to genus specificity; Lect. Notes Comp. Sci. 3240 362–373
Maidak B L, Larsen N, McCaughey M J, Overbeek R, Olsen G J, Fogel K, Blandy J and Woese C R 1994 The ribosomal database project; Nucleic Acids Res. 17 3485–3487
Purohit H J, Raje D V and Kapley A 2003 Identification of signature and primers specific to genus Pseudomonas using mismatched patterns of 16S rDNA sequences; BMC Bioinformatics 4 19
Raje D V, Purohit H J and Singh R S 2002 Distinguishing features of 16S rDNA gene for five dominating bacterial genus observed in bioremediation; J. Comp. Biol. 9 819–829
Roweis S and Saul L 2000 Nonlinear dimensionality reduction by locally linear embedding; Science 290 2323–2326
Sammon J W 1969 A nonlinear mapping algorithm for data structure analysis; IEEE Trans. Comput. C-18 401–409
Schneider G 1999 How many potentially secreted proteins are contained in bacterial genome; Gene 237 113–121
Tenenbaum J, de Silva V and Langford J 2000 A global geometric framework for nonlinear dimensionality reduction; Science 290 2319–2323
Torsvik V, Ovreas L and Thingstad T F 2002 Prokaryotic diversity- -magnitude, dynamics, and controlling factors; Science 296 1064–1066
Vasanto J, Alhoniemi E, Himberg J, Kiviluto K and Parvinainen J 1999 Self-organising map for data mining in Matlab: the SOM Toolbox; Simulation News (Europe) 25–54 http://www.cis.hut.fi/project/somtoolbox
Ward D M, Weller R and Bateson M M 1990 16S rRNA sequences, reveal numerous uncultured microorganisms in a natural community; Nature (London) 345 63–65
Wang H C, Badger J, Kearney P and Li M 2001 Analysis of codon usage patterns of bacterial genomes using the self-organizing map; Mol. Biol. Evol. 18 792–800
Woese C R 1987 Bacterial evolution; Microbiol. Rev. 51 221–271
Wold S, Esbensen K and Geladi P 1987 Principal component analysis; Chemo. Intell. Lab. Syst. 2 37–52
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raje, D.V., Purohit, H.J., Badhe, Y.P. et al. Self-organizing maps: A tool to ascertain taxonomic relatedness based on features derived from 16S rDNA sequence. J Biosci 35, 617–627 (2010). https://doi.org/10.1007/s12038-010-0070-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-010-0070-y