Skip to main content
Log in

Self-organizing maps: A tool to ascertain taxonomic relatedness based on features derived from 16S rDNA sequence

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

Exploitation of microbial wealth, of which almost 95% or more is still unexplored, is a growing need. The taxonomic placements of a new isolate based on phenotypic characteristics are now being supported by information preserved in the 16S rRNA gene. However, the analysis of 16S rDNA sequences retrieved from metagenome, by the available bioinformatics tools, is subject to limitations. In this study, the occurrences of nucleotide features in 16S rDNA sequences have been used to ascertain the taxonomic placement of organisms. The tetra- and penta-nucleotide features were extracted from the training data set of the 16S rDNA sequence, and was subjected to an artificial neural network (ANN) based tool known as self-organizing map (SOM), which helped in visualization of unsupervised classification. For selection of significant features, principal component analysis (PCA) or curvilinear component analysis (CCA) was applied. The SOM along with these techniques could discriminate the sample sequences with more than 90% accuracy, highlighting the relevance of features. To ascertain the confidence level in the developed classification approach, the test data set was specifically evaluated for Thiobacillus, with Acidiphilium, Paracocus and Starkeya, which are taxonomically reassigned. The evaluation proved the excellent generalization capability of the developed tool. The topology of genera in SOM supported the conventional chemo-biochemical classification reported in the Bergey manual.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

ANN:

artificial neural network

CCA:

curvilinear component analysis

ID:

intrinsic dimensionality

MDS:

multidimensional scaling

PCA:

principal component analysis

PE:

processing element

SOM:

self-organizing map

VQ:

vector quantization

References

  • Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T and Ikemura T 2003 Informatics for unveiling hidden genome signatures; Genome Res. 13 693–702

    Article  CAS  PubMed  Google Scholar 

  • Amann R, Ludwig W and Schleifer K 1995 Phylogenetic identification and in situ detection of individual microbial cells without cultivation; Microbiol. Rev. 59 143–169

    CAS  PubMed  Google Scholar 

  • Buchala S, Davey N, Frank R J, Gale T M, Loomes M J, and Kanargard W 2004a Gender classification of face images, 763–768. The role of global and feature-based information (INCONIP)

  • Buchala S, Davey N, Frank R J and Gale T M 2004b Dimensionality reduction of face images for gender classification; Intelligent Systems, 2004; Proceedings. 2004 2nd International IEEE Conference, Volume: 1, pp 88–93

    Google Scholar 

  • Demartines P and Herault J 1997 Curvilinear component analysis: A self-organizing neural network for non-linear mapping of data sets; IEEE Trans. Neural Network 8 148–154.

    Article  CAS  Google Scholar 

  • Durbin R, Eddy S, Krough A and Mitchinson G 1998 Biological sequence analysis (Cambridge: Cambridge University Press)

    Book  Google Scholar 

  • Garrity G M, Winters M and Searles D B 2001 Taxonomic outline of prokaryotic genera-Bergey’s Manual of systematic bacteriology, second edition (New York: Springer-Verlag)

    Google Scholar 

  • Hugenholtz P, Goebel B M and Pace N R 1998 Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity; J. Bacteriol. 180 4765–4774

    CAS  PubMed  Google Scholar 

  • Katayama Y, Hiraishi A and Kuraishi H 1995 Paracoccus thiocyanatus sp. nov., a new species of thiocyanate-utilizing facultative chemolithotroph, and transfer of Thiobacillus versutus to the genus Paracoccus as Paracoccus versutus comb. nov. with emendation of the genus; Microbiology 141 1469–1477

    Article  CAS  PubMed  Google Scholar 

  • Kasturi J, Acharya R and Ramanathan M 2003 An information theoretic approach for analyzing temporal patterns of gene expression; Bioinformatics 19 449–458

    Article  CAS  PubMed  Google Scholar 

  • Kohonen T 1990 The self-organizing map; Proc. IEEE 78 1464–1480

    Article  Google Scholar 

  • Kruskal J B 1964 Multidimensional scaling by optimizing goodness of a fit to a non metric hypothesis; Phychometrica 29 1–27

    Article  Google Scholar 

  • Liskiewicz M, Purohit H J and Raje D V 2004 Relation of residues in the variable regions of 16S rDNA and their relevance to genus specificity; Lect. Notes Comp. Sci. 3240 362–373

    Article  Google Scholar 

  • Maidak B L, Larsen N, McCaughey M J, Overbeek R, Olsen G J, Fogel K, Blandy J and Woese C R 1994 The ribosomal database project; Nucleic Acids Res. 17 3485–3487

    Article  Google Scholar 

  • Purohit H J, Raje D V and Kapley A 2003 Identification of signature and primers specific to genus Pseudomonas using mismatched patterns of 16S rDNA sequences; BMC Bioinformatics 4 19

    Article  CAS  PubMed  Google Scholar 

  • Raje D V, Purohit H J and Singh R S 2002 Distinguishing features of 16S rDNA gene for five dominating bacterial genus observed in bioremediation; J. Comp. Biol. 9 819–829

    Article  CAS  Google Scholar 

  • Roweis S and Saul L 2000 Nonlinear dimensionality reduction by locally linear embedding; Science 290 2323–2326

    Article  CAS  PubMed  Google Scholar 

  • Sammon J W 1969 A nonlinear mapping algorithm for data structure analysis; IEEE Trans. Comput. C-18 401–409

    Article  Google Scholar 

  • Schneider G 1999 How many potentially secreted proteins are contained in bacterial genome; Gene 237 113–121

    Article  CAS  PubMed  Google Scholar 

  • Tenenbaum J, de Silva V and Langford J 2000 A global geometric framework for nonlinear dimensionality reduction; Science 290 2319–2323

    Article  CAS  PubMed  Google Scholar 

  • Torsvik V, Ovreas L and Thingstad T F 2002 Prokaryotic diversity- -magnitude, dynamics, and controlling factors; Science 296 1064–1066

    Article  CAS  PubMed  Google Scholar 

  • Vasanto J, Alhoniemi E, Himberg J, Kiviluto K and Parvinainen J 1999 Self-organising map for data mining in Matlab: the SOM Toolbox; Simulation News (Europe) 25–54 http://www.cis.hut.fi/project/somtoolbox

  • Ward D M, Weller R and Bateson M M 1990 16S rRNA sequences, reveal numerous uncultured microorganisms in a natural community; Nature (London) 345 63–65

    Article  CAS  Google Scholar 

  • Wang H C, Badger J, Kearney P and Li M 2001 Analysis of codon usage patterns of bacterial genomes using the self-organizing map; Mol. Biol. Evol. 18 792–800

    CAS  PubMed  Google Scholar 

  • Woese C R 1987 Bacterial evolution; Microbiol. Rev. 51 221–271

    CAS  PubMed  Google Scholar 

  • Wold S, Esbensen K and Geladi P 1987 Principal component analysis; Chemo. Intell. Lab. Syst. 2 37–52

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. J. Purohit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raje, D.V., Purohit, H.J., Badhe, Y.P. et al. Self-organizing maps: A tool to ascertain taxonomic relatedness based on features derived from 16S rDNA sequence. J Biosci 35, 617–627 (2010). https://doi.org/10.1007/s12038-010-0070-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12038-010-0070-y

Keywords

Navigation