Skip to main content

A new type of unsupervised growing neural network for biological sequence classification that adopts the topology of a phylogenetic tree

  • Methodology for Data Analysis, Task Selection and Nets Design
  • Conference paper
  • First Online:
Biological and Artificial Computation: From Neuroscience to Technology (IWANN 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1240))

Included in the following conference series:

Abstract

We propose a new type of unsupervised growing self-organizing neural network that expands itself following the taxonomic relationships existing among the sequences being classified. The binary tree topology of this neural network, opposite to other more classical neural network topologies, permits an efficient classification of sequences. The growing nature of this procedure allows to stop it at the desired taxonomic level without the necessity of waiting until a complete phylogenetic tree is produced. This novel approach presents a number of other interesting properties, such as a time for convergence which is, approximately, a lineal function of the number of sequences. Computer simulation and a real example shows that the algorithm accurately finds the phylogenetic tree that relates the data. All this makes of the neural network presented here an excellent tool for the phylogenetic analysis of large number of sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adell JC, Dopazo J (1994) Monte Carlo simulation in phytogenies: an application to test the constancy of evolutionary rates. J Mol Evol 38:305–309.

    Google Scholar 

  • Andrade MA, Casari G, Sander C, Valencia A (1996) Classification of protein families and detection of the determinant residues with a self-organising neural network. Submitted.

    Google Scholar 

  • Arrigo P, Guiuliano F, Scalia F, Rapallo A, Damiani G (1991) Identification of a new motif on nucleic acid sequence data using Kohonen's self-organizing map. Comput Appl Biosci 7:353–357.

    Google Scholar 

  • Bengio Y, Pouliot Y (1990) Efficient recognition of immunological domains from amino acid sequences using a neural network. Comput Appl Biosci 6:319–324.

    Google Scholar 

  • Bohr H, Bohr J, Brunak S, Cotterill R, Lautrup B, Norskov L, Olsen OH, Petersen SB (1988) Protein secondary structure and homology by neural networks. The α helices in rhodopsin. FEBS Letters. 241:223–228.

    Google Scholar 

  • Casari G, Sander C, Valencia A (1995) Functional residues predicted in protein sequence space. Nature Structural Biology. 2:171–178.

    Google Scholar 

  • Dopazo J (1994) Estimating errors and confidence intervals for branch lengths in phylogenetic trees by a bootstrap approach. J. Mol Evol 38:300–304.

    Google Scholar 

  • Felsenstein J (1985) Confidence limits on phytogenies: an approach using the bootstrap. Evolution. 39:783–791.

    Google Scholar 

  • Felsenstein J (1993) PHYLIP (Phytogeny Inference Package) version 3.5. Department of Genetics, University of Washington. Seattle.

    Google Scholar 

  • Ferran EA, Pflugfelder B, Ferrara P (1994) Self-organized neural maps of human protein sequences. Protein Science. 3:507–521.

    Google Scholar 

  • Ferran EA, Ferrara P (1991) Topological maps of protein sequences. Biological Cybernetics. 65:451–458.

    Google Scholar 

  • Ferran EA, Ferrara P (1992) Clustering proteins into families using artificial neural networks. Comput Appl Biosci 8:39–44.

    Google Scholar 

  • Ferran EA, Pflugfelder B (1993) A hybrid method to cluster protein sequences based on statistic and artificial neural networks. Comput Appl Biosci 9:671–680.

    Google Scholar 

  • Fitch WM (1971) Toward defining the course of evolution: minimum change for a specified tree topology. Syst Zool 20:406–416.

    Google Scholar 

  • Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science. 155:279–284.

    Google Scholar 

  • Frishman D, Argos P (1992) Recognition of distantly related protein sequences using conserved motifs and neural networks. J Mol Biol 228:951–962.

    Google Scholar 

  • Fritzke B (1994) Growing cell structures — a self-organizing network for unsupervised and supervised learning. Neural Networks. 7:1141–1160.

    Google Scholar 

  • Henikoff S, Henikoff JG (1991) Automated assembly off protein blocks for database searching. Nucleic Acids Research 19:6565–6572.

    Google Scholar 

  • Hirst JD, Stenberg MJE (1992) Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry. 32:7211–7218.

    Google Scholar 

  • Holmes EC, Garnett GP (1994) Genes, trees and infections: molecular evidence in epidemiology. Trends in Ecology and Evolution. 9:256–260.

    Google Scholar 

  • Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press. Cambridge.

    Google Scholar 

  • Kohonen T (1990) The self-organizing map. Proc IEEE. 78:1464–1480.

    Google Scholar 

  • Nei M (1987) Molecular evolutionary genetics. Columbia University Press. New York.

    Google Scholar 

  • O'Neill MC (1995) Escherichia coli promoters: neural networks develop distinct descriptors in learning to search for promoters of different spacing classes. Nucleic Acids Research. 20:3471–3477.

    Google Scholar 

  • Petersen SB, Bohr H, Bohr J, Brunak S, Cotterill R, Fredholm H, Lautrup B (1990) Training neural networks to analyse biological sequences. Trends in Biotechnology. 8:304–308.

    Google Scholar 

  • Rost B, Sander C (1993a) Secondary structure prediction of all-helical proteins in two states. Protein Eng 6:831–836.

    Google Scholar 

  • Rost B, Sander C (1993b) Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Natl Acad Sci USA. 90:7558–7562.

    Google Scholar 

  • Saitou N, Imanishi T (1989) Relative efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum-Likelihood, Minimum-Evolution, and Neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol Biol Evol 6:514–525

    Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol. Evol 4:406–425.

    Google Scholar 

  • Snyder EE, Storno GD (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18.

    Google Scholar 

  • Wade R.C, Bohr H, Wolynes PG (1992) Prediction of water binding sites on proteins by neural networks. J Am Chem Soc 114:8284–8285.

    Google Scholar 

  • Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE. 78:1550–1560.

    Google Scholar 

  • Wu C (1993) Classification neural networks for rapid sequence annotation and automated database organization. Computers Chem 17:219–227.

    Google Scholar 

  • Wu C, Shivakumar S (1994) Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. Nucleic Acids Research. 22:4291–4299.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

José Mira Roberto Moreno-Díaz Joan Cabestany

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dopazo, J., Wang, H., Carazo, J.M. (1997). A new type of unsupervised growing neural network for biological sequence classification that adopts the topology of a phylogenetic tree. In: Mira, J., Moreno-Díaz, R., Cabestany, J. (eds) Biological and Artificial Computation: From Neuroscience to Technology. IWANN 1997. Lecture Notes in Computer Science, vol 1240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0032553

Download citation

  • DOI: https://doi.org/10.1007/BFb0032553

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63047-0

  • Online ISBN: 978-3-540-69074-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics