A new type of unsupervised growing neural network for biological sequence classification that adopts the topology of a phylogenetic tree

  • Joaquín Dopazo
  • Huaichun Wang
  • José María Carazo
Methodology for Data Analysis, Task Selection and Nets Design
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1240)


We propose a new type of unsupervised growing self-organizing neural network that expands itself following the taxonomic relationships existing among the sequences being classified. The binary tree topology of this neural network, opposite to other more classical neural network topologies, permits an efficient classification of sequences. The growing nature of this procedure allows to stop it at the desired taxonomic level without the necessity of waiting until a complete phylogenetic tree is produced. This novel approach presents a number of other interesting properties, such as a time for convergence which is, approximately, a lineal function of the number of sequences. Computer simulation and a real example shows that the algorithm accurately finds the phylogenetic tree that relates the data. All this makes of the neural network presented here an excellent tool for the phylogenetic analysis of large number of sequences.


Neural Network Binary Tree Sequence Vector Nucleic Acid Research Ancestor Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adell JC, Dopazo J (1994) Monte Carlo simulation in phytogenies: an application to test the constancy of evolutionary rates. J Mol Evol 38:305–309.Google Scholar
  2. Andrade MA, Casari G, Sander C, Valencia A (1996) Classification of protein families and detection of the determinant residues with a self-organising neural network. Submitted.Google Scholar
  3. Arrigo P, Guiuliano F, Scalia F, Rapallo A, Damiani G (1991) Identification of a new motif on nucleic acid sequence data using Kohonen's self-organizing map. Comput Appl Biosci 7:353–357.Google Scholar
  4. Bengio Y, Pouliot Y (1990) Efficient recognition of immunological domains from amino acid sequences using a neural network. Comput Appl Biosci 6:319–324.Google Scholar
  5. Bohr H, Bohr J, Brunak S, Cotterill R, Lautrup B, Norskov L, Olsen OH, Petersen SB (1988) Protein secondary structure and homology by neural networks. The α helices in rhodopsin. FEBS Letters. 241:223–228.Google Scholar
  6. Casari G, Sander C, Valencia A (1995) Functional residues predicted in protein sequence space. Nature Structural Biology. 2:171–178.Google Scholar
  7. Dopazo J (1994) Estimating errors and confidence intervals for branch lengths in phylogenetic trees by a bootstrap approach. J. Mol Evol 38:300–304.Google Scholar
  8. Felsenstein J (1985) Confidence limits on phytogenies: an approach using the bootstrap. Evolution. 39:783–791.Google Scholar
  9. Felsenstein J (1993) PHYLIP (Phytogeny Inference Package) version 3.5. Department of Genetics, University of Washington. Seattle.Google Scholar
  10. Ferran EA, Pflugfelder B, Ferrara P (1994) Self-organized neural maps of human protein sequences. Protein Science. 3:507–521.Google Scholar
  11. Ferran EA, Ferrara P (1991) Topological maps of protein sequences. Biological Cybernetics. 65:451–458.Google Scholar
  12. Ferran EA, Ferrara P (1992) Clustering proteins into families using artificial neural networks. Comput Appl Biosci 8:39–44.Google Scholar
  13. Ferran EA, Pflugfelder B (1993) A hybrid method to cluster protein sequences based on statistic and artificial neural networks. Comput Appl Biosci 9:671–680.Google Scholar
  14. Fitch WM (1971) Toward defining the course of evolution: minimum change for a specified tree topology. Syst Zool 20:406–416.Google Scholar
  15. Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science. 155:279–284.Google Scholar
  16. Frishman D, Argos P (1992) Recognition of distantly related protein sequences using conserved motifs and neural networks. J Mol Biol 228:951–962.Google Scholar
  17. Fritzke B (1994) Growing cell structures — a self-organizing network for unsupervised and supervised learning. Neural Networks. 7:1141–1160.Google Scholar
  18. Henikoff S, Henikoff JG (1991) Automated assembly off protein blocks for database searching. Nucleic Acids Research 19:6565–6572.Google Scholar
  19. Hirst JD, Stenberg MJE (1992) Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry. 32:7211–7218.Google Scholar
  20. Holmes EC, Garnett GP (1994) Genes, trees and infections: molecular evidence in epidemiology. Trends in Ecology and Evolution. 9:256–260.Google Scholar
  21. Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press. Cambridge.Google Scholar
  22. Kohonen T (1990) The self-organizing map. Proc IEEE. 78:1464–1480.Google Scholar
  23. Nei M (1987) Molecular evolutionary genetics. Columbia University Press. New York.Google Scholar
  24. O'Neill MC (1995) Escherichia coli promoters: neural networks develop distinct descriptors in learning to search for promoters of different spacing classes. Nucleic Acids Research. 20:3471–3477.Google Scholar
  25. Petersen SB, Bohr H, Bohr J, Brunak S, Cotterill R, Fredholm H, Lautrup B (1990) Training neural networks to analyse biological sequences. Trends in Biotechnology. 8:304–308.Google Scholar
  26. Rost B, Sander C (1993a) Secondary structure prediction of all-helical proteins in two states. Protein Eng 6:831–836.Google Scholar
  27. Rost B, Sander C (1993b) Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Natl Acad Sci USA. 90:7558–7562.Google Scholar
  28. Saitou N, Imanishi T (1989) Relative efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum-Likelihood, Minimum-Evolution, and Neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol Biol Evol 6:514–525Google Scholar
  29. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol. Evol 4:406–425.Google Scholar
  30. Snyder EE, Storno GD (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18.Google Scholar
  31. Wade R.C, Bohr H, Wolynes PG (1992) Prediction of water binding sites on proteins by neural networks. J Am Chem Soc 114:8284–8285.Google Scholar
  32. Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE. 78:1550–1560.Google Scholar
  33. Wu C (1993) Classification neural networks for rapid sequence annotation and automated database organization. Computers Chem 17:219–227.Google Scholar
  34. Wu C, Shivakumar S (1994) Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. Nucleic Acids Research. 22:4291–4299.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Joaquín Dopazo
    • 1
  • Huaichun Wang
    • 2
  • José María Carazo
    • 2
  1. 1.Glaxo Wellcome s.a.Tres Cantos (Madrid)Spain
  2. 2.Centro Nacional de Biotecnología. CSlCUniversidad AutonomaCantoblanco, MadridSpain

Personalised recommendations