A new type of unsupervised growing neural network for biological sequence classification that adopts the topology of a phylogenetic tree

Dopazo, Joaquín; Wang, Huaichun; Carazo, José María

doi:10.1007/BFb0032553

Joaquín Dopazo¹,
Huaichun Wang²^nAff3 &
José María Carazo²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1240))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

222 Accesses
5 Citations

Abstract

We propose a new type of unsupervised growing self-organizing neural network that expands itself following the taxonomic relationships existing among the sequences being classified. The binary tree topology of this neural network, opposite to other more classical neural network topologies, permits an efficient classification of sequences. The growing nature of this procedure allows to stop it at the desired taxonomic level without the necessity of waiting until a complete phylogenetic tree is produced. This novel approach presents a number of other interesting properties, such as a time for convergence which is, approximately, a lineal function of the number of sequences. Computer simulation and a real example shows that the algorithm accurately finds the phylogenetic tree that relates the data. All this makes of the neural network presented here an excellent tool for the phylogenetic analysis of large number of sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adell JC, Dopazo J (1994) Monte Carlo simulation in phytogenies: an application to test the constancy of evolutionary rates. J Mol Evol 38:305–309.
Google Scholar
Andrade MA, Casari G, Sander C, Valencia A (1996) Classification of protein families and detection of the determinant residues with a self-organising neural network. Submitted.
Google Scholar
Arrigo P, Guiuliano F, Scalia F, Rapallo A, Damiani G (1991) Identification of a new motif on nucleic acid sequence data using Kohonen's self-organizing map. Comput Appl Biosci 7:353–357.
Google Scholar
Bengio Y, Pouliot Y (1990) Efficient recognition of immunological domains from amino acid sequences using a neural network. Comput Appl Biosci 6:319–324.
Google Scholar
Bohr H, Bohr J, Brunak S, Cotterill R, Lautrup B, Norskov L, Olsen OH, Petersen SB (1988) Protein secondary structure and homology by neural networks. The α helices in rhodopsin. FEBS Letters. 241:223–228.
Google Scholar
Casari G, Sander C, Valencia A (1995) Functional residues predicted in protein sequence space. Nature Structural Biology. 2:171–178.
Google Scholar
Dopazo J (1994) Estimating errors and confidence intervals for branch lengths in phylogenetic trees by a bootstrap approach. J. Mol Evol 38:300–304.
Google Scholar
Felsenstein J (1985) Confidence limits on phytogenies: an approach using the bootstrap. Evolution. 39:783–791.
Google Scholar
Felsenstein J (1993) PHYLIP (Phytogeny Inference Package) version 3.5. Department of Genetics, University of Washington. Seattle.
Google Scholar
Ferran EA, Pflugfelder B, Ferrara P (1994) Self-organized neural maps of human protein sequences. Protein Science. 3:507–521.
Google Scholar
Ferran EA, Ferrara P (1991) Topological maps of protein sequences. Biological Cybernetics. 65:451–458.
Google Scholar
Ferran EA, Ferrara P (1992) Clustering proteins into families using artificial neural networks. Comput Appl Biosci 8:39–44.
Google Scholar
Ferran EA, Pflugfelder B (1993) A hybrid method to cluster protein sequences based on statistic and artificial neural networks. Comput Appl Biosci 9:671–680.
Google Scholar
Fitch WM (1971) Toward defining the course of evolution: minimum change for a specified tree topology. Syst Zool 20:406–416.
Google Scholar
Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science. 155:279–284.
Google Scholar
Frishman D, Argos P (1992) Recognition of distantly related protein sequences using conserved motifs and neural networks. J Mol Biol 228:951–962.
Google Scholar
Fritzke B (1994) Growing cell structures — a self-organizing network for unsupervised and supervised learning. Neural Networks. 7:1141–1160.
Google Scholar
Henikoff S, Henikoff JG (1991) Automated assembly off protein blocks for database searching. Nucleic Acids Research 19:6565–6572.
Google Scholar
Hirst JD, Stenberg MJE (1992) Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry. 32:7211–7218.
Google Scholar
Holmes EC, Garnett GP (1994) Genes, trees and infections: molecular evidence in epidemiology. Trends in Ecology and Evolution. 9:256–260.
Google Scholar
Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press. Cambridge.
Google Scholar
Kohonen T (1990) The self-organizing map. Proc IEEE. 78:1464–1480.
Google Scholar
Nei M (1987) Molecular evolutionary genetics. Columbia University Press. New York.
Google Scholar
O'Neill MC (1995) Escherichia coli promoters: neural networks develop distinct descriptors in learning to search for promoters of different spacing classes. Nucleic Acids Research. 20:3471–3477.
Google Scholar
Petersen SB, Bohr H, Bohr J, Brunak S, Cotterill R, Fredholm H, Lautrup B (1990) Training neural networks to analyse biological sequences. Trends in Biotechnology. 8:304–308.
Google Scholar
Rost B, Sander C (1993a) Secondary structure prediction of all-helical proteins in two states. Protein Eng 6:831–836.
Google Scholar
Rost B, Sander C (1993b) Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Natl Acad Sci USA. 90:7558–7562.
Google Scholar
Saitou N, Imanishi T (1989) Relative efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum-Likelihood, Minimum-Evolution, and Neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol Biol Evol 6:514–525
Google Scholar
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol. Evol 4:406–425.
Google Scholar
Snyder EE, Storno GD (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18.
Google Scholar
Wade R.C, Bohr H, Wolynes PG (1992) Prediction of water binding sites on proteins by neural networks. J Am Chem Soc 114:8284–8285.
Google Scholar
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE. 78:1550–1560.
Google Scholar
Wu C (1993) Classification neural networks for rapid sequence annotation and automated database organization. Computers Chem 17:219–227.
Google Scholar
Wu C, Shivakumar S (1994) Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. Nucleic Acids Research. 22:4291–4299.
Google Scholar

Download references

Author information

Huaichun Wang
Present address: Institute of Medical Information, 27 Taiping Road, Beijing, China

Authors and Affiliations

Glaxo Wellcome s.a., c/Severo Ochoa 2 (P.T.M.), 28760, Tres Cantos (Madrid), Spain
Joaquín Dopazo
Centro Nacional de Biotecnología. CSlC, Universidad Autonoma, 28049, Cantoblanco, Madrid, Spain
Huaichun Wang & José María Carazo

Authors

Joaquín Dopazo
View author publications
You can also search for this author in PubMed Google Scholar
Huaichun Wang
View author publications
You can also search for this author in PubMed Google Scholar
José María Carazo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

José Mira Roberto Moreno-Díaz Joan Cabestany

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dopazo, J., Wang, H., Carazo, J.M. (1997). A new type of unsupervised growing neural network for biological sequence classification that adopts the topology of a phylogenetic tree. In: Mira, J., Moreno-Díaz, R., Cabestany, J. (eds) Biological and Artificial Computation: From Neuroscience to Technology. IWANN 1997. Lecture Notes in Computer Science, vol 1240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0032553

Download citation

DOI: https://doi.org/10.1007/BFb0032553
Published: 18 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63047-0
Online ISBN: 978-3-540-69074-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics