Advertisement

Neural Networks for Molecular Sequence Classification

  • Cathy H. Wu

Abstract

Nucleic acid and protein sequences contain a wealth of information of interest to molecular biologists, since the genome forms the blueprint of the cell. Currently, a database search for sequence similarities represents the most direct computational approach to decipher the codes connecting molecular sequences with protein structure and function (Doolittle, 1990). If the unknown protein is related to one of known structure/function, inferences based on the known structure/function and the degree of the relationship can provide the most reliable clues to the nature of the unknown protein. This technique has proved successful and has led to new understanding in a wide variety of biological studies (Boswell and Lesk, 1988).

Keywords

Neural Network Input Vector Training Pattern Sequence Entry Classification Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990): Basic local alignment search tool. J Mol Biol 215:403–410PubMedGoogle Scholar
  2. Asoh H, Otsu N (1990): An approximation of nonlinear discriminant analysis by multilayer neural networks. Proc Intn’l Joint Confon Neural Networks (June) 111:211–216CrossRefGoogle Scholar
  3. Bairoch A (1992): PROSITE: A dictionary of sites and patterns in proteins. Nucl Acids Res (Suppl) 20: 2013–2018PubMedGoogle Scholar
  4. Barker WC, George DG, Mewes H-W, Tsugita A (1992): The PIR-international protein sequence database. Nuc Acids Res (Suppl) 20: 2023–2026Google Scholar
  5. Barton (1991): A matrix method for optimizing a neural network. Neural Computation 3:450–459CrossRefGoogle Scholar
  6. Berry MW (1992): Large-scale sparse singular value computations. Int J Supercomputer Applications 6:13–49Google Scholar
  7. Bohr H, Bohr J, Brunak S, Cotterill RMJ, Fredholm H, Lautrup B, Peterson SB (1990): A novel approach to prediction of the 3-dimensional structures of protein backbones by neural networks. FEBS Letters 261:43–46PubMedCrossRefGoogle Scholar
  8. Boswell DR, Lesk AM (1988): Sequence comparison and alignment: The measurement and interpretation of sequence similarity. In: Computational Molecular Biology: Sources and Methods for Sequence Analysis, pp. 161–178. Lesk AM, ed. New York: Oxford University PressGoogle Scholar
  9. Brutlag DL, Dautricourt J-P, Fier RDJ, Moxon B, Stamm R (1992): BLAZE: An implementation of the Smith-Waterman sequence comparison algorithm on a massively parallel computer. Extended Abstracts of the 2nd International Workshop of Open Problems in Computational Molecular Biology: 60-68Google Scholar
  10. Chen S (1993): Characterization and learning of protein conformations. Proc. 2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:391-399Google Scholar
  11. Cherkassky V, Vassilas N (1989): Performance of back propagation networks for associative database retrieval. Proc Int Joint Conf on Neural Networks 1:77–83CrossRefGoogle Scholar
  12. Claverie J-M, Sauvaget I, Bougueleret L (1990): K-tuple frequency analysis: From intron/exon discrimination to T-cell epitope mapping. In: Molecular Evolution: Computer Analysis of Proteins and Nucleic Acid Sequences, Methods in Enzy-mology, Vol. 183. Doolittle RF, ed. New York: Academic Press, pp. 237–252Google Scholar
  13. Davison DB (1990): Sequence searching on supercomputers. In: Computers and DNA, Santa Fe Institute Studies in the Sciences of Complexity. Bell G, Marr TG, eds. Reading, MA: Addison-Wesley, pp. 93–97Google Scholar
  14. Dayhoff MO (ed.) (1972): Atlas of Protein Sequence and Structure, Volume 5. National Biomedical Research Foundation, Washington, D.C.Google Scholar
  15. Dayhoff J (1990): Neural Network Architectures, An Introduction. New York: Van Nostrand ReinholdGoogle Scholar
  16. Deerwester S, Dumais ST, Furnas, Landaur TK, Harshman R (1990): Indexing by latent semantic analysis. J Amer Soc for Information Science 41:391–407CrossRefGoogle Scholar
  17. Demeler B, Zhou G (1991): Neural network optimization for E. coli promoter prediction. Nuc Acids Res 19:1593–1599CrossRefGoogle Scholar
  18. Devereux J (1988): A rapid method for identifying sequences in large nucleotide sequence databases. Ph.D. Thesis, University of WisconsinGoogle Scholar
  19. Doolittle RF (1990): Searching through sequence databases. In: Molecular Evolution: Computer Analysis of Proteins and Nucleic Acid Sequences, Methods in Enzymology, Vol. 183. Doolittle RF, ed. New York: Academic Press, pp. 99–110Google Scholar
  20. Gallinari P, Thiria S and Soulie FF (1988): Multilayer perceptrons and data analysis. Proc Intn’l Joint Conf on Neural Networks 1:391–399Google Scholar
  21. Gribskov M, Devereux J (eds.) (1991): Sequence Analysis Primer. New York: Stockton PressGoogle Scholar
  22. Harris N, Hunter L, States D (1992): Megaclassification: Discovering motifs in massive datastreams. Proceedings of 10th National Conference on Artificial Intelligence, AAAI PressGoogle Scholar
  23. Henikoff S, Henikoff JG (1991): Automated assembly of protein blocks for database searching. Nuc Acid Res 19:6565–6572CrossRefGoogle Scholar
  24. Hirst JD, Sternberg MJE (1992): Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 31:7211–7218PubMedCrossRefGoogle Scholar
  25. Holley LH, Karplus M (1989): Protein secondary structure prediction with a neural network. Proc Natl Acad Sci USA 86:152–156PubMedCrossRefGoogle Scholar
  26. Horton PB, Kanehisa M (1992): An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. Nuc Acids Res 20:4331–4338CrossRefGoogle Scholar
  27. Karlin S, Ost F, Blaisdell BE (1989): Patterns in DNA and amino acid sequences and their statistical significance. In: Mathematical Methods for DNA Sequences, Waterman MS, ed. Boca Raton, FL: CRC Press, Inc. pp. 133–157Google Scholar
  28. Kimoto T, Asakawa K, Yoda M, Takeoka M (1990): Stock market prediction system with modular neural networks. Proc Int Joint Conf on Neural Networks (June) 1:1–6CrossRefGoogle Scholar
  29. Kneller DG, Cohen FE, Langridge R (1990): Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol 214:171–182PubMedCrossRefGoogle Scholar
  30. Konopka AK, Owens J (1990): Non-continuous patterns and compositional complexity of nucleic acid sequences. In: Computers and DNA, SFI Studies in the Sciences of Complexity, Vol. VII. Bell G, Marr T, eds. Addison-Wesley. pp. 147–155Google Scholar
  31. Lapedes A, Barnes C, Burks C, Farber R, Sirotkin K (1990): Application of neural networks and other machine learning algorithms to DNA sequence analysis. In: Computers and DNA, SFI Studies in the Sciences of Complexity, Vol. VII. Bell G, Marr T, eds. Addison-Wesley, pp. 157–182Google Scholar
  32. Le Cun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jeckel LD (1989): Backpropagation applied to handwritten zip code recognition. Neural Computation 1:541–551CrossRefGoogle Scholar
  33. Lendaris GG, Harb LA (1990): Improved generalization in ANNs via use of conceptual graphs: A character recognition task as an example case. Proc Intn’l Joint Conf on Neural Networks (June) I:551–556CrossRefGoogle Scholar
  34. Liebman MN (1993): Application of neural networks to the analysis of structure and function in biologically active macromolecules. Proc 2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:331-347Google Scholar
  35. Lipman DJ, Pearson WR (1985): Rapid and sensitive protein similarity searches. Science 277:1435–1441CrossRefGoogle Scholar
  36. Needleman SB, Wunsch CD (1970): A general method applicable to the search for similarities in the amino acid sequences of two proteins. J Mol Biol 48:443–453PubMedCrossRefGoogle Scholar
  37. Olsen GJ, Overbeek R, Larsen N, Marsh TL, McCaughey MJ, Maciukenas MA, Kuan W-M, Macke, TJ, Xing Y, Woese CR (1992): The ribosomal RNA database project. Nuc Acids Res (Suppl) 20:2199–2200Google Scholar
  38. O’Neill MC (1992): Escherichia coli promoters: Neural networks develop distinct descriptions in learning to search for promoters of different spacing classes. Nuc Acids Res 20:3471–3477CrossRefGoogle Scholar
  39. Pabo CO (1987): New generation databases for molecular biology. Nature 327: 467PubMedCrossRefGoogle Scholar
  40. Pearson WR, Lipman DJ (1988): Improved tools for biological sequence comparisons. Proc Nat Acad Sci 85:2444–2448PubMedCrossRefGoogle Scholar
  41. Qian N, Sejnowski TJ (1988): Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884PubMedCrossRefGoogle Scholar
  42. Rumelhart DE, McClelland JL (eds.) (1986): Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. MIT Press.Google Scholar
  43. Smith TF, Waterman M (1981): Identification of common molecular subsequences. J Mol Biol 147:195–197PubMedCrossRefGoogle Scholar
  44. Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982): Use of the ‘Perceptron’ algorithm to distinguish translation initiation sites in E. coli. Nuc Acids Res 10:2997–3011CrossRefGoogle Scholar
  45. Uberbacher EC, Mural RJ (1991): Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88:11261–11265PubMedCrossRefGoogle Scholar
  46. van Heel M (1991): A new family of powerful multivariant statistical sequence analysis techniques. J Mol Biol 220:877–887PubMedCrossRefGoogle Scholar
  47. Webb AR, Lowe D (1990): The optimized internal representation of multilayered classifier networks performs nonlinear discriminant analysis. Neural Networks 3:367–375CrossRefGoogle Scholar
  48. Woese CR (1987): Bacterial evolution. Microbiological Reviews 51:221–271PubMedGoogle Scholar
  49. Wu CH, Ermongkonchai A, Chang TC (1991): Protein classification using a neural network protein database (NNPDB) system. Proceedings of the Analysis of Neural Network Applications Conference:29-41Google Scholar
  50. Wu CH, Whitson G, McLarty J, Ermongkonchai A, Chang T (1992): Protein classification artificial neural system. Protein Science 1:667–677PubMedCrossRefGoogle Scholar
  51. Wu CH (1993): Classification neural networks for rapid sequence annotation and automated database organization. Computers & Chemistry 17:219–227CrossRefGoogle Scholar
  52. Xin Y, Carmeli T, Liebman M, Wilcox GL (1993): Use of the backpropagation neural network algorithm for prediction of protein folding patterns. Proc.2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:359-375Google Scholar

Copyright information

© Birkhäuser Boston 1994

Authors and Affiliations

  • Cathy H. Wu

There are no affiliations available

Personalised recommendations