Abstract
Nucleic acid and protein sequences contain a wealth of information of interest to molecular biologists, since the genome forms the blueprint of the cell. Currently, a database search for sequence similarities represents the most direct computational approach to decipher the codes connecting molecular sequences with protein structure and function (Doolittle, 1990). If the unknown protein is related to one of known structure/function, inferences based on the known structure/function and the degree of the relationship can provide the most reliable clues to the nature of the unknown protein. This technique has proved successful and has led to new understanding in a wide variety of biological studies (Boswell and Lesk, 1988).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990): Basic local alignment search tool. J Mol Biol 215:403–410
Asoh H, Otsu N (1990): An approximation of nonlinear discriminant analysis by multilayer neural networks. Proc Intn’l Joint Confon Neural Networks (June) 111:211–216
Bairoch A (1992): PROSITE: A dictionary of sites and patterns in proteins. Nucl Acids Res (Suppl) 20: 2013–2018
Barker WC, George DG, Mewes H-W, Tsugita A (1992): The PIR-international protein sequence database. Nuc Acids Res (Suppl) 20: 2023–2026
Barton (1991): A matrix method for optimizing a neural network. Neural Computation 3:450–459
Berry MW (1992): Large-scale sparse singular value computations. Int J Supercomputer Applications 6:13–49
Bohr H, Bohr J, Brunak S, Cotterill RMJ, Fredholm H, Lautrup B, Peterson SB (1990): A novel approach to prediction of the 3-dimensional structures of protein backbones by neural networks. FEBS Letters 261:43–46
Boswell DR, Lesk AM (1988): Sequence comparison and alignment: The measurement and interpretation of sequence similarity. In: Computational Molecular Biology: Sources and Methods for Sequence Analysis, pp. 161–178. Lesk AM, ed. New York: Oxford University Press
Brutlag DL, Dautricourt J-P, Fier RDJ, Moxon B, Stamm R (1992): BLAZE: An implementation of the Smith-Waterman sequence comparison algorithm on a massively parallel computer. Extended Abstracts of the 2nd International Workshop of Open Problems in Computational Molecular Biology: 60-68
Chen S (1993): Characterization and learning of protein conformations. Proc. 2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:391-399
Cherkassky V, Vassilas N (1989): Performance of back propagation networks for associative database retrieval. Proc Int Joint Conf on Neural Networks 1:77–83
Claverie J-M, Sauvaget I, Bougueleret L (1990): K-tuple frequency analysis: From intron/exon discrimination to T-cell epitope mapping. In: Molecular Evolution: Computer Analysis of Proteins and Nucleic Acid Sequences, Methods in Enzy-mology, Vol. 183. Doolittle RF, ed. New York: Academic Press, pp. 237–252
Davison DB (1990): Sequence searching on supercomputers. In: Computers and DNA, Santa Fe Institute Studies in the Sciences of Complexity. Bell G, Marr TG, eds. Reading, MA: Addison-Wesley, pp. 93–97
Dayhoff MO (ed.) (1972): Atlas of Protein Sequence and Structure, Volume 5. National Biomedical Research Foundation, Washington, D.C.
Dayhoff J (1990): Neural Network Architectures, An Introduction. New York: Van Nostrand Reinhold
Deerwester S, Dumais ST, Furnas, Landaur TK, Harshman R (1990): Indexing by latent semantic analysis. J Amer Soc for Information Science 41:391–407
Demeler B, Zhou G (1991): Neural network optimization for E. coli promoter prediction. Nuc Acids Res 19:1593–1599
Devereux J (1988): A rapid method for identifying sequences in large nucleotide sequence databases. Ph.D. Thesis, University of Wisconsin
Doolittle RF (1990): Searching through sequence databases. In: Molecular Evolution: Computer Analysis of Proteins and Nucleic Acid Sequences, Methods in Enzymology, Vol. 183. Doolittle RF, ed. New York: Academic Press, pp. 99–110
Gallinari P, Thiria S and Soulie FF (1988): Multilayer perceptrons and data analysis. Proc Intn’l Joint Conf on Neural Networks 1:391–399
Gribskov M, Devereux J (eds.) (1991): Sequence Analysis Primer. New York: Stockton Press
Harris N, Hunter L, States D (1992): Megaclassification: Discovering motifs in massive datastreams. Proceedings of 10th National Conference on Artificial Intelligence, AAAI Press
Henikoff S, Henikoff JG (1991): Automated assembly of protein blocks for database searching. Nuc Acid Res 19:6565–6572
Hirst JD, Sternberg MJE (1992): Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 31:7211–7218
Holley LH, Karplus M (1989): Protein secondary structure prediction with a neural network. Proc Natl Acad Sci USA 86:152–156
Horton PB, Kanehisa M (1992): An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. Nuc Acids Res 20:4331–4338
Karlin S, Ost F, Blaisdell BE (1989): Patterns in DNA and amino acid sequences and their statistical significance. In: Mathematical Methods for DNA Sequences, Waterman MS, ed. Boca Raton, FL: CRC Press, Inc. pp. 133–157
Kimoto T, Asakawa K, Yoda M, Takeoka M (1990): Stock market prediction system with modular neural networks. Proc Int Joint Conf on Neural Networks (June) 1:1–6
Kneller DG, Cohen FE, Langridge R (1990): Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol 214:171–182
Konopka AK, Owens J (1990): Non-continuous patterns and compositional complexity of nucleic acid sequences. In: Computers and DNA, SFI Studies in the Sciences of Complexity, Vol. VII. Bell G, Marr T, eds. Addison-Wesley. pp. 147–155
Lapedes A, Barnes C, Burks C, Farber R, Sirotkin K (1990): Application of neural networks and other machine learning algorithms to DNA sequence analysis. In: Computers and DNA, SFI Studies in the Sciences of Complexity, Vol. VII. Bell G, Marr T, eds. Addison-Wesley, pp. 157–182
Le Cun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jeckel LD (1989): Backpropagation applied to handwritten zip code recognition. Neural Computation 1:541–551
Lendaris GG, Harb LA (1990): Improved generalization in ANNs via use of conceptual graphs: A character recognition task as an example case. Proc Intn’l Joint Conf on Neural Networks (June) I:551–556
Liebman MN (1993): Application of neural networks to the analysis of structure and function in biologically active macromolecules. Proc 2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:331-347
Lipman DJ, Pearson WR (1985): Rapid and sensitive protein similarity searches. Science 277:1435–1441
Needleman SB, Wunsch CD (1970): A general method applicable to the search for similarities in the amino acid sequences of two proteins. J Mol Biol 48:443–453
Olsen GJ, Overbeek R, Larsen N, Marsh TL, McCaughey MJ, Maciukenas MA, Kuan W-M, Macke, TJ, Xing Y, Woese CR (1992): The ribosomal RNA database project. Nuc Acids Res (Suppl) 20:2199–2200
O’Neill MC (1992): Escherichia coli promoters: Neural networks develop distinct descriptions in learning to search for promoters of different spacing classes. Nuc Acids Res 20:3471–3477
Pabo CO (1987): New generation databases for molecular biology. Nature 327: 467
Pearson WR, Lipman DJ (1988): Improved tools for biological sequence comparisons. Proc Nat Acad Sci 85:2444–2448
Qian N, Sejnowski TJ (1988): Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884
Rumelhart DE, McClelland JL (eds.) (1986): Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. MIT Press.
Smith TF, Waterman M (1981): Identification of common molecular subsequences. J Mol Biol 147:195–197
Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982): Use of the ‘Perceptron’ algorithm to distinguish translation initiation sites in E. coli. Nuc Acids Res 10:2997–3011
Uberbacher EC, Mural RJ (1991): Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88:11261–11265
van Heel M (1991): A new family of powerful multivariant statistical sequence analysis techniques. J Mol Biol 220:877–887
Webb AR, Lowe D (1990): The optimized internal representation of multilayered classifier networks performs nonlinear discriminant analysis. Neural Networks 3:367–375
Woese CR (1987): Bacterial evolution. Microbiological Reviews 51:221–271
Wu CH, Ermongkonchai A, Chang TC (1991): Protein classification using a neural network protein database (NNPDB) system. Proceedings of the Analysis of Neural Network Applications Conference:29-41
Wu CH, Whitson G, McLarty J, Ermongkonchai A, Chang T (1992): Protein classification artificial neural system. Protein Science 1:667–677
Wu CH (1993): Classification neural networks for rapid sequence annotation and automated database organization. Computers & Chemistry 17:219–227
Xin Y, Carmeli T, Liebman M, Wilcox GL (1993): Use of the backpropagation neural network algorithm for prediction of protein folding patterns. Proc.2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:359-375
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Birkhäuser Boston
About this chapter
Cite this chapter
Wu, C.H. (1994). Neural Networks for Molecular Sequence Classification. In: Merz, K.M., Le Grand, S.M. (eds) The Protein Folding Problem and Tertiary Structure Prediction. Birkhäuser Boston. https://doi.org/10.1007/978-1-4684-6831-1_9
Download citation
DOI: https://doi.org/10.1007/978-1-4684-6831-1_9
Publisher Name: Birkhäuser Boston
Print ISBN: 978-1-4684-6833-5
Online ISBN: 978-1-4684-6831-1
eBook Packages: Springer Book Archive