Neural Networks for Molecular Sequence Classification

Wu, Cathy H.

doi:10.1007/978-1-4684-6831-1_9

Cathy H. Wu

Abstract

Nucleic acid and protein sequences contain a wealth of information of interest to molecular biologists, since the genome forms the blueprint of the cell. Currently, a database search for sequence similarities represents the most direct computational approach to decipher the codes connecting molecular sequences with protein structure and function (Doolittle, 1990). If the unknown protein is related to one of known structure/function, inferences based on the known structure/function and the degree of the relationship can provide the most reliable clues to the nature of the unknown protein. This technique has proved successful and has led to new understanding in a wide variety of biological studies (Boswell and Lesk, 1988).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990): Basic local alignment search tool. J Mol Biol 215:403–410
PubMed CAS Google Scholar
Asoh H, Otsu N (1990): An approximation of nonlinear discriminant analysis by multilayer neural networks. Proc Intn’l Joint Confon Neural Networks (June) 111:211–216
Article Google Scholar
Bairoch A (1992): PROSITE: A dictionary of sites and patterns in proteins. Nucl Acids Res (Suppl) 20: 2013–2018
PubMed CAS Google Scholar
Barker WC, George DG, Mewes H-W, Tsugita A (1992): The PIR-international protein sequence database. Nuc Acids Res (Suppl) 20: 2023–2026
CAS Google Scholar
Barton (1991): A matrix method for optimizing a neural network. Neural Computation 3:450–459
Article Google Scholar
Berry MW (1992): Large-scale sparse singular value computations. Int J Supercomputer Applications 6:13–49
Google Scholar
Bohr H, Bohr J, Brunak S, Cotterill RMJ, Fredholm H, Lautrup B, Peterson SB (1990): A novel approach to prediction of the 3-dimensional structures of protein backbones by neural networks. FEBS Letters 261:43–46
Article PubMed CAS Google Scholar
Boswell DR, Lesk AM (1988): Sequence comparison and alignment: The measurement and interpretation of sequence similarity. In: Computational Molecular Biology: Sources and Methods for Sequence Analysis, pp. 161–178. Lesk AM, ed. New York: Oxford University Press
Google Scholar
Brutlag DL, Dautricourt J-P, Fier RDJ, Moxon B, Stamm R (1992): BLAZE: An implementation of the Smith-Waterman sequence comparison algorithm on a massively parallel computer. Extended Abstracts of the 2nd International Workshop of Open Problems in Computational Molecular Biology: 60-68
Google Scholar
Chen S (1993): Characterization and learning of protein conformations. Proc. 2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:391-399
Google Scholar
Cherkassky V, Vassilas N (1989): Performance of back propagation networks for associative database retrieval. Proc Int Joint Conf on Neural Networks 1:77–83
Article Google Scholar
Claverie J-M, Sauvaget I, Bougueleret L (1990): K-tuple frequency analysis: From intron/exon discrimination to T-cell epitope mapping. In: Molecular Evolution: Computer Analysis of Proteins and Nucleic Acid Sequences, Methods in Enzy-mology, Vol. 183. Doolittle RF, ed. New York: Academic Press, pp. 237–252
Google Scholar
Davison DB (1990): Sequence searching on supercomputers. In: Computers and DNA, Santa Fe Institute Studies in the Sciences of Complexity. Bell G, Marr TG, eds. Reading, MA: Addison-Wesley, pp. 93–97
Google Scholar
Dayhoff MO (ed.) (1972): Atlas of Protein Sequence and Structure, Volume 5. National Biomedical Research Foundation, Washington, D.C.
Google Scholar
Dayhoff J (1990): Neural Network Architectures, An Introduction. New York: Van Nostrand Reinhold
Google Scholar
Deerwester S, Dumais ST, Furnas, Landaur TK, Harshman R (1990): Indexing by latent semantic analysis. J Amer Soc for Information Science 41:391–407
Article Google Scholar
Demeler B, Zhou G (1991): Neural network optimization for E. coli promoter prediction. Nuc Acids Res 19:1593–1599
Article CAS Google Scholar
Devereux J (1988): A rapid method for identifying sequences in large nucleotide sequence databases. Ph.D. Thesis, University of Wisconsin
Google Scholar
Doolittle RF (1990): Searching through sequence databases. In: Molecular Evolution: Computer Analysis of Proteins and Nucleic Acid Sequences, Methods in Enzymology, Vol. 183. Doolittle RF, ed. New York: Academic Press, pp. 99–110
Google Scholar
Gallinari P, Thiria S and Soulie FF (1988): Multilayer perceptrons and data analysis. Proc Intn’l Joint Conf on Neural Networks 1:391–399
Google Scholar
Gribskov M, Devereux J (eds.) (1991): Sequence Analysis Primer. New York: Stockton Press
Google Scholar
Harris N, Hunter L, States D (1992): Megaclassification: Discovering motifs in massive datastreams. Proceedings of 10th National Conference on Artificial Intelligence, AAAI Press
Google Scholar
Henikoff S, Henikoff JG (1991): Automated assembly of protein blocks for database searching. Nuc Acid Res 19:6565–6572
Article CAS Google Scholar
Hirst JD, Sternberg MJE (1992): Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 31:7211–7218
Article PubMed CAS Google Scholar
Holley LH, Karplus M (1989): Protein secondary structure prediction with a neural network. Proc Natl Acad Sci USA 86:152–156
Article PubMed CAS Google Scholar
Horton PB, Kanehisa M (1992): An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. Nuc Acids Res 20:4331–4338
Article CAS Google Scholar
Karlin S, Ost F, Blaisdell BE (1989): Patterns in DNA and amino acid sequences and their statistical significance. In: Mathematical Methods for DNA Sequences, Waterman MS, ed. Boca Raton, FL: CRC Press, Inc. pp. 133–157
Google Scholar
Kimoto T, Asakawa K, Yoda M, Takeoka M (1990): Stock market prediction system with modular neural networks. Proc Int Joint Conf on Neural Networks (June) 1:1–6
Article Google Scholar
Kneller DG, Cohen FE, Langridge R (1990): Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol 214:171–182
Article PubMed CAS Google Scholar
Konopka AK, Owens J (1990): Non-continuous patterns and compositional complexity of nucleic acid sequences. In: Computers and DNA, SFI Studies in the Sciences of Complexity, Vol. VII. Bell G, Marr T, eds. Addison-Wesley. pp. 147–155
Google Scholar
Lapedes A, Barnes C, Burks C, Farber R, Sirotkin K (1990): Application of neural networks and other machine learning algorithms to DNA sequence analysis. In: Computers and DNA, SFI Studies in the Sciences of Complexity, Vol. VII. Bell G, Marr T, eds. Addison-Wesley, pp. 157–182
Google Scholar
Le Cun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jeckel LD (1989): Backpropagation applied to handwritten zip code recognition. Neural Computation 1:541–551
Article Google Scholar
Lendaris GG, Harb LA (1990): Improved generalization in ANNs via use of conceptual graphs: A character recognition task as an example case. Proc Intn’l Joint Conf on Neural Networks (June) I:551–556
Article Google Scholar
Liebman MN (1993): Application of neural networks to the analysis of structure and function in biologically active macromolecules. Proc 2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:331-347
Google Scholar
Lipman DJ, Pearson WR (1985): Rapid and sensitive protein similarity searches. Science 277:1435–1441
Article Google Scholar
Needleman SB, Wunsch CD (1970): A general method applicable to the search for similarities in the amino acid sequences of two proteins. J Mol Biol 48:443–453
Article PubMed CAS Google Scholar
Olsen GJ, Overbeek R, Larsen N, Marsh TL, McCaughey MJ, Maciukenas MA, Kuan W-M, Macke, TJ, Xing Y, Woese CR (1992): The ribosomal RNA database project. Nuc Acids Res (Suppl) 20:2199–2200
CAS Google Scholar
O’Neill MC (1992): Escherichia coli promoters: Neural networks develop distinct descriptions in learning to search for promoters of different spacing classes. Nuc Acids Res 20:3471–3477
Article Google Scholar
Pabo CO (1987): New generation databases for molecular biology. Nature 327: 467
Article PubMed CAS Google Scholar
Pearson WR, Lipman DJ (1988): Improved tools for biological sequence comparisons. Proc Nat Acad Sci 85:2444–2448
Article PubMed CAS Google Scholar
Qian N, Sejnowski TJ (1988): Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884
Article PubMed CAS Google Scholar
Rumelhart DE, McClelland JL (eds.) (1986): Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. MIT Press.
Google Scholar
Smith TF, Waterman M (1981): Identification of common molecular subsequences. J Mol Biol 147:195–197
Article PubMed CAS Google Scholar
Stormo GD, Schneider TD, Gold L, Ehrenfeucht A (1982): Use of the ‘Perceptron’ algorithm to distinguish translation initiation sites in E. coli. Nuc Acids Res 10:2997–3011
Article CAS Google Scholar
Uberbacher EC, Mural RJ (1991): Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88:11261–11265
Article PubMed CAS Google Scholar
van Heel M (1991): A new family of powerful multivariant statistical sequence analysis techniques. J Mol Biol 220:877–887
Article PubMed Google Scholar
Webb AR, Lowe D (1990): The optimized internal representation of multilayered classifier networks performs nonlinear discriminant analysis. Neural Networks 3:367–375
Article Google Scholar
Woese CR (1987): Bacterial evolution. Microbiological Reviews 51:221–271
PubMed CAS Google Scholar
Wu CH, Ermongkonchai A, Chang TC (1991): Protein classification using a neural network protein database (NNPDB) system. Proceedings of the Analysis of Neural Network Applications Conference:29-41
Google Scholar
Wu CH, Whitson G, McLarty J, Ermongkonchai A, Chang T (1992): Protein classification artificial neural system. Protein Science 1:667–677
Article PubMed CAS Google Scholar
Wu CH (1993): Classification neural networks for rapid sequence annotation and automated database organization. Computers & Chemistry 17:219–227
Article CAS Google Scholar
Xin Y, Carmeli T, Liebman M, Wilcox GL (1993): Use of the backpropagation neural network algorithm for prediction of protein folding patterns. Proc.2nd International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis:359-375
Google Scholar

Download references

Authors

Cathy H. Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory, University Park, PA, 16802-6300, USA
Kenneth M. Merz Jr. & Scott M. Le Grand &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, C.H. (1994). Neural Networks for Molecular Sequence Classification. In: Merz, K.M., Le Grand, S.M. (eds) The Protein Folding Problem and Tertiary Structure Prediction. Birkhäuser Boston. https://doi.org/10.1007/978-1-4684-6831-1_9

Download citation

DOI: https://doi.org/10.1007/978-1-4684-6831-1_9
Publisher Name: Birkhäuser Boston
Print ISBN: 978-1-4684-6833-5
Online ISBN: 978-1-4684-6831-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics