Annals of Biomedical Engineering

, Volume 31, Issue 4, pp 462–470 | Cite as

Recognition of Adenosine Triphosphate Binding Sites Using Parallel Cascade System Identification

  • James R. Green
  • Michael J. Korenberg
  • Robert David
  • Ian W. Hunter
Article

Abstract

Parallel cascade identification (PCI) is a method for approximating the behavior of a nonlinear system, from input/output training data, by constructing a parallel array of cascaded dynamic linear and static nonlinear elements. PCI has previously been shown to provide an effective means for classifying protein sequences into structure/function families. In the present study, PCI is used to distinguish proteins that are binding to adenosine triphosphate or guanine triphosphate molecules from those that are nonbinding. Classification accuracy of 87.1% using the hydrophobicity scale of Rose et al. (Hydrophobicity of amino acid residues in globular proteins. Science 229:834–838, 1985), and 88.8% using Korenberg's SARAH1 scale, are obtained, as measured by tenfold cross-validation testing. Nearest-neighbor and K-nearest-neighbor (KNN) classifiers are constructed, and the resulting accuracy is, respectively, 88.0% and 90.8% on the SARAH1–encoded test data set, as measured by the above testing protocol. Significantly improved classification accuracy is achieved by combining PCI and KNN classifiers using quadratic discriminant analysis: accuracy rises from 87.9% (PCI) and 87.4% (KNN) to 96.5% for the combination, as measured by twofold cross-validation testing on the SARAH1–encoded test data set. © 2003 Biomedical Engineering Society.

PAC2003: 8714Ee, 8715Cc, 8715Aa

Nonlinear system identification Parallel cascade identification ATP-binding sites SARAH codes Protein sequence analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. 1.
    Adeney, K. M., and M. J. Korenberg. Iterative fast orthogonal search algorithm for MDL-based training of generalized single-layer networks. Neural Networks13:787–799, 2000.Google Scholar
  2. 2.
    Bairoch, A., and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement the TrEMBL in 1999. Nucleic Acids Res.27:49–54, 1999; http://www/expasy.ch/sprotGoogle Scholar
  3. 3.
    Bairoch, A., P. Bucher, and K. Hofmann. The PROSITE database, its status in 1997. Nucleic Acids Res.24:217–221, 1997; http://www.expasy.ch/prosite/Google Scholar
  4. 4.
    Baldi, P., Y. Chauvin, T. Hunkapillar, and M. McClure. Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. U.S.A.91:1059–1063, 1994.Google Scholar
  5. 5.
    Cornette, J. L., K. B. Cease, H. Margalit, J. L. Spouge, J. A. Berzofsky, and C. DeLisi. Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J. Mol. Biol.195:659–685, 1987.Google Scholar
  6. 6.
    David, R. Applications of nonlinear system identification to protein structural prediction. MSc thesis, MIT, Cambridge, MA (2000).Google Scholar
  7. 7.
    Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. A model of evolutionary change in proteins. In:Atlas of Protein Sequence and Structure 5, edited by M. O. Dayhoff. Washington, DC: National Biomedical Research Foundation, 1978, Suppl. 3.Google Scholar
  8. 8.
    Dill, K. A.Dominant forces in protein folding. Biochemistry29:7133–7155, 1990.Google Scholar
  9. 9.
    Fickett, J. W., and C.-S. Tung. Assessment of protein coding measures. Nucleic Acids Res.20:6441–6450, 1992.Google Scholar
  10. 10.
    Henikoff, S., and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A.89:10915–10919, 1992.Google Scholar
  11. 11.
    Hirst, J. D., and M. J. E. Sternberg. Prediction of ATP-binding motifs: A comparison of a perceptron-type neural network and a consensus sequence method. Protein Eng.4:615–623, 1991; 6:549–554, 1993.Google Scholar
  12. 12.
    Hirst, J. D., and M. J. E. Sternberg. Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry31:7211–7218, 1992.Google Scholar
  13. 13.
    Kirkpatrick, P.Look into the future. Nature Rev. Drug Discovery1:334, 2002.Google Scholar
  14. 14.
    Koonin, E. V.A superfamily of ATPases with diverse functions containing either classical or deviant ATP-binding motif. J. Mol. Biol.229:1165–1174, 1993.Google Scholar
  15. 15.
    Korenberg, M. J. Statistical identification of parallel cascades of linear and nonlinear systems. Proceedings of the 6th IFAC Symposium on Identification and System Parameter Estimation 1:580–585, 1982.Google Scholar
  16. 16.
    Korenberg, M. J.Parallel cascade identification and kernel estimation for nonlinear systems. Ann. Biomed. Eng.19:429–455, 1991.Google Scholar
  17. 17.
    Korenberg, M. J.Prediction of treatment response using gene expression profiles. J. Proteome Res.1:55–61, 2002.Google Scholar
  18. 18.
    Korenberg, M. J., J. E. Solomon, and M. E. Regelson. Parallel cascade identification as a means for automatically classifying protein sequences into structure/function groups. Biol. Cybern.82:15–21, 2000.Google Scholar
  19. 19.
    Korenberg, M. J., R. David, I. W. Hunter, and J. E. Solomon. Automatic classification of protein sequences into structure/function groups via parallel cascade identification: A feasibility study. Ann. Biomed. Eng.28:803–811, 2000.Google Scholar
  20. 20.
    Krogh, A., M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol.235:1501–1531, 1994.Google Scholar
  21. 21.
    McLachlan, A. D.Multichannel Fourier analysis of patterns in protein sequences. J. Phys. Chem.97:3000–3006, 1993.Google Scholar
  22. 22.
    Palm, G.On representation and approximation of nonlinear systems. Part II. Discrete time. Biol. Cybern.34:49–52, 1979.Google Scholar
  23. 23.
    Regelson, M. E. Protein structure/function classification using hidden Markov models. PhD thesis, The Beckman Institute, California Institute of Technology, Pasadena, 1997.Google Scholar
  24. 24.
    Rose, G. D., A. R. Geselowitz, G. J. Lesser, R. H. Lee, and M. H. Aehfus. Hydrophobicity of amino acid residues in globular proteins. Science229:834–838, 1985.Google Scholar
  25. 25.
    Saraste, M., P. R. Sibbald, and A. Wittinghofer. The P-loop—A common motif in ATP-and GTP-binding proteins. Trends Biochem. Sci.15:430–435, 1990.Google Scholar
  26. 26.
    Sternberg, M. J. E., and S. A. Islam. Protein sequences—Homologies and motifs. Trends Biotechnol.9:300–302, 1991.Google Scholar
  27. 27.
    Stultz, C. M., J. V. White, and T. F. Smith. Structural analysis based on state-space modeling. Protein Sci.2:305–314, 1993.Google Scholar
  28. 28.
    Walker, J. E., M. Saraste, M. J. Runswick, and N. J. Gay. Distantly related sequences in the α and β subunits of ATP synthase, myosin, kinases, and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J.1:945–951, 1982.Google Scholar
  29. 29.
    Weiss, S. W., and C. A. Kulikowski. Computer Systems That Learn. San Francisco: Morgan Kaufmann, 1991, 223 pp.Google Scholar
  30. 30.
    White, J. V., C. M. Stultz, and T. F. Smith. Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. Math. Biosci.119:35–75, 1994.Google Scholar

Copyright information

© Biomedical Engineering Society 2003

Authors and Affiliations

  • James R. Green
    • 1
  • Michael J. Korenberg
    • 1
  • Robert David
    • 2
  • Ian W. Hunter
    • 2
  1. 1.Department of Electrical and Computer EngineeringQueen's UniversityKingstonCanada
  2. 2.Department of Mechanical EngineeringMassachusetts Institute of TechnologyCambridge

Personalised recommendations