Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Recognition of Adenosine Triphosphate Binding Sites Using Parallel Cascade System Identification

Abstract

Parallel cascade identification (PCI) is a method for approximating the behavior of a nonlinear system, from input/output training data, by constructing a parallel array of cascaded dynamic linear and static nonlinear elements. PCI has previously been shown to provide an effective means for classifying protein sequences into structure/function families. In the present study, PCI is used to distinguish proteins that are binding to adenosine triphosphate or guanine triphosphate molecules from those that are nonbinding. Classification accuracy of 87.1% using the hydrophobicity scale of Rose et al. (Hydrophobicity of amino acid residues in globular proteins. Science 229:834–838, 1985), and 88.8% using Korenberg's SARAH1 scale, are obtained, as measured by tenfold cross-validation testing. Nearest-neighbor and K-nearest-neighbor (KNN) classifiers are constructed, and the resulting accuracy is, respectively, 88.0% and 90.8% on the SARAH1–encoded test data set, as measured by the above testing protocol. Significantly improved classification accuracy is achieved by combining PCI and KNN classifiers using quadratic discriminant analysis: accuracy rises from 87.9% (PCI) and 87.4% (KNN) to 96.5% for the combination, as measured by twofold cross-validation testing on the SARAH1–encoded test data set. © 2003 Biomedical Engineering Society.

PAC2003: 8714Ee, 8715Cc, 8715Aa

This is a preview of subscription content, log in to check access.

REFERENCES

  1. 1

    Adeney, K. M., and M. J. Korenberg. Iterative fast orthogonal search algorithm for MDL-based training of generalized single-layer networks. Neural Networks13:787–799, 2000.

  2. 2

    Bairoch, A., and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement the TrEMBL in 1999. Nucleic Acids Res.27:49–54, 1999; http://www/expasy.ch/sprot

  3. 3

    Bairoch, A., P. Bucher, and K. Hofmann. The PROSITE database, its status in 1997. Nucleic Acids Res.24:217–221, 1997; http://www.expasy.ch/prosite/

  4. 4

    Baldi, P., Y. Chauvin, T. Hunkapillar, and M. McClure. Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. U.S.A.91:1059–1063, 1994.

  5. 5

    Cornette, J. L., K. B. Cease, H. Margalit, J. L. Spouge, J. A. Berzofsky, and C. DeLisi. Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J. Mol. Biol.195:659–685, 1987.

  6. 6

    David, R. Applications of nonlinear system identification to protein structural prediction. MSc thesis, MIT, Cambridge, MA (2000).

  7. 7

    Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. A model of evolutionary change in proteins. In:Atlas of Protein Sequence and Structure 5, edited by M. O. Dayhoff. Washington, DC: National Biomedical Research Foundation, 1978, Suppl. 3.

  8. 8

    Dill, K. A.Dominant forces in protein folding. Biochemistry29:7133–7155, 1990.

  9. 9

    Fickett, J. W., and C.-S. Tung. Assessment of protein coding measures. Nucleic Acids Res.20:6441–6450, 1992.

  10. 10

    Henikoff, S., and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A.89:10915–10919, 1992.

  11. 11

    Hirst, J. D., and M. J. E. Sternberg. Prediction of ATP-binding motifs: A comparison of a perceptron-type neural network and a consensus sequence method. Protein Eng.4:615–623, 1991; 6:549–554, 1993.

  12. 12

    Hirst, J. D., and M. J. E. Sternberg. Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry31:7211–7218, 1992.

  13. 13

    Kirkpatrick, P.Look into the future. Nature Rev. Drug Discovery1:334, 2002.

  14. 14

    Koonin, E. V.A superfamily of ATPases with diverse functions containing either classical or deviant ATP-binding motif. J. Mol. Biol.229:1165–1174, 1993.

  15. 15

    Korenberg, M. J. Statistical identification of parallel cascades of linear and nonlinear systems. Proceedings of the 6th IFAC Symposium on Identification and System Parameter Estimation 1:580–585, 1982.

  16. 16

    Korenberg, M. J.Parallel cascade identification and kernel estimation for nonlinear systems. Ann. Biomed. Eng.19:429–455, 1991.

  17. 17

    Korenberg, M. J.Prediction of treatment response using gene expression profiles. J. Proteome Res.1:55–61, 2002.

  18. 18

    Korenberg, M. J., J. E. Solomon, and M. E. Regelson. Parallel cascade identification as a means for automatically classifying protein sequences into structure/function groups. Biol. Cybern.82:15–21, 2000.

  19. 19

    Korenberg, M. J., R. David, I. W. Hunter, and J. E. Solomon. Automatic classification of protein sequences into structure/function groups via parallel cascade identification: A feasibility study. Ann. Biomed. Eng.28:803–811, 2000.

  20. 20

    Krogh, A., M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol.235:1501–1531, 1994.

  21. 21

    McLachlan, A. D.Multichannel Fourier analysis of patterns in protein sequences. J. Phys. Chem.97:3000–3006, 1993.

  22. 22

    Palm, G.On representation and approximation of nonlinear systems. Part II. Discrete time. Biol. Cybern.34:49–52, 1979.

  23. 23

    Regelson, M. E. Protein structure/function classification using hidden Markov models. PhD thesis, The Beckman Institute, California Institute of Technology, Pasadena, 1997.

  24. 24

    Rose, G. D., A. R. Geselowitz, G. J. Lesser, R. H. Lee, and M. H. Aehfus. Hydrophobicity of amino acid residues in globular proteins. Science229:834–838, 1985.

  25. 25

    Saraste, M., P. R. Sibbald, and A. Wittinghofer. The P-loop—A common motif in ATP-and GTP-binding proteins. Trends Biochem. Sci.15:430–435, 1990.

  26. 26

    Sternberg, M. J. E., and S. A. Islam. Protein sequences—Homologies and motifs. Trends Biotechnol.9:300–302, 1991.

  27. 27

    Stultz, C. M., J. V. White, and T. F. Smith. Structural analysis based on state-space modeling. Protein Sci.2:305–314, 1993.

  28. 28

    Walker, J. E., M. Saraste, M. J. Runswick, and N. J. Gay. Distantly related sequences in the α and β subunits of ATP synthase, myosin, kinases, and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J.1:945–951, 1982.

  29. 29

    Weiss, S. W., and C. A. Kulikowski. Computer Systems That Learn. San Francisco: Morgan Kaufmann, 1991, 223 pp.

  30. 30

    White, J. V., C. M. Stultz, and T. F. Smith. Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. Math. Biosci.119:35–75, 1994.

Download references

Author information

Correspondence to James R. Green.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Green, J.R., Korenberg, M.J., David, R. et al. Recognition of Adenosine Triphosphate Binding Sites Using Parallel Cascade System Identification. Annals of Biomedical Engineering 31, 462–470 (2003). https://doi.org/10.1114/1.1561293

Download citation

  • Nonlinear system identification
  • Parallel cascade identification
  • ATP-binding sites
  • SARAH codes
  • Protein sequence analysis