Parallel cascade identification (PCI) is a method for approximating the behavior of a nonlinear system, from input/output training data, by constructing a parallel array of cascaded dynamic linear and static nonlinear elements. PCI has previously been shown to provide an effective means for classifying protein sequences into structure/function families. In the present study, PCI is used to distinguish proteins that are binding to adenosine triphosphate or guanine triphosphate molecules from those that are nonbinding. Classification accuracy of 87.1% using the hydrophobicity scale of Rose et al. (Hydrophobicity of amino acid residues in globular proteins. Science 229:834–838, 1985), and 88.8% using Korenberg's SARAH1 scale, are obtained, as measured by tenfold cross-validation testing. Nearest-neighbor and K-nearest-neighbor (KNN) classifiers are constructed, and the resulting accuracy is, respectively, 88.0% and 90.8% on the SARAH1–encoded test data set, as measured by the above testing protocol. Significantly improved classification accuracy is achieved by combining PCI and KNN classifiers using quadratic discriminant analysis: accuracy rises from 87.9% (PCI) and 87.4% (KNN) to 96.5% for the combination, as measured by twofold cross-validation testing on the SARAH1–encoded test data set. © 2003 Biomedical Engineering Society.
PAC2003: 8714Ee, 8715Cc, 8715Aa
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Adeney, K. M., and M. J. Korenberg. Iterative fast orthogonal search algorithm for MDL-based training of generalized single-layer networks. Neural Networks13:787–799, 2000.
Bairoch, A., and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement the TrEMBL in 1999. Nucleic Acids Res.27:49–54, 1999; http://www/expasy.ch/sprot
Bairoch, A., P. Bucher, and K. Hofmann. The PROSITE database, its status in 1997. Nucleic Acids Res.24:217–221, 1997; http://www.expasy.ch/prosite/
Baldi, P., Y. Chauvin, T. Hunkapillar, and M. McClure. Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. U.S.A.91:1059–1063, 1994.
Cornette, J. L., K. B. Cease, H. Margalit, J. L. Spouge, J. A. Berzofsky, and C. DeLisi. Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J. Mol. Biol.195:659–685, 1987.
David, R. Applications of nonlinear system identification to protein structural prediction. MSc thesis, MIT, Cambridge, MA (2000).
Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. A model of evolutionary change in proteins. In:Atlas of Protein Sequence and Structure 5, edited by M. O. Dayhoff. Washington, DC: National Biomedical Research Foundation, 1978, Suppl. 3.
Dill, K. A.Dominant forces in protein folding. Biochemistry29:7133–7155, 1990.
Fickett, J. W., and C.-S. Tung. Assessment of protein coding measures. Nucleic Acids Res.20:6441–6450, 1992.
Henikoff, S., and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A.89:10915–10919, 1992.
Hirst, J. D., and M. J. E. Sternberg. Prediction of ATP-binding motifs: A comparison of a perceptron-type neural network and a consensus sequence method. Protein Eng.4:615–623, 1991; 6:549–554, 1993.
Hirst, J. D., and M. J. E. Sternberg. Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry31:7211–7218, 1992.
Kirkpatrick, P.Look into the future. Nature Rev. Drug Discovery1:334, 2002.
Koonin, E. V.A superfamily of ATPases with diverse functions containing either classical or deviant ATP-binding motif. J. Mol. Biol.229:1165–1174, 1993.
Korenberg, M. J. Statistical identification of parallel cascades of linear and nonlinear systems. Proceedings of the 6th IFAC Symposium on Identification and System Parameter Estimation 1:580–585, 1982.
Korenberg, M. J.Parallel cascade identification and kernel estimation for nonlinear systems. Ann. Biomed. Eng.19:429–455, 1991.
Korenberg, M. J.Prediction of treatment response using gene expression profiles. J. Proteome Res.1:55–61, 2002.
Korenberg, M. J., J. E. Solomon, and M. E. Regelson. Parallel cascade identification as a means for automatically classifying protein sequences into structure/function groups. Biol. Cybern.82:15–21, 2000.
Korenberg, M. J., R. David, I. W. Hunter, and J. E. Solomon. Automatic classification of protein sequences into structure/function groups via parallel cascade identification: A feasibility study. Ann. Biomed. Eng.28:803–811, 2000.
Krogh, A., M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol.235:1501–1531, 1994.
McLachlan, A. D.Multichannel Fourier analysis of patterns in protein sequences. J. Phys. Chem.97:3000–3006, 1993.
Palm, G.On representation and approximation of nonlinear systems. Part II. Discrete time. Biol. Cybern.34:49–52, 1979.
Regelson, M. E. Protein structure/function classification using hidden Markov models. PhD thesis, The Beckman Institute, California Institute of Technology, Pasadena, 1997.
Rose, G. D., A. R. Geselowitz, G. J. Lesser, R. H. Lee, and M. H. Aehfus. Hydrophobicity of amino acid residues in globular proteins. Science229:834–838, 1985.
Saraste, M., P. R. Sibbald, and A. Wittinghofer. The P-loop—A common motif in ATP-and GTP-binding proteins. Trends Biochem. Sci.15:430–435, 1990.
Sternberg, M. J. E., and S. A. Islam. Protein sequences—Homologies and motifs. Trends Biotechnol.9:300–302, 1991.
Stultz, C. M., J. V. White, and T. F. Smith. Structural analysis based on state-space modeling. Protein Sci.2:305–314, 1993.
Walker, J. E., M. Saraste, M. J. Runswick, and N. J. Gay. Distantly related sequences in the α and β subunits of ATP synthase, myosin, kinases, and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J.1:945–951, 1982.
Weiss, S. W., and C. A. Kulikowski. Computer Systems That Learn. San Francisco: Morgan Kaufmann, 1991, 223 pp.
White, J. V., C. M. Stultz, and T. F. Smith. Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. Math. Biosci.119:35–75, 1994.
About this article
Cite this article
Green, J.R., Korenberg, M.J., David, R. et al. Recognition of Adenosine Triphosphate Binding Sites Using Parallel Cascade System Identification. Annals of Biomedical Engineering 31, 462–470 (2003). https://doi.org/10.1114/1.1561293
- Nonlinear system identification
- Parallel cascade identification
- ATP-binding sites
- SARAH codes
- Protein sequence analysis