Protein Fold Recognition Based Upon the Amino Acid Occurrence

  • Y. -h. Taguchi
  • M. Michael Gromiha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4774)


We have investigated the relative performance of amino acid occurrence and other features, such as predicted secondary structure, hydrophobicity, normalized van der Waals volume, polarity, polarizability, and real/predicted contact information of residues, for recognizing protein folds. We observed that the improvement over other features is only marginal compared with amino acid occurrence. This is because amino acid occurrence, indirectly, can consider varieties of physical properties which are useful to discriminate protein folds. If we consider only proteins which are well aligned structurally with each other, the accuracy of discrimination is drastically improved. In order to discriminate protein folds more accurately, we need to consider anything other than structure alignment.


Linear Discriminant Analysis Structural Alignment Solvent Accessible Surface Area Gibbs Free Energy Change Fold Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Shi, J., Blundell, T.L., Mizuguchi, K.: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310, 243–257 (2001)CrossRefGoogle Scholar
  2. 2.
    Zhou, H., Zhou, Y.: Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins 58, 321–328 (2005)CrossRefGoogle Scholar
  3. 3.
    Cheng, J., Baldi, P.: A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22, 1456–1463 (2006)CrossRefGoogle Scholar
  4. 4.
    Gromiha, M.M., Suwa, M.: A Simple statistical method for discriminating outer membrane proteins with better accuracy. Bioinformatics 21, 961–968 (2005)CrossRefGoogle Scholar
  5. 5.
    Hirokawa, T., Boon-Chieng, S., Mitaku, S.: SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 14, 378–379 (1998)CrossRefGoogle Scholar
  6. 6.
    Chou, K.C.: Prediction of protein structural classes and subcellular locations Curr. Protein Pept. Sci. 1, 171–208 (2000)CrossRefGoogle Scholar
  7. 7.
    Gromiha, M.M., Selvaraj, S., Thangakani, A.M.: A Statistical method for predicting protein unfolding rates from amino acid sequence. J. Chem. Inf. Model 46, 1503–1508 (2006)CrossRefGoogle Scholar
  8. 8.
    Gromiha, M.M., Oobatake, M., Kono, H., Uedaira, H., Sarai, A.: Relationship between amino acid properties and protein stability: Buried Mutations. J. Protein Chem. 18, 565–578 (1999)CrossRefGoogle Scholar
  9. 9.
    Ding, H.Q.D., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)CrossRefGoogle Scholar
  10. 10.
    Taguchi, Y.-h., Gromiha, M.M.: Comparison of amino acid occurrence and composition for predicting protein folds. IPSJ SIG Technical Report 2007-BIO-008, pp. 9–16 (2007)Google Scholar
  11. 11.
    Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)Google Scholar
  12. 12.
  13. 13.
    Pearl, F.M., Bennett, C.F., Bray, J.E., Harrison, A.P., Martin, N., Shepherd, A., Sillitoe, I., Thornton, J., Orengo, C.A.: The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Research 31, 452–455 (2003)CrossRefGoogle Scholar
  14. 14.
    Gubbi, J., Shilton, A., Parker, M., Palaniswami, M.: Protein Topology Classification Using Two-Stage Support Vector Machines. Genome Informatics 17, 259–269 (2006)Google Scholar
  15. 15.
  16. 16.
    Olmea, O., Valencia, A.: Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Folding Design 2, S25–S32 (1997). Fariselli, P., Casadio, R.: A neural network based predictor of residue contacts in proteins. Protein Eng. 12, 15–21(1999)Google Scholar
  17. 17.
    Nick and Thomas’ Protein Contact Prediction Server,
  18. 18.
    MacCallum, R.M.: Striped sheets and protein contact prediction. Bioinformatics 20(suppl. 1), I224–I231 (2004)CrossRefGoogle Scholar
  19. 19.
    Gromiha, M.M., Oobatake, M., Sarai, A.: Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophysical Chemistry 82, 51–67 (1999)CrossRefGoogle Scholar
  20. 20.
    Gromiha, M.M., Oobatake, M., Kono, H., Uedaira, H., Sarai, A.: Importance of Mutant Position in Ramachandran Plot for Predicting Protein Stability of Surface Mutations. Biopolymers 64, 210–220 (2002)CrossRefGoogle Scholar
  21. 21.
    Grmiha, M.M.: Importance of Native-state Topology for Determining the Folding Rate of Two-state Proteins. J. Chem. Inf. Comp. Sci. 43, 1481–1485 (2003)CrossRefGoogle Scholar
  22. 22.
    Zeyar, A., Kian-Lee, T.: MatAlign: Precise Protein Structure Comparison by Matrix Alignment. J. Bioinform. Comp. Biol. 6, 1197–1216 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Y. -h. Taguchi
    • 1
  • M. Michael Gromiha
    • 2
  1. 1.Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551Japan
  2. 2.Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064Japan

Personalised recommendations