Protein Data Condensation for Effective Quaternary Structure Classification

  • Fabrizio Angiulli
  • Valeria Fionda
  • Simona E. Rombo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4881)


Many proteins are composed of two or more subunits, each associated with different polypeptide chains. The number and the arrangement of subunits forming a protein are referred to as quaternary structure. The quaternary structure of a protein is important, since it characterizes the biological function of the protein when it is involved in specific biological processes. Unfortunately, quaternary structures are not trivially deducible from protein amino acid sequences. In this work, we propose a protein quaternary structure classification method exploiting the functional domain composition of proteins. It is based on a nearest neighbor condensation technique in order to reduce both the portion of dataset to be stored and the number of comparisons to carry out. Our approach seems to be promising, in that it guarantees an high classification accuracy, even though it does not require the entire dataset to be analyzed. Indeed, experimental evaluations show that the method here proposed selects a small dataset portion for the classification (of the order of the 6.43%) and that it is very accurate (97.74%).


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Angiulli, F.: Fast condensend nearest neighbor rule. In: Proc. of the 22nd International Conference on Machine Learning, Bonn, Germany (2005)Google Scholar
  2. 2.
    Bairoch, A., Apweiler, R.: The swiss-prot protein sequence data bank and its new supplement trembl. Nucleic Acids Research 24(1), 21–25 (1996)CrossRefGoogle Scholar
  3. 3.
    Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., Sonnhammer, E.L.L.: The pfam protein families database. Nucleic Acids Reserch 30(1), 276–280 (2002)CrossRefGoogle Scholar
  4. 4.
    Cai, Y.D., Doig, A.J.: Prediction of saccharomyces cerevisiae protein functional class from functional domain composition. Bioinformatics 20(8), 1292–1300 (2004)CrossRefGoogle Scholar
  5. 5.
    Chou, K.C., Cai, Y.D.: Predicting protein quaternary structure by pseudo amino acid composition. Proteins: Structure, Function, and Genetics 53(2), 282–289 (2003)CrossRefGoogle Scholar
  6. 6.
    Chou, K.C., Cai, Y.D.: Predicting protein structural class by functional domain composition. Biochemical and biophysical research communications 321(4), 1007–1009 (2004)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. on Inform. Th. 13(1), 21–27 (1967)MATHCrossRefGoogle Scholar
  8. 8.
    Devroye, L., Gyorfy, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)MATHGoogle Scholar
  9. 9.
    Fukunaga, K., Hostetler, L.D.: k-nearest-neighbor bayes-risk estimation. IEEE Transactions on Information Theory 21, 285–293 (1975)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Garian, R.: Prediction of quaternary structure from primary structure. Bioinformatics 17(6), 551–556 (2000)CrossRefGoogle Scholar
  11. 11.
    Kim, W.K., Park, J., Suh, J.K.: Large scale statistical prediction of protein-protein interaction by potentially interacting domain (pid) pair. In: Genome informatics. International Conference on Genome Informatics, vol. 13, pp. 42–50 (2002)Google Scholar
  12. 12.
    Klotz, I.M., Langerman, N.R., Darnall, D.W.: Quaternary structure of proteins. Annual review of biochemistry 39, 25–62 (1970)CrossRefGoogle Scholar
  13. 13.
    Lesk, A.M.: Introduction to Protein Architecture. Oxford University Press, Oxford (2001)Google Scholar
  14. 14.
    Meiler, J., Baker, D.: Coupled prediction of protein secondary and tertiary structure. Proceedings of the National Academy of Sciences of the United States of America 100(21), 12105–12110 (2003)CrossRefGoogle Scholar
  15. 15.
    Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)CrossRefGoogle Scholar
  16. 16.
    Song, J., Tang, H.: Accurate classification of homodimeric vs other homooligomeric proteins using a new measure of information discrepancy. Journal of chemical information and computer sciences 44(4), 1324–1327 (2004)CrossRefGoogle Scholar
  17. 17.
    Sund, H., Weber, K.: The quaternary structure of proteins. Angewandte Chemie (International eds in English) 5(2), 231–245 (1966)CrossRefGoogle Scholar
  18. 18.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38(3), 257–286 (2000)MATHCrossRefGoogle Scholar
  19. 19.
    Wojcik, J., Schachter, V.: Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics 17(1), 296–305 (2001)Google Scholar
  20. 20.
    Yu, X., Lin, J., Shi, T., Li, Y.: A novel domain-based method for predicting the functional classes of proteins. Chinese Science Bullettin - English Edition- 49(22), 2379–2384 (2004)Google Scholar
  21. 21.
    Yu, X., Wang, C., Li, Y.: Classification of protein quaternary structure by functional domain composition. BMC Bioinformatics 7(187) (2006)Google Scholar
  22. 22.
    Zhang, S.W., Pan, Q., Zhang, H.C., Zhang, Y.L., Wang, H.Y.: Classification of protein quaternary structure with support vector machine. Bioinformatics 19(18), 2390–2396 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Fabrizio Angiulli
    • 1
  • Valeria Fionda
    • 2
  • Simona E. Rombo
    • 1
  1. 1.DEIS - Università della Calabria, Via P. Bucci 41C, 87036 Rende (CS)Italy
  2. 2.Dept. of Mathematics, Via P. Bucci 31B, 87036 Rende (CS)Italy

Personalised recommendations