Multi-domain Protein Family Classification Using Isomorphic Inter-property Relationships

  • Harpreet Singh
  • Pradeep Chowriappa
  • Sumeet Dua
Part of the Communications in Computer and Information Science book series (CCIS, volume 40)


Multi-domain proteins result from the duplication and combination of complex but limited number of domains. The ability to distinguish multi-domain homologs from unrelated pairs that share a domain is essential to genomic analysis. Heuristics based on sequence similarity and alignment coverage have been proposed to screen out domain insertions but have met with limited success. In this paper we propose a unique protein classification schema for multi-domain protein superfamilies. Segmented profiles of physico-chemical properties and amino acid composition are created for vector quantization based dimensionality reduction to create a feature profile for rule-discovery and classification. Association rules are mined to identify isomorphic relationships that govern the formation of domains between proteins to correctly predict homologous pairs and reject unrelated pairs, including those that share domains. Our results demonstrate that effective classification of conserved domain classes can be performed using these feature profiles, and the classifier is not susceptible to class imbalances frequently encountered in these databases.


Multi-domain proteins supervised classification association rules 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pearson, W.R.: Effective protein sequence comparison. Methods Enzymol. 266, 227–258 (1996)CrossRefPubMedGoogle Scholar
  2. 2.
    Song, N., Joseph, J.M., Davis, G.B., Durand, D.: Sequence Similarity Network Reveals. Common Ancestry of Multidomain Proteins 4(5), e1000063 (2004)Google Scholar
  3. 3.
    Wilce, M.C.J., Aguilar, M.-I., Hearn, M.T.: Physicochemical Basis of Amino Acid Hydrophobicity Scales: Evaluation of Four New Scales of Amino Acid Hydrophobicity Coefficients Derived from RP-HPLC of Peptides. Analytical Chemistry 67(7), 1210–1219 (1995)CrossRefGoogle Scholar
  4. 4.
    Dua, S., Singh, H., Thompson, H.W.: Associated Classification of Mammograms using Weighted Rules Based Classification. Elsevier Expert System with Applications (in press)Google Scholar
  5. 5.
    Vogel, C., Bashton, M., Kerrison, N.D., Chothia, C., Teichmann, S.A.: Structure, function and evolution of multidomain proteins  14, 208–216 (2004)Google Scholar
  6. 6.
    Hubbard, T.J.P., Murzin, A., Brenner, S., Chotia, C.: SCOP: a structural classification of proteins database. Nucl. Acids Res. 25, 236–239 (1997)CrossRefPubMedCentralGoogle Scholar
  7. 7.
    Cuff, A.L., Sillitoe, I., Lewis, T., Redfern, O.C., Garratt, R., Thornton, J., Orengo, C.A.: The CATH classification revisited–architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Research (2008)Google Scholar
  8. 8.
    Rossman, M.G., Lijas, A.: Recognition of structural domains in globular proteins. Journal of Molecular Biology 85(1), 177–181 (1974)CrossRefPubMedGoogle Scholar
  9. 9.
    Song, N., Sedgewick, R.D., Durand, D.: Domain architecture comparison for multi-domain homology identification. Journal of Computational Biology 14, 496–516 (2007)CrossRefPubMedGoogle Scholar
  10. 10.
    Aboderin, A.A.: Mobilities of amino acids on chromatography paper (RF). Int. J. Biochemistry 2, 537–544 (1971)CrossRefGoogle Scholar
  11. 11.
    Chou, F.G.: Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13, 211–222 (1974)CrossRefPubMedGoogle Scholar
  12. 12.
  13. 13.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD ICMD, pp. 207–216. ACM, Washington (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Harpreet Singh
    • 1
  • Pradeep Chowriappa
    • 1
  • Sumeet Dua
    • 1
    • 2
  1. 1.Data Mining Research Laboratory (DMRL), Department of Computer ScienceLouisiana Tech UniversityRustonU.S.A.
  2. 2.School of MedicineLouisiana State University Health Sciences CenterNew OrleansU.S.A.

Personalised recommendations