Domain Architecture in Homolog Identification

  • N. Song
  • R. D. Sedgewick
  • D. Durand
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4205)


Homology identification is the first step for many genomic studies. Current methods, based on sequence comparison, can result in a substantial number of mis-assignments due to the alignment of homologous domains in otherwise unrelated sequences. Here we propose methods to detect homologs through explicit comparison of domain architecture. We developed several schemes for scoring the similarity of a pair of protein sequences by exploiting an analogy between comparing proteins using their domain content and comparing documents based on their word content. We evaluate the proposed methods using a benchmark of fifteen sequence families of known evolutionary history. The results of these studies demonstrate the effectiveness of comparing domain architectures using these similarity measures. We also demonstrate the importance of both weighting critical domains and of compensating for proteins with large numbers of domains.


Kinase Family Cosine Similarity Domain Architecture Jaccard Similarity Homologous Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Huynen, M.A., Bork, P.: Measuring genome evolution. PNAS 95(11), 5849–5856 (1998)CrossRefGoogle Scholar
  2. 2.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRefGoogle Scholar
  3. 3.
    Gilbert, W.: The exon theory of genes. Cold Spring Harb. Symp. Quant. Biol. 52, 901–905 (1987)Google Scholar
  4. 4.
    Patthy, L.: Genome evolution and the evolution of exon-shuffling–a review. Gene 238(1), 103–114 (1999)CrossRefGoogle Scholar
  5. 5.
    Eichler, E.E.: Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 17(11), 661–669 (2001)CrossRefGoogle Scholar
  6. 6.
    Emanuel, B.S., Shaikh, T.H.: Segmental duplications: an ’expanding’ role in genomic instability and disease. Nat. Rev. Genet. 2(10), 791–800 (2001)CrossRefGoogle Scholar
  7. 7.
    Kaessmann, H., Zollner, S., Nekrutenko, A., Li, W.H.: Signatures of domain shuffling in the human genome. Genome Res. 12(11), 1642–1650 (2002)CrossRefGoogle Scholar
  8. 8.
    Wang, W., Zhang, J., Alvarez, C., Llopart, A., Long, M.: The origin of the jingwei gene and the complex modular structure of its parental gene, yellow emperor, in drosophila melanogaster. Mol. Biol. Evol. 17(9), 1294–1301 (2000)Google Scholar
  9. 9.
    Long, M.: Evolution of novel genes. Curr. Opin. Genet. Dev. 11(6), 673–680 (2001)CrossRefGoogle Scholar
  10. 10.
    Long, M., Thornton, K.: Gene duplication and evolution. Science 293(5535), 1551 (2001)CrossRefGoogle Scholar
  11. 11.
    Apic, G., Gough, J., Teichmann, S.A.: Domain coalmbinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310(2), 311–325 (2001)CrossRefGoogle Scholar
  12. 12.
    Letunic, I., Goodstadt, L., Dickens, N.J., Doerks, T., Schultz, J., Mott, R., Ciccarelli, F., Copley, R.R., Ponting, C.P., Bork, P.: Recent improvements to the smart domain-based sequence annotation resource. Nucleic Acids Res. 30(1), 242–244 (2002)CrossRefGoogle Scholar
  13. 13.
    Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., Sonnhammer, E.L.L.: The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)CrossRefGoogle Scholar
  14. 14.
    Corpet, F., Gouzy, J., Kahn, D.: The ProDom database of protein domain families. Nucleic Acids Res. 26(1), 323–326 (1998)CrossRefGoogle Scholar
  15. 15.
    Gracy, J., Argos, P.: Domo: a new database of aligned protein domains. Trends Biochem. Sci. 23(12), 495–497 (1998)CrossRefGoogle Scholar
  16. 16.
    Heger, A., Holm, L.: Exhaustive enumeration of protein domain families. J. Mol. Biol. 328(3), 749–767 (2003)CrossRefGoogle Scholar
  17. 17.
    Murzin, A., Brenner, S., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540 (1995)Google Scholar
  18. 18.
    Geer, L.Y., Domrachev, M., Lipman, D.J., Bryant, S.H.: CDART: protein homology by domain architecture. Genome Res. 12(10), 1619–1623 (2002)CrossRefGoogle Scholar
  19. 19.
    Bjorklund, A.K., Ekman, D., Light, S., Frey-Skott, J., Elofsson, A.: Domain rearrangements in protein evolution. J. Mol. Biol. 353(4), 911–923 (2005)CrossRefGoogle Scholar
  20. 20.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  21. 21.
    Rubin, G.M., Yandell, M.D., Wortman, J.R., Gabor, M.G.L., Nelson, C.R., Hariharan, I.K., Fortini, M.E., Li, P.W., Apweiler, R., Fleischmann, W., et al.: Comparative genomics of the eukaryotes. Science 287(5461), 2204–2215 (2000)CrossRefGoogle Scholar
  22. 22.
    Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285(5428), 751–753 (1999)CrossRefGoogle Scholar
  23. 23.
    Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.: The universal protein resource (UniProt). Nucleic Acids Res. 33, D154–D159 (2005)CrossRefGoogle Scholar
  24. 24.
    Nicholson, A.C., Malik, S.B., Logsdon, J.M.J., Van Meir, E.G.: Functional evolution of ADAMTS genes: evidence from analyses of phylogeny and gene organization. BMC Evol. Biol. 5(1), 11 (2005)CrossRefGoogle Scholar
  25. 25.
    Stone, A.L., Kroeger, M., Sang, Q.X.: Structure-function analysis of the adam family of disintegrin-like and metalloproteinase-containing proteins (review). J. Protein Chem. 18(4), 447–465 (1999)CrossRefGoogle Scholar
  26. 26.
    Wolfsberg, T.G., White, J.M.: Adams in fertilization and development. Dev. Biol. 180(2), 389–401 (1996)CrossRefGoogle Scholar
  27. 27.
    Wharton, K.A.: Runnin’ with the Dvl: proteins that associate with Dsh/Dvl and their significance to Wnt signal transduction. Dev. Biol. 253(1), 1–17 (2003)CrossRefGoogle Scholar
  28. 28.
    Sheldahl, L.C., Slusarski, D.C., Pandur, P., Miller, J.R., Kühl, M., Moon, R.T.: Dishevelled activates Ca2+ flux, PKC, and CamKII in vertebrate embryos. J. Cell Biol. 161(4), 769–777 (2003)CrossRefGoogle Scholar
  29. 29.
    Mazet, F., Yu, J.K., Liberles, D.A., Holland, L.Z., Shimeld, S.M.: Phylogenetic relationships of the fox (forkhead) gene family in the bilateria. Gene 316, 79–89 (2003)CrossRefGoogle Scholar
  30. 30.
    Kaestner, K.H., Knochel, W., Martinez, D.E.: Unified nomenclature for the winged helix/forkhead transcription factors. Genes Dev. 14(2), 142–146 (2000)Google Scholar
  31. 31.
    Lowry, J.A., Atchley, W.R.: Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain. J. Mol. Evol. 50(2), 103–115 (2000)Google Scholar
  32. 32.
    Patient, R.K., McGhee, J.D.: The GATA family (vertebrates and invertebrates). Curr. Opin. Genet. Dev. 12(4), 416–422 (2002)CrossRefGoogle Scholar
  33. 33.
    Robinson, D.R., Wu, Y.M., Lin, S.F.: The protein tyrosine kinase family of the human genome. Oncogene 19(49), 5548–5557 (2000)CrossRefGoogle Scholar
  34. 34.
    Hanks, S.K.: Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol. 4(5), 111 (2003)CrossRefGoogle Scholar
  35. 35.
    Cheek, S., Zhang, H., Grishin, N.V.: Sequence and structure classification of kinases. J. Mol. Biol. 320(4), 855–881 (2002)CrossRefGoogle Scholar
  36. 36.
    Shiu, S.H., Li, W.H.: Origins, lineage-specific expansions, and multiple losses of tyrosine kinases in eukaryotes. Mol. Biol. Evol. 21(5), 828–840 (2004)CrossRefGoogle Scholar
  37. 37.
    Iwabe, N., Miyata, T.: Kinesin-related genes from diplomonad, sponge, amphioxus, and cyclostomes: divergence pattern of kinesin family and evolution of giardial membrane-bounded organella. Mol. Biol. Evol. 19(9), 1524–1533 (2002)Google Scholar
  38. 38.
    Lawrence, C.J., Dawe, R.K., Christie, K.R., Cleveland, D.W., Dawson, S.C., Endow, S.A., Goldstein, L.S., Goodson, H.V., Hirokawa, N., Howard, J., et al.: A standardized kinesin nomenclature. J. Cell Biol. 67(1), 19–22 (2004)CrossRefGoogle Scholar
  39. 39.
    Miki, H., Setou, M., Hirokawa, N.: Kinesin superfamily proteins (kifs) in the mouse transcriptome. Genome Res. 13(6B), 1455–1465 (2003)CrossRefGoogle Scholar
  40. 40.
    Welch, A.Y., Kasahara, M., Spain, L.M.: Identification of the mouse killer immunoglobulin-like receptor-like (Kirl) gene family mapping to chromosome X. Immunogenetics 54(11), 782–790 (2003)Google Scholar
  41. 41.
    Belkin, D., Torkar, M., Chang, C., Barten, R., Tolaini, M., Haude, A., Allen, R., Wilson, M.J., Kioussis, D., Trowsdale, J.: Killer cell Ig-like receptor and leukocyte Ig-like receptor transgenic mice exhibit tissue- and cell-specific transgene expression. J. Immunol. 171(6), 3056–3063 (2003)Google Scholar
  42. 42.
    Engel, J.: Laminins and other strange proteins. Biochemistry 31(44), 10643–10651 (1992)CrossRefGoogle Scholar
  43. 43.
    Hutter, H., Vogel, B.E., Plenefisch, J.D., Norris, C.R., Proenca, R.B., Spieth, J., Guo, C., Mastwal, S., Zhu, X., Scheel, J., Hedgecock, E.M.: Conservation and novelty in the evolution of cell adhesion and extracellular matrix genes. Science 287(5455), 989–994 (2000)CrossRefGoogle Scholar
  44. 44.
    Richards, T.A., Cavalier-Smith, T.: Myosin domain evolution and the primary divergence of eukaryotes. Nature 436(7054), 1113–1118 (2005)CrossRefGoogle Scholar
  45. 45.
    Goodson, H.V., Dawson, S.C.: Multiplying myosins. Proc. Natl. Acad. Sci. USA 103(10), 3498–3499 (2006)CrossRefGoogle Scholar
  46. 46.
    Foth, B.J., Goedecke, M.C., Soldati, D.: New insights into myosin evolution and classification. Proc. Natl. Acad. Sci. USA 103(10), 3681–3686 (2006)CrossRefGoogle Scholar
  47. 47.
    Maine, E.M., Lissemore, J.L., Starmer, W.T.: A phylogenetic analysis of vertebrate and invertebrate notch-related genes. Mol. Phylogenet. Evol. 4(2), 139–149 (1995)CrossRefGoogle Scholar
  48. 48.
    Westin, J., Lardelli, M.: Three novel notch genes in zebrafish: implications for vertebrate notch gene evolution and function. Dev. Genes. Evol. 207(1), 51–63 (1997)CrossRefGoogle Scholar
  49. 49.
    Kortschak, R.D., Tamme, R., Lardelli, M.: Evolutionary analysis of vertebrate notch genes. Dev. Genes. Evol. 211(7), 350–354 (2001)CrossRefGoogle Scholar
  50. 50.
    Degerman, E., Belfrage, P., Manganiello, V.: Structure, localization, and regulation of cGMP-inhibited phosphodiesterase (PDE3). J. Biol. Chem. 272(11), 6823–6826 (1997)CrossRefGoogle Scholar
  51. 51.
    Raper, J.: Semaphorins and their receptors in vertebrates and invertebrates. Curr. Opin. Neurobiol. 10(1), 88–94 (2000)CrossRefGoogle Scholar
  52. 52.
    Yazdani, U., Terman, J.R.: The semaphorins. Genome. Biol. 7(3), 211 (2006)CrossRefGoogle Scholar
  53. 53.
    Locksley, R.M., Killeen, N., Lenardo, M.J.: The tnf and tnf receptor superfamilies: integrating mammalian biology. Cell 104(4), 487–501 (2001)CrossRefGoogle Scholar
  54. 54.
    MacEwan, D.J.: TNF ligands and receptors–a matter of life and death. Br. J. Pharmacol. 135(4), 855–875 (2002)CrossRefGoogle Scholar
  55. 55.
    Inoue, J., Ishida, T., Tsukamoto, N., Kobayashi, N., Naito, A., Azuma, S., Yamamoto, T.: Tumor necrosis factor receptor-associated factor (TRAF) family: adapter proteins that mediate cytokine signaling. Exp. Cell Res. 254(1), 14–24 (2000)CrossRefGoogle Scholar
  56. 56.
    Wing, S.S.: Deubiquitinating enzymes–the importance of driving in reverse along the ubiquitin-proteasome pathway. Int. J. Biochem. Cell Biol. 35(5), 590–605 (2003)CrossRefGoogle Scholar
  57. 57.
    Kim, J.H., Park, K.C., Chung, S.S., Bang, O., Chung, C.H.: Deubiquitinating enzymes as cellular regulators. J. Biochem. (Tokyo) 134(1), 9–18 (2003)Google Scholar
  58. 58.
    DeLong, E.R., DeLong, D.M.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • N. Song
    • 1
  • R. D. Sedgewick
    • 1
  • D. Durand
    • 1
  1. 1.Department of Biological SciencesCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations