Advertisement

Touring Protein Space with Matt

  • Noah Daniels
  • Anoop Kumar
  • Lenore Cowen
  • Matt Menke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6053)

Abstract

Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based metrics of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level. Implications for the debate over the organization of protein fold space are discussed.

Keywords

Jaccard Index Matt Family Fold Level Protein Space Protein Structural Domain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, L.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  2. 2.
    Andreeva, A., Howorth, D., Brenner, S., Hubbard, T., Chothia, C., Murzin, A.: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 32, D226–D229 (2004)CrossRefGoogle Scholar
  3. 3.
    Berbalk, C., Schwaiger, C., Lackner, P.: Accuracy analysis of multiple structure alignments. Protein Science 18, 2027–2035 (2009)CrossRefGoogle Scholar
  4. 4.
    Cheek, S., Qi, Y., Krishna, S., Kinch, L., Grishin, N.V.: SCOPmap: Automated assignment of protein structures to evolutionary superfamilies. BMC Bioinformatics 7 (2006)Google Scholar
  5. 5.
    Chi, P.-H., Shyu, C.-R., Xu, D.: A fast SCOP fold classification system using content-based E-predict algorithm. BMC Bioinformatics 7, 10.1186/1471–2105–7–362 (2006)CrossRefGoogle Scholar
  6. 6.
    Choi, I.-G., Kim, S.-H.: Evolution of protein structural classes and protein sequence families. Proc. Nat. Acad. Sci. 103, 14056–14061 (2006)CrossRefGoogle Scholar
  7. 7.
    Day, R., Beck, D., Armen, R., Daggett, V.: A consensus view of fold space: Combining SCOP, CATH, and the Dali domain dictionary. Protein Science 12, 2150–2160 (2003)CrossRefGoogle Scholar
  8. 8.
    Gerstein, M., Levitt, M.: Comprehensive assement of automatic structural alignment against a manual standard, the SCOP classification of proteins. Protein Sci., 445–456 (1998)Google Scholar
  9. 9.
    Getz, G., Vendruscolo, M., Sachs, D., Domany, E.: Automatic assignment of SCOP and CATH protein structure classifications from FSSP scores. Proteins: Structure Function and Genetics 46, 405–415 (2002)CrossRefGoogle Scholar
  10. 10.
    Gibrat, J., Madej, T., Bryant, S.: Suprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6, 377–385 (2006)CrossRefGoogle Scholar
  11. 11.
    Greene, L., Lewis, T., Addou, S., Cuff, A., Dallman, T., Dibley, M., Redfern, O., Pearl, F., Nambudiry, R., Reid, A., Silitoe, I., Yeats, C., Thornton, J., Orengo, C.: The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res. 35, D291–D297 (2007)CrossRefGoogle Scholar
  12. 12.
    Hadley, C., Jones, D.: A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure 7, 1099–1112 (1999)CrossRefGoogle Scholar
  13. 13.
    Harrison, A., Pearl, F., Mott, R., Thornton, J., Orengo, C.: Quantifying the similarity within fold space. J. Mol. Bio. 323, 909–926 (2002)CrossRefGoogle Scholar
  14. 14.
    Holland, T., Veretnik, S., Shindyalov, I.N., Bourne, P.: Partitioning protein structures into domains: why is it so difficult? J. Mol. Biol. 361, 562–590 (2006)CrossRefGoogle Scholar
  15. 15.
    Holm, L., Park, J.: DaliLite workbench for protein structure comparison. Bioinformatics 16, 566–567 (2000)CrossRefGoogle Scholar
  16. 16.
    Holm, L., Sander, C.: Mapping the protein universe. Science 260, 595–602 (1996)CrossRefGoogle Scholar
  17. 17.
    Holm, L., Sander, C.: Touring protein fold space with Dali/FSSP. Nucleic Acids Res., 316–319 (1998)Google Scholar
  18. 18.
    Kolodny, R., Petrey, D., Honig, B.: Protein structure comparison: implications for the nature of fold space, and structure and function prediction. Curr. Opin. Struct. Biol. 16, 393–398 (2006)CrossRefGoogle Scholar
  19. 19.
    Madej, T., Gibrat, J.-F., Bryant, S.: Threading a database of protein cores. Proteins 23, 356–369 (1995)CrossRefGoogle Scholar
  20. 20.
    Menke, M., Berger, B., Cowen, L.: Matt: Local flexibility aids protein multiple structure alignment. PLoS Comput. Biol. 4(1), e10 (2008) doi:10.1371/journal.pcbi.0040010CrossRefMathSciNetGoogle Scholar
  21. 21.
    Murzin, A., Brenner, S., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 297, 536–540 (1995)Google Scholar
  22. 22.
    Orengo, C., Michie, A., Jones, S., Jones, D., Swindells, M., Thornton, J.: Cath- a hierarchic classification of protein domain structures. Structure 5(8), 1093–1108 (1997)CrossRefGoogle Scholar
  23. 23.
    Pearl, F., Bennett, C., Bray, J., Harrison, A., Martin, N., Shepherd, A., Sillitoe, I., Thornton, J., Orengo, C.: The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 31, 452–455 (2003)CrossRefGoogle Scholar
  24. 24.
    Redfern, O., Harrison, A., Dallman, T., Pearl, F., Orengo, C.: CATHEDRAL: A fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLOS Computational Biology, e232 (2007) doi:10.1371/journal.pcji.0030232Google Scholar
  25. 25.
    Rocha, J., Segura, J., Wilson, R., Dasgupta, S.: Flexible structural protein alignment by a sequence of local transformations. Bioinformatics 25, 1625–1631 (2009)CrossRefGoogle Scholar
  26. 26.
    Rost, B.: Did evolution leap to create the protein universe? Curr. Opinion in Struct. Biol., 409–416 (2002)Google Scholar
  27. 27.
    Sadreyev, R., Kim, B.-H., Grishin, N.: Discrete-continous duality of protein structure space. Curr. Opinion Structural Biol. 19, 321–328 (2009)CrossRefGoogle Scholar
  28. 28.
    Sam, V., Tai, C., Garnier, J., Gibrat, J.F., Lee, B., Munson, P.: ROC and confusion analysis of structure comparison methods identify the main causes of divergence from manual protein classification. BMC Bioinformatics 7, 206 (2006)CrossRefGoogle Scholar
  29. 29.
    Sam, V., Tai, C., Garnier, J., Gibrat, J.F., Lee, B., Munson, P.: Towards an automatic classification of protein structural domains based on structural similarity. BMC Bioinformatics 9 (2008)Google Scholar
  30. 30.
    Shindyalov, I., Bourne, P.: An alternative view of protein fold space. Proteins 38, 513–514 (2000)CrossRefGoogle Scholar
  31. 31.
    Simonsen, M., Mailund, T., Pedersen, C.N.S.: Rapid neighbour-joining. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 113–122. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  32. 32.
    Suhrer, S., Wederstein, M., Sippl, M.: QSCOP-SCOP quantified by structural relationships. Bioinformatics 23, 513–514 (2007)CrossRefGoogle Scholar
  33. 33.
    Valas, R., Yang, S., Bourne, P.: Nothing about protein structure classification makes sense except in the light of evolution. Curr. Opin. Struct. Biol. 19, 329–334 (2009)CrossRefGoogle Scholar
  34. 34.
    Veretnik, S., Bourne, P., Alexandrov, N., Shindyalov, I.: Toward consistent assignment of structural domains in proteins. J. Mol. Biol. 339, 647–678 (2004)CrossRefGoogle Scholar
  35. 35.
    Vuk, M., Curk, T.: Roc curve, lift chart and calibration plot. Metodolo ski zvezki 2, 89–108 (2006)Google Scholar
  36. 36.
    Zemla, A., Geisbrecht, B., Smith, J., Lam, M., Kirkpatrick, B., Wagner, M., Slezak, T., Zhou, C.: STRALCP–structure alignment-based clustering of proteins. Nucleic Acids Res. 35, e150 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Noah Daniels
    • 1
  • Anoop Kumar
    • 1
  • Lenore Cowen
    • 1
  • Matt Menke
    • 1
  1. 1.Department of Computer ScienceTufts UniversityMedford

Personalised recommendations