Geometric Sieving: Automated Distributed Optimization of 3D Motifs for Protein Function Prediction

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


Determining the function of all proteins is a recurring theme in modern biology and medicine, but the sheer number of proteins makes experimental approaches impractical. For this reason, current efforts have considered in silico function prediction in order to guide and accelerate the function determination process. One approach to predicting protein function is to search functionally uncharacterized protein structures (targets), for substructures with geometric and chemical similarity (matches), to known active sites (motifs). Finding a match can imply that the target has an active site similar to the motif, suggesting functional homology.

An effective function predictor requires effective motifs – motifs whose geometric and chemical characteristics are detected by comparison algorithms within functionally homologous targets (sensitive motifs), which also are not detected within functionally unrelated targets (specific motifs). Designing effective motifs is a difficult open problem. Current approaches select and combine structural, physical, and evolutionary properties to design motifs that mirror functional characteristics of active sites.

We present a new approach, Geometric Sieving (GS), which refines candidate motifs into optimized motifs with maximal geometric and chemical dissimilarity from all known protein structures. The paper discusses both the usefulness and the efficiency of GS. We show that candidate motifs from six well-studied proteins, including α-Chymotrypsin, Dihydrofolate Reductase, and Lysozyme, can be optimized with GS to motifs that are among the most sensitive and specific motifs possible for the candidate motifs. For the same proteins, we also report results that relate evolutionarily important motifs with motifs that exhibit maximal geometric and chemical dissimilarity from all known protein structures. Our current observations show that GS is a powerful tool that can complement existing work on motif design and protein function prediction.


Message Passing Interface Functional Homolog Motif Point Motif Design Protein Function Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wolfson, H.J., Rigoutsos, I.: Geometric hashing: An overview. IEEE Comp. Sci. Eng. 4(4), 10–21 (1997)CrossRefGoogle Scholar
  2. 2.
    Barker, J.A., Thornton, J.M.: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinf. 19(13), 1644–1649 (2003)CrossRefGoogle Scholar
  3. 3.
    Chen, B.Y., et al.: Algorithms for structural comparison and statistical analysis of 3d protein motifs. In: Proceedings of Pacific Symposium on Biocomputing 2005, pp. 334–345 (2005)Google Scholar
  4. 4.
    Stark, A., Sunyaev, S., Russell, R.B.: A model for statistical significance of local similarities in structure. J. Mol. Biol. 326, 1307–1316 (2003)CrossRefGoogle Scholar
  5. 5.
    Yao, H., et al.: An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326, 255–261 (2003)CrossRefGoogle Scholar
  6. 6.
    Laskowski, R.A., Watson, J.D., Thornton, J.M.: Protein function prediction using local 3d templates. Journal of Molecular Biology 351, 614–626 (2005)CrossRefGoogle Scholar
  7. 7.
    Porter, C.T., Bartlett, G.J., Thornton, J.M.: The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Research 32, D129–D133 (2004)Google Scholar
  8. 8.
    Shatsky, M., Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Recognition of binding patterns common to a set of protein structures. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 440–455. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257(2), 342–358 (1996)CrossRefGoogle Scholar
  10. 10.
    Lichtarge, O., Yamamoto, K.R., Cohen, F.E.: Identification of functional surfaces of the zinc binding domains of intracellular receptors. J. Mol. Biol. 274, 325–327 (1997)CrossRefGoogle Scholar
  11. 11.
    Connolly, M.L.: Solvent-accessible surfaces of proteins and nucleic acids. Science 221, 709–713 (1983)CrossRefGoogle Scholar
  12. 12.
    Kinoshita, K., Nakamura, H.: Identification of protein biochemical functions by similarity search using the molecular surface database ef-site. Protein Science 12, 1589–1595 (2003)CrossRefGoogle Scholar
  13. 13.
    Shatsky, M., Nussinov, R., Wolfson, H.J.: Flexprot: Alignment of flexible protein structures without a predefinition of hinge regions. Journal of Computational Biology 11(1), 83–106 (2004)CrossRefGoogle Scholar
  14. 14.
    Artymuik, P.J., et al.: A graph-theoretic approach to the identification of three dimensional patterns of amino acid side chains in protein structures. J. Mol. Biol. 243, 327–344 (1994)CrossRefGoogle Scholar
  15. 15.
    Bachar, O., et al.: A computer vision based technique for 3-d sequence independent structural comparison of proteins. Prot. Eng. 6(3), 279–288 (1993)CrossRefGoogle Scholar
  16. 16.
    Rosen, M., et al.: Molecular shape comparisons in searches for active sites and functional similarity. Prot. Eng. 11(4), 263–277 (1998)CrossRefGoogle Scholar
  17. 17.
    Wallace, A.C., Laskowski, R.A., Thornton, J.M.: Derivation of 3D coordinate templates for searching structural databases. Prot. Sci. 5, 1001–1013 (1996)CrossRefGoogle Scholar
  18. 18.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)zbMATHGoogle Scholar
  19. 19.
    Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Amer. Stat. Assoc. 91, 401–407 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selections method for kernel density estimation. J. Roy. Stat. Soc. 53(3), 683–690 (1991)zbMATHMathSciNetGoogle Scholar
  21. 21.
    Berman, H.M., et al.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  22. 22.
    Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: Cath- a hierarchic classification of protein domain structures. Structure 5(8), 1093–1108 (1997)CrossRefGoogle Scholar
  23. 23.
    Efron, B., Tibshirani, R.: The bootstrap method for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science 1(1), 1–35 (1986)MathSciNetGoogle Scholar
  24. 24.
    Efron, B.: Better bootstrap confidence intervals (with discussion). J. Amer. Stat. Assoc. 82, 171 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chappman & Hall, London (1993)zbMATHGoogle Scholar
  26. 26.
    Blow, D.M., Birktoft, J.J., Hartley, B.S.: Role of a buried acid group in the mechanism of action of chymotrypsin. Nature 221(178), 337–340 (1969)CrossRefGoogle Scholar
  27. 27.
    Reyes, V., et al.: Isomorphous crystal structures of Escherichia coli dihydrofolate reductase complexed with folate, 5-deazafolate, and 5,10-dideazatetrahydrofolate: mechanistic implications. Biochemistry 34, 2710–2723 (1995)CrossRefGoogle Scholar
  28. 28.
    Bystroff, C., et al.: Crystal structures of Escherichia coli dihydrofolate reductase: the nadp +  holoenzyme and the folate-nadp +  ternary complex. substrate binding and a model for the transition state. Biochemistry 29, 3263–3277 (1990)CrossRefGoogle Scholar
  29. 29.
    Knochel, T.R., et al.: The crystal structure of indole-3-glycerol phosphate synthase from the hyperthermophilic archaeon sulfolobus solfataricus in three different crystal forms: effects of ionic strength. J. Mol. Biol. 262, 502–515 (1996)CrossRefGoogle Scholar
  30. 30.
    Huang, C.-C., et al.: Crystal structures of mycolic acid cyclopropane synthases from mycobacterium tuberculosis. J. Biol. Chem. 277, 11559–11569 (2002)CrossRefGoogle Scholar
  31. 31.
    Krengel, U., Dijkstra, B.W.: Three-dimensional structure of endo-1,4-beta-xylanase i from aspergillus niger: Molecular basis for its low ph optimum. J. Mol. Biol. 263, 70–78 (1996)CrossRefGoogle Scholar
  32. 32.
    International Union of Biochemistry. Nomenclature Committee. Enzyme Nomenclature. Academic Press, San Diego, California (1992)Google Scholar
  33. 33.
    Snir, M., Gropp, W.: MPI: The Complete Reference, 2nd edn. The MIT Press, Cambridge (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceRice UniversityHoustonUSA
  2. 2.Department of StatisticsRice University 
  3. 3.Structural and Computational Biology and Molecular BiophysicsBaylor College of MedicineHoustonUSA
  4. 4.Department of Molecular and Human GeneticsBaylor College of Medicine 
  5. 5.Department of BioengineeringRice University 

Personalised recommendations