Skip to main content

Ballast: A Ball-Based Algorithm for Structural Motifs

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNBI,volume 7262)

Abstract

Structural motifs encapsulate local sequence-structure-function relationships characteristic of related proteins, enabling the prediction of functional characteristics of new proteins, providing molecular-level insights into how those functions are performed, and supporting the development of variants specifically maintaining or perturbing function in concert with other properties. Numerous computational methods have been developed to search through databases of structures for instances of specified motifs. However, it remains an open problem as to how best to leverage the local geometric and chemical constraints underlying structural motifs in order to develop motif-finding algorithms that are both theoretically and practically efficient. We present a simple, general, efficient approach, called Ballast (Ball-based algorithm for structural motifs), to match given structural motifs to given structures. Ballast combines the best properties of previously developed methods, exploiting the composition and local geometry of a structural motif and its possible instances in order to effectively filter candidate matches. We show that on a wide range of motif matching problems, Ballast efficiently and effectively finds good matches, and we provide theoretical insights into why it works well. By supporting generic measures of compositional and geometric similarity, Ballast provides a powerful substrate for the development of motif matching algorithms.

Keywords

  • protein structure
  • structural motif
  • sequence-structure-function relationship
  • geometric matching
  • motif matching algorithm
  • probabilistic analysis

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Artymiuk, P.J., Poirrette, A.R., Grindley, H.M., Rice, D.W., Willett, P.: A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J. Mol. Biol. 243, 327–344 (1994)

    CrossRef  Google Scholar 

  2. Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-d point sets. IEEE Trans. Pattern Anal. Mach. Intell. 9, 698–700 (1987)

    CrossRef  Google Scholar 

  3. Babbitt, P.C., Hasson, M.S., et al.: The enolase superfamily: A general strategy for enzyme-catalyzed abstraction of the α-protons of carboxylic acids. Biochemistry 35(51), 16489–16501 (1996)

    CrossRef  Google Scholar 

  4. Bandyopadhyay, D., Huan, J., et al.: Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J. Comput. Aided Mol. Des. 23, 773–784 (2009)

    CrossRef  Google Scholar 

  5. Bandyopadhyay, D., Snoeyink, J.: Almost-delaunay simplices: nearest neighbor relations for imprecise points. In: Proc. SODA, pp. 410–419 (2004)

    Google Scholar 

  6. Barker, J.A., Thornton, J.M.: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics 19, 1644–1649 (2003)

    CrossRef  Google Scholar 

  7. Bernstein, F.C., Koetzle, T.F., et al.: The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542 (1977)

    CrossRef  Google Scholar 

  8. Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16, 575–577 (1973)

    CrossRef  MATH  Google Scholar 

  9. Chen, B.Y., Fofanov, V.Y., et al.: The MASH pipeline for protein function prediction and an algorithm for the geometric refinement of 3D motifs. J. Comput. Biol. 14, 791–816 (2007)

    CrossRef  MathSciNet  Google Scholar 

  10. Feige, U., Goldwasser, S., Lovász, L., Safra, S., Szegedy, M.: Interactive proofs and the hardness of approximating cliques. J. ACM 43, 268–292 (1996)

    CrossRef  MathSciNet  MATH  Google Scholar 

  11. Gardiner, E.J., Artymiuk, P.J., et al.: Clique-detection algorithms for matching three-dimensional molecular structures. J. Mol. Graph. Model. 15, 245–253 (1997)

    CrossRef  Google Scholar 

  12. Hegyi, H., Gerstein, M.: The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288, 147–164 (1999)

    CrossRef  Google Scholar 

  13. Karp, R.M.: Reducibility among combinatorial problems. Complexity of Computer Computations 40(4), 85–103 (1972)

    CrossRef  MathSciNet  Google Scholar 

  14. Kleywegt, G.J.: Recognition of spatial motifs in protein structures. J. Mol. Biol. 285, 1887–1897 (1999)

    CrossRef  Google Scholar 

  15. Loewenstein, Y., Raimondo, D., et al.: Protein function annotation by homology-based inference. Genome Biol. 10, 207 (2009)

    CrossRef  Google Scholar 

  16. Lueker, G.S.: A data structure for orthogonal range queries. In: Proc. FOCS, pp. 28–34. IEEE Computer Society, Washington, DC (1978)

    Google Scholar 

  17. Meng, E.C., et al.: Superfamily active site templates. Proteins 55, 962–976 (2004)

    CrossRef  Google Scholar 

  18. Milik, M., Szalma, S., Olszewski, K.A.: Common Structural Cliques: a tool for protein structure and function analysis. Protein Eng. 16, 543–552 (2003)

    CrossRef  Google Scholar 

  19. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge Univ. Press, New York (2005)

    MATH  Google Scholar 

  20. Moll, M., Bryant, D.H., Kavraki, L.E.: The labelhash algorithm for substructure matching. BMC Bioinformatics 11, 555 (2010)

    CrossRef  Google Scholar 

  21. Muthukrishnan, S., Pandurangan, G.: The bin-covering technique for thresholding random geometric graph properties. In: Proc. SODA, pp. 989–998 (2005)

    Google Scholar 

  22. Najmanovich, R., Kurbatova, N., Thornton, J.: Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics 24, i105–i111 (2008)

    CrossRef  Google Scholar 

  23. Nussinov, R., Wolfson, H.J.: Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. PNAS 88, 10495–10499 (1991)

    CrossRef  Google Scholar 

  24. Pegg, S.C., Brown, S.D., et al.: Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry 45, 2545–2555 (2006)

    CrossRef  Google Scholar 

  25. Penrose, M.D.: Random Geometric Graphs. Oxford University Press (2003)

    Google Scholar 

  26. Porter, C.T., Bartlett, G.J., Thornton, J.M.: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 32, D129–D133 (2004)

    CrossRef  Google Scholar 

  27. Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Recognition of functional sites in protein structures. J. Mol. Biol. 339, 607–633 (2004)

    CrossRef  Google Scholar 

  28. Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23, 31–42 (1976)

    CrossRef  MathSciNet  Google Scholar 

  29. Wallace, A.C., Borkakoti, N., Thornton, J.M.: TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites. Protein Sci. 6, 2308–2323 (1997)

    CrossRef  Google Scholar 

  30. Wangikar, P.P., et al.: Functional sites in protein families uncovered via an objective and automated graph theoretic approach. J. Mol. Biol. 326, 955–978 (2003)

    CrossRef  Google Scholar 

  31. Willard, D.E.: Predicate-Oriented Database Search Algorithms. Outstanding Dissertations in the Computer Sciences. Garland Publishing, New York (1978)

    Google Scholar 

  32. Wolfson, H.J., Rigoutsos, I.: Geometric hashing: An overview. Computing in Science and Engineering 4, 10–21 (1997)

    Google Scholar 

  33. Xie, L., Bourne, P.E.: Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. PNAS 105, 5441–5446 (2008)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

He, L., Vandin, F., Pandurangan, G., Bailey-Kellogg, C. (2012). Ballast: A Ball-Based Algorithm for Structural Motifs. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29627-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29626-0

  • Online ISBN: 978-3-642-29627-7

  • eBook Packages: Computer ScienceComputer Science (R0)