A Branch-and-Reduce Algorithm for the Contact Map Overlap Problem

  • Wei Xie
  • Nikolaos V. Sahinidis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


A fundamental problem in molecular biology is the comparison of 3-dimensional protein folds in order to develop similarity measures and exploit them for protein clustering, database searches, and drug design. Contact map overlap (CMO) is one of the most reliable and robust measures of protein structure similarity. Fold comparison can be done by aligning the amino acid residues of two proteins in a way that maximizes the number of common residue contacts. CMO maximization is gaining increasing attention because it results in protein clusterings in good agreement with classification by experts. However, CMO maximization is an \({\mathcal{NP}}\)-hard problem and few exact algorithms exist for solving this problem to global optimality.

In this paper, we propose a branch-and-reduce exact algorithm for the CMO problem. Contrary to previous approaches, we do not transform CMO to other combinatorial optimization problems for solution. Instead, we address the problem directly in its natural form. By exploiting the problem’s mathematical structure, we develop bounding and reduction procedures that lead to a very efficient algorithm. We present extensive computational results for over 36000 test problems from the literature. These results demonstrate that our algorithm is significantly faster and solves many more challenging test sets than the best previous algorithms for CMO. Furthermore, the algorithm results in protein clusters that are in excellent agreement with the SCOP database.


Exact Algorithm Node Versus Scop Database Maximum Clique Problem Contact Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    NIH: Protein structural initiative: Better tools and better knowledge for structural genomics (Web), http://nigms.nih.gov/psi/
  2. 2.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  3. 3.
    Hulo, N., Sigrist, C.J.A., Saux, V.L., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A.: Recent improvements to the PROSITE database. Nucleic Acids Research 32, 134–137 (2004)CrossRefGoogle Scholar
  4. 4.
    Pearson, W.R., Sierk, M.L.: The limits of protein sequence comparison?. Current opinion in structural biology 15, 254–260 (2005)CrossRefGoogle Scholar
  5. 5.
    Vogt, G., Etzold, T., Argos, P.: An assessment of amino acid exchange matrices in aligning protein sequences: The twilight zone revisited. Journal of Molecular Biology 249, 816–831 (1995)CrossRefGoogle Scholar
  6. 6.
    Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH-A hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997)CrossRefGoogle Scholar
  7. 7.
    Godzik, A.: The structural alignment between two proteins: Is there a unique answer?. Protein science 5, 1325–1338 (1996)CrossRefGoogle Scholar
  8. 8.
    Godzik, A., Skolnick, J., Kolinski, A.: A topology fingerprint approach to inverse protein folding problem. Journal of Molecular Biology 227, 227–238 (1992)CrossRefGoogle Scholar
  9. 9.
    Godzik, A., Skolnick, J.: Flexible algorithm for direct multiple alignment of protein structures and sequences. Computer applications in biosciences: CABIOS 10, 587–596 (1994)Google Scholar
  10. 10.
    Zaki, M.J., Jin, S., Bystroff, C.: Mining residue contacts in proteins using local structure predictions. In: Proceedings. IEEE Symposium on Bioinformatics and Biomedical Engineering, pp. 168–175. IEEE Computer Society, Los Alamitos (2000)CrossRefGoogle Scholar
  11. 11.
    Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings Third IEEE International Symposium on Bioinformatics and Bioengineering, pp. 26–36. IEEE Computer Society, Los Alamitos (2003)CrossRefGoogle Scholar
  12. 12.
    Caprara, A., Carr, R., Istrail, S., Lancia, G., Walenz, B.: 1001 optimal PDB structure alignments: Integer programming methods for finding the maximum contact map overlap. Journal of Computational Biology 11, 27–52 (2004)CrossRefGoogle Scholar
  13. 13.
    Goldman, D.: Algorithmic aspects of protein folding and protein structure similarity. PhD thesis, University of California at Berkeley (2000)Google Scholar
  14. 14.
    Carr, R.D., Lancia, G., Istrail, S.: Branch-and-cut algorithms for independent set problems: Integrality gap and an application to protein structural alignment. Technical report, Sandia National laboratories (2000)Google Scholar
  15. 15.
    Lancia, G., Carr, R., Walenz, B., Istrail, S.: 101 optimal PDB structure alignments: A branch-and-cut algorithm for the maximum contact map overlap problem. In: Proceedings of Annual International Conference on Computational Biology (RECOMB), pp. 193–202 (2001)Google Scholar
  16. 16.
    Caprara, A., Lancia, G.: Structural alignment of large-size proteins via Lagrangian relaxation. In: Proceeding of Internation Conference on Computational Biology (RECOMB), pp. 100–108 (2002)Google Scholar
  17. 17.
    Strickland, D.M., Barnes, E., Sokol, J.S.: Optimal protein structure alignment using maximum cliques. Operations Research 53, 389–402 (2005)MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Xie, W., Sahinidis, N.V.: A reduction-based exact algorithm for the contact map overlap problem (in preparation, 2005)Google Scholar
  19. 19.
    Dongarra, J.J.: Performance of various computers using standard linear equations software. Technical report, University of Tennessee, Knoxville, TN (2005), http://www.netlib.org/benchmark/performance.ps
  20. 20.
    Kohlbacher, O., Lenhof, H.: BALL—Rapid software prototyping in computational molecular biology. Bioinformatics 16, 815–824 (2000)CrossRefGoogle Scholar
  21. 21.
    Murzin, A., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A structural classification of protein database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)Google Scholar
  22. 22.
    Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Research 32, D226–D229 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wei Xie
    • 1
  • Nikolaos V. Sahinidis
    • 1
  1. 1.Department of Chemical and Biomolecular EngineeringUniversity of Illinois at Urbana-ChampaignUrbana

Personalised recommendations