Bioinformatics: A Challenge to Constraint Programming

  • Pedro Barahona
  • Ludwig Krippahl
  • Olivier Perriquet
Part of the Springer Optimization and Its Applications book series (SOIA, volume 45)


Bioinformatics is a rapidly growing field at the intersection of biology and computer science. As such, it poses a wealth of problems, opportunities, and challenges for both areas. This paper overviews some of these issues, with an emphasis on those that seem most amenable to constraint programming (CP) approaches and where CP has made some progress. Since bioinformatics is tightly focused on real-life applications, this paper does not expand on theoretical principles but, rather, tries to give an idea of the practical issues. At this light, the paper briefly presents the selected problems together with the solutions found so far, that illustrate the versatility of CP techniques that have been used in this area and the need to integrate them with other complementary techniques to handle realistic applications.


Local Search Protein Data Bank Structure Prediction Constraint Programming Protein Structure Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRefGoogle Scholar
  2. 2.
    Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45:810–825zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441CrossRefGoogle Scholar
  4. 4.
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410Google Scholar
  5. 5.
    Roland HC Yap (2001) Parametric sequence alignment with constraints. Constraints 6(2–3):157–172zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Will S, Busch A, Backofen R (2008) Efficient sequence alignment with side-constraints by cluster tree elimination. Constraints 13(1–2):110–129zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Carlsson M, Beldiceanu N (2004) Multiplex dispensation order generation for pyrosequencing. In: CP’2004 workshop on CSP techniques with immediate application, Toronto, Canada, 27 September 2004Google Scholar
  8. 8.
    Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2004) GenBank: update. Nucleic Acids Res 32(Database issue):D23–D26Google Scholar
  9. 9.
    Gent IP, Prosser P, Smith BM, Wei W (2003) Supertree construction using constraint programming. In: Proc CP2003. Lecture notes in computer science, vol 2833. Springer, BerlinGoogle Scholar
  10. 10.
    Moore NC, Prosser P (2008) The ultrametric constraint and its application to phylogenetics. J Artif Intell Res 32:901–938zbMATHMathSciNetGoogle Scholar
  11. 11.
    Brooks DR, Erdem E, Erdogan ST, Minett JW, Ringe D (2007) Inferring phylogenetic trees using answer set programming. J Automat Reason 39(4):471–511zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Wu G, You JH, Lin G (2007) Quartet-based phylogeny reconstruction with answer set programming. IEE/ACM Trans Comput Biol Bioinform 4(1):139–152CrossRefGoogle Scholar
  13. 13.
    Clark AG (1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 77:111–122Google Scholar
  14. 14.
    Gusfield D (2003) Haplotype inference by pure parsimony. In: 14th Annual symposium on combinatorial pattern matching (CPM03). Springer, Heidelberg, pp 144–155Google Scholar
  15. 15.
    Huang Y-T et al (2005) An approximation algorithm for haplotype inference by maximum parsimony. J Comput Biol 12:1261–1274CrossRefGoogle Scholar
  16. 16.
    Lancia G et al (2004) Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS J Comput 16:348–359CrossRefMathSciNetGoogle Scholar
  17. 17.
    Cilibrasi R et al (2005) On the complexity of several haplotyping problems. In: 5th Workshop on algorithms in bioinformatics (WABI 2005). Springer, Mallorca, pp 128–139Google Scholar
  18. 18.
    Sharan R et al (2006) Islands of tractability for parsimony haplotyping. IEEE/ACM Trans Comput Biol Bioinform 3:303–311CrossRefGoogle Scholar
  19. 19.
    Brown D, Harrower I (2006) Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Trans Comput Biol Bioinform 3(2):141–154CrossRefGoogle Scholar
  20. 20.
    Wang L, Xu Y (2003) Haplotype inference by maximum parsimony. Bioinformatics 19:1773–1780CrossRefGoogle Scholar
  21. 21.
    Lynce I, Marques-Silva J (2006) Efficient haplotype inference with Boolean satisfiability. In: AAAI conference on artificial intelligence, pages 104109, July 2006Google Scholar
  22. 22.
    Lynce I, Marques-Silva J, Prestwich S (2008) Boosting haplotype inference with local search. Constraints 13(1):155–179zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Lynce I, Graa A, Marques-Silva J, Oliveira AL (2008) Haplotype inference with boolean constraint solving: an overview. In: Proceedings of 20th IEEE international conference on tools with artificial intelligence (ICTAI 08), Dayton, OH, 2008Google Scholar
  24. 24.
    Erdem E, Erdem O, Türe F (2009)In: HAplo-ASP: haplotype inference using answer set programming, LPNMR09. Lecture notes in computer science, vol 5753. Springer, Berlin, pp 573–578Google Scholar
  25. 25.
    Benedettini S, Roli A, Di Gaspero L (2008) Two-level ACO for haplotype inference under pure parsimony. In: ANTS conference, 2008, pp 179–190Google Scholar
  26. 26.
    Climer S, Jäger G, Templeton AR, Zhang W (2009) How frugal is mother nature with haplotypes? Bioinformatics 25(1):68–74CrossRefGoogle Scholar
  27. 27.
    Eddy SR (2001) Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2(12):919–929CrossRefGoogle Scholar
  28. 28.
    Tinoco I, Bustamante C (1999) How RNA folds. J Mol Biol 293:271–281CrossRefGoogle Scholar
  29. 29.
    Moore PB (1999) The RNA folding problem. In: The RNA world, 2nd edn. CSHL Press, Cold Spring Harbor, pp 381–401Google Scholar
  30. 30.
    Waterman MS (1995) RNA secondary structure. In: Introduction to computational biology. Chapman and Hall, London, pp 327–343Google Scholar
  31. 31.
    Wu M, Tinoco I (1998) RNA folding causes secondary structure rearrangement. Proc Natl Acad Sci USA 95:11555–11560CrossRefGoogle Scholar
  32. 32.
    Capriotti E, Marti-Renom MA (2008) Computational RNA structure prediction. Curr Bioinform 3(1):32–45CrossRefGoogle Scholar
  33. 33.
    Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77:6309–6313CrossRefGoogle Scholar
  34. 34.
    Jaeger JA, Turner DH, Zuker M (1989) Improved predictions of secondary structures for RNA. Proc Natl Acad Sci USA 86:7706–7710CrossRefGoogle Scholar
  35. 35.
    Knudsen B, Hein J (1999) RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6):446–454CrossRefGoogle Scholar
  36. 36.
    Gaspin C, Westhof E (1995) An interactive framework for RNA secondary structure prediction with a dynamical treatment of constraints. J Mol Biol 254(2):163–174CrossRefGoogle Scholar
  37. 37.
    Gaspin C (2001) RNA secondary structure determination and representation based on constraints satisfaction. Constraints 6(2–3):201–221zbMATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Thebault P, de Givry S, Schiex T, Gaspin C (2006) Searching RNA motifs and their intermolecular contacts with constraint networks. Bioinformatics 22(17):2074–2080CrossRefGoogle Scholar
  39. 39.
    Zytnicki M, Gaspin C, Schiex T (2008) Darn! a weighted constraint solver for RNA motif localization. Constraints 13(1–2):91–109zbMATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Billoud B, Kontic M, Viari A (1996) Palingol: declarative programming language to describe nucleic acids secondary structures and to scan sequence databases. Nucleic Acids Res 24(8):1395–1404CrossRefGoogle Scholar
  41. 41.
    Harmanci AO, Sharma G, Mathews DH (2007) Efficient pairwise RNA structure prediction using probabilistic alignment constraints in dynalign. BMC Bioinformatics 8:130CrossRefGoogle Scholar
  42. 42.
    Dowell RD, Eddy SR (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7:400CrossRefGoogle Scholar
  43. 43.
    Major F, Turcotte M, Gautheret D, Lapalme G, Fillion E, Cedergren R (1991) The combination of symbolic and numerical computation for three-dimensional modeling of RNA. Science 253:1255–1260CrossRefGoogle Scholar
  44. 44.
    Shapiro BA, Yingling YG, Kasprzak W, Bindewald E (2007) Bridging the gap in RNA structure prediction. Curr Opin Struct Biol 17(2):157–165-2pc]Reference [44] is not cited in text. Please provide appropriate text citation or remove from list.Google Scholar
  45. 45.
    Gautheret D, Major F, Cedergren R (1993) Modeling the threedimensional structure of RNA using discrete nucleotide conformational sets. J Mol Biol 229:1049–1064CrossRefGoogle Scholar
  46. 46.
    Shapiro BA, Yingling YG, Kasprzak W, Bindewald E (2007) Bridging the gap in RNA structure prediction. Curr Opin Struct Biol 17(2):157–165CrossRefGoogle Scholar
  47. 47.
    Gutell RR, Power A, Hertz GZ, Putz EJ, Stormo GD (1992) Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Res 20:5785–5795CrossRefGoogle Scholar
  48. 48.
    Leontis NB, Lescoute A, Westhof E (2006) The building blocks and motifs of RNA architecture. Curr Opin Struct Biol 16(3):279–287CrossRefGoogle Scholar
  49. 49.
    Lazaridis T, Karplus M (2000) Effective energy functions for protein structure prediction. Curr Opin Struct Biol 10(2):139–145CrossRefGoogle Scholar
  50. 50.
    Shirts MR, Pande VS (2000) Screen savers of the world, unite! Science 290:1903–1904CrossRefGoogle Scholar
  51. 51.
    Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS (1995) Principles of protein folding – a perspective of simple exact models. Protein Sci 4:561–602CrossRefGoogle Scholar
  52. 52.
    Lau KF, Dill KA (1989) A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22:3986–3997CrossRefGoogle Scholar
  53. 53.
    Crescenzi P, Goldman D, Papadimitriou C, Piccolboni A, Yannakakis M ( 1998) On the complexity of protein folding. J Comput Biol 5(3):423–466CrossRefGoogle Scholar
  54. 54.
    Berger B, Leighton T (1998) Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete. J Comput Biol 5(3):27–40CrossRefGoogle Scholar
  55. 55.
    Yue K, Dill KA (1996) Folding proteins with a simple energy function and extensive conformational search. Protein Sci 5(2):254–261CrossRefGoogle Scholar
  56. 56.
    Abkevitch VI, Gutin AM, Shakhnovich EI (1995) Impact of local and non-local interactions oin thermodynamics and kinetics of protein folding. J Mol Biol 252:460–471CrossRefGoogle Scholar
  57. 57.
    Unger R, Moult J (1996) Local interactions dominate folding in a simple protein model. J Mol Biol 259:988–994CrossRefGoogle Scholar
  58. 58.
    Hinds DA, Levitt M (1996) From structure to sequence and back again. J Mol Biol 258:201–209CrossRefGoogle Scholar
  59. 59.
    Bornberg-Bauer E (1997) Chain growth algorithms for HP-type lattice proteins. In: Proceedings of RECOMB97. 1st International conference on Research in computational molecular biology, pp 47–55Google Scholar
  60. 60.
    Backofen R (1998) Constraint techniques for solving the protein structure prediction problem. In: Proceedings of CP98. Lecture notes in computer science, vol 1520, pp 72–86Google Scholar
  61. 61.
    Backofen R, Will S (2002) Excluding symmetries in constraint-based search. Constraints 7(3):333–349zbMATHCrossRefMathSciNetGoogle Scholar
  62. 62.
    Backofen R, Will S, Bornberg-Bauer E (1999) Application of constraint programming techniques for structure prediction of lattice proteins with extended alphabets. Bioinformatics 15(3):234–242CrossRefGoogle Scholar
  63. 63.
    Bagci Z, Jernigan RL, Bahar I (2002) Residue coordination in proteins conforms to the closest packing of spheres. Polymer 43:451–459CrossRefGoogle Scholar
  64. 64.
    Cipra B (1998) Packing challenge mastered at last. Science 281:1267Google Scholar
  65. 65.
    Park BH, Levitt M (1995) The complexity and accuracy of discrete state models of protein structure. J Mol Biol 249:493–507CrossRefGoogle Scholar
  66. 66.
    Cooperativity in protein-folding kinetics. Proc Natl Acad Sci USA 90:1942–1946 (1993)Google Scholar
  67. 67.
    Unger R, Moult J (1993) Genetic algorithms for protein folding simulations. J Mol Biol 231:75–81CrossRefGoogle Scholar
  68. 68.
    Agarwala R, Batzoglou S, Dancik V, Decatur SE, Farach M, Hannenhalli S, Muthukrishnan S, Skiena S (1997) Local rules for protein folding on a triangular lattice and generalized hydrophobicity in the HP-model. J Comput Biol 4(2):275–296CrossRefGoogle Scholar
  69. 69.
    Backofen R, Will S (2006) A constraint-based approach to fast and exact structure prediction in three-dimensional protein models. Constraints 11(1):5–30zbMATHCrossRefMathSciNetGoogle Scholar
  70. 70.
    Backofen R, Will S (2001) Fast, constraint-based threading of HP-sequences to hydrophobic cores. In: Proceedings of CP01. Lecture notes in computer science, vol 2239, pp 494–508Google Scholar
  71. 71.
    Cebrian M, Dotu I, Van Hentenryck P, Clote P (2008) Protein structure prediction on the face centered cubic lattice by local search. In: Proceedings of AAAI08, pp 241–245Google Scholar
  72. 72.
    Dot I, Cebrin M, Van Hentenryck P, Clote P (2008) Protein structure prediction with large neighborhood constraint programming search. In: Proceedings of CP08. Lecture notes in computer science, vol 5202, pp 82–96Google Scholar
  73. 73.
    Dal Pal A, Dovier A, Fogolari F (2004) Constraint logic programming approach to protein structure prediction. BMC Bioinformatics 5:186CrossRefGoogle Scholar
  74. 74.
    Dal Pal A, Dovier A, Pontelli E (2007) A constraint solver for discrete lattices, its paralelization and application to protein structure prediction. Software Pract Ex 37(13):1405–1449CrossRefGoogle Scholar
  75. 75.
    Cipriano R, Pal AD, Dovier A (2008) A hybrid approach mixing local search and constraint programming applied to the protein structure prediction problem. In: Proceedings of WCB08, Paris, May 2008Google Scholar
  76. 76.
    Zhang Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9:40CrossRefGoogle Scholar
  77. 77.
    Fischer D (2006) Servers for protein structure prediction. Curr Opin Struct Biol 16:178–182CrossRefGoogle Scholar
  78. 78.
    Bonneau R, Tsai J, Ruczinski I, Chivian D, Rohl C, Strauss CE, Baker D (2001) Rosetta in CASP4: progress in ab initio protein structure prediction. Proteins 45(S5)119–126Google Scholar
  79. 79.
    Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15:285–289CrossRefGoogle Scholar
  80. 80.
    Skolnick J, Kolinski A, Kihara D, Betancourt M, Rotkiewicz P, Boniecki M (2001) Ab initio protein structure prediction via a combination of threading, lattice folding, clustering, and structure refinement. PROTEINS Suppl 5:149–156CrossRefGoogle Scholar
  81. 81.
    Gntert P, Mumenthaler C, Wthrich K (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J Mol Biol 273:283–298CrossRefGoogle Scholar
  82. 82.
    Krippahl L, Barahona P (2002) PSICO: solving protein structures with constraint programming and optimisation. Constraints 7:317–331zbMATHCrossRefMathSciNetGoogle Scholar
  83. 83.
    Krippahl L, Barahona P (2003) Propagating N-ary rigid-body constraints. In: Francesca Rossi (ed) CP’2003: principles and practice of constraint programming, October 2003. Lecture notes in computer science, vol 2833. Springer, pp 452–465Google Scholar
  84. 84.
    Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992) Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc Natl Acad Sci USA 89(6):2195–2199CrossRefGoogle Scholar
  85. 85.
    Krippahl L, Barahona P (2005) Applying constraint programming to rigid body protein docking. In: van Beek P (ed) CP’2005: principles and practice of constraint programming. Lecture notes in computer science, vol 3709. Springer, Berlin, pp 373–387Google Scholar
  86. 86.
    Krippahl L, Moura JJ, Palma PN (2003) Modeling protein complexes with bigger. Proteins 52(1):19–23CrossRefGoogle Scholar
  87. 87.
    Dominguez C, Boelens R, Bonvin AMJJ (2003) HADDOCK: a protein–protein docking approach based on biochemical and/or biophysical information. J Am Chem Soc 125:1731–1737CrossRefGoogle Scholar
  88. 88.
    de Vries SJ, van Dijk ADJ, Krzeminski M, van Dijk M, Thureau A, Hsu V, Wassenaar T, Bonvin AMJJ (2007) HADDOCK versus HADDOCK: New features and performance of HADDOCK2.0 on the CAPRI targets. Proteins 69:726–733CrossRefGoogle Scholar
  89. 89.
    Kitano H (ed) (2001) Foundations of system biology. MIT Press, CamdridgeGoogle Scholar
  90. 90.
    Bower JM, Bolouri H (eds) (2001) Computational modeling of genetic and biochemical networks. MIT Press, CamdridgeGoogle Scholar
  91. 91.
    Kauffman SA (1993) The origins of order. Oxford University Press, New YorkGoogle Scholar
  92. 92.
    Thieffry D, Thomas R (1998) Qualitative analysis of gene networks. Pac Symp Biocomput 3:77–88Google Scholar
  93. 93.
    Reddy VN, Mavrovouniotis ML, Liebman ML (1993) Petri net representation in metabolic pathways. Proc Int Conf Intell Syst Mol Biol 1:328–336Google Scholar
  94. 94.
    Regev A, Silverman W, Shapiro E (2001) Representation and simulation of bio-chemical processes using the pcalculus process algebra. Pac Symp Biocomput 6:459–470Google Scholar
  95. 95.
    Cardelli L (2005) Abstract machines of systems biology. Trans Comput Syst Biol 3737: 145–168Google Scholar
  96. 96.
    Calzonne L, Fages F, Soliman S (2006) BIOCHAM. An environment for modeling biological systems and formalizing experimental knowledge. Bioinformatics 22(14):1805–1807Google Scholar
  97. 97.
    Cruz J, Barahona P (2003) Constraint satisfaction differential problems. In: Proceedings of CP03. Lecture notes in computer science, vol 2833, pp 259–273Google Scholar
  98. 98.
    Cruz J, Barahona P (2005) Constraint reasoning in deep biomedical models. Artif Intell Med 34:77–88CrossRefGoogle Scholar
  99. 99.
    Bockmayr A, Courtois A (2002) Using hybrid concurrent constraint programming to model dynamic biological systems. In: ICLP02. Lecture notes in computer science, vol 2401, pp 85–99Google Scholar
  100. 100.
    Dooms G, Deville Y, Dupont P (2005) CP (Graph): introducing a graph computation domain in constraint programming. In: Proceedings of CP05. Lecture notes in computer science, vol 3709, pp 211–225Google Scholar
  101. 101.
    Gebser M, Schaub T, Thiele S, Usadel B, Veber P (2008) Detecting inconsistencies in large influence networks with answer set programming. In: International conference on logic programming, 2008Google Scholar
  102. 102.
    Dworschak S, Grell S, Nikiforova VJ, Schaub T, Selbig J (2008) Modeling biological networks by action languages via answer set programming. Constraints 13(1–2):21–65zbMATHCrossRefMathSciNetGoogle Scholar
  103. 103.
    Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2011

Authors and Affiliations

  • Pedro Barahona
    • 1
  • Ludwig Krippahl
  • Olivier Perriquet
  1. 1.Centro de Inteligência Artificial, Dep. de InformáticaUniversidade Nova de LisboaLisbonPortugal

Personalised recommendations