Evolutionary Algorithms for the Inverse Protein Folding Problem

  • Sune S. Nielsen
  • Grégoire Danoy
  • Wiktor Jurkowski
  • Roland Krause
  • Reinhard Schneider
  • El-Ghazali Talbi
  • Pascal Bouvry
Living reference work entry


Protein structure prediction is an essential step in understanding the molecular mechanisms of living cells with widespread application in biotechnology and health. The inverse folding problem (IFP) of finding sequences that fold into a defined structure is in itself an important research problem at the heart of rational protein design. In this chapter, a multi-objective genetic algorithm (MOGA) using the diversity-as-objective (DAO) variant of multi-objectivization is presented, which optimizes the secondary structure similarity and the sequence diversity at the same time and hence searches deeper in the sequence solution space. To validate the final optimization results, a subset of the best sequences was selected for tertiary structure prediction. Comparing secondary structure annotation and tertiary structure of the predicted model to the original protein structure demonstrates that relying on fast approximation during the optimization process permits to obtain meaningful sequences.


Genetic algorithm Diversity preservation Inverse folding problem 



Work was funded by the National Research Fund of Luxembourg (FNR) as part of the EVOPERF project at the University of Luxembourg with the AFR contract no. 1356145. Experiments were carried out using the HPC facility of the University of Luxembourg [31].


  1. 1.
    Alba E, Dorronsoro B (2005) The exploration/exploitation tradeoff in dynamic cellular genetic algorithms. IEEE Trans Evol Comput 9(2):126–142CrossRefGoogle Scholar
  2. 2.
    Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2002) Molecular biology of the cell. Garland Science, New YorkGoogle Scholar
  3. 3.
    Bellows ML, Fung HK, Taylor MS, Floudas CA, Lopez de Victoria A, Morikis D (2010) New compstatin variants through two de novo protein design frameworks. Biophys J 98(10):2337–2346CrossRefGoogle Scholar
  4. 4.
    Bellows ML, Taylor MS, Cole PA, Shen L, Siliciano RF, Fung HK, Floudas CA (2010) Discovery of entry inhibitors for HIV-1 via a new de novo protein design framework. Biophys J 99(10):3445–3453CrossRefGoogle Scholar
  5. 5.
    Bowie JU, Lüthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science (New York, N.Y.) 253(5016):164–170Google Scholar
  6. 6.
    Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) Charmm – a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4(2):187–217CrossRefGoogle Scholar
  7. 7.
    Chen W, Brühlmann F, Richins RD, Mulchandani A (1999) Engineering of improved microbes and enzymes for bioremediation. Curr Opin Biotechnol 10(2):137–141CrossRefGoogle Scholar
  8. 8.
    De Jong AK (1975) Analysis of the behavior of a class of genetic adaptive systems. PhD thesis, University of Michigan, Ann Arbor. Dissertation Abstracts International 36(10):5140B, University Microfilms Number 76–9381Google Scholar
  9. 9.
    Deb K, Saha A (2010) Finding multiple solutions for multimodal optimization problems using a multi-objective evolutionary approach. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, pp 447–454Google Scholar
  10. 10.
    Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197CrossRefGoogle Scholar
  11. 11.
    Drexler KE (1981) Molecular engineering: an approach to the development of general capabilities for molecular manipulation. Proc Natl Acad Sci 78(9):5275–5278CrossRefGoogle Scholar
  12. 12.
    Fung HK, Floudas CA, Taylor MS, Zhang L, Morikis D (2008) Toward full-sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys J 94(2):584–599CrossRefGoogle Scholar
  13. 13.
    Goldberg DE, Richardson J (1987) Genetic algorithms with sharing for multimodal function optimization. In: Grefenstette JJ (ed) Genetic algorithms and their applications: proceedings of the second international conference on genetic algorithms. Lawrence Erlbaum, Hillsdale, pp 41–49Google Scholar
  14. 14.
    Gutte B, Däumigen M, Wittschieber E (1979) Design, synthesis and characterisation of a 34-residue polypeptide that interacts with nucleic acids. Nature 281(5733):650–655CrossRefGoogle Scholar
  15. 15.
    Harbury PB, Plecs JJ, Tidor B, Alber T, Kim PS (1998) High-resolution protein design with backbone freedom. Science 282(5393):1462–1467CrossRefGoogle Scholar
  16. 16.
    Isogai Y, Ota M, Fujisawa T, Izuno H, Mukai M, Nakamura H, Iizuka T, Nishikawa K (1999) Design and synthesis of a globin fold. Biochemistry 38(23):7431–7443CrossRefGoogle Scholar
  17. 17.
    Jones DT (1994) De novo protein design using pairwise potentials and a genetic algorithm. Protein Sci 3:567–574CrossRefGoogle Scholar
  18. 18.
    Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637CrossRefGoogle Scholar
  19. 19.
    Klein F, Mouquet H, Dosenovic P, Scheid JF, Scharf L, Nussenzweig CM (2013) Antibodies in HIV-1 vaccine development and therapy. Science (New York, N.Y.) 341(6151):1199–204Google Scholar
  20. 20.
    Klepeis JL, Floudas CA, Morikis D, Tsokos CG, Lambris JD (2004) Design of peptide analogues with improved activity using a novel de novo protein design approach. Ind Eng Chem Res 43(14):3817–3826CrossRefGoogle Scholar
  21. 21.
    Kuhlman B, Baker D (2000) Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci 97(19):10383–10388CrossRefGoogle Scholar
  22. 22.
    Laredo JLJ, Nielsen SS, Danoy G, Bouvry P, Fernandes CM (2014) Cooperative selection: improving tournament selection via altruism. Accepted for publication in EvoCOP14 – 14th European conference on evolutionary computation in combinatorial optimisationGoogle Scholar
  23. 23.
    Mitra P, Shultis D, Brender JR, Czajka J, Marsh D, Gray F, Cierpicki T, Zhang Y (2013) An evolution-based approach to de novo protein design and case study on mycobacterium tuberculosis. PLoS Comput Biol 9(10):e1003298CrossRefGoogle Scholar
  24. 24.
    Pabo C (1983) Molecular technology. Designing proteins and peptides. Nature 301(5897):200Google Scholar
  25. 25.
    Ponder JW, Richards FM (1987) Tertiary templates for proteins: use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol 193(4):775–791CrossRefGoogle Scholar
  26. 26.
    Rost B, Sander C (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19(1):55–72CrossRefGoogle Scholar
  27. 27.
    Shimodaira H (1997) Dcga: a diversity control oriented genetic algorithm. In: ICTAI, pp 367–374Google Scholar
  28. 28.
    Smadbeck J, Peterson MB, Khoury GA, Taylor MS, Floudas CA (2013) Protein wisdom: a workbench for in silico de novo design of biomolecules. J Vis Exp n77:50476Google Scholar
  29. 29.
    Su A, Mayo SL (1997) Coupling backbone flexibility and amino acid sequence selection in protein design. Protein Sci 6(8):1701–1707CrossRefGoogle Scholar
  30. 30.
    Toffolo A, Benini E (2003) Genetic diversity as an objective in multi-objective evolutionary algorithms. Evol Comput 11(2):151–167CrossRefGoogle Scholar
  31. 31.
    Varrette S, Bouvry P, Cartiaux H, Georgatos F (2014) Management of an academic HPC cluster: the UL experience. In: Proceedings of the 2014 international conference on high performance computing & simulation (HPCS 2014), BolognaGoogle Scholar
  32. 32.
    Voigt CA, Mayo SL, Arnold FH, Wang Z-G (2001) Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci USA 98(7):3778–3783CrossRefGoogle Scholar
  33. 33.
    Wernisch L, Hery S, Wodak S (2000) Automatic protein design with all atom force-fields by exact and heuristic optimization. J Mol Biol 301(3):713–736CrossRefGoogle Scholar
  34. 34.
    Wessing S, Preuss M, Rudolph G (2013) Niching by multiobjectivization with neighbor information: trade-offs and benefits. In: 2013 IEEE congress on evolutionary computation (CEC), pp 103–110Google Scholar
  35. 35.
    Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRefGoogle Scholar
  36. 36.
    Xu J, Zhang Y (2010) How significant is a protein structure similarity with tm-score = 0.5? Bioinformatics 26(7):889–895CrossRefGoogle Scholar
  37. 37.
    Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y (2015) The i-TASSER suite: protein structure and function prediction. Nat Methods 12(1):7–8CrossRefGoogle Scholar
  38. 38.
    Zemla A (2003) LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 31(13):3370–3374CrossRefGoogle Scholar
  39. 39.
    Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins Struct Funct Bioinf 57(4):702–710CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Sune S. Nielsen
    • 1
  • Grégoire Danoy
    • 1
  • Wiktor Jurkowski
    • 2
  • Roland Krause
    • 3
  • Reinhard Schneider
    • 3
  • El-Ghazali Talbi
    • 4
  • Pascal Bouvry
    • 1
  1. 1.Computer Science and Communications (CSC) Research Unit, FSTCUniversity of Luxembourg, 6 avenue de la FonteL-4364 Esch-sur-AlzetteLuxembourg
  2. 2.The Genome Analysis Centre (TGAC)Norwich Research ParkNorwichUK
  3. 3.Luxembourg Centre for Systems Biomedicine (LCSB)University of Luxembourg, 6, avenue du SwingL-4367 BelvauxLuxembourg
  4. 4.Université des sciences et technologies de Lille, INRIA Lille Nord EuropeVilleneuve d’AscqFrance

Personalised recommendations