Soft Computing

, Volume 18, Issue 4, pp 773–795 | Cite as

MOIRAE: A computational strategy to extract and represent structural information from experimental protein templates

  • Márcio DornEmail author
  • Luciana S. Buriol
  • Luis C. Lamb
Methodologies and Application


The prediction and analysis of the three- dimensional (3D) structure of proteins is a key research problem in Structural Bioinformatics. The 1990’s Genome Projects resulted in a large increase in the number of available protein sequences. However, the number of identified 3D protein structures have not followed the same growth trend. Currently, the number of available protein sequences greatly exceeds the number of known 3D structures. Many computational methodologies, systems and algorithms have been proposed to address the protein structure prediction problem. However, the problem still remains challenging because of the complexity and high dimensionality of a protein conformational search space. The most significant progress in the last Critical Assessment of protein Structure Prediction was achieved by methods that use database information. Nevertheless, a major challenge remains in the development of better strategies for template identification and representation. This article describes a computational strategy to acquire and represent structural information of experimentally determined 3D protein structures. A clustering strategy was combined with artificial neural networks in order to extract structural information from experimental protein structure templates. In the proposed strategy, the main efforts focus on the acquisition of useful and accurate structural information from 3D protein templates stored in the Protein Data Bank (PDB). The proposed method was tested in twenty protein sequences whose sizes vary from 14 to 70 amino acid residues. Our results show that the proposed method is a good way to extract and represent valuable information obtained from the PDB and also significantly reduce the 3D protein conformational search space.


Three-dimensional protein structure prediction Structural Bioinformatics Hybrid methods in Bioinformatics Artificial Neural Networks 



The authors thank MCT/CNPq, CAPES and FAPERGS (Brazil) for financial support.


  1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefGoogle Scholar
  2. Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(96):223–230CrossRefGoogle Scholar
  3. Anfinsen CB, Haber E, Sela M, White FH Jr (1961) The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc Natl Acad Sci USA 47:1309–1314CrossRefGoogle Scholar
  4. Bajorath J, Stenkamp R, Aruffo A (1994) Knowledge-based model building of proteins: concepts and examples. Protein Sci 2(11):1797–1810Google Scholar
  5. Banner DW, Kokkinidis M, Tsernoglou D (1987) Structure of the ColE1 rop protein at 1.7 A resolution. J Mol Biol 196:657–675CrossRefGoogle Scholar
  6. Ben-David M, Noivirt-Brik O, Prilusky J, Sussman JL, Levy Y (2009) Assessments of CASP8 structure predictions for template free targets. Proteins Struct Funct Bioinf 77(9):50–65Google Scholar
  7. Berman HM, Westbrook J, Feng Z, Gilliland G, Bath TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242CrossRefGoogle Scholar
  8. Blanc E, Fremont V, Sizun P, Meunier S, Van Rietschoten J, Thevand A, Bernassau JM, Darbon H (1996) Solution structure of P01, a natural scorpion peptide structurally analogous to scorpion toxins specific for apamin-sensitive potassium channel. Proteins 24:359–369CrossRefGoogle Scholar
  9. Blundell TL, Sibanda BL, Sternberg MJE, Thornton JM (1987) Knowledge-based prediction of protein structures and the design of novel molecules. Nature 326:347–352CrossRefGoogle Scholar
  10. Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253(5016):164–170CrossRefGoogle Scholar
  11. Bryant SH, Altschul S (1995) Statistics of sequence-structure threading. Curr Opin Struct Biol 5(2):236–244CrossRefGoogle Scholar
  12. Bryson AE, Ho Y-C (1969) Applied optimal control: optimization, estimation, and control, 1st edn. Taylor and Francis, LevittownGoogle Scholar
  13. Cai Z, Xu C, Xu Y, Lu W, Chi CW, Shi Y, Wu J (2004) Solution structure of BmBKTx1, a new BKCa1 channel blocker from the Chinese scorpion Buthus martensi Karsch. Biochemistry 43:3764–3771CrossRefGoogle Scholar
  14. Chagot B, Pimentel C, Dai L, Pil J, Tytgat J, Nakajima T, Corzo G, Darbon H, Ferrat G (2005) An unusual fold for potassium channel blockers: NMR structure of three toxins from the scorpion opisthacanthus madagascariensis. Biochem J 388:263–271CrossRefGoogle Scholar
  15. Clarke ND, Kissinger CR, Desjarlais J, Gilliland GL, Pabo CO (1994) Structural studies of the engrailed homeodomain. Protein Sci 3:1779–1787CrossRefGoogle Scholar
  16. Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A (2009) Evaluation of template-based models in CASP8 with standard measures. Proteins: Struct Funct Bioinf 77(9):18–28Google Scholar
  17. Creighton TE (1990) Protein folding. Biochem J 270:1–16Google Scholar
  18. Crescenzi P, Goldman D, Papadimitriou CH, Piccolboni A, Yannakakis M (1998) On the complexity of protein folding. J Comput Biol 5(3):423–466CrossRefGoogle Scholar
  19. Dauplais M, Lecoq A, Song J, Cotton J, Jamin N, Gilquin B, Roumestand C, Vita C, de Medeiros CL, Rowan EG, Harvey AL, Menez A (1997) On the convergent evolution of animal toxins. Conservation of a diad of functional residues in potassium channel-blocking toxins with unrelated structures. J Biol Chem 272:4302–4309CrossRefGoogle Scholar
  20. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. Atlas Protein Seq Struct 5(3):345–352Google Scholar
  21. Donaldson LW, Wojtyra U, Houry WA (2003) Solution structure of the dimeric zinc binding domain of the chaperone ClpX. J Biol Chem 278:48991–48996CrossRefGoogle Scholar
  22. Dorn M, Breda A, Norberto de Souza O (2008) A hybrid method for the protein structure prediction problem. Lect Notes Bioinf 5167:47– 56Google Scholar
  23. Dorn M, Buriol LS, Lamb LC (2011) A hybrid genetic algorithm for the 3-D protein structure prediction problem using a path-relinking strategy. In: IEEE congress on evolutionary computation (CEC), pp 2709–2716Google Scholar
  24. Floudas CA, Fung HK, McAllister SR, Moennigmann M, Rajgaria R (2006) Advances in protein structure prediction and de novo protein design: a review. Chem Eng Sci 61(3):966–988CrossRefGoogle Scholar
  25. Fraenkel AS (1993) Complexity of protein folding. Bull Math Biol 55(6):1199–1210CrossRefzbMATHGoogle Scholar
  26. Glykos NM, Cesareni G, Kokkinidis M (1999) Protein plasticity to the extreme: changing the topology of a 4-alpha-helical bundle with a single amino acid substitution. Struct Fold Des 7:597–603CrossRefGoogle Scholar
  27. Gouda H, Torigoe H, Saito A, Sato M, Arata Y, Shimada I (1992) Three-dimensional solution structure of the B domain of staphylococcal protein A: comparisons of the solution and crystal structures. Biochemistry 31:9665–9672CrossRefGoogle Scholar
  28. Hart W, Istrail S (1997) Robust proofs of NP-hardness for protein folding: general lattices and energy potentials. J Comput Biol 4(1): 1–22Google Scholar
  29. Henikoff S, Henikoff JG (1993) Performance evaluation of amino acid substitution matrices. Proteins 17(1):49–61CrossRefGoogle Scholar
  30. Hill CP, Yee J, Selsted ME, Eisenberg D (1991) Crystal structure of defensin HNP-3, an amphiphilic dimer: mechanisms of membrane permeabilization. Science 251:1481–1485CrossRefGoogle Scholar
  31. Hovmoller TZ, Ohlson T (2002) Conformation of amino acids in protein. Acta Crystallogr 58(5):768–776Google Scholar
  32. Hutchinson EG, Thornton JM (1996) Promotif: a program to identify and analyze structural motifs in proteins. Protein Sci 5(2):212–220CrossRefGoogle Scholar
  33. Jauch R, Yeo HC, Kolatkar PR, Clarke ND (2007) Assessment of CASP7 structure predictions for template free targets. Proteins: Struct Funct Bioinf 69(8):57–67Google Scholar
  34. Ji H, Shu W, Burling FT, Jiang S, Lu M (1999) Inhibition of human immunodeficiency virus type 1 infectivity by the gp41 core: role of a conserved hydrophobic cavity in membrane fusion. J Virol 73:8578–8586Google Scholar
  35. Jones DT, Taylor WR, Thornton JM (1992) A new approach to protein fold recognition. Nature 358(6381):86–89CrossRefGoogle Scholar
  36. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637CrossRefGoogle Scholar
  37. Kabsch W, Sander C (1984) On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. Proc Natl Acad Sci USA 81(10):1075–1078CrossRefGoogle Scholar
  38. Kolinski A (2004) Protein modeling and structure prediction with a reduced representation. Acta Biochim Pol 51:349–371Google Scholar
  39. Koonin EV, Galperin MY (2002) Computational approaches in comparative genomics, 1st edn. Kluwer, NorwellGoogle Scholar
  40. Koop S, Bordoli L, Battey JN, Kiefer F, Schwede T (2007) Assessment of CASP7 predictions for template-based modleing targets. Proteins: Struct Funct Bioinf 69(8):38–56Google Scholar
  41. Lander ES, Waterman MS (1999) The secrets of life: a mathematician’s introduction to molecular biology. National Academy Press, Washington, DCGoogle Scholar
  42. Laskowski RA, MacArthur MW, Moss DS, Thornton JM (1993) Procheck: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26(2):283–291CrossRefGoogle Scholar
  43. Laskowski RA, Rullmann JA, MacArthur MW, Kaptein R, Thornton JM (1996) AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J Biomol NMR 8:477–486Google Scholar
  44. Lehninger AL, Nelson DL, Cox MM (2005) Princ Biochem, 4th edn. W.H. Freeman, New YorkGoogle Scholar
  45. Lesk AM (2002) Introduction to bioinformatics, 1st edn. Oxford University Press Inc., New YorkGoogle Scholar
  46. Lesk AM (2010) Introduction to protein science, 2nd edn. Oxford University Press, New YorkGoogle Scholar
  47. Levinthal C (1968) Are there pathways for protein folding? J Chim Phys Phys-Chim Biol 65(1):44–45Google Scholar
  48. Lewis PN, Momany FA, Scheraga HA (1973) Chain reversals in proteins. Biochim Biophys Act 303(2):211–229Google Scholar
  49. Liljas A, Liljas L, Pskur J, Lindblom G, Nissen P, Kjeldgaard M (2011) Textbook of structural biology, 1st edn. World Scientific Printers, SingaporeGoogle Scholar
  50. Liu J, Lynch PA, Chien CY, Montelione GT, Krug RM, Berman HM (1997) Crystal structure of the unique RNA-binding domain of the influenza virus NS1 protein. Nat Struct Biol 4:896–899CrossRefGoogle Scholar
  51. Liu J, Zheng Q, Deng Y, Cheng CS, Kallenbach NR, Lu M (2006) A seven-helix coiled coil. Proc Natl Acad Sci USA 103(42):15457–15462CrossRefGoogle Scholar
  52. Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137CrossRefzbMATHMathSciNetGoogle Scholar
  53. Martí-Renom MA, Stuart A, Fiser A, Sanchez A, Mello F, Sali A (2000) Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29(16):291–325CrossRefGoogle Scholar
  54. McLachlan AD (1992) Rapid comparison of protein structures. Acta Crystallogr A38:871–873Google Scholar
  55. Milner-White EJ, Ross BM, Ismail R, Belhadj-Mostefa K, Poet R (1988) One type of gamma-turn, rather than the other gives rise to chain-reversal in proteins. J Mol Biol 204(3):777–782CrossRefGoogle Scholar
  56. Mitra S, Acharya T (2005) Data mining: pratical machine learning tools and techniques, 2nd edn. Elsevier, San FranciscoGoogle Scholar
  57. Morris AL, MacArthur MW, Hutchinson EG, Thornton JM (1992) Stereochemical quality of protein structure coordinates. Proteins: Struct Funct Bioinf 12:345–364Google Scholar
  58. Nagadoi A, Nakazawa K, Uda H, Okuno K, Maekawa T, Ishii S, Nishimura Y (1999) Solution structure of the transactivation domain of ATF-2 comprising a zinc finger-like subdomain and a flexible subdomain. J Mol Biol 287:593–607Google Scholar
  59. Némethy G, Printz MP (1972) The \(\gamma \)-turn, a possible folded conformation of the polypeptide chain. Comparison with the \(\beta \)-turn. Macromolecules 5(6):755CrossRefGoogle Scholar
  60. Neumaier A (1997) Molecular modeling of proteins and mathematical prediction of protein structure. SIAM Rev 39:407–460CrossRefzbMATHMathSciNetGoogle Scholar
  61. Ngo JT, Marks J, Karplus M (1997) The protein folding problem and tertiary structure prediction. In: Merz K Jr, Grand SL (eds) Computational complexity, protein structure prediction and the Levinthal Paradox, pp 435–508. Birkhauser, BostonGoogle Scholar
  62. Osguthorpe DJ (2000) Ab initio protein folding. Curr Opin Struct Biol 10(2):146–152CrossRefGoogle Scholar
  63. Pastor MT, Lopez de la Paz M, Lacroix E, Serrano L, Perez-Paya E (2002) Combinatorial approaches: a new tool to search for highly structured beta-hairpin peptides. Proc Natl Acad Sci USA 99:614–619CrossRefGoogle Scholar
  64. Pauling L, Corey RB, Branson HR (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA 37(4):205–211CrossRefGoogle Scholar
  65. Pedersen JT, Moult J (1997) Protein folding simulations with genetic algorithms and a detailed molecular description. J Mol Biol 269(2):240–259CrossRefGoogle Scholar
  66. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) Ucsf chimera: a visualization system for exploratory research and analysis. J Comput Chem 25(13):1605–1612CrossRefGoogle Scholar
  67. Richardson JS (1981) The anatomy and taxonomy of protein structure. Biopolymers 34:167–339Google Scholar
  68. Rohl CA, Strauss CE, Misura KMS, Baker D (2004) Protein structure prediction using Rosetta. Methods Enzymol 383(2):66–93CrossRefGoogle Scholar
  69. Rose GD, Gierasch LM, Smith JA (1985) Turns in peptides and proteins. Adv Protein Chem 37:1–109CrossRefGoogle Scholar
  70. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRefGoogle Scholar
  71. Sánchez R, Sali A (1997) Advances in comparative protein-structure modeling. Curr Opin Struct Biol 7(2):206–214CrossRefGoogle Scholar
  72. Sarisky CA, Mayo SL (2001) The beta-beta-alpha fold: explorations in sequence space. J Mol Biol 307:1411–1418CrossRefGoogle Scholar
  73. Schwartz R (2008) Biological Modeling and Simulation: a survey of pratical models, algorithms, and numerical methods, 1st edn. MIT Press, LondonGoogle Scholar
  74. Simons KT, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated anneling and Bayesian score functions. J Mol Biol 268(1):209–225CrossRefGoogle Scholar
  75. Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960CrossRefGoogle Scholar
  76. Srinivasan R, Rose GD (1995) LINUS: a hierarchic procedure to predict the fold of a protein. Proteins 22(2):81–99CrossRefGoogle Scholar
  77. Tramontano A (2006) Protein structure prediction: concepts and applications, 1st edn. Wiley, WeinheimGoogle Scholar
  78. Tudor JE, Pallaghy PK, Pennington MW, Norton RS (1996) Solution structure of ShK toxin, a novel potassium channel inhibitor from a sea anemone. Nat Struct Biol 3:317–320CrossRefGoogle Scholar
  79. Tuffery P, Etchebest C, Hazout S, Lavery R (1991) A new approach to the rapid determination of protein side chain conformations. J Biomol Struct Dyn 8(6):1267–1289CrossRefGoogle Scholar
  80. Tugarinov V, Zvi A, Levy R, Anglister J (1999) A cis proline turn linking two beta-hairpin strands in the solution structure of an antibody-bound HIV-1IIIB V3 peptide. Nat Struct Biol 6:331–335Google Scholar
  81. Venkatachalam CM (1968) Stereochemical criteria for polypeptides and proteins. V. Conformation of a system of three linked peptide units. Biopolymers 6(10):1425–1436 Google Scholar
  82. Withers-Ward ES, Mueller TD, Chen IS, Feigon J (2000) Biochemical and structural analysis of the interaction between the UBA(2) domain of the DNA repair protein HHR23A and HIV-1 Vpr. Biochemistry 39:14103–14112Google Scholar
  83. Xu D, Zhang J, Roy A, Zhang A (2011) Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based strcuture refinement. Proteins: Struct Funct Bioinf 79(10):147–160Google Scholar
  84. Yamano A, Heo NH, Teeter MM (1997) Crystal structure of Ser-22/ile-25 form crambin confirms solvent, side chain substate correlations. J Biol Chem 272:9597–9600CrossRefGoogle Scholar
  85. Zerella R, Chen PY, Evans PA, Raine A, Williams DH (2000) Structural characterization of a mutant peptide derived from ubiquitin: implications for protein folding. Protein Sci 9:2142–2150CrossRefGoogle Scholar
  86. Zhang Y (2008B) Progress and challenges in protein structure prediction. Curr Opin Struct Biol 18:342–348CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Márcio Dorn
    • 1
    Email author
  • Luciana S. Buriol
    • 1
  • Luis C. Lamb
    • 1
  1. 1.Institute of InformaticsFederal University of Rio Grande do SulPorto AlegreBrazil

Personalised recommendations