Skip to main content

Cloud Services for Efficient Ab Initio Predictions of 3D Protein Structures

  • Chapter
  • First Online:
  • 806 Accesses

Part of the book series: Computational Biology ((COBO,volume 28))

Abstract

Computational methods for protein structure prediction enable determination of a three-dimensional structure of a protein based on its pure amino acid sequence. However, conventional calculations of protein structure may be time-consuming and may require ample computational resources, especially when carried out with the use of ab initio methods. In this chapter, we will see how Cloud Services may help to solve these problems by scaling the computations in a role-based and queue-based Cloud4PSP system, deployed in the Microsoft Azure cloud. The chapter shows the system architecture, the Cloud4PSP processing model, and results of various scalability tests that speak in favor of the presented architecture.

Cloud computing is the third wave of the digital revolution

Lowell McAdam, Verizon Communications, 2013

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    GUID—Globally Unique Identifier.

References

  1. Arnold, K., Bordoli, L., Kopp, J., Schwede, T.: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22(2), 195–201 (2006)

    Article  Google Scholar 

  2. Berman, H., et al.: The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)

    Article  Google Scholar 

  3. Bondi, A.: Characteristics of scalability and their impact on performance. In: 2nd International Workshop on Software and Performance, WOSP 2000, pp. 195–203 (2000)

    Google Scholar 

  4. Case, D., Cheatham 3rd, T., Darden, T., Gohlke, H., Luo, R., Merz, K.J., Onufriev, A., Simmerling, C., Wang, B., Woods, R.: The Amber biomolecular simulation programs. J. Comput. Chem. 26, 1668–1688 (2005)

    Article  Google Scholar 

  5. Chen, C., Huang, Y., Ji, X., Xiao, Y.: Efficiently finding the minimum free energy path from steepest descent path. J. Chem. Phys. 138(16), 164122 (2013)

    Article  Google Scholar 

  6. Chivian, D., Kim, D.E., Malmström, L., Bradley, P., Robertson, T., Murphy, P., Strauss, C.E., Bonneau, R., Rohl, C.A., Baker, D.: Automated prediction of CASP-5 structures using the Robetta server. Proteins: Struct. Funct. Bioinf. 53(S6), 524–533 (2003)

    Article  Google Scholar 

  7. Cornell, W., Cieplak, P., Bayly, C., Gould, I., Merz, K.J., Ferguson, D., Spellmeyer, D., Fox, T., Caldwell, J., Kollman, P.: A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117, 5179–5197 (1995)

    Article  Google Scholar 

  8. De Vries, S., van Dijk, A., Krzeminski, M., van Dijk, M., Thureau, A., Hsu, V., Wassenaar, T., Bonvin, A.: HADDOCK versus HADDOCK: new features and performance of HADDOCK2.0 on the CAPRI targets. Proteins 69, 726–733 (2007)

    Article  Google Scholar 

  9. Edic, P., Isaacson, D., Saulnier, G., Jain, H., Newell, J.: An iterative Newton-Raphson method to solve the inverse admittivity problem. IEEE Trans. Biomed. Eng. 45(7), 899–908 (1998)

    Article  Google Scholar 

  10. Eswar, N., Webb, B., Marti-Renom, M.A., Madhusudhan, M., Eramian, D., Shen, M., Pieper, U., Sali, A.: Comparative Protein Structure Modeling Using MODELLER, chap. 5. Wiley, New York (2007)

    Article  Google Scholar 

  11. Farkas, Z., Kacsuk, P.: P-GRADE portal: a generic workflow system to support user communities. Future Gener. Comput. Syst. 27(5), 454–465 (2011)

    Article  Google Scholar 

  12. Ferrari, T., Gaido, L.: Resources and services of the EGEE production infrastructure. J. Grid Comput. 9, 119–133 (2011)

    Article  Google Scholar 

  13. Fletcher, R., Powell, M.: A rapidly convergent descent method for minimization. Comput. J. 6(2), 163–168 (1963)

    Article  MathSciNet  Google Scholar 

  14. Frishman, D., Argos, P.: 75% accuracy in protein secondary structure prediction. Proteins 27, 329–335 (1997)

    Article  Google Scholar 

  15. Garnier, J., Gibrat, J., Robson, B.: GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. 266, 540–53 (1996)

    Article  Google Scholar 

  16. Gesing, S., Grunzke, R., Krüger, J., Birkenheuer, G., Wewior, M., Schäfer, P., et al.: A single sign-on infrastructure for science gateways on a use case for structural bioinformatics. J. Grid Comput. 10, 769–790 (2012)

    Article  Google Scholar 

  17. Gosk, P.: Modeling of protein structures using cloud computing. Master’s thesis, Institute of Informatics, Silesian University of Technology, Gliwice, Poland (2013)

    Google Scholar 

  18. Gu, J., Bourne, P.: Structural Bioinformatics (Methods of Biochemical Analysis), 2nd edn. Wiley-Blackwell, Hoboken (2009)

    Google Scholar 

  19. Herrmann, T., Güntert, P., Wüthrich, K.: Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319, 209–227 (2002)

    Article  Google Scholar 

  20. Hovmöller, S., Zhou, T., Ohlson, T.: Conformations of amino acids in proteins. Acta Cryst. D58, 768–776 (2002)

    Google Scholar 

  21. Hupfeld, F., Cortes, T., Kolbeck, B., Stender, J., Focht, E., Hess, M., et al.: The XtreemFS architecture - a case for object-based file systems in Grids. Concurrency Computat.: Pract. Exper. 20(17), 2049–2060 (2008)

    Article  Google Scholar 

  22. Insilicos: Rosetta@Cloud: Macromolecular modeling in the Cloud. Fact Sheet (2012). Accessed 9 Mar 2018. https://rosettacloud.files.wordpress.com/2012/08/rc-fact-sheet_bp5-en2a.pdf

  23. Jmol Homepage: Jmol: an open-source Java viewer for chemical structures in 3D (2018) Accessed 7 May 2018. http://www.jmol.org

  24. Kacsuk, P., Farkas, Z., Kozlovszky, M., Hermann, G., Balasko, A., Karóczkai, K., Márton, I.: WS-PGRADE/gUSE generic DCI gateway framework for a large variety of user communities. J. Grid Comput. 10(4), 601–630 (2012)

    Article  Google Scholar 

  25. Kaján, L., Yachdav, G., Vicedo, E., Steinegger, M., Mirdita, M., Angermüller, C., Böhm, A., Domke, S., Ertl, J., Mertes, C., Reisinger, E., Staniewski, C., Rost, B.: Cloud prediction of protein structure and function with PredictProtein for Debian. BioMed Res Int. 2013(398968), 1–6 (2013)

    Article  Google Scholar 

  26. Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., Xu, J.: Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 7, 1511–1522 (2012)

    Article  Google Scholar 

  27. Kelley, L., Sternberg, M.: Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protoc. 4(3), 363–371 (2009)

    Article  Google Scholar 

  28. Kessel, A., Ben-Tal, N.: Introduction to Proteins: Structure, Function, and Motion. Chapman & Hall/CRC Mathematical & Computational Biology. CRC Press, Boca Raton (2010)

    Book  Google Scholar 

  29. Kim, D., Chivian, D., Baker, D.: Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 32(Suppl 2), W526–31 (2004)

    Article  Google Scholar 

  30. Kollman, P.: Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules. Acc. Chem. Res. 29, 461–469 (1996)

    Article  Google Scholar 

  31. Krampis, K., Booth, T., Chapman, B., Tiwari, B., et al.: Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinf. 13, 42 (2012)

    Article  Google Scholar 

  32. Laganà, A., Costantini, A., Gervasi, O., Lago, N.F., Manuali, C., Rampino, S.: COMPCHEM: progress towards GEMS a Grid empowered molecular simulator and beyond. J. Grid Comput. 8(4), 571–586 (2010)

    Article  Google Scholar 

  33. Lampio, A., Kilpeläinen, I., Pesonen, S., Karhi, K., Auvinen, P., Somerharju, P., Kääriäinen, L.: Membrane binding mechanism of an RNA virus-capping enzyme. J. Biol. Chem. 275(48), 37853–9 (2000)

    Article  Google Scholar 

  34. Leach, A.: Molecular Modelling: Principles and Applications, 2nd edn. Pearson Education EMA, Essex (2001)

    Google Scholar 

  35. Leaver-Fay, A., Tyka, M., Lewis, S., Lange, O., Thompson, J., Jacak, R., et al.: ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–74 (2011)

    Article  Google Scholar 

  36. Lesk, A.: Introduction to Protein Science: Architecture, Function, and Genomics, 2nd edn. Oxford University Press, NY (2010)

    Google Scholar 

  37. Mell, P., Grance, T.: The NIST definition of Cloud Computing. Special Publication 800-145 (2011). Accessed 7 May 2018. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

  38. Mrozek, D.: High-Performance Computational Solutions in Protein Bioinformatics. Springer Briefs in Computer Science. Springer International Publishing, Berlin (2014)

    Google Scholar 

  39. Mrozek, D., Małysiak-Mrozek, B., Kłapciński, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)

    Article  Google Scholar 

  40. Mrozek, D., Gosk, P., Małysiak-Mrozek, B.: Scaling Ab Initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comput. 13, 561–585 (2015)

    Article  Google Scholar 

  41. Mrozek, D., Kłapciński, A., Małysiak-Mrozek, B.: Orchestrating task execution in Cloud4PSi for scalable processing of macromolecular data of 3D protein structures. In: Nguyen, N.T., Tojo, S., Nguyen, L.M., Trawiński, B. (eds.) Intelligent Information and Database Systems. Lecture Notes in Computer Science, vol. 10192, pp. 723–732. Springer International Publishing, Cham (2017)

    Chapter  Google Scholar 

  42. Pierce, L., Salomon-Ferrer, R., de Oliveira, C., McCammon, J., Walker, R.: Routine access to millisecond time scale events with accelerated molecular dynamics. J. Chem. Theory Comput. 8(9), 2997–3002 (2012)

    Article  Google Scholar 

  43. Ponder, J.: TINKER - software tools for molecular design (2001), Dept. of Biochemistry & Molecular Biophysics, Washington University, School of Medicine, St. Louis

    Google Scholar 

  44. Ramachandran, G., Ramakrishnan, C., Sasisekaran, V.: Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7, 95–9 (1963)

    Article  Google Scholar 

  45. Rost, B., Liu, J.: The PredictProtein server. Nucleic Acids Res. 31(13), 3300–3304 (2003)

    Article  Google Scholar 

  46. Schwieters, C., Kuszewski, J., Tjandra, N., Clore, G.: The Xplor-NIH NMR molecular structure determination package. J. Magn. Reson. 160, 65–73 (2003)

    Article  Google Scholar 

  47. Shanno, D.: On Broyden-Fletcher-Goldfarb-Shanno method. J. Optimiz Theory Appl. 46 (1985)

    Google Scholar 

  48. Shaw, D.E., Dror, R.O., Salmon, J.K., Grossman, J.P., Mackenzie, K.M., Bank, J.A., Young, C., Deneroff, M.M., Batson, B., Bowers, K.J., Chow, E., Eastwood, M.P., Ierardi, D.J., Klepeis, J.L., Kuskin, J.S., Larson, R.H., Lindorff-Larsen, K., Maragakis, P., Moraes, M.A., Piana, S., Shan, Y., Towles, B.: Millisecond-scale molecular dynamics simulations on Anton. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 39:1–39:11. SC ’09, ACM, New York, NY, USA (2009)

    Google Scholar 

  49. Shen, Y., Vernon, R., Baker, D., Bax, A.: De novo protein structure generation from incomplete chemical shift assignments. J. Biomol. NMR 43, 63–78 (2009)

    Article  Google Scholar 

  50. Shirts, M., Pande, V.: COMPUTING: screen savers of the world unite!. Science 290(5498), 1903–4 (2000)

    Article  Google Scholar 

  51. Söding, J.: Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7), 951–960 (2005)

    Article  Google Scholar 

  52. Streit, A., Bala, P., Beck-Ratzka, A., Benedyczak, K., Bergmann, S., Breu, R., et al.: Unicore 6 - recent and future advancements. JUEL 4319 (2010)

    Google Scholar 

  53. Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A., Berendsen, H.: GROMACS: fast, flexible, and free. J. Comput. Chem. 26, 1701–1718 (2005)

    Article  Google Scholar 

  54. Warecki, S., Znamirowski, L.: Random simulation of the nanostructures conformations. In: Proceedings of International Conference on Computing, Communication and Control Technology, vol. 1, pp. 388–393. The International Institute of Informatics and Systemics, Austin, Texas (2004)

    Google Scholar 

  55. Wassenaar, T.A., van Dijk, M., Loureiro-Ferreira, N., van der Schot, G., de Vries, S.J., Schmitz, C., van der Zwan, J., Boelens, R., Giachetti, A., Ferella, L., Rosato, A., Bertini, I., Herrmann, T., Jonker, H.R., Bagaria, A., Jaravine, V., Güntert, P., Schwalbe, H., Vranken, W.F., Doreleijers, J.F., Vriend, G., Vuister, G., Franke, D., Kikhney, A., Svergun, D.I., Fogh, R.H., Ionides, J., Laue, E.D., Spronk, C., Jurks̆a, S., Verlato, M., Badoer, S., Dal Pra, S., Mazzucato, M., Frizziero, E., Bonvin, A.M.: WeNMR: structural biology on the Grid. J. Grid Comput. 10(4), 743–767 (2012)

    Article  Google Scholar 

  56. Wu, S., Skolnick, J., Zhang, Y.: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol. 5(17) (2007)

    Article  Google Scholar 

  57. Xu, D., Zhang, Y.: Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 80(7), 1715–35 (2012)

    Google Scholar 

  58. Xu, J., Li, M., Kim, D., Xu, Y.: RAPTOR: optimal protein threading by linear programming, the inaugural issue. J. Bioinform. Comput. Biol. 1(1), 95–117 (2003)

    Article  Google Scholar 

  59. Yang, Y., Faraggi, E., Zhao, H., Zhou, Y.: Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27(15), 2076–2082 (2011)

    Article  Google Scholar 

  60. Zhang, Y.: Progress and challenges in protein structure prediction. Curr. Opin. Struct. Biol. 18(3), 342–348 (2008)

    Article  Google Scholar 

  61. Znamirowski, L.: Non-gradient, sequential algorithm for simulation of nascent polypeptide folding. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M., Dongarra, J.J. (eds.) Computational Science - ICCS 2005. Lecture Notes in Computer Science, vol. 3514, pp. 766–774. Springer, Berlin (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dariusz Mrozek .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mrozek, D. (2018). Cloud Services for Efficient Ab Initio Predictions of 3D Protein Structures. In: Scalable Big Data Analytics for Protein Bioinformatics. Computational Biology, vol 28. Springer, Cham. https://doi.org/10.1007/978-3-319-98839-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98839-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98838-2

  • Online ISBN: 978-3-319-98839-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics