Exploiting the Parallel Execution of Homology Workflow Alternatives in HPC Compute Clouds

  • Kary A. C. S. OcañaEmail author
  • Daniel de Oliveira
  • Vítor Silva
  • Silvia Benza
  • Marta Mattoso
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8954)


Homology modeling (HM) plays an important role in drug discovery. HM analysis aims at predicting a 3D model from a biological sequence in order to discover new drugs. There are several problems in executing an HM analysis in large-scale, such as multiple software to be evaluated, the management of the parallel execution, and results analysis, e.g. browsing manually all results to find which structure was derived from which program with good quality. Scientific Workflow Management System (SWfMS) with parallelism and provenance support can aid the large-scale HM executions by addressing the result analysis. However, before submitting the HM workflow for execution, it has to be specified along with its several alternatives (also called variants), as considered in this paper. Managing HM workflow variations is a complex task to be accomplished even with the help of a SWfMS. In this paper, we propose SciSamma (Structural Approach and Molecular Modeling Analyses), an abstract representation of HM workflows inspired in the concept of software product lines (SPL). SciSamma models HM workflow variants to execute with parallel processing in the cloud using SciCumulus SWfMS. We evaluated SciSamma with two common variants using 100 protease enzymes of protozoan genomes. Both variations presented scalability with performance improvements (dropping from 8 h to 27 min using 32 Amazon’s large virtual machines). While evaluating the two workflow variants, through provenance queries, they present the same quality in biological results, but the difference in execution time between them was around 40 %.


Cloud Workflow Homology modeling Provenance data 


  1. 1.
    Cavasotto, C.N., Phatak, S.S.: Homology modeling in drug discovery: current trends and applications. Drug Discov. Today. 14, 676–683 (2009)CrossRefGoogle Scholar
  2. 2.
    Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., Myers, J.: Examining the challenges of scientific workflows. Computer 40, 24–32 (2007)CrossRefGoogle Scholar
  3. 3.
    Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10, 11–21 (2008)CrossRefGoogle Scholar
  4. 4.
    Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for Pegasus: creating large-scale scientific applications using semantic representations of computational workflows. In: The National Conference on Artificial Intelligence, pp. 1767–1774, Vancouver, BC, Canada (2007)Google Scholar
  5. 5.
    Deelman, E., Mehta, G., Singh, G., Su, M.-H., Vahi, K.: Pegasus: mapping large-scale workflows to distributed resources. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 376–394. Springer, London (2007)CrossRefGoogle Scholar
  6. 6.
    Santos, I., Dias, J., Oliveira, D., Ogasawara, E., Ocaña, K., Mattoso, M.: Runtime dynamic structural changes of scientific workflows in clouds. In: Proceedings of the IEEE/ACM 6th International Workshop on Clouds and (eScience) Applications Management – CloudAM, pp. 417–422. Dresden, Germany (2013)Google Scholar
  7. 7.
    Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: SciCumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: Proceedings of the 3rd International Conference on Cloud Computing, pp. 378–385. IEEE, Washington, DC, USA (2010)Google Scholar
  8. 8.
    Costa, F., Silva, V., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with PROV: a practical approach. In: Proceedings of the Joint EDBT/ICDT 2013 - Workshops on EDBT 2013, pp. 282–289. ACM Press, NY, USA (2013)Google Scholar
  9. 9.
    Moreau, L., Groth, P.: Provenance: an introduction to PROV. In: Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 3(4), pp. 1-129. Morgan & Claypool Publishers, San Rafael (2013)Google Scholar
  10. 10.
    Shah, F., Mukherjee, P., Desai, P., Avery, M.: Computational approaches for the discovery of cysteine protease inhibitors against Malaria and SARS. Curr. Comput. Aided-Drug Des. 6, 1–23 (2010)CrossRefGoogle Scholar
  11. 11.
    Lindoso, J.A.L., Lindoso, A.A.B.P.: Neglected tropical diseases in Brazil. Revista do Instituto de Medicina Tropical de São Paulo. 51, 247–253 (2009)CrossRefGoogle Scholar
  12. 12.
    Oliveira, D., Ocaña, K., Baião, F., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10, 521–552 (2012)CrossRefGoogle Scholar
  13. 13.
    Martí-Renom, M.A., Stuart, A.C., Fiser, A., Sánchez, R., Melo, F., Sali, A.: Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000)CrossRefGoogle Scholar
  14. 14.
    Rose, P.W., Bi, C., Bluhm, W.F., Christie, C.H., Dimitropoulos, D., Dutta, S., Green, R.K., Goodsell, D.S., Prlic, A., Quesada, M., Quinn, G.B., Ramos, A.G., Westbrook, J.D., Young, J., Zardecki, C., Berman, H.M., Bourne, P.E.: The RCSB protein data bank: new resources for research and education. Nucleic Acids Res. 41, D475–D482 (2013)CrossRefGoogle Scholar
  15. 15.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  16. 16.
    Eswar, N., Eramian, D., Webb, B., Shen, M.-Y., Sali, A.: Protein structure modeling with MODELLER. Methods Mol. Biol. 426, 145–159 (2008)CrossRefGoogle Scholar
  17. 17.
    Sutcliffe, M.J., Haneef, I., Carney, D., Blundell, T.L.: Knowledge based modelling of homologous proteins, part I: three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1, 377–384 (1987)CrossRefGoogle Scholar
  18. 18.
    Li, H., Tejero, R., Monleon, D., Bassolino-Klimas, D., Abate-Shen, C., Bruccoleri, R.E., Montelione, G.T.: Homology modeling using simulated annealing of restrained molecular dynamics and conformational search calculations with CONGEN: application in predicting the three-dimensional structure of murine homeodomain Msx-1. Protein Sci. 6, 956–970 (1997)CrossRefGoogle Scholar
  19. 19.
    Xiang, J.Z., Honig, B.: Jackal: a Protein Structure Modeling Package. Columbia University and Howard Hughes Medical Institute, New York (2002)Google Scholar
  20. 20.
    Koehl, P., Delarue, M.: A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modelling. Nat. Struct. Biol. 2, 163–170 (1995)CrossRefGoogle Scholar
  21. 21.
    Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M.: PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283–291 (1993)CrossRefGoogle Scholar
  22. 22.
    Pruitt, K.D., Tatusova, T., Klimke, W., Maglott, D.R.: NCBI reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32–D36 (2009)CrossRefGoogle Scholar
  23. 23.
    Arnold, K., Bordoli, L., Kopp, J., Schwede, T.: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Kary A. C. S. Ocaña
    • 1
    Email author
  • Daniel de Oliveira
    • 2
  • Vítor Silva
    • 1
  • Silvia Benza
    • 1
  • Marta Mattoso
    • 1
  1. 1.Federal University of Rio de Janeiro - COPPE/UFRJRio de JaneiroBrazil
  2. 2.Computing InstituteFluminense Federal University – UFFNiteróiBrazil

Personalised recommendations