Journal of Grid Computing

, Volume 14, Issue 3, pp 443–461 | Cite as

A Dynamic Cloud Dimensioning Approach for Parallel Scientific Workflows: a Case Study in the Comparative Genomics Domain

  • Rafaelli CoutinhoEmail author
  • Yuri Frota
  • Kary Ocaña
  • Daniel de Oliveira
  • Lúcia M. A. Drummond


Usually, scientists need to execute experiments that demand high performance computing environments and parallel techniques. This is the scenario found in many bioinformatics experiments modeled as scientific workflows, such as phylogenetic and phylogenomic analyses. To execute these experiments, scientists have adopted virtual machines (VMs) instantiated in clouds. Estimating the number of VMs to instantiate is a crucial task to avoid negative impacts on the execution performance and on the financial costs with under or overestimations. Previously, the necessary number of VMs to execute bioinformatics workflows have been estimated by a GRASP heuristic and have been coupled to a Cloud-based Parallel Scientific Workflow Management System. Although this work was a step forward, this approach only provided a static dimensioning. If the characteristics of the environment change (processing capacity, network speed), this static dimensioning may not be suitable. In this way, it is of interest that the dimensioning is adjusted at runtime. To achieve this, we developed a novel framework for monitoring and dynamically dimensioning resources during the execution of parallel scientific workflows in clouds, called Dynamic Dimensioning of Cloud Computing Framework (DDC-F). We have evaluated DDC-F in real executions of bioinformatics workflows. Experiments showed that DDC-F is able to efficiently calculate the number of VMs necessary to execute bioinformatics workflows of Comparative Genomics (CG), also reducing the financial costs, when compared with other works of the related literature.


Cloud computing Virtual machine allocation Scientific workflows 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    hmmbuild/hmmsearch (HMMER3).
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Abouelhoda, M., Issa, S., Ghanem, M.: Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support. BMC Bioinforma. 13(1), 77+ (2012)CrossRefGoogle Scholar
  14. 14.
    Chard, R., Chard, K., Bubendorfer, K., Lacinski, L., Madduri, R., Foster, I.: Cost-Aware Elastic Cloud Provisioning for Scientific Workloads. In: 2015 IEEE 8Th International Conference On Cloud Computing (CLOUD), pp 971–974 (2015)Google Scholar
  15. 15.
    Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming scientific and distributed workflow with Triana services. Concurr. Comput. Pract. Exper. 18(10), 1021–1037 (2006)CrossRefGoogle Scholar
  16. 16.
    Coutinho, R., Drummond, L., Frota, Y., De Oliveira, D.: Optimizing virtual machine allocation for parallel scientific workflows in federated clouds. Fut. Gener. Comput. Syst. 46(0), 51 –68 (2015)CrossRefGoogle Scholar
  17. 17.
    Coutinho, R., Drummond, L., Frota, Y., De Oliveira, D., Ocaña, K.: Evaluating Grasp-Based Cloud Dimensioning for Comparative Genomics: a Practical Approach. In: IEEE International Conference on Cluster Computing (CLUSTER), pp 371–379 (2014)Google Scholar
  18. 18.
    Crawl, D., Wang, J., Altintas, I.: Provenance for MapReduce-based Data-intensive Workflows. In: Proceedings of the 6Th Workshop on Workflows in Support of Large-Scale Science, WORKS ’11, pp 21–30. ACM, NY, USA (2011)Google Scholar
  19. 19.
    Deng, K., Song, J., Ren, K., Iosup, A.: Exploring Portfolio Scheduling forLong-term Execution of Scientific Workloads in IaaS Clouds. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’13, pp 55:1–55:12. ACM, NY, USA (2013)Google Scholar
  20. 20.
    Eddy, S.: A new generation of homology search tools based on probabilistic inference. Genome Informatics. Int. Conf. Genome Inf. 23(5), 205–11 (2009)Google Scholar
  21. 21.
    Emeakaroha, V., Maurer, M., Stern, P., Abaj, P., Brandic, I., Kreil, D.: Managing and optimizing bioinformatics workflows for data analysis in clouds. J. Grid Comput. 11(3), 407–428 (2013)CrossRefGoogle Scholar
  22. 22.
    Felsenstein, J.: PHYLIP - Phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989)Google Scholar
  23. 23.
    Foster, I., Kesselman, C.: The Grid 2, Second Edition: Blueprint for a New Computing Infrastructure (The Elsevier Series in Grid Computing), 2nd edn. Morgan Kaufmann (2003)Google Scholar
  24. 24.
    Gilbert, D.: Sequence file format conversion with commandline readseq. Current Protocols in Bioinformatics Appendix 1, Appendix 1E (2003)Google Scholar
  25. 25.
    Jackson, K.R., Ramakrishnan, L., Runge, K.J., Thomas, R.C.: Seeking Supernovae in the Clouds: a Performance Study. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, pp 421–429. ACM, NY, USA (2010)Google Scholar
  26. 26.
    Lama, P., Zhou, X.: AROMA: Automated Resource Allocation and Configuration of MapReduce Environment in the Cloud. In: Proceedings of the 9th International Conference on Autonomic Computing, ICAC ’12, pp 63–72. ACM, NY, USA (2012)Google Scholar
  27. 27.
    Madera, M., Gough, J.: A comparison of profile hidden markov model procedures for remote homology detection. Nucleic Acids Res. 30(19), 4321–4328 (2002)CrossRefGoogle Scholar
  28. 28.
    Maheshwari, K., Jung, E.S., Meng, J., Morozov, V., Vishwanath, V., Kettimuthu, R.: Workflow performance improvement using model-based scheduling over multiple clusters and clouds. Fut. Gener. Comput. Syst. 54, 206–218 (2016)CrossRefGoogle Scholar
  29. 29.
    Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Cost- and Deadline-constrained Provisioning for Scientific Workflow Ensembles in IaaS Clouds. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp 22:1–22:11. IEEE Computer Society Press, CA, USA (2012)Google Scholar
  30. 30.
    Massi, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation and experience. Parallel Comput. 30(7), 817–840 (2004)CrossRefGoogle Scholar
  31. 31.
    Nguyen, P., Halem, M.: A MapReduce Workflow System for Architecting Scientific Data Intensive Applications. In: Proceedings of the 2Nd International Workshop on Software Engineering for Cloud Computing, SECLOUD ’11, pp 57–63. ACM, NY, USA (2011)Google Scholar
  32. 32.
    Ocaña, K.A., De Oliveira, D., Dias, J., Ogasawara, E., Mattoso, M.: Designing a parallel cloud based comparative genomics workflow to improve phylogenetic analyses. Future Generation Computer Systems 29(8), 2205 –2219 (2013)CrossRefGoogle Scholar
  33. 33.
    Ocaña, K., de Oliveira, D., Ogasawara, E.S., Dv̈ila, A.M.R., Lima, A.A.B., Mattoso, M.: Sciphy: A Cloud-Based Workflow for Phylogenetic Analysis of Drug Targets in Protozoan Genomes. In: De Souza, O.N., Telles, G.P., Palakal, M.J. (eds.) BSB, Lecture Notes in Computer Science, vol. 6832, pp 66–70. Springer (2011)Google Scholar
  34. 34.
    Ocaña, K.A., de Oliveira, D., Dias, J., Ogasawara, E., Mattoso, M.: Optimizing Phylogenetic Analysis Using Scihmm Cloud-based Scientific Workflow. IEEE 9th Int. Conf. e-Sci. 0, 62–69 (2011)Google Scholar
  35. 35.
    Ocaña, K.A., De Oliveira, D., Dias, J., Ogasawara, E., Mattoso, M.: Discovering drug targets for neglected diseases using a pharmacophylogenomic cloud workflow. IEEE 8th Int. Conf. E-Sci. 0, 1–8 (2012)Google Scholar
  36. 36.
    Ocaña, K.A., de Oliveira, D., Horta, F., Dias, J., Ogasawara, E., Mattoso, M.: Exploring Molecular Evolution Reconstruction Using a Parallel Cloud Based Scientific Workflow. In: Advances in Bioinformatics and Computational Biology, Lecture Notes in Computer Science, Vol. 7409, pp 179–191. Springer, Berlin Heidelberg (2012)Google Scholar
  37. 37.
    De Oliveira, D., Ocaña, K.A., Ogasawara, E., Dias, J., Gonlves, J., Baio, F., Mattoso, M.: Performance evaluation of parallel strategies in public clouds: a study with phylogenomic workflows. Fut. Gener. Comput. Syst. 29(7), 1816 –1825 (2013)CrossRefGoogle Scholar
  38. 38.
    De Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: Scicumulus: a Lightweight Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows. In: 3Rd International Conference on Cloud Computing, pp 378–385 (2010)Google Scholar
  39. 39.
    De Oliveira, D., Viana, V., Ogasawara, E., Ocaña, K., Mattoso, M.: Dimensioning the Virtual Cluster for Parallel Scientific Workflows in Clouds. In: Proceedings of the 4Th ACM Workshop on Scientific Cloud Computing, Science Cloud ’13, pp 5–12. ACM, NY, USA (2013)Google Scholar
  40. 40.
    Prodan, R., Wieczorek, M., Fard, H.: Double auction-based scheduling of scientific applications in distributed grid and cloud environments. J. Grid Comput. 9(4), 531–548 (2011)CrossRefGoogle Scholar
  41. 41.
    Ragothaman, A., Boddu, S.C., Kim, N., Feinstein, W., Brylinski, M., Jha, S., Kim, J.: Developing eThread Pipeline Using SAGA-pilot Abstraction for Large-Scale Structural Bioinformatics. BioMed Res. Int. 2014, 1–12 (2014)CrossRefGoogle Scholar
  42. 42.
    Rodero, I., Viswanathan, H., Lee, E.K., Gamell, M., Pompili, D., Parashar, M.: Energy-efficient thermal-aware autonomic management of virtualized hpc cloud infrastructure. J. Grid Comput. 10(3), 447–473 (2012)CrossRefGoogle Scholar
  43. 43.
    Sadooghi, I., Hernandez Martin, J., Li, T., Brandstatter, K., Zhao, Y., Maheshwari, K., Pais Pitta de Lacerda Ruivo, T., Timm, S., Garzoglio, G., Raicu, I.: Understanding the performance and potential of cloud computing for scientific applications. IEEE Trans. Cloud Comput. PP (99), 1–1 (2015)CrossRefGoogle Scholar
  44. 44.
    Shen, Z., Subbiah, S., Gu, X., Wilkes, J.: Cloudscale: Elastic Resource Scaling for Multi-tenant Cloud Systems. In: Proceedings of the 2Nd ACM Symposium on Cloud Computing, SOCC ’11, pp 5:1–5:14. ACM, NY, USA (2011)Google Scholar
  45. 45.
    Sun, X., Fan, L., Yan, L., Kong, L., Ding, Y., Guo, C., Sun, W.: Deliver Bioinformatics Services in Public Cloud: Challenges and Research Framework. In: Proceedings of the 2011 IEEE 8Th International Conference on E-Business Engineering, ICEBE ’11, pp 352–357. IEEE Computer Society, DC, USA (2011)Google Scholar
  46. 46.
    Szabo, C., Sheng, Q., Kroeger, T., Zhang, Y., Yu, J.: Science in the cloud: Allocation and execution of data-intensive scientific workflows. J. Grid Comput. 12(2), 245–264 (2014)CrossRefGoogle Scholar
  47. 47.
    Taylor, I.J., Deelman, E., Gannon, D.B.: Workflows for e-Science: Scientific Workflows for Grids. Springer (2007)Google Scholar
  48. 48.
    Tian, W.: Adaptive Dimensioning of Cloud Data Centers. In: Proceedings of the 8Th International Conference on Dependable, Autonomic and Secure Computing, DASC ’09, pp 5–10. IEEE Computer Society, DC, USA (2009)Google Scholar
  49. 49.
    Walker, E., Guiang, C.: Challenges in Executing Large Parameter Sweep Studies across Widely Distributed Computing Environments. In: Proceedings of the 5Th IEEE Workshop on Challenges of Large Applications in Distributed Environments, CLADE ’07, pp 11–18. ACM, NY, USA (2007)Google Scholar
  50. 50.
    Wang, J., Crawl, D., Altintas, I.: Kepler + Hadoop: A General Architecture Facilitating Data-intensive Applications in Scientific Workflow Systems. In: Proceedings of the 4Th Workshop on Workflows in Support of Large-Scale Science, WORKS ’09, pp 12:1–12:8. ACM, NY, USA (2009)Google Scholar
  51. 51.
    Wozniak, J.M., Armstrong, T.G., Maheshwari, K., Lusk, E.L., Katz, D.S., Wilde, M., Foster, I.T.: Turbine: A distributed memory dataflow engine for high performance many-task applications. Fundam. Inf. J. 128(3), 337–366 (2013)Google Scholar
  52. 52.
    Xiao, Z., Song, W., Chen, Q.: Dynamic resource allocation using virtual machines for cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 24(6), 1107–1117 (2013)CrossRefGoogle Scholar
  53. 53.
    Xu, L., Zeng, Z., Ye, X.: Multi-Objective Optimization Based Virtual Resource Allocation Strategy for Cloud Computing. In: Proceedings of the 11Th International Conference on Computer and Information Science, ICIS ’12, pp 56–61. IEEE Computer Society, DC, USA (2012)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Rafaelli Coutinho
    • 1
    Email author
  • Yuri Frota
    • 2
  • Kary Ocaña
    • 3
  • Daniel de Oliveira
    • 2
  • Lúcia M. A. Drummond
    • 2
  1. 1.Federal Center of Technological EducationCEFETRio de JaneiroBrazil
  2. 2.Institute of ComputingFluminense Federal UniversityNiteróiBrazil
  3. 3.National Laboratory of Scientific ComputingLNCCPetrópolisBrazil

Personalised recommendations