An Efficient Algorithm for Runtime Minimum Cost Data Storage and Regeneration for Business Process Management in Multiple Clouds

  • Junhua Zhang
  • Dong Yuan
  • Lizhen CuiEmail author
  • Bing Bing Zhou
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 342)


The proliferation of cloud computing provides flexible ways for users to utilize cloud resources to cope with data complex applications, such as Business Process Management (BPM) System. In the BPM system, users may have various usage manner of the system, such as upload, generate, process, transfer, store, share or access variety kinds of data, and these data may be complex and very large in size. Due to the pas-as-you-go pricing model of cloud computing, improper usage of cloud resources will incur high cost for users. Hence, for a typical BPM system usage, data could be regenerated, transferred and stored with multiple clouds, a data storage, transfer and regeneration strategy is needed to reduce the cost on resource usage. The current state-of-art algorithm can find a strategy that achieves minimum data storage, transfer and computation cost, however, this approach has very high computation complexity and is neither efficient nor practical to be applied at runtime. In this paper, by thoroughly investigating the trade-off problem of resources utilization, we propose a Provenance Candidates Elimination algorithm, which can efficiently find the minimum cost strategy for data storage, transfer and regeneration. Through comprehensive experimental evaluation, we demonstrate that our approach can calculate the minimum cost strategy in milliseconds, which outperforms the exiting algorithm by 2 to 4 magnitudes.


Cloud computing Business Process Management Datasets storage and regeneration 



The research work was supported by the National Key R&D Program (2017YFB1400102, 2016YFB1000602), NSFC (61572295), SDNSFC (No. ZR2017ZB0420), and Shandong Major scientific and technological innovation projects (2018YFJH0506).


  1. 1.
    Zhang, Q., Zhani, M.F., Boutaba, R., Hellerstein, J.L.: Dynamic heterogeneity-aware resource provisioning in the cloud. IEEE Trans. Cloud Comput. 2(1), 14–28 (2014)CrossRefGoogle Scholar
  2. 2.
    Szalay, A., Gray, J.: 2020 computing: science in an exponential world. Nature 440(7083), 413–414 (2006)CrossRefGoogle Scholar
  3. 3.
    Weske, M.: Business process management architectures. Business Process Management, pp. 333–371. Springer, Heidelberg (2012). Scholar
  4. 4.
    Burton, A., Treloar, A.: Publish my data: a composition of services from ANDS and ARCS. In: Fifth IEEE International Conference on e-Science, pp. 164–170. IEEE (2009)Google Scholar
  5. 5.
    Agarwala, S., Jadav, D., Bathen, L.A.: iCostale: adaptive cost optimization for storage clouds. In: 4th International Conference on Cloud Computing, pp. 436–443. IEEE (2011)Google Scholar
  6. 6.
    Yuan, D., Cui, L., Li, W., Liu, X., Yang, Y.: An algorithm for finding the minimum cost of storing and regenerating datasets in multiple clouds. IEEE Trans. Cloud Comput. 6, 519–531 (2015)CrossRefGoogle Scholar
  7. 7.
    Deng, K., Song, J., Ren, K., Yuan, D., Chen, J.: Graph-cut based coscheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, pp. 34–41. IEEE Computer Society (2011)Google Scholar
  8. 8.
    Li, W., Yang, Y., Chen, J., Yuan, D.: A cost-effective mechanism for cloud data reliability management based on proactive replica checking. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp. 564–571. IEEE Computer Society (2012)Google Scholar
  9. 9.
    Foster, I., Vockler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of 14th International Conference on Scientific and Statistical Database Management, pp. 37–46. IEEE (2002)Google Scholar
  10. 10.
    Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.I.: Provenance for the cloud, pp. 14–15 (2010)Google Scholar
  11. 11.
    Gunda, P.K., Ravindranath, L., Thekkath, C.A., Yu, Y., Zhuang, L.: Nectar: automatic management of data and computation in datacenters. In: OSDI, pp. 1–8 (2010)Google Scholar
  12. 12.
    Yuan, D., Yang, Y., Liu, X., Chen, J.: A cost-effective strategy for intermediate data storage in scientific cloud workflow systems. In: Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2010)Google Scholar
  13. 13.
    Yuan, D., Liu, X., Yang, Y.: Dynamic on-the-fly minimum cost benchmarking for storing generated scientific datasets in the cloud. IEEE Trans. Comput. 64(10), 2781–2795 (2015)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Joe-Wong, C., Sen, S., Lan, T., Chiang, M.: Multiresource allocation: fairness-efficiency tradeoffs in a unifying framework. IEEE/ACM Trans. Netw. (TON) 21(6), 1785–1798 (2013)CrossRefGoogle Scholar
  15. 15.
    Yuan, D., et al.: An algorithm for cost-effectively storing scientific datasets with multiple service providers in the cloud. In: 2013 IEEE 9th International Conference on eScience (eScience), pp. 285–292 (2013)Google Scholar
  16. 16.
    Yuan, D., Yang, Y., Liu, X., Chen, J.: On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems. J. Parallel Distrib. Comput. 71(2), 316–332 (2011)CrossRefGoogle Scholar
  17. 17.
    Yuan, D., et al.: A highly practical approach toward achieving minimum data sets storage cost in the cloud. IEEE Trans. Parallel Distrib. Syst. 24(6), 1234–1244 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Junhua Zhang
    • 1
  • Dong Yuan
    • 2
  • Lizhen Cui
    • 1
    Email author
  • Bing Bing Zhou
    • 2
  1. 1.School of Computer Science and TechnologyShandong UniversityJinanChina
  2. 2.School of Information TechnologyThe University of SydneySydneyAustralia

Personalised recommendations