, Volume 97, Issue 4, pp 337–355 | Cite as

Reasoning task dependencies for robust service selection in data intensive workflows

  • Mingzhong Wang
  • Liehuang Zhu
  • Kotagiri Ramamohanarao


Selecting appropriate services for task execution in workflows should not only consider budget and deadline constraints, but also ensure the best probability that workflow will succeed and minimize the potential loss in case of exceptions. This requirement is more critical for data-intensive applications in grids or clouds since any failure is costly. Therefore, we design a fine-grained risk evaluation model customized for workflows to precisely compute the cost of failure for selected services. In comparison with current course-grained model, ours takes the relation of task dependency into consideration and assigns higher impact factor to tasks at the end. Thereafter, we design the utility function with the model and apply a genetic algorithm to find the optimized service allocations, thereby maximizing the robustness of the workflow while minimizing the possible risk of failure. Experiments and analysis show that the application of customized risk evaluation model into service selection can generally improve the successful probability of a workflow while reducing its exposure to the risk.


Risk evaluation Robust service selection Workflows Task dependency 

Mathematics Subject Classification (2010)

68M14 Distributed systems 



The research work reported in this paper is supported by National Science Foundation of China under Grant No. 61100172 and No. 61272512. A preliminary version of this paper appeared in 2012 IPDPS Workshop of Large Scale Distributed Service-oriented Systems.


  1. 1.
    Cardoso J, Sheth A, Miller J, Arnold J, Kochut K (2004) Quality of service for workflows and web service processes. Web Semant Sci Serv Agents World Wide Web 1(3):281–308CrossRefGoogle Scholar
  2. 2.
    Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Futur Gener Comput Syst 25(5):528–540CrossRefGoogle Scholar
  3. 3.
    Hoffa C, Mehta G, Freeman T, Deelman E, Keahey K, Berriman B, Good J (2008) On the use of cloud computing for scientific workflows. In: Proceedings of the 2008 fourth IEEE international conference on eScience. IEEE computer society, Washington, DC, USA, pp 640–645Google Scholar
  4. 4.
    Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis Y (2011) Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM international conference on management of data. ACM, New York, pp 289–300Google Scholar
  5. 5.
    Kokash N, D’Andrea V (2007) Evaluating quality of web services: a risk-driven approach. In: Abramowicz W (ed) Business information systems. Lecture Notes in Computer Science, vol 4439. Springer, Berlin, pp 180–194Google Scholar
  6. 6.
    Kolisch R, Sprecher A, Drexl A (2005) PSPLIB—project scheduling problem library V2.1. Accessed 28 Mar 2013
  7. 7.
    Lin C, Lu S (2011) Scheduling scientific workflows elastically for cloud computing. In: Proceedings of 2011 IEEE international conference on cloud, Computing, pp 746–747Google Scholar
  8. 8.
    Ma H, Schewe KD, Thalheim B, Wang Q (2009) A theory of data-intensive software services. Serv Orient Comput Appl 3(4):263–283CrossRefGoogle Scholar
  9. 9.
    Meffert K, Rotstan N, Knowles C, Sangiorgi UB (2012) JGAP—Java genetic algorithms and genetic programming package V3.6. Accessed 28 Mar 2013
  10. 10.
    Olston C, Chiou G, Chitnis L, Liu F, Han Y, Larsson M, Neumann A, Rao VB, Sankarasubramanian V, Seth S, Tian C, ZiCornell T, Wang X (2011) Nova: continuous pig/hadoop workflows. In: Proceedings of the 2011 ACM international conference on management of data. ACM, New York,, pp 1081–1090Google Scholar
  11. 11.
    Pettifer S, Ison J, Kalas M, Thorne D, McDermott P, Jonassen I, Liaquat A, Fernandez JM, Rodriguez JM, Partners I, Pisano DG, Blanchet C, Uludag M, Rice P, Bartaseviciute E, Rapacki K, Hekkelman M, Sand O, Stockinger H, Clegg AB, Bongcam-Rudloff E, Salzemann J, Breton V, Attwood TK, Cameron G, Vriend G (2010) The embrace web service collection. Nucleic Acids Res 38:683–688CrossRefGoogle Scholar
  12. 12.
    Qi L, Lin W, Dou W, Jiang J, Chen J (2011) A QoS-aware exception handling method in scientific workflow execution. Concurr Comput Pract Exp 23(16):1951–1968CrossRefGoogle Scholar
  13. 13.
    Rahman M, Ranjan R, Buyya R (2010) Reputation-based dependable scheduling of workflow applications in peer-to-peer grids. Comput Netw 54:3341–3359CrossRefGoogle Scholar
  14. 14.
    Skene J, Raimondi F, Emmerich W (2010) Service-level agreements for electronic services. IEEE Trans Softw Eng 36(2):288–304CrossRefGoogle Scholar
  15. 15.
    Vanhatalo J, Völzer H, Leymann F, Moser S (2008) Automatic workflow graph refactoring and completion. In: Proceedings of the 6th international conference on service-oriented computing. Springer, Berlin, pp 100–115Google Scholar
  16. 16.
    Wang M, Ramamohanarao K, Chen J (2009) Trust-based robust scheduling and runtime adaptation of scientific workflow. Concurr Comput Pract Exp 21(16):1982–1998CrossRefGoogle Scholar
  17. 17.
    Wang X, Yeo CS, Buyya R, Su J (2011) Optimizing the makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm. Futur Gener Comput Syst 27(8):1124–1134CrossRefGoogle Scholar
  18. 18.
    Weißbach M, Zimmermann W (2010) Termination analysis of business process workflows. In: Proceedings of the 5th international workshop on enhanced web service technologies. ACM, New York, pp 18–25Google Scholar
  19. 19.
    Yeo CS, Buyya R (2007) Integrated risk analysis for a commercial computing service. In: IEEE international parallel and distributed processing symposium, pp 1–10.Google Scholar
  20. 20.
    Zhang X, Liu C, Nepal S, Chen J (2013a) An efficient quasi-identifier index based approach for privacy preservation over incremental data sets on cloud. J Comput Syst Sci 79(5):542–555CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Zhang X, Liu C, Nepal S, Pandey S, Chen J (2013b) A privacy leakage upper-bound constraint based approach for cost-effective privacy preserving of intermediate datasets in cloud. IEEE Trans Parallel Distrib Syst 24(6):1192–1202CrossRefGoogle Scholar
  22. 22.
    Zhang X, Yang LT, Liu C, Chen J (2013c), A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Trans Parallel Distrib Syst 99 (PrePrints)Google Scholar

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  • Mingzhong Wang
    • 1
  • Liehuang Zhu
    • 1
  • Kotagiri Ramamohanarao
    • 2
  1. 1.School of Computer ScienceBeijing Institute of TechnologyBeijingChina
  2. 2.Department of Computing and Information SystemsThe University of MelbourneVictoriaAustralia

Personalised recommendations