Advertisement

Journal of Grid Computing

, Volume 10, Issue 3, pp 367–393 | Cite as

A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids

  • Qishi WuEmail author
  • Mengxia Zhu
  • Yi Gu
  • Patrick Brown
  • Xukang Lu
  • Wuyin Lin
  • Yangang Liu
Article

Abstract

Next-generation scientific applications feature complex workflows comprised of many computing modules with intricate inter-module dependencies. Supporting such scientific workflows in wide-area networks especially Grids and optimizing their performance are crucial to the success of collaborative scientific discovery. We develop a Scientific Workflow Automation and Management Platform (SWAMP), which enables scientists to conveniently assemble, execute, monitor, control, and steer computing workflows in distributed environments via a unified web-based user interface. The SWAMP architecture is built entirely on a seamless composition of web services: the functionalities of its own are provided and its interactions with other tools or systems are enabled through web services for easy access over standard Internet protocols while being independent of different platforms and programming languages. SWAMP also incorporates a class of efficient workflow mapping schemes to achieve optimal end-to-end performance based on rigorous performance modeling and algorithm design. The performance superiority of SWAMP over existing workflow mapping schemes is justified by extensive simulations, and the system efficacy is illustrated by large-scale experiments on real-life scientific workflows for climate modeling through effective system implementation, deployment, and testing on the Open Science Grid.

Keywords

Distributed computing Scientific workflow Climate modeling Open Science Grid 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afrati, F., Papadimitriou, C., Papageorgiou, G.: Scheduling \(\textnormal{DAG}\)s to minimize time and communication. In: Proc. of the 3rd Aegean Workshop on Computing: VLSI Algorithms and Architectures, pp. 134–138. Springer, Berlin (1988)Google Scholar
  2. 2.
    Agarwalla, B., Ahmed, N., Hilley, D., Ramachandran, U.: Streamline: a scheduling heuristic for streaming application on the Grid. In: Proc. of the 13th Multimedia Comp. and Net. Conf. San Jose, CA (2006)Google Scholar
  3. 3.
    Ahmed, I., Kwok, Y.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9, 872–892 (1998)CrossRefGoogle Scholar
  4. 4.
    Annie, S., Yu, H., Jin, S., Lin, K.-C.: An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 15, 824–834 (2004)CrossRefGoogle Scholar
  5. 5.
    Bandwidth Test Controller: http://www.internet2.edu/performance/bwctl/. Accessed 1 Aug 2012
  6. 6.
    Boeres, C., Filho, J., Rebello, V.: A cluster-based strategy for scheduling task on heterogeneous processors. In: Proc. of 16th Symp. on Comp. Arch. and HPC, pp. 214–221 (2004)Google Scholar
  7. 7.
    Bozdag, D., Catalyurek, U., Ozguner, F.: A task duplication based bottom-up scheduling algorithm for heterogeneous environments. In: Proc. of the 20th IPDPS (2006)Google Scholar
  8. 8.
    Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. JPDC 68(6), 790–808 (2008)zbMATHGoogle Scholar
  9. 9.
    Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming scientific and distributed workflow with triana services. Concurrency and Computation: Practice and Experience, Special Issue: Workflow in Grid Systems 18(10), 1021–1037 (2006). http://www.trianacode.org CrossRefGoogle Scholar
  10. 10.
    Climate and Carbon Research Institute: http://www.ccs.ornl.gov/CCR. Accessed 1 Aug 2012
  11. 11.
    Cordella, L., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proc. of the 3rd Int. Workshop on Graph-based Representations, Italy (2001)Google Scholar
  12. 12.
    DAGMan: http://www.cs.wisc.edu/condor/dagman. Accessed 1 Aug 2012
  13. 13.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proc. of 6th Symp. on Operating System Design and Implementation, San Francisco, CA (2004)Google Scholar
  14. 14.
    Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. J. of Future Generation Comp. Sys. 25(5), 528–540 (2009)CrossRefGoogle Scholar
  15. 15.
    Deelman, E., Singh, G., Su, M., Blythe, J., Gil, A., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13, 219–237 (2005)Google Scholar
  16. 16.
    Dhodhi, M., Ahmad, I., Yatama, A.: An integrated technique for task matching and scheduling onto distributed heterogeneous computing systems. JPDC 62, 1338–1361 (2002)zbMATHGoogle Scholar
  17. 17.
    Distributed computing projects: http://en.wikipedia.org/wiki/List_of_distributed_computing_projects. Accessed 1 Aug 2012
  18. 18.
    Dobber, M., van der Mei, R., Koole, G.: A prediction method for job runtimes on shared processors: survey, statistical analysis and new avenues. Perform. Eval. 64(7–8), 755–781 (2007)CrossRefGoogle Scholar
  19. 19.
    Earth Simulator Center: http://www.jamstec.go.jp/esc. Accessed 1 Aug 2012
  20. 20.
    Earth System Grid (ESG): http://www.earthsystemgrid.org. Accessed 1 Aug 2012
  21. 21.
    Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.-L., Villazon, A., Wieczorek, M.: ASKALON: a Grid application development and computing environment. In: Proc. of the 6th IEEE/ACM Int. Workshop on Grid Comp., pp. 122–131 (2005)Google Scholar
  22. 22.
    Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman, San Francisco (1979)zbMATHGoogle Scholar
  23. 23.
    Gates, M., Warshavsky, A.: Iperf version 2.0.3. http://iperf.sourceforge.net. Accessed 1 Aug 2012
  24. 24.
    Gerasoulis, A., Yang, T.: A comparison of clustering heuristics for scheduling DAGs on multiprocessors. JPDC 16(4), 276–291 (1992)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Globus Replica Location Service: http://www.globus.org/toolkit/data/rls/. Accessed 1 Aug 2012
  26. 26.
  27. 27.
    Gu, Y., Wu, Q.: Maximizing workflow throughput for streaming applications in distributed environments. In: Proc. of the 19th Int. Conf. on Comp. Comm. and Net., Zurich, Switzerland (2010)Google Scholar
  28. 28.
    Gu, Y., Wu, Q.: Optimizing distributed computing workflows in heterogeneous network environments. In: Proc. of the 11th Int. Conf. on Distributed Computing and Networking, Kolkata, India, 3–6 Jan 2010Google Scholar
  29. 29.
    Gu, Y., Wu, Q., Benoit, A., Robert, Y.: Optimizing end-to-end performance of distributed applications with linear computing pipelines. In: Proc. of the 15th Int. Conf. on Para. and Dist. Sys., Shenzhen, China, 8–11 Dec 2009Google Scholar
  30. 30.
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34, 729–732 (2006). http://www.taverna.org.uk CrossRefGoogle Scholar
  31. 31.
    Ilavarasan, E., Thambidurai, P.: Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J. Comp. Sci. 3(2), 94–103 (2007)CrossRefGoogle Scholar
  32. 32.
    Johnston, W.: Computational and data Grids in large-scale science and engineering. J. of Future Generation Comp. Sys. 18(8), 1085–1100 (2002)zbMATHCrossRefGoogle Scholar
  33. 33.
    Kacsuk, P., Farkas, Z., Sipos, G., Toth, A., Hermann, G.: Workflow-level parameter study management in multi-Grid environments by the P-GRADE Grid portal. In: Int. Workshop on Grid Computing Enviornments (2006)Google Scholar
  34. 34.
    Kwok, Y., Ahmad, I.: Dynamic critical-path scheduling: An effective technique for allocating task graph to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)CrossRefGoogle Scholar
  35. 35.
    Kwok, Y., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)CrossRefGoogle Scholar
  36. 36.
    Large Hadron Collider (LHC): http://lhc.web.cern.ch/lhc
  37. 37.
    Laszewski, G., Hategan, M.: Workflow concepts of the \(\textnormal{Java CoG Kit}\). J. Grid Computing 3(3–4), 239–258 (2005)CrossRefGoogle Scholar
  38. 38.
    Lewis, T., EI-Rewini, H.: Introduction to Parallel Computing. Prentice Hall, New York (1992)zbMATHGoogle Scholar
  39. 39.
  40. 40.
    Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the \(\textnormal{K}\)epler system. Concurrency and Computation: Practice and Experience 18(10), 1039–1605 (2006)CrossRefGoogle Scholar
  41. 41.
    Ma, T., Buyya, R.: Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global Grids. In: Proc. of the 17th Int. Symp. on Computer Architecture on HPC, pp. 251–258 (2005)Google Scholar
  42. 42.
    McCreary, C., Khan, A., Thompson, J., McArdle, M.: A comparison of heuristics for scheduling \(\textnormal{DAG}\)s on multiprocessors. In: Proc. of the 8th ISPP, pp. 446–451 (1994)Google Scholar
  43. 43.
    McDermott, W., Maluf, D., Gawdiak, Y., Tran, P.: Airport simulations using distributed computational resources. J. Defense Soft. Eng. 16(6), 7–11 (2003)Google Scholar
  44. 44.
    Messmer, B.: Efficient graph matching algorithms for preprocessed model graphs. PhD thesis, Institute of Computer Science and Applied Mathematics, University of Bern (1996)Google Scholar
  45. 45.
    Monitoring and Discovery System (MDS): http://www.globus.org/toolkit/mds/. Accessed 1 Aug 2012
  46. 46.
    Network weather service: http://nws.cs.ucsb.edu. Accessed 1 Aug 2012
  47. 47.
    One-Way Active Measurement Protocol: http://www.internet2.edu/performance/owamp/. Accessed 1 Aug 2012
  48. 48.
    Open Science Grid: http://www.opensciencegrid.org. Accessed 1 Aug 2012
  49. 49.
    OSCARS: On-demand Secure Circuits and Advance Reservation System: http://www.es.net/oscars. Accessed 1 Aug 2012
  50. 50.
    OSG Resource and Site Validation: http://vdt.cs.wisc.edu/components/osg-rsv.html. Accessed 1 Aug 2012
  51. 51.
    Performance Inspector: http://perfinsp.sourceforge.net. Accessed 1 Aug 2012
  52. 52.
    perfSONAR: http://www.perfsonar.net/. Accessed 1 Aug 2012
  53. 53.
    Portable Batch System: http://www.pbsworks.com/. Accessed 1 Aug 2012
  54. 54.
    Rahman, M., Venugopal, S., Buyya, R.: A dynamic critical path algorithm for scheduling scientific workflow applications on global Grids. In: Proc. of the 3rd IEEE Int. Conf. on e-Sci. and Grid Comp., pp. 35–42 (2007)Google Scholar
  55. 55.
    Ranaweera, A., Agrawal, D.: A task duplication based algorithm for heterogeneous systems. In: Proc. of IPDPS, pp. 445–450 (2000)Google Scholar
  56. 56.
    Rao, N.S.V.: Vector space methods for sensor fusion problems. Opt. Eng. 37(2), 499–504 (1998)CrossRefGoogle Scholar
  57. 57.
    Reliable File Transfer: http://www-unix.globus.org/toolkit/docs/3.2/rft/index.html. Accessed 1 Aug 2012
  58. 58.
    Sekhar, A., Manoj, B., Murthy, C.: A state-space search approach for optimizing reliability and cost of execution in distributed sensor networks. In: Proc. of Int. Workshop on Dist. Comp., pp. 63–74 (2005)Google Scholar
  59. 59.
    Shroff, P., Watson, D., Flann, N., Freund, R.: Genetic simulated annealing for scheduling data-dependent tasks in heterogeneous environments. In: Proc. of Heter. Comp. Workshop, pp. 98–104 (1996)Google Scholar
  60. 60.
    Singh, M., Vouk, M.: Scientific workflows: scientific computing meets transactional workflows. In: Proc. of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions, pp. 28–34. Univ. Georgia, Athens, GA (1996)Google Scholar
  61. 61.
    Spallation Neutron Source: http://neutrons.ornl.gov, http://www.sns.gov. Accessed 1 Aug 2012
  62. 62.
    Storage Resource Broker (SRB): http://www.sdsc.edu/srb/index.php/Main_Page. Accessed 1 Aug 2012
  63. 63.
    Storage Resource Management (SRM): https://sdm.lbl.gov/srm-wg/. Accessed 1 Aug 2012
  64. 64.
  65. 65.
    Swift: http://www.ci.uchicago.edu/swift/main/. Accessed 1 Aug 2012
  66. 66.
    Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Berlin Heidelberg New York (2007)Google Scholar
  67. 67.
    TeraPaths: https://www.racf.bnl.gov/terapaths. Accessed 1 Aug 2012
  68. 68.
    The Whetstone Benchmark: http://www.roylongbottom.org.uk/whetstone.htm. Accessed 1 Aug 2012
  69. 69.
    Topcuoglu, H., Hariri, S., Wu, M.: Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE TPDS 13(3), 260–274 (2002)Google Scholar
  70. 70.
    Wang, L., Siege, H., Roychowdhury, V., Maciejewski, A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. JPDC 47, 8–22 (1997)Google Scholar
  71. 71.
    Wassermann, B., Emmerich, W., Butchart, B., Cameron, N., Chen, L., Patel, J.: Workflows for e-Science: Scientific Workflows for Grids, Chapter Sedna: A BPEL-based Environment for Visual Scientific Workflow Modeling, pp. 427–448. Springer, London (2007)Google Scholar
  72. 72.
    Worldwide LHC Computing Grid (WLCG): http://lcg.web.cern.ch/LCG
  73. 73.
    Wu, Q., Gu, Y.: Optimizing end-to-end performance of data-intensive computing pipelines in heterogeneous network environments. J. Parallel Distrib. Comput. 71(2), 254–265 (2011)zbMATHCrossRefGoogle Scholar
  74. 74.
    Wu, Q., Gu, Y., Liao, Y., Lu, X., Lin, Y., Rao, N.: Latency modeling and minimization for large-scale scientific workflows in distributed network environments. In: The 44th Annual Simulation Symposium (ANSS11), Part of the 2011 Spring Simulation Multiconference (SpringSim11), Boston, MA, 4–7 Apr 2011Google Scholar
  75. 75.
    Wu, Q., Rao, N.S.V.: On transport daemons for small collaborative applications over wide-area networks. In: Proc. of the 24th IEEE Int. Performance Computing and Communications Conf., pp. 159–166, Phoenix, AZ, 7–9 Apr 2005Google Scholar
  76. 76.
    Wu, Q., Zhu, M., Lu, X., Brown, P., Lin, Y., Gu, Y., Cao, F., Reuter, M.: Automation and management of scientific workflows in distributed network environments. In: Proc. of the 6th Int. Workshop on Sys. Man. Tech., Proc., and Serv., Atlanta, GA, 19 Apr 2010Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • Qishi Wu
    • 1
    Email author
  • Mengxia Zhu
    • 2
  • Yi Gu
    • 3
  • Patrick Brown
    • 2
  • Xukang Lu
    • 1
  • Wuyin Lin
    • 4
  • Yangang Liu
    • 4
  1. 1.Department of Computer ScienceUniversity of MemphisMemphisUSA
  2. 2.Department of Computer ScienceSouthern Illinois UniversityCarbondaleUSA
  3. 3.Dept of Management, Marketing, Computer Science, and Information SystemsUniversity of Tennessee at MartinMartinUSA
  4. 4.Atmospheric Science DivisionBrookhaven National LaboratoryUptonUSA

Personalised recommendations