Advertisement

Journal of Grid Computing

, Volume 1, Issue 1, pp 25–39 | Cite as

Mapping Abstract Complex Workflows onto Grid Environments

  • Ewa Deelman
  • James Blythe
  • Yolanda Gil
  • Carl Kesselman
  • Gaurang Mehta
  • Karan Vahi
  • Kent Blackburn
  • Albert Lazzarini
  • Adam Arbree
  • Richard Cavanaugh
  • Scott Koranda
Article

Abstract

In this paper we address the problem of automatically generating job workflows for the Grid. These workflows describe the execution of a complex application built from individual application components. In our work we have developed two workflow generators: the first (the Concrete Workflow Generator CWG) maps an abstract workflow defined in terms of application-level components to the set of available Grid resources. The second generator (Abstract and Concrete Workflow Generator, ACWG) takes a wider perspective and not only performs the abstract to concrete mapping but also enables the construction of the abstract workflow based on the available components. This system operates in the application domain and chooses application components based on the application metadata attributes. We describe our current ACWG based on AI planning technologies and outline how these technologies can play a crucial role in developing complex application workflows in Grid environments. Although our work is preliminary, CWG has already been used to map high energy physics applications onto the Grid. In one particular experiment, a set of production runs lasted 7 days and resulted in the generation of 167,500 events by 678 jobs. Additionally, ACWG was used to map gravitational physics workflows, with hundreds of nodes onto the available resources, resulting in 975 tasks, 1365 data transfers and 975 output files produced.

complex applications planning reliability workflow management 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Abramovici, W.E. Althouse et al., “LIGO: The Laser Interferometer Gravitational-Wave Observatory (in Large Scale Measurements),” Science, Vol. 256, pp. 325–333, 1992.Google Scholar
  2. 2.
    W. Allcock, J. Bester et al., “Secure, Efficient Data Transport and Replica Management for High-Performance Data Intensive Computing,” presented at Mass Storage Conference, 2001.Google Scholar
  3. 3.
    J.e.L. Ambite and C.A. Knoblock, “Planning by Rewriting: Efficiently Generating High-Quality Plans,” in Proc. 14 th National Conf. on Artificial Intelligence, 1997.Google Scholar
  4. 4.
    J. Annis, Y. Zhao et al., “Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey,” Technical Report GriPhyN-2002-05, 2002.Google Scholar
  5. 5.
    B.C. Barish and R. Weiss, “LIGO and the Detection of Gravitational Waves,” Physics Today, Vol. 52, pp. 44, 1999.Google Scholar
  6. 6.
    F. Berman and R. Wolski, “Scheduling from the Perspective of the Application,” presented at High Performance Distributed Computing Conference, Syracuse, NY, 1996.Google Scholar
  7. 7.
    J. Blythe, “Decision-Theoretic Planning,” AI Magazine, Vol. 20, 1999.Google Scholar
  8. 8.
    J. Blythe, E. Deelman, Y. Gil, C. Kesselman, A. Agarwal and G. Mehta, “The Role of Planning in Grid Computing,” 13 th International Conference on Automated Planning & Scheduling, 2003.Google Scholar
  9. 9.
    C. Boutlier, T. Dean and S. Hanks, “Planning under Uncertainty: Structural Assumptions and Computational Leverage,” Journal of Artificial Intelligence, Vol. 11, 1999.Google Scholar
  10. 10.
    R. Buyya, D. Abramson et al., “Nimrod-G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid,” presented at HPC ASIA'2000, 2000.Google Scholar
  11. 11.
    R. Buyya, D. Abramson et al., “An Economy Driven Resource Management Architecture for Global Computational Power Grids,” presented at The 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2000), Las Vegas, USA, 2000.Google Scholar
  12. 12.
    H. Casanova, A. Legrand et al., “Heuristics for Scheduling Parameter Sweep Applications in Grid Environments,” presented at 9 th Heterogeneous Computing Workshop (HCW'C 2000), Cancun, Mexico, 2000.Google Scholar
  13. 13.
    A. Chervenak, E. Deelman et al., “Giggle: A Framework for Constructing Sclable Replica Location Services,” in Proceedings of Supercomputing 2002 (SC2002), 2002.Google Scholar
  14. 14.
    K. Czajkowski, S. Fitzgerald et al., “Grid Information Services for Distributed Resource Sharing,” presented at 10 th IEEE International Symposium on High Performance Distributed Computing, 2001.Google Scholar
  15. 15.
    K. Czajkowski, I. Foster et al., “A Resource Management Architecture for Metacomputing Systems,” in 4 th Workshop on Job Scheduling Strategies for Parallel Processing, Springer-Verlag, 1998, pp. 62–82.Google Scholar
  16. 16.
    E. Deelman, K. Blackburn et al., “GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists,” presented at 11 th Intl. Symposium on High Performance Distributed Computing, 2002.Google Scholar
  17. 17.
    E. Deelman, J. Blythe et al., “Pegasus: Planning for Execution in Grids,” Technical Report GRIPHYN 2002-20, 2002.Google Scholar
  18. 18.
    E. Deelman, C. Kesselman et al., “Transformation Catalog Design for GriPhyN,” Technical Report GriPhyN-2001-17, 2001.Google Scholar
  19. 19.
    E. Deelman, I. Foster et al., “Representing Virtual Data: A Catalog Architecture for Location and Materialization Transparency,” Technical Report GriPhyN-2001-14, 2001.Google Scholar
  20. 20.
    I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999.Google Scholar
  21. 21.
    I. Foster, C. Kesselman et al., “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of High Performance Computing Applications, Vol. 15, pp. 200–222, 2001.Google Scholar
  22. 22.
    I. Foster, J. Voeckler et al., “Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation,” presented at Scientific and Statistical Database Management, 2002.Google Scholar
  23. 23.
    I. Foster, J. Voeckler et al., “Chimera: A Virtual Data system for Representing, Querying, and Automating data Derivation,” presented at 14 th International Conference on Scientific and Statistical Database Management (SSDBM 2002), Edinburgh, 2002.Google Scholar
  24. 24.
    I. Foster, C. Kesselman et al., “Grid Services for Distributed System Integration,” Computer, Vol. 35, 2002.Google Scholar
  25. 25.
    I. Foster, C. Kesselman et al., “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration,” 22 June 2002.Google Scholar
  26. 26.
    J. Frey, T. Tannenbaum et al., “Condor-G: A Computation Management Agent for Multi-Institutional Grids,” Cluster Computing, Vol. 5, pp. 237–246, 2002.Google Scholar
  27. 27.
    F. Giacomini and F. Prelz, “Definition of Architecture, Technical Plan and Evaluation Criteria for Scheduling, Resource Management, Security and Job Description,” EDG Workload Management Draft, 2001.Google Scholar
  28. 28.
    Y. Gil and J. Blythe, “PLANET: A Shareable and Reusable Ontology for Representing Plans,” presented at AAAI Workshop on Representational Issues for Real-World Planning Systems, 2000.Google Scholar
  29. 29.
    Globus, www.globus.org.Google Scholar
  30. 30.
    GriPhyN, www.griphyn.org.Google Scholar
  31. 31.
    K.J. Hammond, “Case-Based Planning: An Integrated Theory of Planning, Learning and Memory,” 1986.Google Scholar
  32. 32.
    K. Holtman, “CMS Data Grid System Overview and Requirements,” CMS-NOTE-2001-037, 2001.Google Scholar
  33. 33.
    V. Lefebure and J. Andreeva, “RefDB,” CMS IN 2002/044, 2002.Google Scholar
  34. 34.
    D. Long and M. Fox, “Recognizing and Exploiting Generic Types in Planning Domains,” presented at 5 th International Conference on Artificial Intelligence Planning and Scheduling, Breckenridge, CO, 2000.Google Scholar
  35. 35.
    K. Myers, S. Smith et al., “Integrating Planning and Scheduling through Adaptation of Resource Intensity Estimates,” in Proceedings of the 6 th European Conference on Planning (ECP-01), 2001.Google Scholar
  36. 36.
    NPACI, “Telescience,” https://gridport.npaci.edu/Telescience/.Google Scholar
  37. 37.
    K. Ranganathan and I. Foster, “Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid,” presented at International Conference on Computing in High Energy and Nuclear Physics, 2001.Google Scholar
  38. 38.
    K. Ranganathan and I. Foster, “Identifying Dynamic Replication Strategies for a High Performance Data Grid,” presented at International Workshop on Grid Computing, 2001.Google Scholar
  39. 39.
    K. Ranganathan and I. Foster, “Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications,” presented at International Symposium for High Performance Distributed Computing (HPDC-11), Edinburgh, 2002.Google Scholar
  40. 40.
    M. Ruda et al., “Integrating GRID Tools to Build a Computing Resource Broker: Activities of DataGrid WP1,” presented at CHEP 2001, Beijing, 2001.Google Scholar
  41. 41.
    S. C. E. C. s. C. M. “Environment,” http://www.scec.org/cme/.Google Scholar
  42. 42.
    S.F. Smith and M. Becker, “An Ontology for Constructing Scheduling Systems,” presented at AAAI Spring Symposium on Ontological Engineering, Stanford University, 1997.Google Scholar
  43. 43.
    S.F. Smith and O. Lassila, “Toward the Development of Mixed-Initiative Scheduling Systems,” in Proceedings ARPARome Laboratory Planning Initiative Workshop, Tucson, AZ, 1994.Google Scholar
  44. 44.
    M. Veloso, J. Carbonell et al., “Integrating Planning and Learning: The PRODIGY Architecture,” Journal of Experimental and Theoretical AI, Vol. 7, pp. 81–120, 1995.Google Scholar
  45. 45.
    M.M. Veloso, Planning and Learning by Analogical Reasoning. Springer Verlag, December 1994.Google Scholar
  46. 46.
    R. Wolski, “Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service,” in Proc. 6 th IEEE Symp. on High Performance Distributed Computing, Portland, Oregon, 1997.Google Scholar
  47. 47.
    C.-E. Wulz, “CMS – Concept and Physics Potential,” presented at 2 nd Latin American Symposium on High Energy Physics (II-SILAFAE), San Juan, Puerto Rico, 1998.Google Scholar
  48. 48.
    Q. Yang, Intelligent Planning. Springer Verlag, 1997.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Ewa Deelman
    • 1
  • James Blythe
    • 2
  • Yolanda Gil
    • 2
  • Carl Kesselman
    • 2
  • Gaurang Mehta
    • 2
  • Karan Vahi
    • 2
  • Kent Blackburn
    • 3
  • Albert Lazzarini
    • 3
  • Adam Arbree
    • 4
  • Richard Cavanaugh
    • 4
  • Scott Koranda
    • 5
  1. 1.Information Sciences InstituteUniversity of Southern CaliforniaUSA
  2. 2.Information Sciences InstituteUniversity of Southern CaliforniaUSA
  3. 3.California Institute of TechnologyPasadenaUSA
  4. 4.Department of PhysicsUniversity of FloridaGainesvilleUSA
  5. 5.Department of PhysicsUniversity of WisconsinMilwaukeeUSA

Personalised recommendations