A Trace-Based Investigation Of The Characteristics Of Grid Workflows

  • Simon Ostermann
  • Radu Prodan
  • Thomas Fahringer
  • Alexandru Iosup
  • Dick Epema

Grid computing promises to enable a reliable and easy-to-use computational infrastructure for e-Science. To materialize this promise, grids need to provide full automation from the experiment design to the final result. Often, this automation relies on the execution of workflows, that is, of jobs comprising many inter-related computing and data transfer tasks. While several grid workflow execution tools already exist, not much is known about their workload. This lack of knowledge hampers the development of new workflow scheduling algorithms, and slows the tuning of existing ones. To address this situation, in this work we present an analysis of two workflow-based workload traces from the Austrian Grid. We introduce a method for analyzing such traces, focused on the intrinsic and on the environment-related characteristics of the workflows. Then, we analyze the workflows executed in the Austrian Grid over the last two years. Finally, we identify six categories of workflows based on their intrinsic workflow characteristics. We show that the six categories exhibit distinctive environmentrelated characteristics, and identify the categories that are difficult to execute for common workflow schedulers.

grid

workload traces workflow execution workflow characteristics statistic analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    P. Blaha, K. Schwarz, and J. Luitz. WIEN2k, a full potential linearized augmented plane wave package for calculating crystal properties. Austria 1999. ISBN 3-9501031-1-2.Google Scholar
  2. [2]
    T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press and McGraw-Hill Book Company, 1989.Google Scholar
  3. [3]
    R. Duan, R. Prodan, and T. Fahringer. Dee: A distributed fault tolerant workflow enactment engine for grid computing. In HPCC, volume 3726 of LNCS, pages 704-716. Springer-Verlag, 2005.Google Scholar
  4. [4]
    T. Fahringer, A. Jugravu, S. Pllana, R. Prodan, C. S. Jr., and H. L. Truong. ASKALON: a tool set for cluster and grid computing. CP&E, 17(2-4):143-169, 2005.Google Scholar
  5. [5]
    T. Fahringer, J. Qin, and S. Hainzer. Specification of grid workflow applications with agwl: an abstract grid workflow language. In CCGrid, pages 676-685. IEEE CS, 2005.Google Scholar
  6. [6]
    D. G. Feitelson and L. Rudolph. Metrics and benchmarking for parallel job scheduling. In JSSPP, volume 1459 of LNCS, pages 1-24. Springer, 1998.Google Scholar
  7. [7]
    J. L. Henning. Spec cpu2000: Measuring cpu performance in the new millennium. IEEE Computer, 33(7):28-35, 2000.Google Scholar
  8. [8]
    A. Iosup, C. Dumitrescu, D. Epema, H. Li, and L. Wolters. How are real grids used? the analysis of four grid traces and its implications. In GRID, pages 262-269. IEEE CS, 2006.Google Scholar
  9. [9]
    A. Iosup and D. H. J. Epema. Grenchmark: A framework for analyzing, testing, and comparing grids. In CCGrid, pages 313-320. IEEE CS, 2006.Google Scholar
  10. [10]
    A. Iosup, M. Jan, O. Sonmez, and D. Epema. The characteristics and performance of groups of jobs in grids. In Euro-Par, LNCS. Springer-Verlag, August 2007.Google Scholar
  11. [11]
    R. Jain. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling,. May 1991.Google Scholar
  12. [12]
    E. G. C. Jr. and R. L. Graham. Optimal scheduling for two-processor systems. Acta Inf., 1972.Google Scholar
  13. [13]
    Y.-K. Kwok and I. Ahmad. Benchmarking and comparison of the task graph scheduling algorithms. J. PDC, 59(3):381-422, 1999.MATHGoogle Scholar
  14. [14]
    H. Li, D. L. Groep, and L. Wolters. Workload characteristics of a multi-cluster supercom- puter. In JSSPP, volume 3277 of LNCS, pages 176-193. Springer-Verlag, 2004.Google Scholar
  15. [15]
    U. Lublin and D. G. Feitelson. The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. PDC, 63(11):1105-1122, 2003.MATHGoogle Scholar
  16. [16]
    K. Plankensteiner. EE2: A high performance execution engine for scientific workflows on Clusters and the Grid. U.Innsbruck, Master Thesis, 2008.Google Scholar
  17. [17]
    J. Yu and R. Buyya. A taxonomy of scientific workflow systems for grid computing. ACM SIGMOD Rec., 34(3):44-49, 2005.CrossRefGoogle Scholar
  18. [18]
    F. Nadeem, R. Prodan, and T. Fahringer. Optimizing Performance of Automatic Training Phase for Application Performance Prediction in the Grid. In HPCC, pages 309-321, 2007.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Simon Ostermann
    • 1
  • Radu Prodan
    • 1
  • Thomas Fahringer
    • 1
  • Alexandru Iosup
    • 2
  • Dick Epema
    • 2
  1. 1.University of Innsbruck
  2. 2.Delft University of Technology

Personalised recommendations