Advertisement

Analyzing the EGEE Production Grid Workload: Application to Jobs Submission Optimization

  • Diane Lingrand
  • Johan Montagnat
  • Janusz Martyniak
  • David Colling
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5798)

Abstract

Grids reliability remains an order of magnitude below clusters on production infrastructures. This work is aims at improving grid application performances by improving the job submission system. A stochastic model, capturing the behavior of a complex grid workload management system is proposed. To instantiate the model, detailed statistics are extracted from dense grid activity traces. The model is exploited in a simple job resubmission strategy. It provides quantitative inputs to improve job submission performance and it enables quantifying the impact of faults and outliers on grid operations.

Keywords

Grid Infrastructure Computing Element Production Grid Total Latency Real Time Monitor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Glatard, T., Montagnat, J., Pennec, X.: A probabilistic model to analyse workflow performance on production grids. In: Priol, T., Lefevre, L., Buyya, R. (eds.) IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), Lyon, France, pp. 510–517. IEEE, Los Alamitos (2008)CrossRefGoogle Scholar
  2. 2.
    Lingrand, D., Montagnat, J., Glatard, T.: Modeling user submission strategies on production grids. In: International Symposium on High Performance Distributed Computing (HPDC 2009), München, Germany, June 2009, pp. 121–129 (2009)Google Scholar
  3. 3.
    Frachtenberg, E., Schwiegelshohn, U.: New challenges of parallel job scheduling. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 1–23. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Medernach, E.: Workload analysis of a cluster in a grid environment. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 36–61. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.: The Grid Workloads Archive. Future Generation Computer Systems 24(7), 672–686 (2008)CrossRefGoogle Scholar
  6. 6.
    Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Christodoulopoulos, K., Gkamas, V., Varvarigos, E.A.: Statistical Analysis and Modeling of Jobs in a Grid Environment. Journal of Grid Computing (JGC) 6(1), 77–101 (2008)CrossRefGoogle Scholar
  8. 8.
    Germain, C., Loomis, C., Mościcki, J.T., Texier, R.: Scheduling for Responsive Grids. Journal of Grid Computing (JGC) 6(1), 15–27 (2008)CrossRefGoogle Scholar
  9. 9.
    Li, H., Groep, D., Walters, L.: Workload Characteristics of a Multi-cluster Supercomputer. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 176–193. Springer, Heidelberg (2005)Google Scholar
  10. 10.
    Casanova, H., Legrand, A., Quinson, M.: SimGrid: a Generic Framework for Large-Scale Distributed Experiments. In: 10th IEEE International Conference on Computer Modeling and Simulation (UKSim), Cambridge, UK, April 2008, pp. 126–131 (2008)Google Scholar
  11. 11.
    Pacini, F.: WMS User’s Guide. Technical Report EGEE-JRA1-TEC-572489, EGEE (May 2006)Google Scholar
  12. 12.
    Lingrand, D., Montagnat, J., Glatard, T.: Estimating the execution context for refining submission strategies on production grids. In: Assessing Models of Networks and Distributed Computing Platforms (ASSESS / ModernBio) (CCgrid 2008), Lyon, pp. 753–758. IEEE, Los Alamitos (2008)Google Scholar
  13. 13.
    Glatard, T., Montagnat, J., Pennec, X.: Optimizing jobs timeouts on clusters and production grids. In: International Symposium on Cluster Computing and the Grid (CCGrid 2007), Rio de Janeiro, pp. 100–107. IEEE, Los Alamitos (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Diane Lingrand
    • 1
  • Johan Montagnat
    • 1
  • Janusz Martyniak
    • 2
  • David Colling
    • 2
  1. 1.University of Nice - Sophia Antipolis / CNRSFrance
  2. 2.The Blackett LabImperial College LondonUK

Personalised recommendations