Abstract
It is commonly observed that production Grids are inherently unreliable. The aim of this work is to improve Grid application performances by tuning the job submission system. A stochastic model, capturing the behavior of a complex Grid workload management system is proposed. To instantiate the model, detailed statistics are extracted from dense Grid activity traces. The model is exploited for optimizing a simple job resubmission strategy. It provides quantitative inputs to improve job submission performance and it enables the impact of faults and outliers on Grid operations to be quantified.
Similar content being viewed by others
References
Aparicio, G., Blanquer Espert, I., Hernández García, V.: A highly optimized Grid deployment: the metagenomic analysis example. In: Global Healthgrid: e-Science Meets Biomedical Informatics (Healthgrid’08), pp. 105–115 (2008)
Casanova, H., Legrand, A., Quinson, M.: SimGrid: a generic framework for large-scale distributed experiments. In: 10th IEEE International Conference on Computer Modeling and Simulation (UKSim), pp. 126–131 (2008)
Christodoulopoulos, K., Gkamas, V., Varvarigos, E.A.: Statistical analysis and modeling of jobs in a Grid environment. J. Grid Computing 6(1), 77–101 (2008)
Colling, D., Martyniak, J., McGough, S., Křenek, A., Sitera, J., Mulač, M., Dvořák, F.: Real Time Monitor of Grid job executions. In: Computing in High Energy Physics/Journal of Physics: Conference Series (CHEP) (2009)
Dabrowski, C.: Reliability in Grid computing systems. Concurrency and Computation: Practice & Experience (CCPE) Special issue on Open Grid Forum 21(8), 927–959 (2009)
Feitelson, D.: Workload Modeling for Performance Evaluation, vol. 2459, pp. 114–141. Springer, New York (2002)
Frachtenberg, E., Schwiegelshohn, U.: New challenges of parallel job scheduling. In: 13th Job Scheduling Strategies for Parallel Processing (JSSPP). LNCS, vol. 4942, pp. 1–23 (2008)
Germain, C., Loomis, C., Mościcki, J.T., Texier, R.: Scheduling for responsive Grids. J. Grid Computing 6(1), 15–27 (2008)
Glatard, T., Montagnat, J., Pennec, X.: Optimizing jobs timeouts on clusters and production Grids. In: International Symposium on Cluster Computing and the Grid (CCGrid’07), pp. 100–107 (2007)
Huedo, E., Montero, R.S., Llorente, I.M.: Evaluating the reliability of computational Grids from the end user’s point of view. J. Systems Archit. 52(12), 727–736 (2006)
Hwang, S., Kesselman, C.: A flexible framework for fault tolerance in the Grid. J. Grid Computing 1(3), 251–272 (2003)
Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.: The Grid workloads archive. Future Gener. Comput. Syst. 24(7), 672–686 (2008)
Laure, E., Fisher, S., Frohner, Á., Grandi, C., Kunszt, P.: Programming the Grid with gLite. Comput. Methods Sci. Technol. 12(1), 33–45 (2006)
Li, H., Groep, D., Walters, L.: Workload characteristics of a multi-cluster supercomputer. In: Job Scheduling Strategies for Parallel Processing, pp. 176–193 (2004)
Lingrand, D., Glatard, T., Montagnat, J.: Modeling the latency on production Grids with respect to the execution context. Parallel Comput. (PARCO) 35(10–11), 493–511 (2009a)
Lingrand, D., Montagnat, J., Glatard, T.: Modeling user submission strategies on production Grids. In: International Symposium on High Performance Distributed Computing (HPDC’09), pp. 121–130 (2009b)
Medernach, E.: Workload analysis of a cluster in a Grid environment. In: Job Scheduling Strategies for Parallel Processing (JSSPP), pp. 36–61 (2005)
Nurmi, D., Mandal, A., Brevik, J., Koelbel, C., Wolski, R., Kennedy, K.: Evaluation of a workflow scheduler using integrated performance modelling and batch queue wait time prediction. In: Conference on High Performance Networking and Computing (2006)
Pacini, F.: WMS user’s guide. Technical Report EGEE-JRA1-TEC-572489, EGEE (2006)
Swany, M., Wolski, R.: Building performance topologies for computational Grids. Int. J. High Perform. Comput. Appl. 18(2), 255–265 (2004)
Thebe, O., Bunde, D.P., Leung, V.J.: Scheduling restartable jobs with short test runs. In: 14th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP’09), Workshop: IPDPS. LNCS, vol. 5798, pp. 116–137 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lingrand, D., Montagnat, J., Martyniak, J. et al. Optimization of Jobs Submission on the EGEE Production Grid: Modeling Faults Using Workload. J Grid Computing 8, 305–321 (2010). https://doi.org/10.1007/s10723-010-9151-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-010-9151-2