Skip to main content
Log in

Benefits and Drawbacks of Redundant Batch Requests

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Most parallel computing platforms are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes is granted. Queue waiting times are notoriously hard to predict, making it difficult for users not only to estimate when their applications may start, but also to pick among multiple batch-scheduled platforms the one that will produce the shortest turnaround time. As a result, an increasing number of users resort to “redundant requests”: several requests are simultaneously submitted to multiple batch schedulers on behalf of a single job; once one of these requests is granted access to compute nodes, the others are canceled. Using simulation as well as experiments with a production batch scheduler we evaluate the impact of redundant requests on (1) average job performance, (2) schedule fairness, (3) system load, and (4) system predictability. We find that some of the popularly held beliefs about the harmfulness of redundant batch requests are unfounded. We also find that the two most critical issues with redundant requests are the additional load on current middleware infrastructures and unfairness towards users who do not use redundant requests. Using our experimental results we quantify both impacts in terms of the number of users who use redundant requests and of the amount of request redundancy these users employ.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Brevik, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay for batch-scheduled parallel machines. In: Proc. of the 11th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPoPP), pp. 110–118 (2006)

  2. Bucur, A., Epema, D.: The performance of processor co-allocation in multicluster systems. In: Proc. of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 302–309 (2003)

  3. Capit, N., Da Costa, G., Georgiou, Y., Huard, G., Martin, C., Mounié, G., Neyron, P., Richard, O.: A batch scheduler with high level components. In: Proc. of the 5th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid), pp. 776–783 (2005)

  4. Feitelson, D.: Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/ (2006)

  5. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling – a status report. In: Proc. of the 10th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 3277, pp. 1–16 (2004)

  6. Gudgin, M., Hadley, M., Mendelsohn, U., Moreau, J.-J., Canon, S., Nielsen, H.: Simple Object Access Prototol 1.1. http://www.w3.org/TR/SOAP/. (2003)

  7. Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: Evaluation of job-scheduling strategies for Grid computing. In: Proc. of the 1st IEEE/ACM International Workshop on Grid Computing. Lecture Notes in Computer Science, vol. 1971, pp. 191–202 (2000)

  8. Head, M.R., Govindaraju, M., Slominski, A., Liu, P., Abu-Ghazaleh, N., van Engelen, R., Chiu, K., Lewis, M.J.: A benchmark suite for SOAP-based communication in Grid web services. In: Proc. of the 2005 ACM/IEEE Conference on Supercomputing (SC), pp. 19–31 (2005)

  9. Legrand, A., Marchal, L., Casanova, H.: Scheduling distributed applications: the SimGrid simulation framework. In: Proc. of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 138–145 (2003)

  10. Lifka, D.: The ANL/IBM SP scheduling system. In: Proc. of the 1st Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 949, pp. 295–303 (1995)

  11. Lublin, U., Feitelson, D.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)

    Article  MATH  Google Scholar 

  12. Mu’alem, A., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syt. 12, 529–543. (2001)

    Article  Google Scholar 

  13. Pinchak, C., Lu, P., Goldenberg, M.: Practical heterogeneous placeholder scheduling in overlay metacomputers: early experiences. In: Proc. of the 8th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 2537, pp. 85–105 (2002)

  14. Raicu, I.: A Performance Study of the Globus Toolkit ® and Grid Services via DiPerf, an Automated Distributed Performance Testing Framework. Master’s thesis, University of Chicago (2005)

  15. Ranganathan, K., Foster, I.: Decoupling computation and data scheduling in distributed data-intensive applications. In: Proc. of the 11th IEEE International Symposium for High Performance Distributed Computing (HPDC), pp. 352–358 (2002)

  16. Sabin, G., Kettimuthu, R., Rajan, A., Sadayappan, P.: Scheduling of parallel jobs in a heterogeneous multi-site environment. In: Proc. of the 9th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 2872, pp. 87–104 (2003)

  17. Shan, H., Oliker, L., Biswas, R.: Job superscheduler architecture and performance in computational Grid environments. In: Proc. of the 2003 ACM/IEEE Conference on Supercomputing (SC), pp. 44-58 (2003)

  18. Srinivasan, S., Subramani, V., Kettimuthu, R., Holenarsipur, P., Sadayappan, P.: Effective selection of partition sizes for moldable scheduling of parallel jobs. In: Proc. of the 9th International Conference on High Performance Computing (HiPC). Lecture Notes in Computer Science, vol. 2552, pp. 176–182 (2002)

  19. Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed job scheduling on computational Grids using multiple simultaneous requests. In: Proc. of the High Performance and Distributed Conference (HPDC), pp. 359–366 (2002)

  20. The Globus ®Alliance: WS GRAM: Developer’s Guide. (2006) http://www.globus.org/toolkit/docs/4.0/execution/wsgram/

  21. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Proc. of the 11th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 3834, pp. 1–35 (2005)

  22. van Engelen, R., Gallivan, K.: The gSOAP toolkit for web services and peer-to-peer computing networks. In: Proc. of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 128–135 (2002)

  23. Yoshimoto, K.: The Catalina Batch Scheduler. http://www.sdsc.edu/catalina/ (2005)

  24. Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. In: Proc. of the 7th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes on Computer Science, vol. 2221, pp. 133–158 (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henri Casanova.

Additional information

This work was supported by the NSF under Award 0546688.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Casanova, H. Benefits and Drawbacks of Redundant Batch Requests. J Grid Computing 5, 235–250 (2007). https://doi.org/10.1007/s10723-007-9068-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-007-9068-6

Key words

Navigation