Advertisement

Journal of Grid Computing

, Volume 5, Issue 2, pp 235–250 | Cite as

Benefits and Drawbacks of Redundant Batch Requests

  • Henri CasanovaEmail author
Article

Abstract

Most parallel computing platforms are controlled by batch schedulers that place requests for computation in a queue until access to compute nodes is granted. Queue waiting times are notoriously hard to predict, making it difficult for users not only to estimate when their applications may start, but also to pick among multiple batch-scheduled platforms the one that will produce the shortest turnaround time. As a result, an increasing number of users resort to “redundant requests”: several requests are simultaneously submitted to multiple batch schedulers on behalf of a single job; once one of these requests is granted access to compute nodes, the others are canceled. Using simulation as well as experiments with a production batch scheduler we evaluate the impact of redundant requests on (1) average job performance, (2) schedule fairness, (3) system load, and (4) system predictability. We find that some of the popularly held beliefs about the harmfulness of redundant batch requests are unfounded. We also find that the two most critical issues with redundant requests are the additional load on current middleware infrastructures and unfairness towards users who do not use redundant requests. Using our experimental results we quantify both impacts in terms of the number of users who use redundant requests and of the amount of request redundancy these users employ.

Key words

job scheduling batch scheduling redundant requests 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brevik, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay for batch-scheduled parallel machines. In: Proc. of the 11th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPoPP), pp. 110–118 (2006)Google Scholar
  2. 2.
    Bucur, A., Epema, D.: The performance of processor co-allocation in multicluster systems. In: Proc. of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 302–309 (2003)Google Scholar
  3. 3.
    Capit, N., Da Costa, G., Georgiou, Y., Huard, G., Martin, C., Mounié, G., Neyron, P., Richard, O.: A batch scheduler with high level components. In: Proc. of the 5th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid), pp. 776–783 (2005)Google Scholar
  4. 4.
    Feitelson, D.: Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/ (2006)
  5. 5.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling – a status report. In: Proc. of the 10th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 3277, pp. 1–16 (2004)Google Scholar
  6. 6.
    Gudgin, M., Hadley, M., Mendelsohn, U., Moreau, J.-J., Canon, S., Nielsen, H.: Simple Object Access Prototol 1.1. http://www.w3.org/TR/SOAP/. (2003)
  7. 7.
    Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: Evaluation of job-scheduling strategies for Grid computing. In: Proc. of the 1st IEEE/ACM International Workshop on Grid Computing. Lecture Notes in Computer Science, vol. 1971, pp. 191–202 (2000)Google Scholar
  8. 8.
    Head, M.R., Govindaraju, M., Slominski, A., Liu, P., Abu-Ghazaleh, N., van Engelen, R., Chiu, K., Lewis, M.J.: A benchmark suite for SOAP-based communication in Grid web services. In: Proc. of the 2005 ACM/IEEE Conference on Supercomputing (SC), pp. 19–31 (2005)Google Scholar
  9. 9.
    Legrand, A., Marchal, L., Casanova, H.: Scheduling distributed applications: the SimGrid simulation framework. In: Proc. of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 138–145 (2003)Google Scholar
  10. 10.
    Lifka, D.: The ANL/IBM SP scheduling system. In: Proc. of the 1st Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 949, pp. 295–303 (1995)Google Scholar
  11. 11.
    Lublin, U., Feitelson, D.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)zbMATHCrossRefGoogle Scholar
  12. 12.
    Mu’alem, A., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syt. 12, 529–543. (2001)CrossRefGoogle Scholar
  13. 13.
    Pinchak, C., Lu, P., Goldenberg, M.: Practical heterogeneous placeholder scheduling in overlay metacomputers: early experiences. In: Proc. of the 8th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 2537, pp. 85–105 (2002)Google Scholar
  14. 14.
    Raicu, I.: A Performance Study of the Globus Toolkit ® and Grid Services via DiPerf, an Automated Distributed Performance Testing Framework. Master’s thesis, University of Chicago (2005)Google Scholar
  15. 15.
    Ranganathan, K., Foster, I.: Decoupling computation and data scheduling in distributed data-intensive applications. In: Proc. of the 11th IEEE International Symposium for High Performance Distributed Computing (HPDC), pp. 352–358 (2002)Google Scholar
  16. 16.
    Sabin, G., Kettimuthu, R., Rajan, A., Sadayappan, P.: Scheduling of parallel jobs in a heterogeneous multi-site environment. In: Proc. of the 9th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 2872, pp. 87–104 (2003)Google Scholar
  17. 17.
    Shan, H., Oliker, L., Biswas, R.: Job superscheduler architecture and performance in computational Grid environments. In: Proc. of the 2003 ACM/IEEE Conference on Supercomputing (SC), pp. 44-58 (2003)Google Scholar
  18. 18.
    Srinivasan, S., Subramani, V., Kettimuthu, R., Holenarsipur, P., Sadayappan, P.: Effective selection of partition sizes for moldable scheduling of parallel jobs. In: Proc. of the 9th International Conference on High Performance Computing (HiPC). Lecture Notes in Computer Science, vol. 2552, pp. 176–182 (2002)Google Scholar
  19. 19.
    Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed job scheduling on computational Grids using multiple simultaneous requests. In: Proc. of the High Performance and Distributed Conference (HPDC), pp. 359–366 (2002)Google Scholar
  20. 20.
    The Globus ®Alliance: WS GRAM: Developer’s Guide. (2006) http://www.globus.org/toolkit/docs/4.0/execution/wsgram/
  21. 21.
    Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Proc. of the 11th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes in Computer Science, vol. 3834, pp. 1–35 (2005)Google Scholar
  22. 22.
    van Engelen, R., Gallivan, K.: The gSOAP toolkit for web services and peer-to-peer computing networks. In: Proc. of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 128–135 (2002)Google Scholar
  23. 23.
    Yoshimoto, K.: The Catalina Batch Scheduler. http://www.sdsc.edu/catalina/ (2005)
  24. 24.
    Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. In: Proc. of the 7th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). Lecture Notes on Computer Science, vol. 2221, pp. 133–158 (2001)Google Scholar

Copyright information

© Springer Science + Business Media B.V. 2007

Authors and Affiliations

  1. 1.Department of Information and Computer SciencesUniversity of Hawai‘i at ManoaHonoluluUSA

Personalised recommendations