Abstract
Production parallel systems are space-shared and hence employ batch queues in which the jobs submitted to the systems are made to wait before execution. Thus, jobs submitted to parallel batch systems incur queue waiting times in addition to the execution times. Prediction of these queue waiting times is important to provide overall estimates to the users and can also help metaschedulers make scheduling decisions. Analyses of the job traces of supercomputers reveal that about 56 to 99% of the jobs incur queue waiting times of less than an hour. Hence, identifying these quick starters or jobs with short queue waiting times is essential for overall improvement on queue waiting time predictions. Existing strategies provide high overestimates of upper bounds of queue waiting times rendering the bounds less useful for jobs with short queue waiting times. In this work, we have developed an integrated framework that uses the job characteristics, and states of the queue and processor occupancy to identify and predict quick starters, and use the existing strategies to predict jobs with long queue waiting times. Our experiments with different production supercomputer job traces show that our prediction strategies can lead to correct identification of up to 20 times more quick starters and provide tighter bounds for these jobs, and thus result in up to 64% higher overall prediction accuracy than existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
IBM Load Leveler, http://www.redbooks.ibm.com/abstracts/sg246038.html
PBS Works, http://www.pbsworks.com/
Platform LSF, http://www.platform.com/workload-management/high-performance-computing
MAUI Scheduler, http://www.supercluster.org
Tera Grid Karnak Prediction Service, http://karnak.teragrid.org/karnak/index.html
Parallel Workload Archive, http://www.cs.huji.ac.il/labs/parallel/workload/logs.html
Li, H., Groep, D.L., Wolters, L.: Efficient Response Time Predictions by Exploiting Application and Resource State Similarities. In: GRID 2005 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, pp. 234–241 (2005)
Li, H., Chen, J., Tao, Y., Groep, D.L., Wolters, L.: Improving a Local Learning Technique for Queue Wait Time Predictions. In: CCGRID 2006 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid, pp. 335–342 (2006)
Smith, W., Foster, I., Taylor, V.: Predicting Application Run Times Using Historical Information. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)
Smith, W., Taylor, V.E., Foster, I.T.: Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance. In: IPPS/SPDP 1999/JSSPP 1999: Proceedings of the Job Scheduling Strategies for Parallel Processing, pp. 202–219 (1999)
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel Job Scheduling - A Status Report. In: JSSPP 2007 Proceedings of the 13th International Conference on Job Scheduling Strategies for Parallel Processing, pp. 1–16 (2004)
Li, H., Groep, D., Templon, J., Wolters, L.: Predicting Job Start Times on Clusters. In: CCGRID 2004: Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid (2004)
Downey, A.B.: Predicting Queue Times on Space-Sharing Parallel Computers. In: IPPS 1997 Proceedings of the 11th International Symposium on Parallel Processing, pp. 209–218 (1997)
Nurmi, D., Brevik, J., Wolski, R.: QBETS: Queue Bounds Estimation from Time Series. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 76–101. Springer, Heidelberg (2008)
Brevik, J., Nurmi, D., Wolski, R.: Predicting Bounds on Queuing Delay for Batch-Scheduled Parallel Machines. In: PPoPP 2006: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 110–118 (2006)
Brevik, D.N.J., Wolski, R.: Using Model-Based Clustering to Improve Predictions for Queueing Delay on Parallel Machines, pp. 21–46
Standard Workload Form, http://www.cs.huji.ac.il/labs/parallel/workload/swf.html
Shmueli, E., Feitelson, D.G.: Uncovering the Effect of System Performance on User Behavior from Traces of Parallel Systems. In: MASCOTS 2007 Proceedings of the 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 274–280 (2007)
Zilber, J., Amit, O., Talby, D.: What is worth learning from Parallel Workloads?: A User and Session based Analysis. In: ICS 2005 Proceedings of the 19th Annual International Conference on Supercomputing, pp. 377–386 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, R., Vadhiyar, S. (2013). Identifying Quick Starters: Towards an Integrated Framework for Efficient Predictions of Queue Waiting Times of Batch Parallel Jobs. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2012. Lecture Notes in Computer Science, vol 7698. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35867-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-35867-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35866-1
Online ISBN: 978-3-642-35867-8
eBook Packages: Computer ScienceComputer Science (R0)