Prediction of Queue Waiting Times for Metascheduling on Parallel Batch Systems

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8828)

Abstract

Prediction of queue waiting times of jobs submitted to production parallel batch systems is important to provide overall estimates to users and can also help meta-schedulers make scheduling decisions. In this work, we have developed a framework for predicting ranges of queue waiting times for jobs by employing multi-class classification of similar jobs in history. Our hierarchical prediction strategy first predicts the point wait time of a job using dynamic k-Nearest Neighbor (kNN) method. It then performs a multi-class classification using Support Vector Machines (SVMs) among all the classes of the jobs. The probabilities given by the SVM for the class predicted using k-NN and its neighboring classes are used to provide a set of ranges of predicted wait times with probabilities. We have used these predictions and probabilities in a meta-scheduling strategy that distributes jobs to different queues/sites in a multi-queue/grid environment for minimizing wait times of the jobs. Experiments with different production supercomputer job traces show that our prediction strategies can give correct predictions for about 77–87 % of the jobs, and also result in about 12 % improved accuracy when compared to the next best existing method. Experiments with our meta-scheduling strategy using different production and synthetic job traces for various system sizes, partitioning schemes and different workloads, show that the meta-scheduling strategy gives much improved performance when compared to existing scheduling policies by reducing the overall average queue waiting times of the jobs by about 47 %.

References

  1. 1.
  2. 2.
  3. 3.
    Tera Grid Karnak Prediction Service. http://karnak.teragrid.org/karnak/index.html
  4. 4.
    Kumar, R., Vadhiyar, S.: Identifying quick starters: towards an integrated framework for efficient predictions of queue waiting times of batch parallel jobs. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2012. LNCS, vol. 7698, pp. 196–215. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  5. 5.
    Smith, W., Foster, I., Taylor, V.: Predicting application run times using historical information. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998) CrossRefGoogle Scholar
  6. 6.
    Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 202–219. Springer, Heidelberg (1999) CrossRefGoogle Scholar
  7. 7.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling - a status report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 1–16. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  8. 8.
    Nurmi, D., Brevik, J., Wolski, R.: QBETS: queue bounds estimation from time series. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 76–101. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  9. 9.
    Brevik, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay for batch-scheduled parallel machines. In: PPoPP 2006: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 110–118 (2006)Google Scholar
  10. 10.
    Li, H., Groep, D.L., Wolters, L.: Efficient response time predictions by exploiting application and resource state similarities. In: GRID 2005: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, pp. 234–241 (2005)Google Scholar
  11. 11.
    Li, H., Chen, J., Tao, Y., Groep, D.L., Wolters, L.: Improving a local learning technique for queue wait time predictions. In: CCGRID 2006: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid, pp. 335–342 (2006)Google Scholar
  12. 12.
    Casanova, H.: Benefits and drawbacks of redundant batch requests. J. Grid Comput. 5, 235–250 (2007)CrossRefGoogle Scholar
  13. 13.
    Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed job scheduling on computational grids using multiple simultaneous requests. In: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing, pp. 359–366 (2002)Google Scholar
  14. 14.
    Sabin, G., Lang, M.: Moldable parallel job schedulingusing jobefficiency: an iterative approach. In: Workshop on JobScheduling Strategies for Parallel Processing (JSSPP), in conjunction withACM SIGMETRICS (2006)Google Scholar
  15. 15.
    Li, H., Groep, D., Wolters, L.: Mining performance data for metascheduling decision support in the grid. J. Future Gener. Comput. Syst. - Special Section: Data mining in grid computing environments 23(1), 92–99 (2007)CrossRefGoogle Scholar
  16. 16.
    Wilson, D., Martinez, T.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 134 (1997)MathSciNetGoogle Scholar
  17. 17.
    Machine Learning in Python. http://scikit-learn.org/stable
  18. 18.
  19. 19.
  20. 20.
    Buyya, R., Murshed, M.: Gridsim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurrency Comput. Pract. Exp. (CCPE) 14(13), 1175–1220 (2002)CrossRefMATHGoogle Scholar
  21. 21.
    Redundant Batch Requests Simulator. http://sourceforge.net/projects/redsim
  22. 22.
  23. 23.
    Lublin, U., Feitelson, D.G.: The workload on parallelsupercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Supercomputer Education and Research CenterIndian Institute of ScienceBangaloreIndia

Personalised recommendations