Risk Aware Overbooking for Commercial Grids

  • Georg Birkenheuer
  • André Brinkmann
  • Holger karl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6253)


The commercial exploitation of the emerging Grid and Cloud markets needs SLAs to sell computing run times. Job traces show that users have a limited ability to estimate the resource needs of their applications. This offers the possibility to apply overbooking to negotiation, but overbooking increases the risk of SLA violations. This work presents an overbooking approach with an integrated risk assessment model. Simulations for this model, which are based on real-world job traces, show that overbooking offers significant opportunities for Grid and Cloud providers.


Execution Time Probability Density Function Time Slice Joint Probability Density Function Cumulative Distribution Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Workshop on Workload Characterization, Citeseer (2001)Google Scholar
  2. 2.
    Hopper, E., Turton, B.: A review of the application of meta-heuristic algorithms to 2D strip packing problems. Artificial Intelligence Review 16(4), 257–300 (2001)CrossRefzbMATHGoogle Scholar
  3. 3.
    Berkey, J., Wang, P.: Two-dimensional finite bin-packing algorithms. Journal of the Operational Research Society, 423–429 (1987)Google Scholar
  4. 4.
    Ntene, N., van Vuuren, J.: A survey and comparison of level heuristics for the 2D oriented strip packing problem. Discrete Optimization (2006)Google Scholar
  5. 5.
    Baker, B., Schwarz, J.: Shelf algorithms for two-dimensional packing problems. SIAM Journal on Computing 12, 508 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Feitelson, D., Jette, M.: Improved utilization and responsiveness with gang scheduling. In: Proceedings of the Job Scheduling Strategies for Parallel Processing: IPPS 1997 Workshop, Geneva, Switzerland, April 5 (1997)Google Scholar
  7. 7.
    Feitelson, D., Weil, A.: Utilization and predictability in scheduling the ibm sp2 with backfilling. In: Proceedings of the 12th International Parallel Processing Symposium (1998)Google Scholar
  8. 8.
    Mu’alem, A., Feitelson, D.: Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp 2 with backfilling. IEEE Transactions on Parallel and Distributed Systems 12(6), 529–543 (2001)CrossRefGoogle Scholar
  9. 9.
    Zotkin, D., Keleher, P.: Job-length estimation and performance in backfilling schedulers. In: Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, HPDC (1999)Google Scholar
  10. 10.
    Tsafrir, D., Feitelson, D.: The dynamics of backfilling: solving the mystery of why increased inaccuracy may help. In: Proceedings of the IEEE International Symposium on Workload Characterization (2006)Google Scholar
  11. 11.
    Gibbons, R.: A historical application profiler for use by parallel schedulers. In: Proceedings of the Job Scheduling Strategies for Parallel Processing (JSSPP): IPPS 1997 Workshop (1997)Google Scholar
  12. 12.
    Smith, W., Foster, I., Taylor, V.: Predicting application run times using historical information. In: Proceedings of the Job Scheduling Strategies for Parallel Processing, JSSPP (1998)Google Scholar
  13. 13.
    Tsafrir, D., Etsion, Y., Feitelson, D.: Modeling user runtime estimates. In: Proceedings of the Job Scheduling Strategies for Parallel Processing, JSSPP (2005)Google Scholar
  14. 14.
    Tsafrir, D., Etsion, Y., Feitelson, D.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems (TPDS), 789–803 (2007)Google Scholar
  15. 15.
    Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. In: Proc. of the 2006 international Conference on Dependable Systems and Networks (DSN 2006), Citeseer (2006)Google Scholar
  16. 16.
    Sahoo, R., Squillante, M., Sivasubramaniam, A., Zhang, Y.: Failure data analysis of a large-scale heterogeneous server environment. In: 2004 International Conference on Dependable Systems and Networks, pp. 772–781 (2004)Google Scholar
  17. 17.
    Iosup, A., Jan, M., Sonmez, O., Epema, D.: On the dynamic resource availability in grids. In: Proceedings of the 8th IEEE/ACM International Conference on Grid Computing, pp. 26–33. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  18. 18.
    Nurmi, D., Brevik, J., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 432–441. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Birkenheuer, G., Djemame, K., Gourlay, I., Hovestadt, M., Kao, O., Padgett, J., Voss, K.: Introducing risk management into the grid. In: Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing (e-Science 2006), p. 28. IEEE Computer Society, Amsterdam (2006)Google Scholar
  20. 20.
    Djemame, K., Padgett, J., Gourlay, I., Voss, K., Battre, D., Kao, O.: Economically enhanced risk-aware grid sla management. In: Proceedings of eChallenges e-2008 Conference, Stockolm, Sweden (2008)Google Scholar
  21. 21.
    Liberman, V., Yechiali, U.: On the hotel overbooking problem-an inventory system with stochastic cancellations. Management Science 24(11), 1117–1126 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Subramanian, J., Stidham Jr., S., Lautenbacher, C.J.: Airline yield management with overbooking, cancellations, and no-shows. Transportation Science 33(2), 147–167 (1999)CrossRefzbMATHGoogle Scholar
  23. 23.
    Rothstein, M.: Or and the airline overbooking problem. Operations Research 33(2), 237–248 (1985)CrossRefGoogle Scholar
  24. 24.
    Urgaonkar, B., Shenoy, P.J., Roscoe, T.: Resource overbooking and application profiling in shared hosting platforms. In: Proceedings of the 5th Symposium on Operating System Design and Implementation, OSDI (2002)Google Scholar
  25. 25.
    Andrieux, A., Berry, D., Garibaldi, J., Jarvis, S., MacLaren, J., Ouelhadj, D., Snelling, D.: Open issues in grid scheduling. UK e-Science Report UKeS-2004-03 (2004)Google Scholar
  26. 26.
    Hovestadt, M., Kao, O., Keller, A., Streit, A.: Scheduling in hpc resource management systems: Queuing vs. planning. In: Proceedings of the Job Scheduling Strategies for Parallel Processing, JSSPP (2003)Google Scholar
  27. 27.
    Siddiqui, M., Villazón, A., Fahringer, T.: Grid allocation and reservation - grid capacity planning with negotiation-based advance reservation for optimized qos. In: Proceedings of the ACM/IEEE SC 2006 Conference on High Performance Networking and Computing, p. 103 (2006)Google Scholar
  28. 28.
    Chen, M., Wu, Y., Yang, G., Liu, X.: Efficiently rationing resources for grid and p2p computing. In: Proceedings of the IFIP International Conference Network and Parallel Computing (NPC), pp. 133–136 (2004)Google Scholar
  29. 29.
    Sulistio, A., Kim, K.H., Buyya, R.: Managing cancellations and no-shows of reservations with overbooking to increase resource revenue. In: Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pp. 267–276 (2008)Google Scholar
  30. 30.
    Nissimov, A., Feitelson, D.: Probabilistic backfilling. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 102–115. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Birkenheuer, G., Hovestadt, M., Kao, O., Voss, K.: Overbooking in planning based scheduling systems. In: Proceedings of the 2008 International Conference on Grid Computing and Applications (GCA), Las Vegas, Nevada, USA (2008)Google Scholar
  32. 32.
    Birkenheuer, G., Brinkmann, A., Karl, H.: The gain of overbooking. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2009. LNCS, vol. 5798, pp. 80–100. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Georg Birkenheuer
    • 1
  • André Brinkmann
    • 1
  • Holger karl
    • 1
  1. 1.Paderborn Center for Parallel Computing (PC2)Universität PaderbornGermany

Personalised recommendations