Advertisement

A Periodic Portfolio Scheduler for Scientific Computing in the Data Center

  • Kefeng Deng
  • Ruben Verboon
  • Kaijun Ren
  • Alexandru Iosup
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8429)

Abstract

The popularity of data centers in scientific computing has led to new architectures, new workload structures, and growing customer-bases. As a consequence, the selection of efficient scheduling algorithms for the data center is an increasingly costlier and more difficult challenge. To address this challenge, and contrasting previous work on scheduling for scientific workloads, we focus in this work on portfolio scheduling—here, the dynamic selection and use of a scheduling policy, depending on the current system and workload conditions, from a portfolio of multiple policies. We design a periodic portfolio scheduler for the workload of the entire data center, and equip it with a portfolio of resource provisioning and allocation policies. Through simulation based on real and synthetic workload traces, we show evidence that portfolio scheduling can automatically select the scheduling policy to match both user and data center objectives, and that portfolio scheduling can perform well in the data center, relative to its constituent policies.

Keywords

Portfolio scheduling Data center Provisioning and allocation Scheduling policies Scientific workloads 

Notes

Acknowledgments

Supported by the STW/NWO Veni grant 11881, the Dutch national research program COMMIT, the Commission of the European Union (Project No. 320013, FP7 REGIONS Programme, PEDCA), the National Natural Science Foundation of China (Grant No. 60903042 and 61272483), and the R&D Special Fund for Public Welfare Industry (Meteorology) GYHY201306003.

References

  1. 1.
    Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)CrossRefzbMATHGoogle Scholar
  2. 2.
    Iosup, A., Dumitrescu, C., Epema, D.H.J., Li, H., Wolters, L.: How are real grids used? the analysis of four grid traces and its implications. In: GRID (2006)Google Scholar
  3. 3.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling — a status report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 1–16. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  4. 4.
    Klusáček, D., Rudová, H.: Performance and fairness for users in parallel job scheduling. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2012. LNCS, vol. 7698, pp. 235–252. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  5. 5.
    Sabin, G., Lang, M., Sadayappan, P.: Moldable parallel job scheduling using job efficiency: an iterative approach. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2006. LNCS, vol. 4376, pp. 94–114. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  6. 6.
    Bucur, A.I.D., Epema, D.H.J.: Scheduling policies for processor coallocation in multicluster systems. IEEE Trans. Parallel Distrib. Syst. 18(7), 958–972 (2007)CrossRefGoogle Scholar
  7. 7.
    Iosup, A., Sonmez, O.O., Anoep, S., Epema, D.H.J.: The performance of bags-of-tasks in large-scale distributed systems. In: HPDC, pp. 97–108 (2008)Google Scholar
  8. 8.
    Huberman, B.A., Lukose, R.M., Hogg, T.: An economics approach to hard computational problems. Science 27(5296), 51–53 (1997)CrossRefGoogle Scholar
  9. 9.
    Greenberg, A.G., Hamilton, J.R., Maltz, D.A., Patel, P.: The cost of a cloud: research problems in data center networks. Comp. Comm. Rev. 39(1), 68–73 (2009)CrossRefGoogle Scholar
  10. 10.
    Popa, L., Kumar, G., Chowdhury, M., Krishnamurthy, A., Ratnasamy, S., Stoica, I.: Faircloud: sharing the network in cloud computing. In: SIGCOMM (2012)Google Scholar
  11. 11.
    Greenberg, A.G., Hamilton, J.R., Jain, N., Kandula, S., Kim, C., Lahiri, P., Maltz, D.A., Patel, P., Sengupta, S.: Vl2: a scalable and flexible data center network. Commun. ACM 54(3), 95–104 (2011)CrossRefGoogle Scholar
  12. 12.
    Farrington, N., Porter, G., Sun, P.C., Forencich, A., Ford, J., Fainman, Y., Papen, G., Vahdat, A.: A demonstration of ultra-low-latency data center optical circuit switching. In: SIGCOMM, pp. 95–96 (2012)Google Scholar
  13. 13.
    Gordon, A., Amit, N., Har’El, N., Ben-Yehuda, M., Landau, A., Schuster, A., Tsafrir, D.: ELI: bare-metal performance for I/O virtualization. In: ASPLOS (2012)Google Scholar
  14. 14.
    Ben-Yehuda, M., Day, M.D., Dubitzky, Z., Factor, M., Har’El, N., Gordon, A., Liguori, A., Wasserman, O., Yassour, B.A.: The turtles project: design and implementation of nested virtualization. In: OSDI, pp. 423–436 (2010)Google Scholar
  15. 15.
    Villegas, D., Antoniou, A., Sadjadi, S.M., Iosup, A.: An analysis of provisioning and allocation policies for infrastructure-as-a-service clouds. In: CCGRID, pp. 612–619 (2012)Google Scholar
  16. 16.
    Agmon Ben-Yehuda, O., Schuster, A., Sharov, A., Silberstein, M., Iosup, A.: Expert: pareto-efficient task replication on grids and a cloud. In: IPDPS (2012)Google Scholar
  17. 17.
    Iosup, A., Epema, D.H.J.: Grid computing workloads. IEEE Internet Comput. 15(2), 19–26 (2011)Google Scholar
  18. 18.
    Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.H.J.: The grid workloads archive. Future Gener. Comp. Syst. 24(7), 672–686 (2008)CrossRefGoogle Scholar
  19. 19.
    Feitelson, D.: Parallel workloads archive, http://www.cs.huji.ac.il/labs/parallel/workload/
  20. 20.
    Iosup, A., Sonmez, O.O., Epema, D.H.J.: DGSim: comparing grid resource management architectures through trace-based simulation. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 13–25. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  21. 21.
    Petrini, F., Fossum, G., Fernández, J., Varbanescu, A.L., Kistler, M., Perrone, M.: Multicore surprises: lessons learned from optimizing sweep3d on the cell broadband engine. In: IPDPS, pp. 1–10 (2007)Google Scholar
  22. 22.
    Sonmez, O.O., Mohamed, H.H., Epema, D.H.J.: On the benefit of processor coallocation in multicluster grid systems. IEEE Trans. Parallel Distrib. Syst. 21(6), 778–789 (2010)CrossRefGoogle Scholar
  23. 23.
    Shen, S., Deng, K., Iosup, A., Epema, D.: Scheduling jobs in the cloud using on-demand and reserved instances. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 242–254. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  24. 24.
    Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T.L., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: SOSP (2003)Google Scholar
  25. 25.
    Menon, A., Santos, J.R., Turner, Y., Janakiraman, G.J., Zwaenepoel, W.: Diagnosing performance overheads in the Xen virtual machine environment. In: VEE, pp. 13–23 (2005)Google Scholar
  26. 26.
    Youseff, L., Seymour, K., You, H., Dongarra, J., Wolski, R.: The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software. In: HPDC, pp. 141–152. ACM (2008)Google Scholar
  27. 27.
    Donassolo, B., Casanova, H., Legrand, A., Velho, P.: Fast and scalable simulation of volunteer computing systems using simgrid. In: HPDC, pp. 605–612 (2010)Google Scholar
  28. 28.
    Jacobson, V.: Congestion avoidance and control. In: SIGCOMM, pp. 314–329 (1988)Google Scholar
  29. 29.
    Iosup, A., Ostermann, S., Yigitbasi, N., Prodan, R., Fahringer, T., Epema, D.H.J.: Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Trans. Parallel Distrib. Syst. 22(6), 931–945 (2011)CrossRefGoogle Scholar
  30. 30.
    Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel Distrib. Syst. 16(2), 175–182 (2005)CrossRefGoogle Scholar
  31. 31.
    Jones, J.P., Nitzberg, B.: Scheduling for parallel supercomputing: a historical perspective of achievable utilization. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 1–16. Springer, Heidelberg (1999) CrossRefGoogle Scholar
  32. 32.
    Markowitz, H.: Portfolio selection. J. Finance 7(1), 77–91 (1952)Google Scholar
  33. 33.
    Gomes, C.P., Selman, B.: Algorithm portfolios. Artif. Intell. 126(1–2), 43–62 (2001)CrossRefzbMATHMathSciNetGoogle Scholar
  34. 34.
    Streeter, M.J., Golovin, D., Smith, S.F.: Combining multiple heuristics online. In: AAAI, pp. 1197–1203 (2007)Google Scholar
  35. 35.
    Bougeret, M., Dutot, P.F., Goldman, A., Ngoko, Y., Trystram, D.: Combining multiple heuristics on discrete resources. In: IPDPS, pp. 1–8 (2009)Google Scholar
  36. 36.
    Goldman, A., Ngoko, Y., Trystram, D.: Malleable resource sharing algorithms for cooperative resolution of problems. In: IEEE Congress on Evolutionary Computation, pp. 1–8 (2012)Google Scholar
  37. 37.
    Streeter, M.J., Smith, S.F.: New techniques for algorithm portfolio design. CoRR abs/1206.3286 (2012)Google Scholar
  38. 38.
    Gagliolo, M., Schmidhuber, J.: Learning dynamic algorithm portfolios. Ann. Math. Artif. Intell. 47(3–4), 295–328 (2006)zbMATHMathSciNetGoogle Scholar
  39. 39.
    Gagliolo, M., Schmidhuber, J.: Algorithm portfolio selection as a bandit problem with unbounded losses. Ann. Math. Artif. Intell. 61(2), 49–86 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  40. 40.
    Merton, R.C.: Optimum consumption and portfolio rules in a continuous-time model. MIT, Cambridge (1970)Google Scholar
  41. 41.
    Magill, M.J., Constantinides, G.M.: Portfolio selection with transaction costs. J. Econ. Theory 13(2), 245–263 (1976)CrossRefzbMATHMathSciNetGoogle Scholar
  42. 42.
    Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 18(3), 637–654 (1973)CrossRefGoogle Scholar
  43. 43.
    Marshall, P., Keahey, K., Freeman, T.: Elastic site: using clouds to elastically extend site resources. In: CCGRID, pp. 43–52 (2010)Google Scholar
  44. 44.
    den Bossche, R.V., Vanmechelen, K., Broeckhove, J.: Cost-optimal scheduling in hybrid iaas clouds for deadline constrained workloads. In: IEEE CLOUD, pp. 228–235 (2010)Google Scholar
  45. 45.
    Palankar, M.R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon s3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, pp. 55–64. ACM (2008)Google Scholar
  46. 46.
    Hu, J., Gu, J., Sun, G., Zhao, T.: A scheduling strategy on load balancing of virtual machine resources in cloud computing environment. In: PAAP, pp. 89–96 (2010)Google Scholar
  47. 47.
    Gao, Y., Rong, H., Huang, J.Z.: Adaptive grid job scheduling with genetic algorithms. Future Gener. Comp. Syst. 21(1), 151–161 (2005)CrossRefGoogle Scholar
  48. 48.
    Calheiros, R.N., Ranjan, R., Buyya, R.: Virtual machine provisioning based on analytical performance and qos in cloud computing environments. In: ICPP, pp. 295–304 (2011)Google Scholar
  49. 49.
    Ali-Eldin, A., Kihl, M., Tordsson, J., Elmroth, E.: Efficient provisioning of bursty scientific workloads on the cloud using adaptive elasticity control. In: ScienceCloud, pp. 31–40 (2012)Google Scholar
  50. 50.
    Deng, K., Song, J., Ren, K., Iosup, A.: Exploring portfolio scheduling for long-term execution of scientific workloads in iaas clouds. In: SC (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Kefeng Deng
    • 1
    • 2
  • Ruben Verboon
    • 2
  • Kaijun Ren
    • 1
  • Alexandru Iosup
    • 2
  1. 1.National University of Defense TechnologyChangshaChina
  2. 2.Delft University of TechnologyDelftThe Netherlands

Personalised recommendations