Evolving Cut-Off Mechanisms and Other Work-Stealing Parameters for Parallel Programs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10199)

Abstract

Optimizing parallel programs is a complex task because the interference among many different parameters. Work-stealing runtimes, used to dynamically balance load among different processor cores, are no exception. This work explores the automatic configuration of the following runtime parameters: dynamic granularity control algorithms, granularity control cache, work-stealing algorithm, lazy binary splitting parameter, the maximum queue size and the unparking interval. The performance of the program is highly sensible to the granularity control algorithm, which can be a combination of other granularity algorithms. In this work, we address two search-based problems: finding a globally efficient work-stealing configuration, and finding the best configuration just for an individual program. For both problems, we propose the use of a Genetic Algorithm (GA). The genotype of the GA is able to represent combinations of up to three cut-off algorithms, as well as other work-stealing parameters.

The proposed GA has been evaluated in its ability to obtain a more efficient solution across a set of programs, in its ability to generalize the solution to a larger set of programs, and its ability to evolve single programs individually.

The GA was able to improve the performance of the set of programs in the training set, but the obtained configurations were not generalized to a larger benchmark set. However, it was able to successfully improve the performance of each program individually.

Keywords

Granularity Cut-off mechanism Parallel programming Multicore Genetic Algorithm 

References

  1. 1.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRefGoogle Scholar
  2. 2.
    Dagum, L., Menon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRefGoogle Scholar
  3. 3.
    Lea, D.: A java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, pp. 36–43. ACM (2000)Google Scholar
  4. 4.
    Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: ACM Sigplan Notices, vol. 40, pp. 519–538. ACM (2005)Google Scholar
  5. 5.
    Stork, S., Naden, K., Sunshine, J., Mohr, M., Fonseca, A., Marques, P., Aldrich, J.: Æminium: a permission-based concurrent-by-default programming language approach. ACM Trans. Program. Lang. Syst. (TOPLAS) 36(1), 2 (2014)CrossRefGoogle Scholar
  6. 6.
    Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: ACM Sigplan Notices, vol. 33, pp. 212–223. ACM (1998)Google Scholar
  7. 7.
    Mohr, E., Kranz, D.A., Halstead, R.H.: Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Trans. Parallel Distrib. Syst. 2(3), 264–280 (1991)CrossRefGoogle Scholar
  8. 8.
    Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Eigenmann, R., Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008). doi:10.1007/978-3-540-79561-2_9 CrossRefGoogle Scholar
  9. 9.
    Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 36. IEEE Press (2008)Google Scholar
  10. 10.
    Fonseca, A., Cabral, B.: Evaluation of runtime cut-off approaches for parallel programs. In: VECPAR 2016 Proceedings (2016)Google Scholar
  11. 11.
    Miller, B.L., Goldberg, D.E.: Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 9(3), 193–212 (1995)MathSciNetGoogle Scholar
  12. 12.
    DeJong, K.: An analysis of the behavior of a class of genetic adaptive systems. Ph.D. Thesis, University of Michigan (1975)Google Scholar
  13. 13.
    Olivier, S.L., Prins, J.F.: Evaluating OpenMP 3.0 run time systems on unbalanced task graphs. In: Müller, M.S., Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 63–78. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02303-3_6 CrossRefGoogle Scholar
  14. 14.
    Tchiboukdjian, M., Danjean, V., Gautier, T., Mentec, F., Raffin, B.: A work stealing scheduler for parallel loops on shared cache multicores. In: Guarracino, M.R., et al. (eds.) Euro-Par 2010. LNCS, vol. 6586, pp. 99–107. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21878-1_13 CrossRefGoogle Scholar
  15. 15.
    Cong, G., Kodali, S., Krishnamoorthy, S., Lea, D., Saraswat, V., Wen, T.: Solving large, irregular graph problems using adaptive work-stealing. In: 2008 37th International Conference on Parallel Processing, pp. 536–545. IEEE (2008)Google Scholar
  16. 16.
    Wang, L., Cui, H., Duan, Y., Lu, F., Feng, X., Yew, P.C.: An adaptive task creation strategy for work-stealing scheduling. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 266–277. ACM (2010)Google Scholar
  17. 17.
    Chen, S., Gibbons, P.B., Kozuch, M., Liaskovitis, V., Ailamaki, A., Blelloch, G.E., Falsafi, B., Fix, L., Hardavellas, N., Mowry, T.C., et al.: Scheduling threads for constructive cache sharing on cmps. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 105–115. ACM (2007)Google Scholar
  18. 18.
    Ahmad, I., Dhodhi, M.K.: Multiprocessor scheduling in a genetic paradigm. Parallel Comput. 22(3), 395–406 (1996)CrossRefMATHGoogle Scholar
  19. 19.
    Kwok, Y.K., Ahmad, I.: Efficient scheduling of arbitrary task graphs to multiprocessors using a parallel genetic algorithm. J. Parallel Distrib. Comput. 47(1), 58–77 (1997)CrossRefGoogle Scholar
  20. 20.
    Wang, L., Siegel, H.J., Roychowdhury, V.P., Maciejewski, A.A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. J. Parallel Distrib. Comput. 47(1), 8–22 (1997)CrossRefGoogle Scholar
  21. 21.
    Corrêa, R.C., Ferreira, A., Rebreyend, P.: Scheduling multiprocessor tasks with genetic algorithms. IEEE Trans. Parallel Distrib. Syst. 10(8), 825–837 (1999)CrossRefGoogle Scholar
  22. 22.
    Omara, F.A., Arafa, M.M.: Genetic algorithms for task scheduling problem. J. Parallel Distrib. Comput. 70(1), 13–22 (2010)CrossRefMATHGoogle Scholar
  23. 23.
    Mezmaz, M., Melab, N., Kessaci, Y., Lee, Y.C., Talbi, E.G., Zomaya, A.Y., Tuyttens, D.: A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems. J. Parallel Distrib. Comput. 71(11), 1497–1508 (2011)CrossRefGoogle Scholar
  24. 24.
    Sheikh, H.F., Ahmad, I., Fan, D.: An evolutionary technique for performance-energy-temperature optimized scheduling of parallel tasks on multi-core processors. IEEE Trans. Parallel Distrib. Syst. 27(3), 668–681 (2016)CrossRefGoogle Scholar
  25. 25.
    Langdon, W.B., Harman, M.: Genetically improved CUDA C++ software. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 87–99. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44303-3_8 Google Scholar
  26. 26.
    Le Goues, C., Nguyen, T., Forrest, S., Weimer, W.: Genprog: a generic method for automatic software repair. IEEE Trans. Software Eng. 38(1), 54–72 (2012)CrossRefGoogle Scholar
  27. 27.
    Ryan, C., Ivan, L., Koza, J.R., Banzhaf, W.: Automatic parallelization of loops in sequential programs using genetic programming. In: Genetic Programming 1998: Proceedings of the Third, pp. 344–349. Morgan Kaufmann (1998)Google Scholar
  28. 28.
    Ryan, C., Ivan, L.: Automatic parallelization of arbitrary programs. In: Poli, R., Nordin, P., Langdon, W.B., Fogarty, T.C. (eds.) EuroGP 1999. LNCS, vol. 1598, pp. 244–254. Springer, Heidelberg (1999). doi:10.1007/3-540-48885-5_21 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.CISUC, Department of Informatics EngineeringUniversity of CoimbraCoimbraPortugal

Personalised recommendations