Efficient Dynamic Pinning of Parallelized Applications by Distributed Reinforcement Learning

Article
  • 115 Downloads
Part of the following topical collections:
  1. Special Issue on High-Level Programming for Heterogeneous Parallel Systems

Abstract

This paper introduces a resource allocation framework specifically tailored for addressing the problem of dynamic placement (or pinning) of parallelized applications to processing units. Under the proposed setup each thread of the parallelized application constitutes an independent decision maker (or agent), which (based on its own prior performance measurements and its own prior CPU-affinities) decides on which processing unit to run next. Decisions are updated recursively for each thread by a resource manager/scheduler which runs in parallel to the application’s threads and periodically records their performances and assigns to them new CPU affinities. For updating the CPU-affinities, the scheduler uses a distributed reinforcement-learning algorithm, each branch of which is responsible for assigning a new placement strategy to each thread. The proposed framework is flexible enough to address alternative optimization criteria, such as maximum average processing speed and minimum speed variance among threads. We demonstrate analytically that convergence to locally-optimal placements is achieved asymptotically. Finally, we validate these results through experiments in Linux platforms.

Keywords

Dynamic pinning Reinforcement learning Parallel applications 

References

  1. 1.
    Bini, E., Buttazzo, G.C., Eker, J., Schorr, S., Guerra, R., Fohler, G., Årzén, K.E., Vanessa, R., Scordino, C.: Resource management on multicore systems: The ACTORS approach. IEEE Micro 31(3), 72–81 (2011)CrossRefGoogle Scholar
  2. 2.
    Brecht, T.: On the importance of parallel application placement in NUMA multiprocessors. In: Proceedings of the Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV). pp. 1–18. San Deigo, CA (1993)Google Scholar
  3. 3.
    Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.A., Namyst, R.: ForestGOMP: an efficient OpenMP environment for NUMA architectures. Int. J. Parallel Program. 38, 418–439 (2010)CrossRefMATHGoogle Scholar
  4. 4.
    Chasparis, G.C., Maggio, M., Bini, E., Årzén, K.E.: Design and implementation of distributed resource management for time-sensitive applications. Automatica 64, 44–53 (2016)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Chasparis, G.C., Shamma, J.S., Rantzer, A.: Nonconvergence to saddle boundary points under perturbed reinforcement learning. Int. J. Game Theory 44(3), 667–699 (2015)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Chasparis, G., Shamma, J.: Distributed dynamic reinforcement of efficient outcomes in multiagent coordination and network formation. Dyn. Games Appl. 2(1), 18–50 (2012)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    De Angelis, F., Boaro, M., Fuselli, D., Squartini, S., Piazza, F., Wei, Q.: Optimal home energy management under dynamic electrical and thermal constraints. IEEE Trans. Ind. Inform. 9(3), 1518–1527 (2013)Google Scholar
  8. 8.
    Inaltekin, H., Wicker, S.: A one-shot random access game for wireless networks. In: International Conference on Wireless Networks, Communications and Mobile Computing (2005)Google Scholar
  9. 9.
    Klug, T., Ott, M., Weidendorfer, J., Trinitis, C.: autopin: automated optimization of thread-to-core pinning on multicore systems. In: Stenstrom, P. (ed.) Transactions on High-Performance Embedded Architectures and Compilers III. Lecture Notes in Computer Science, vol. 6590, pp. 219–235. Springer, Berlin (2011)Google Scholar
  10. 10.
    Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, 2nd edn. Springer, New York (2003)MATHGoogle Scholar
  11. 11.
    Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: a portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference. pp. 7–10 (1999)Google Scholar
  12. 12.
    Narendra, K., Thathachar, M.: Learning Automata: An Introduction. Prentice-Hall, Upper Saddle River (1989)MATHGoogle Scholar
  13. 13.
    Olivier, S., Porterfield, A., Wheeler, K.: Scheduling task parallelism on multi-socket multicore systems. In: ROSS’11. pp. 49–56. Tuscon, Arizona, USA (2011)Google Scholar
  14. 14.
    Subrata, R., Zomaya, A.Y., Landfeldt, B.: A cooperative game framework for QoS guided job allocation schemes in grids. IEEE Trans. Comput. 57(10), 1413–1422 (2008)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Tembine, H., Altman, E., ElAzouri, R., Hayel, Y.: Correlated evolutionary stable strategies in random medium access control. In: International Conference on Game Theory for Networks. pp. 212–221 (2009)Google Scholar
  16. 16.
    Thibault, S., Namyst, R., Wacrenier, P.: Building portable thread schedulers for hierarchical multi-processors: the bubblesched framework. In: Euro-Par. ACM. Rennes, France (2007)Google Scholar
  17. 17.
    Wei, G., Vasilakos, A.V., Zheng, Y., Xiong, N.: A game-theoretic method of fair resource allocation for cloud computing services. J. Supercomput. 54(2), 252–269 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Software Competence Center Hagenberg GmbHHagenbergAustria

Personalised recommendations