Advertisement

International Journal of Parallel Programming

, Volume 47, Issue 5–6, pp 1014–1044 | Cite as

Adaptive Thread Scheduling in Chip Multiprocessors

  • Ismail AkturkEmail author
  • Ozcan Ozturk
Article
  • 108 Downloads

Abstract

The full potential of chip multiprocessors remains unexploited due to architecture oblivious thread schedulers employed in operating systems. We introduce an adaptive cache-hierarchy-aware scheduler that tries to schedule threads in a way that inter-thread contention is minimized. A novel multi-metric scoring scheme is used which specifies L1 cache access characteristics of threads. Scheduling decisions are made based on these multi-metric scores of threads.

Keywords

Adaptive scheduling Chip multiprocessors Inter-thread contention Multi-metric scoring 

Notes

References

  1. 1.
    Moore, G.E.: Cramming more components onto integrated circuits. Proc. IEEE 86(1), 82–85 (1998).  https://doi.org/10.1109/JPROC.1998.658762 CrossRefGoogle Scholar
  2. 2.
    Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, pp. 2–11 (1996).  https://doi.org/10.1145/237090.237140
  3. 3.
    Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392–403. ACM, New York, NY, USA (1995).  https://doi.org/10.1145/223982.224449
  4. 4.
    Kumar, R., Tullsen, D.M.: Compiling for instruction cache performance on a multithreaded architecture. In: Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 419–429. IEEE Computer Society Press, Los Alamitos, CA, USA (2002)Google Scholar
  5. 5.
    Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 203–212. ACM, New York, NY, USA (2010).  https://doi.org/10.1145/1693453.1693482
  6. 6.
    Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, pp. 47–58. ACM, New York, NY, USA (2007).  https://doi.org/10.1145/1272996.1273004
  7. 7.
    Parekh, S.S., Eggers, S.J., Levy, H.M.: Thread-Sensitive Scheduling for SMT Processors. Technical report, University of Washington (2001)Google Scholar
  8. 8.
    Bulpin, J.R., Pratt, I.A.: Hyper-threading aware process scheduling heuristics. In: Proceedings of USENIX Annual Technical Conference, p. 27. USENIX Association, Berkeley, CA, USA (2005)Google Scholar
  9. 9.
    Settle, A., Kihm, J., Janiszewski, A., Connors, D.: Architectural support for enhanced SMT job scheduling. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 63–73. IEEE Computer Society, Washington, DC, USA (2004).  https://doi.org/10.1109/PACT.2004.7
  10. 10.
    Ubal, R., Sahuquillo, J., Petit, S., López, P.: Multi2Sim: a simulation framework for CPU-GPU computing. In: Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing (2007)Google Scholar
  11. 11.
    Bienia, C.: Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University (2011)Google Scholar
  12. 12.
    Jiang, Y., Shen, X., Chen, J., Tripathi, R.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 220–229. ACM, New York, NY, USA (2008).  https://doi.org/10.1145/1454115.1454146
  13. 13.
    El-Moursy, A., Garg, R., Albonesi, D.H., Dwarkadas, S.: Compatible phase co-scheduling on a CMP of multi-threaded processors. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, p. 141. IEEE Computer Society, Washington, DC, USA (2006)Google Scholar
  14. 14.
    Snavely, A., Tullsen, D.M.: Symbiotic jobscheduling for a simultaneous multithreaded processor. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 234–244. ACM, New York, NY, USA (2000).  https://doi.org/10.1145/378993.379244
  15. 15.
    Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 111–122. IEEE Computer Society, Washington, DC, USA (2004) .  https://doi.org/10.1109/PACT.2004.15
  16. 16.
    Fedorova, A., Seltzer, M., Smith, M.D.: Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pp. 25–38. IEEE Computer Society, Washington, DC, USA (2007).  https://doi.org/10.1109/PACT.2007.40
  17. 17.
    Denning, P.J.: The working set model for program behavior. Commun. ACM 11(5), 323–333 (1968).  https://doi.org/10.1145/363095.363141 MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Wong, W., Baer, J.L.: Modified LRU policies for improving second-level cache behavior. In: Proceedings of the 6th International Symposium on High Performance Computer Architecture, pp. 49–60 (2000).  https://doi.org/10.1109/HPCA.2000.824338
  19. 19.
    Stone, H.S., Turek, J., Wolf, J.L.: Optimal partitioning of cache memory. IEEE Trans. Comput. 41(9), 1054–1068 (1992).  https://doi.org/10.1109/12.165388 CrossRefGoogle Scholar
  20. 20.
    Qureshi, M.K., Lynch, D.N., Mutlu, O., Patt, Y.N.: A case for MLP-aware cache replacement. In: Proceedings of the 33rd Annual International Symposium on Computer Architecture. IEEE Computer Society, Washington, DC, USA, pp. 167–178 (2006).  https://doi.org/10.1109/ISCA.2006.5
  21. 21.
    Chiou, D., Devadas, S., Rudolph, L., Ang, B.S., Chiouy, D., Chiouy, D., Rudolphy, L., Rudolphy, L., Devadasy, S., Devadasy, S., Angz, B.S., Angz, B.S.: Dynamic cache partitioning via columnization. In: Proceedings of Design Automation Conference (2000)Google Scholar
  22. 22.
    Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 45–57. ACM, New York, NY, USA (2002).  https://doi.org/10.1145/605397.605403
  23. 23.
    Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th International Symposium on High Performance Computer Architecture, pp. 340–351. IEEE Computer Society, Washington, DC, USA (2005).  https://doi.org/10.1109/HPCA.2005.27
  24. 24.
    Cazorla, F.J., Ramirez, A., Valero, M., Fernandez, E.: Dynamically controlled resource allocation in SMT processors. In: Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 171–182. IEEE Computer Society, Washington, DC, USA (2004).  https://doi.org/10.1109/MICRO.2004.17
  25. 25.
    Kihm, J.L., Janiszewski, A.W., Connors, D.A.: Dynamically controlled resource allocation in SMT processors. In: Proceedings of International Conference on Computing, Communications and Control Technologies (2004)Google Scholar
  26. 26.
    Tian, K., Jiang, Y., Shen, X.: A study on optimally co-scheduling jobs of different lengths on chip multiprocessors. In: Proceedings of the 6th ACM Conference on Computing Frontiers, pp. 41–50. ACM, New York, NY, USA (2009).  https://doi.org/10.1145/1531743.1531752
  27. 27.
    Jiang, Y., Tian, K., Shen, X.: Analysis and approximation of optimal co-scheduling on chip multiprocessors. In: Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers, pp. 201–215. Springer, Berlin, Heidelberg (2010)CrossRefGoogle Scholar
  28. 28.
    Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 245–257. ACM, New York, NY, USA (2003).  https://doi.org/10.1145/781131.781159
  29. 29.
    Suh, G.E., Devadas, S., Rudolph, L.: A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of the 8th International Symposium on High Performance Computer Architecture, pp. 117–128. IEEE Computer Society, Washington, DC, USA (2002)Google Scholar
  30. 30.
    Sugumar, R.A., Abraham, S.G.: Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst. 13(1), 32–56 (1995).  https://doi.org/10.1145/200912.200918 CrossRefGoogle Scholar
  31. 31.
    DeVuyst, M., Kumar, R., Tullsen, D.M.: Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, pp. 140–149. IEEE Computer Society, Washington, DC, USA (2006)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaUSA
  2. 2.Department of Computer EngineeringBilkent UniversityAnkaraTurkey

Personalised recommendations