Advertisement

The Journal of Supercomputing

, Volume 75, Issue 1, pp 425–446 | Cite as

Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks

  • Han Jun Bae
  • Lynn ChoiEmail author
Article

Abstract

In this paper, we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than a single core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is a decoupled directory storage from the shared cache and dynamically maintains directory entries only for actively shared blocks. Also, we add a small additional victim cache to its original DDT in order to reduce invalidation broadcasts caused by DDT eviction. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 16.09% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

Keywords

Cache coherence Directory Parallel processing Simulation Scalable computing Multi-core architectures 

Notes

Acknowledgements

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) and funded by the Ministry of Science, ICT and Future Planning (NRF-2017R1A2B2009 641). This research was also supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2018-2015-0-00363) supervised by the IITP (Institute for Information and Communications Technology Promotion). This research was supported by Korea University.

References

  1. 1.
    Zhao H, Shriraman A, Dwarkadas S (2010) SPACE: sharing pattern-based directory coherence for multicore scalability. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp 135–146Google Scholar
  2. 2.
    Zebchuk J, Qureshi MK, Srinivasan V, Moshovos A (2009) A Tagless coherence directory. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 423–434Google Scholar
  3. 3.
    Zhao H, Shriraman A, Dwarkadas S, Srinivasan V (2011) Spatl: honey, I shrunk the coherence directory. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pp 33–44Google Scholar
  4. 4.
    Sanchez D, Kozyrakis C (2012) SCD: a scalable coherence directory with flexible sharer set encoding. In: Proceedings of the IEEE 18th International Symposium on High Performance Computer Architecture (HPCA), pp 1–12Google Scholar
  5. 5.
    Zebchuk J, FalsafiB, Moshovos A (2013) Multi-grain coherence directories. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp 359–370Google Scholar
  6. 6.
    Alisafaee M (2012) Spatiotemporal coherence tracking. In: Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp 341–350Google Scholar
  7. 7.
    Zhao H, Shriraman A, Kumar S, Dwarkadas S (2013) Protozoa: adaptive granularity cache coherence. ACM SIGARCH Comput Archit News 41(3):547–558CrossRefGoogle Scholar
  8. 8.
    Zhang G, Horn W, Sanchez D (2015) Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems. In: Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture, pp 13–25Google Scholar
  9. 9.
    Manivannan M, Negi A, Stenström P (2013) Efficient forwarding of producer–consumer data in task-based programs. In: 2013 42nd International Conference on Parallel Processing, pp 517–522Google Scholar
  10. 10.
    Censier LM, Feautrier P (1978) A new solution to coherence problems in multi-cache systems. IEEE Trans Comput 100(12):1112–1118CrossRefzbMATHGoogle Scholar
  11. 11.
    Agarwal A, Simoni R, Hennessy J, Horowitz M (1988) An evaluation of directory schemes for cache coherence. ACM SIGARCH Comput Archit News 16(2):280–298CrossRefGoogle Scholar
  12. 12.
    Cuesta BA, Ros A, Gómez ME, Robles A, Duato JF (2011) Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. ACM SIGARCH Comput Archit News 39(3):93–104CrossRefGoogle Scholar
  13. 13.
    Gupta A, WeberWD, Mowry T (1992) Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In: Scalable shared memory multiprocessors, pp 167–192Google Scholar
  14. 14.
    Titos-Gil R, Flores A, Fernández-Pascual R, Ros A, Acacio ME (2017) Way-combining directory: an adaptive and scalable low-cost coherence directory. In: Proceedings of the International Conference on SupercomputingGoogle Scholar
  15. 15.
    Intel I (2013) Intel 64 and IA-32 architectures software developer’s manual. Syst Program Guide Part 1 3A:64Google Scholar
  16. 16.
    Hackenberg D, Molka D, Nagel WE (2009) Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pp 413–422Google Scholar
  17. 17.
    Conway P, Hughes B (2007) The AMD opteron northbridge architecture. IEEE Micro 27(2):10–21CrossRefGoogle Scholar
  18. 18.
    Conway P, Kalyanasundharam N, Donley G, Lepak K, Hughes B (2010) Cache hierarchy and memory subsystem of the AMD opteron processor. IEEE Micro 30(2):16–29CrossRefGoogle Scholar
  19. 19.
    Papamarcos MS, Patel JH (1984) A low-overhead coherence solution for multiprocessors with private cache memories. ACM SIGARCH Comput Archit News 12(3):348–354CrossRefGoogle Scholar
  20. 20.
    Rudolph L, Segall Z (1984) Dynamic decentralized cache schemes for MIMD parallel processors. ACM SIGARCH Comput Archit News 12(3):340–347CrossRefGoogle Scholar
  21. 21.
    Sweazey P, Smith AJ (1986) A class of compatible cache consistency protocols and their support by the IEEE futurebus. ACM SIGARCH Comput Archit News 14(2):414–423CrossRefGoogle Scholar
  22. 22.
    Bae HJ, Choi L (2017) Dynamic directory table: on-demand allocation of directory entries for active shared cache blocks. J Korean Inst Inf Sci Eng (KIISE) 44(12):1245–1251Google Scholar
  23. 23.
    Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7CrossRefGoogle Scholar
  24. 24.
    Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp 72–81Google Scholar
  25. 25.
    Thoziyoor S, Muralimanohar N, Ahn JH, Jouppi N (2008) Cacti 5.3. HP Laboratories, Palo AltoGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Electrical EngineeringKorea UniversitySeoulKorea

Personalised recommendations