A Novel Cache Organization for Tiled Chip Multiprocessor

  • Xi Zhang
  • Dongsheng Wang
  • Yibo Xue
  • Haixia Wang
  • Jinglei Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5737)


Increased device density and working set size are driving a rise in cache capacity, which comes at the cost of high access latency. Based on the characteristic of shared data, which is accessed frequently and consumes a little capacity, a novel two-level directory organization is proposed to minimize the cache access time in this paper. In this scheme, a small Fast Directory is used to offer fast hits for a great fraction of memory accesses. Detailed simulation results show that on a 16-core tiled chip multiprocessor, this approach reduces average access latency by 17.9% compared to the general cache organization, and improves the overall performance by 13.3% on average.


Chip Multiprocessor(CMP) Tiled Architecture Multi-level Directory Cache Organization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Briggs, F., et al.: Intel 870: A Building Block for Cost-Effective Scalable Servers. IEEE Micro., 36–47 (March-April 2002)Google Scholar
  2. 2.
    Chaiken, D., Fields, C., Kurihara, K., Agarwal, A.: Directory-based cache coherence in large-scale multiprocessors. IEEE Computer, 49–58 (June 1990)Google Scholar
  3. 3.
    Rusu, S., et al.: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache. In: IEEE International Solid-State Circuits Conference Digest of Technical Papers (February 2006)Google Scholar
  4. 4.
    Wuu, J., Weiss, D., Morganti, C., Dreesen, M.: The Asynchronous 24MB On-Chip Level-3 Cache for a Dual-Core Itanium-Family Processor. In: IEEE International Solid-State Circuits Conference Digest of Technical Papers (February 2005)Google Scholar
  5. 5.
    Hardavellas, N., Pandis, I., Johnson, R., Mancheril, N., Ailamaki, A., Falsafi, B.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR (2007)Google Scholar
  6. 6.
    Zhang, M., Asanovic, K.: Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In: Proc. of the 32nd International Symposium on Computer Architecture, June 2005, pp. 336–345 (2005)Google Scholar
  7. 7.
    Zhang, M., Asanovic, K.: Victim Migration: Dynamically Adapting between Private and Shared CMP Caches. MIT Technical Report MIT-CSAIL-TR-2005-064,MIT-LCS-TR-1006 (October 2005)Google Scholar
  8. 8.
    Beckmann, B.M., et al.: ASR: Adaptive Selective Replication for CMP Caches. In: Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 2006, pp. 443–454 (2006)Google Scholar
  9. 9.
    Chang, J., et al.: Cooperative Caching for Chip Multiprocessors. In: Proc. of the 33rd Annual International Symposium on Computer Architecture, ISCA 2006, May 2006, pp. 264–276. IEEE, Los Alamitos (2006)Google Scholar
  10. 10.
    Eisley, N., Peh, L.-S., Shang, L.: Leveraging On-Chip Networks for Cache Migration in Chip Multiprocessors. In: Proceedings of 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), Toronto, Canada (October 2008)Google Scholar
  11. 11.
    Michael, M.M., Nanda, A.K.: Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors. In: 5th Int’l. Symposium on High Performance Computer Architecture (January 1999)Google Scholar
  12. 12.
    Acacio, M.E., Gonzalez, J., Garcia, J.M., Duato, J.: A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors. IEEE Transactions on Parallel and Distributed Systems 16(1), 67–79 (2005)CrossRefGoogle Scholar
  13. 13.
    Ros, A., Acacio, M.E., García, J.M.: A Novel Lightweight Directory Architecture for Scalable Shared-Memory Multiprocessors. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 582–591. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Acacio, M.E., Gonzalez, J., Garcia, J.M., Duato, J.: An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration. IEEE Transactions on Parallel and Distributed Systems 15(8), 755–768 (2004)CrossRefGoogle Scholar
  15. 15.
    Brown, J., Kumar, R., Tullsen, D.: Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures. In: Proceedings of SPAA-19. ACM, New York (June 2007)Google Scholar
  16. 16.
    Lenoski, D., Laudon, J., Gharachorloo, K., Weber, W., Gupta, A., Henessy, J., Horowitz, M., Lam, M.: The stanford DASH multiprocessor. IEEE Computer (1992)Google Scholar
  17. 17.
    Virtutech AB. Simics Full System Simulator, http://www.simics.com/
  18. 18.
    Wisconsin Multifacet GEMS Simulator, http://www.cs.wisc.edu/gems/
  19. 19.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pp. 24–37 (1995)Google Scholar
  20. 20.
    Wang, H., Wang, D., Li, P.: Exploit Temporal Locality of Shared Data in SRC enabled CMP. In: Li, K., Jesshope, C., Jin, H., Gaudiot, J.-L. (eds.) NPC 2007. LNCS, vol. 4672, pp. 384–393. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Beckmann, B.M., Wood, D.A.: Managing wire delay in large chip multiprocessor caches. Micro. 37 (December 2004)Google Scholar
  22. 22.
    Liu, C., Sivasubramaniam, A., Kandemir, M., Irwin, M.J.: Enhancing L2 organization for CMPs with a center cell. In: IPDPS 2006 (April 2006)Google Scholar
  23. 23.
    Guz, Z., Keidar, I., Kolodny, A., Weiser, U.C.: Utilizing shared data in chip multiprocessors with the Nahalal architecture. In: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2008), New York, NY, USA, pp. 1–10 (2008)Google Scholar
  24. 24.
    Azimi, M., Cherukuri, N., Jayasimha, D.N., Kumar, A., Kundu, P., Park, S., Schoinas, I., Vaidya, A.S.: Integration challenges and trade-offs for tera-scale architectures. Intel. Technology Journal (August 2007)Google Scholar
  25. 25.
    Haff, G.: Niagara2: More Heft in the Weft. Sun Analyst Research Reports (August 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Xi Zhang
    • 1
  • Dongsheng Wang
    • 1
  • Yibo Xue
    • 1
  • Haixia Wang
    • 1
  • Jinglei Wang
    • 1
  1. 1.Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science & TechnologyTsinghua UniversityBeijingChina

Personalised recommendations