Nowadays the market is moving to have multiple cores on the same chip (Chip Multiprocessors – CMP) with a multi-sliced L2 which is shared by 2 cores. CMPs with 8 cores can already be found, and future CMPs will have more than 8 cores. Typical implementations of CMPs share the L2 cache among the processors and have 2 cores sharing the same L2. We are interested in investigating the behavior of the pair: L2 sharing x L2 cache size. So, we construct models of two different organizations of CMPs: (i) tiles, with L1 and L2 private, interconnected through a router; (ii) tiles with L1 private and L2 shared among processors. The (ii) organization is evaluated with different numbers (2, 4) of cores sharing the same L2 slice and also, the L2 shared slice size is changed (1 MB, 2MB and 4 MB). With a total number of 32 cores, the proposed configurations of (ii) organization are evaluated with a full-system simulation under SPLASH-2 benchmarks. By applying both techniques, results show that the execution time is improved of about 18.9% for Ocean, 88.8% for Raytrace,and 31.8% for Volrend.


Good Speedup Previous Benchmark Invalidation Message Slice Size Hierarchical Cache 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zhang, M., Asanovic, K.: Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In: ISCA 2005, USA (2005)Google Scholar
  2. 2.
    Chisti, Z., Powell, M.D., Vijaykumar, T.N.: Optimizing Replication, Communication, and Capacity Allocation in CMPs. In: ISCA 2005, USA (2005)Google Scholar
  3. 3.
    Kumar, R., Zyuban, V., Tullsen, D.M.: Interconnections in Multi-core Architectures: Understanding Mechanisms, Overheads and Scaling. In: ISCA 2005, USA (2005)Google Scholar
  4. 4.
    Waingold, E., et al.: Baring it all to Software: Raw Machines. Computer (1997)Google Scholar
  5. 5.
    Taylor, M.B., et al.: Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In: Proceedings of ISCA 2004 (2004)Google Scholar
  6. 6.
    Nagarajan, R.N., Sankaralingam, K., Burger, D., Leckler, S.W.: A Design Space Evaluation of Grid Processor Architectures. In: ISCA 2001 (2001)Google Scholar
  7. 7.
    Sankaralingam, K., Nagarajan, R.N., Liu, H., Kim, C.: Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture. IEEE (2003)Google Scholar
  8. 8.
    Cascaval, C., et al.: Evaluation of a Multithreaded Architecture for Cellular Computing (2002)Google Scholar
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Barroso, L., et al.: Piranha: a scalable architecture based on single-chip multiprocessing. In: ISCA (2002)Google Scholar
  14. 14.
  15. 15.
    Woo, S., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd. Annual Symposium on Computer Architecture, pp. 24–36 (1995)Google Scholar
  16. 16.
    Kumar, R., Tullsen, D.M., Jouppi, N.P., Ranganathan, P.: Heterogeneous Chip Multiprocessors. Computer 38(11), 32–38 (2005)CrossRefGoogle Scholar
  17. 17.
  18. 18.
    Liu, C., Sivasubramaniam, A., Kandemir, M.: Optimizing Bus Energy Consumption of On-Chip Multiprocessors Using Frequent Values, pdp. In: 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2004), p. 340 (2004)Google Scholar
  19. 19.
    Olukotun, K., et al.: The Case for a Single-Chip Multiprocessor. In: Proceedings of the Seventh International Symposium on Architectural Support for Parallel Languages and Operating Systems (October 1996)Google Scholar
  20. 20.
    Huh, J., Burger, D., Kecler, S.: Exploring the design space of future CMPs. In: PACT 1997 (1997)Google Scholar
  21. 21.
    Villa, F., Acacio, M., Garcia, J.: Memory Subsystem Characterization in a 16-Core Snoop-Based Chip-Multiprocessor Architecture. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 213–222. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Curstis-Maury,, et al.: An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors, IWOMP, Eugene, Oregon, USA, June 1-4 (2005)Google Scholar
  23. 23.
    Kumar, R., Tullsen, D.M.: Heterogeneous Chip Multiprocessors. Computer (2005)Google Scholar
  24. 24.
    Chisti, Z., Powell, M.D., Vijaykumar, T.N.: Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures. In: Proceedings of the 36th Annual International Symposium on Microarchitecture (MICRO), December 2003, pp. 55–66 (2003)Google Scholar
  25. 25.
    Kumar, R., Jouppi, N.P., Tullsen, D.M.: Conjoined-core Chip Multiprocessing. In: 37th International Symposium on Microarchitecture (December 2004)Google Scholar
  26. 26.
    Kumar, R., Zyuban, V., Tullsen, D.M.: Interconnections in Multi-core Architectures: Understanding Mechanisms, Overheads and Scaling. In: ISCA, Wisconsin-Madison, USA (2005)Google Scholar
  27. 27.
    Nayfeh, B.A., Hammond, L., Olukotun, K.: Evaluation of Design Alternatives for a Multiprocessor Microprocessor. In: ISCA (May 1996)Google Scholar
  28. 28.
    Marino, M.D.: Preliminary evaluation of interconnection latency on a CMP with multisliced-L2. In: XXI South Symposium on Microeletronics, Porto Alegre, Brasil (May 2006)Google Scholar
  29. 29.
    Shivakumar, P., Jouppi, N.P.: Cacti 3.0: An integrated cache timing, power and area model. Technical report, Compaq Computer Corporation (August 2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mario Donato Marino
    • 1
  1. 1.Computing Engineering Department- Polytechnic SchoolUniversity of Sao Paulo 

Personalised recommendations