Abstract
This paper presents a novel simulation-based approach which targets the performance estimation of cache coherence protocol implementations. Our approach allows to model a cache coherence protocol where coherence transactions take zero cycle and do not generate communication accesses, in the hope that it will provide a close lower bound on latency and traffic. The protocol modeling approach relies on cycle-accurate simulation models in which components can access instantaneously and transparently internal states of other components. Using this strategy, the access time and the traffic due to cache misses are taken into account as it would be on a multiprocessor system without cache coherence. However, the proposed approach still ensures that processors receive coherent data.
We detail the implementation of this approach in a cycle accurate multiprocessor simulation environment. To show its effectiveness, we implement cache and memory models for two coherence protocols both with and without our omniscient cache coherence (OCC) proposal. We show with a formal method that this approach makes it possible to preserve the consistency models implied by the cache coherence protocols, and experimentally that the OCC strategy protocol gives a close lower bound on latency and traffic.
Similar content being viewed by others
References
Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M, Kozyrakis C (2007) Comparing memory systems for chip multiprocessors. In: ISCA ’07: proceedings of the 34th annual international symposium on computer architecture. ACM, New York, pp 358–368
Kongetira P, Aingaran K, Olukotun K (2005) Niagara: a 32-way multithreaded sparc processor. Micro IEEE 25:21–29
Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: ISCA ’00: proceedings of the 27th annual international symposium on computer architecture. ACM, New York, pp 282–293
Archibald J, Baer J-L (1986) Cache coherence protocols: evaluation using a multiprocessor simulation model. ACM Trans Comput Syst 4(4):273–298
Agarwal A, Simoni R, Hennessy J, Horowitz M (1988) An evaluation of directory schemes for cache coherence. In: ISCA ’88: proceedings of the 15th annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos, pp 280–298
Loghi M, Poncino M, Benini L (2006) Cache coherence tradeoffs in shared-memory mpsocs. Trans Embed Comput Syst 5(2):383–407
Qing Yang B-CL, Bhuyan Laxmi N (1989) Analysis and comparison of cache coherence protocols for a packet-switched multiprocessor. IEEE Trans Comput 38
Adve SV, Adve VS, Hill MD, Vernon MK (1991) Comparison of hardware and software cache coherence schemes. In: ISCA ’91: proceedings of the 18th annual international symposium on computer architecture. ACM, New York, pp 298–308
Stenstrom P (1990) A survey of cache coherence schemes for multiprocessors. Computer 23:12–24
Tomasevic M, Milutinovic V (1994) Hardware approaches to cache coherence in shared-memory multiprocessors, part 1. Micro IEEE 14:52
Censier L, Feautrier P (1978) A new solution to coherence problems in multicache systems. IEEE Trans Comput C-27:1112–1118
Eisley N, Peh L-S, Shang L (2006) In-network cache coherence. In: MICRO 39: proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, Los Alamitos, pp 321–332
Martin MMK, Hill MD, Wood DA (2003) Token coherence: decoupling performance and correctness. In: ISCA ’03: proceedings of the 30th annual international symposium on computer architecture. ACM, New York, pp 182–193
Speight E, Shafi H, Zhang L, Rajamony R (2005) Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors. In: ISCA ’05: proceedings of the 32nd annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos, pp 346–356
Huh J, Chang J, Burger D, Sohi GS (2004) Coherence decoupling: making use of incoherence. In: ASPLOS-XI: proceedings of the 11th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 97–106
Lebeck AR, Wood DA (1995) Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors. In: ISCA ’95: proceedings of the 22nd annual international symposium on computer architecture. ACM, New York, pp 48–59
Lai A-C, Falsafi B (2000) Selective, accurate, and timely self-invalidation using last-touch prediction. In: ISCA ’00: proceedings of the 27th annual international symposium on computer architecture. ACM, New York, pp 139–148
Goodman JR (1998) Using cache memory to reduce processor-memory traffic. In: ISCA ’98: 25 years of the international symposium on computer architecture (selected papers). ACM, New York, pp 255–262
Papamarcos MS, Patel JH (1984) A low-overhead coherence solution for multiprocessors with private cache memories. In: ISCA ’84: proceedings of the 11th annual international symposium on computer architecture. ACM, New York, pp 348–354
Beckmann BM, Marty MR, Wood DA (2006) Asr: adaptive selective replication for cmp caches. In: MICRO 39: proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, Los Alamitos, pp 443–454
Yoo S, Rha K, Cho Y, Jung J, Choi K (2000) Performance estimation of multiple-cache ip-based systems: case study of an interdependency problem and application of an extended shared memory model. In: CODES 2000. ACM, New York, pp 77–81
Glodsmchidt SR, Hennessy JL (1993) The accuracy of trace-driven simulations of multiprocessors. In: SIGMETRICS’ 93: proceedings of the 1993 ACM SIGMETRICS conference on measurement and modeling of computer systems. ACM, New York, pp 146–157
Perez DG, Mouchard G, Temam O (2004) Microlib: A case for the quantitative comparison of micro-architecture mechanisms. In: MICRO 37: proceedings of the 37th annual IEEE/ACM international symposium on microarchitecture, Washington, DC, USA. IEEE Computer Society, Los Alamitos, pp 43–54
Beltrame G, Sciuto D, Silvano C, Lyonnard D, Pilkington C (2006) Exploiting tlm and object introspection for system-level simulation. In: DATE ’06: proceedings of the conference on design, automation and test in Europe, Leuven, Belgium. European Design and Automation Association, pp 100–105
Ophelders FE, Chakraborty S, Corporaal H (2008) Intra- and inter-processor hybrid performance modeling for mpsoc architectures. In: CODES/ISSS ’08: proceedings of the 6th IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, New York, NY, USA. ACM, New York, pp 91–96
Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput Archit News 33(4):92–99
Mauer CJ, Hill MD, Wood DA (2002) Full-system timing-first simulation. In: SIGMETRICS ’02: proceedings of the 2002 ACM SIGMETRICS international conference on measurement and modeling of computer systems, New York, NY, USA. ACM, New York, pp 108–116
Mosberger D (1993) Memory consistency models. SIGOPS Oper Syst Rev 27(1):18–26
Condon A, Hill M, Plakal M, Sorin D (1999) Using lamport clocks to reason about relaxed memory models. In: Proceedings of the fifth international symposium on high-performance computer architecture, January 1999, pp 270–278
Lamport L (1978) Time, clocks and the ordering of events in a distributed system. Commun ACM 21:558–565
de Massas PG, Pétrot F (2008) Comparison of memory write policies for noc based multicore cache coherent systems. In: DATE ’08: proceedings of the conference on design, automation and test in Europe, New York, NY, USA. ACM, New York, pp 997–1002
Soclib project. http://www.soclib.lip6.fr/Home.html
Virtual component interface standard (ocb 2 2.0), VSI Alliance (2000)
Cameron S, Moriyoshi W, Evan O, Jaswinder T, Sing P, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: ISCA ’95: proceedings of the 22nd annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos
Petrot F, Gomez P (2003) Lightweight implementation of the posix threads api for an on-chip mips multiprocessor with vci interconnect. In: Design, automation and test in Europe conference and exhibition, pp 51–56
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guironnet de Massas, P., Pétrot, F. Evaluation of the implementation cost of cache coherence protocols using omniscient actions. Des Autom Embed Syst 14, 21–42 (2010). https://doi.org/10.1007/s10617-010-9050-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-010-9050-6