Evaluation of the implementation cost of cache coherence protocols using omniscient actions

Guironnet de Massas, Pierre; Pétrot, Frédéric

doi:10.1007/s10617-010-9050-6

Evaluation of the implementation cost of cache coherence protocols using omniscient actions

Published: 29 January 2010

Volume 14, pages 21–42, (2010)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

Pierre Guironnet de Massas¹ &
Frédéric Pétrot¹

149 Accesses
Explore all metrics

Abstract

This paper presents a novel simulation-based approach which targets the performance estimation of cache coherence protocol implementations. Our approach allows to model a cache coherence protocol where coherence transactions take zero cycle and do not generate communication accesses, in the hope that it will provide a close lower bound on latency and traffic. The protocol modeling approach relies on cycle-accurate simulation models in which components can access instantaneously and transparently internal states of other components. Using this strategy, the access time and the traffic due to cache misses are taken into account as it would be on a multiprocessor system without cache coherence. However, the proposed approach still ensures that processors receive coherent data.

We detail the implementation of this approach in a cycle accurate multiprocessor simulation environment. To show its effectiveness, we implement cache and memory models for two coherence protocols both with and without our omniscient cache coherence (OCC) proposal. We show with a formal method that this approach makes it possible to preserve the consistency models implied by the cache coherence protocols, and experimentally that the OCC strategy protocol gives a close lower bound on latency and traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Multi-agent architecture for fault recovery in self-healing systems

Article 07 August 2020

Trends in Processor Architecture

References

Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M, Kozyrakis C (2007) Comparing memory systems for chip multiprocessors. In: ISCA ’07: proceedings of the 34th annual international symposium on computer architecture. ACM, New York, pp 358–368
Chapter Google Scholar
Kongetira P, Aingaran K, Olukotun K (2005) Niagara: a 32-way multithreaded sparc processor. Micro IEEE 25:21–29
Article Google Scholar
Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: ISCA ’00: proceedings of the 27th annual international symposium on computer architecture. ACM, New York, pp 282–293
Chapter Google Scholar
Archibald J, Baer J-L (1986) Cache coherence protocols: evaluation using a multiprocessor simulation model. ACM Trans Comput Syst 4(4):273–298
Article Google Scholar
Agarwal A, Simoni R, Hennessy J, Horowitz M (1988) An evaluation of directory schemes for cache coherence. In: ISCA ’88: proceedings of the 15th annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos, pp 280–298
Google Scholar
Loghi M, Poncino M, Benini L (2006) Cache coherence tradeoffs in shared-memory mpsocs. Trans Embed Comput Syst 5(2):383–407
Article Google Scholar
Qing Yang B-CL, Bhuyan Laxmi N (1989) Analysis and comparison of cache coherence protocols for a packet-switched multiprocessor. IEEE Trans Comput 38
Adve SV, Adve VS, Hill MD, Vernon MK (1991) Comparison of hardware and software cache coherence schemes. In: ISCA ’91: proceedings of the 18th annual international symposium on computer architecture. ACM, New York, pp 298–308
Chapter Google Scholar
Stenstrom P (1990) A survey of cache coherence schemes for multiprocessors. Computer 23:12–24
Article Google Scholar
Tomasevic M, Milutinovic V (1994) Hardware approaches to cache coherence in shared-memory multiprocessors, part 1. Micro IEEE 14:52
Article Google Scholar
Censier L, Feautrier P (1978) A new solution to coherence problems in multicache systems. IEEE Trans Comput C-27:1112–1118
Article Google Scholar
Eisley N, Peh L-S, Shang L (2006) In-network cache coherence. In: MICRO 39: proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, Los Alamitos, pp 321–332
Google Scholar
Martin MMK, Hill MD, Wood DA (2003) Token coherence: decoupling performance and correctness. In: ISCA ’03: proceedings of the 30th annual international symposium on computer architecture. ACM, New York, pp 182–193
Google Scholar
Speight E, Shafi H, Zhang L, Rajamony R (2005) Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors. In: ISCA ’05: proceedings of the 32nd annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos, pp 346–356
Google Scholar
Huh J, Chang J, Burger D, Sohi GS (2004) Coherence decoupling: making use of incoherence. In: ASPLOS-XI: proceedings of the 11th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 97–106
Chapter Google Scholar
Lebeck AR, Wood DA (1995) Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors. In: ISCA ’95: proceedings of the 22nd annual international symposium on computer architecture. ACM, New York, pp 48–59
Google Scholar
Lai A-C, Falsafi B (2000) Selective, accurate, and timely self-invalidation using last-touch prediction. In: ISCA ’00: proceedings of the 27th annual international symposium on computer architecture. ACM, New York, pp 139–148
Chapter Google Scholar
Goodman JR (1998) Using cache memory to reduce processor-memory traffic. In: ISCA ’98: 25 years of the international symposium on computer architecture (selected papers). ACM, New York, pp 255–262
Chapter Google Scholar
Papamarcos MS, Patel JH (1984) A low-overhead coherence solution for multiprocessors with private cache memories. In: ISCA ’84: proceedings of the 11th annual international symposium on computer architecture. ACM, New York, pp 348–354
Chapter Google Scholar
Beckmann BM, Marty MR, Wood DA (2006) Asr: adaptive selective replication for cmp caches. In: MICRO 39: proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, Los Alamitos, pp 443–454
Google Scholar
Yoo S, Rha K, Cho Y, Jung J, Choi K (2000) Performance estimation of multiple-cache ip-based systems: case study of an interdependency problem and application of an extended shared memory model. In: CODES 2000. ACM, New York, pp 77–81
Chapter Google Scholar
Glodsmchidt SR, Hennessy JL (1993) The accuracy of trace-driven simulations of multiprocessors. In: SIGMETRICS’ 93: proceedings of the 1993 ACM SIGMETRICS conference on measurement and modeling of computer systems. ACM, New York, pp 146–157
Google Scholar
Perez DG, Mouchard G, Temam O (2004) Microlib: A case for the quantitative comparison of micro-architecture mechanisms. In: MICRO 37: proceedings of the 37th annual IEEE/ACM international symposium on microarchitecture, Washington, DC, USA. IEEE Computer Society, Los Alamitos, pp 43–54
Google Scholar
Beltrame G, Sciuto D, Silvano C, Lyonnard D, Pilkington C (2006) Exploiting tlm and object introspection for system-level simulation. In: DATE ’06: proceedings of the conference on design, automation and test in Europe, Leuven, Belgium. European Design and Automation Association, pp 100–105
Ophelders FE, Chakraborty S, Corporaal H (2008) Intra- and inter-processor hybrid performance modeling for mpsoc architectures. In: CODES/ISSS ’08: proceedings of the 6th IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, New York, NY, USA. ACM, New York, pp 91–96
Chapter Google Scholar
Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput Archit News 33(4):92–99
Article Google Scholar
Mauer CJ, Hill MD, Wood DA (2002) Full-system timing-first simulation. In: SIGMETRICS ’02: proceedings of the 2002 ACM SIGMETRICS international conference on measurement and modeling of computer systems, New York, NY, USA. ACM, New York, pp 108–116
Chapter Google Scholar
Mosberger D (1993) Memory consistency models. SIGOPS Oper Syst Rev 27(1):18–26
Article Google Scholar
Condon A, Hill M, Plakal M, Sorin D (1999) Using lamport clocks to reason about relaxed memory models. In: Proceedings of the fifth international symposium on high-performance computer architecture, January 1999, pp 270–278
Lamport L (1978) Time, clocks and the ordering of events in a distributed system. Commun ACM 21:558–565
Article MATH Google Scholar
de Massas PG, Pétrot F (2008) Comparison of memory write policies for noc based multicore cache coherent systems. In: DATE ’08: proceedings of the conference on design, automation and test in Europe, New York, NY, USA. ACM, New York, pp 997–1002
Chapter Google Scholar
Soclib project. http://www.soclib.lip6.fr/Home.html
Virtual component interface standard (ocb 2 2.0), VSI Alliance (2000)
Cameron S, Moriyoshi W, Evan O, Jaswinder T, Sing P, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: ISCA ’95: proceedings of the 22nd annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos
Google Scholar
Petrot F, Gomez P (2003) Lightweight implementation of the posix threads api for an on-chip mips multiprocessor with vci interconnect. In: Design, automation and test in Europe conference and exhibition, pp 51–56

Download references

Author information

Authors and Affiliations

TIMA Laboratory, 46 Av. Félix Viallet, 38031, Grenoble, France
Pierre Guironnet de Massas & Frédéric Pétrot

Authors

Pierre Guironnet de Massas
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Pétrot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Guironnet de Massas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guironnet de Massas, P., Pétrot, F. Evaluation of the implementation cost of cache coherence protocols using omniscient actions. Des Autom Embed Syst 14, 21–42 (2010). https://doi.org/10.1007/s10617-010-9050-6

Download citation

Received: 30 October 2008
Accepted: 08 January 2010
Published: 29 January 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10617-010-9050-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of the implementation cost of cache coherence protocols using omniscient actions

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Multi-agent architecture for fault recovery in self-healing systems

Trends in Processor Architecture

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of the implementation cost of cache coherence protocols using omniscient actions

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Multi-agent architecture for fault recovery in self-healing systems

Trends in Processor Architecture

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation