Skip to main content
Log in

Evaluation of the implementation cost of cache coherence protocols using omniscient actions

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

Abstract

This paper presents a novel simulation-based approach which targets the performance estimation of cache coherence protocol implementations. Our approach allows to model a cache coherence protocol where coherence transactions take zero cycle and do not generate communication accesses, in the hope that it will provide a close lower bound on latency and traffic. The protocol modeling approach relies on cycle-accurate simulation models in which components can access instantaneously and transparently internal states of other components. Using this strategy, the access time and the traffic due to cache misses are taken into account as it would be on a multiprocessor system without cache coherence. However, the proposed approach still ensures that processors receive coherent data.

We detail the implementation of this approach in a cycle accurate multiprocessor simulation environment. To show its effectiveness, we implement cache and memory models for two coherence protocols both with and without our omniscient cache coherence (OCC) proposal. We show with a formal method that this approach makes it possible to preserve the consistency models implied by the cache coherence protocols, and experimentally that the OCC strategy protocol gives a close lower bound on latency and traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M, Kozyrakis C (2007) Comparing memory systems for chip multiprocessors. In: ISCA ’07: proceedings of the 34th annual international symposium on computer architecture. ACM, New York, pp 358–368

    Chapter  Google Scholar 

  2. Kongetira P, Aingaran K, Olukotun K (2005) Niagara: a 32-way multithreaded sparc processor. Micro IEEE 25:21–29

    Article  Google Scholar 

  3. Barroso LA, Gharachorloo K, McNamara R, Nowatzyk A, Qadeer S, Sano B, Smith S, Stets R, Verghese B (2000) Piranha: a scalable architecture based on single-chip multiprocessing. In: ISCA ’00: proceedings of the 27th annual international symposium on computer architecture. ACM, New York, pp 282–293

    Chapter  Google Scholar 

  4. Archibald J, Baer J-L (1986) Cache coherence protocols: evaluation using a multiprocessor simulation model. ACM Trans Comput Syst 4(4):273–298

    Article  Google Scholar 

  5. Agarwal A, Simoni R, Hennessy J, Horowitz M (1988) An evaluation of directory schemes for cache coherence. In: ISCA ’88: proceedings of the 15th annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos, pp 280–298

    Google Scholar 

  6. Loghi M, Poncino M, Benini L (2006) Cache coherence tradeoffs in shared-memory mpsocs. Trans Embed Comput Syst 5(2):383–407

    Article  Google Scholar 

  7. Qing Yang B-CL, Bhuyan Laxmi N (1989) Analysis and comparison of cache coherence protocols for a packet-switched multiprocessor. IEEE Trans Comput 38

  8. Adve SV, Adve VS, Hill MD, Vernon MK (1991) Comparison of hardware and software cache coherence schemes. In: ISCA ’91: proceedings of the 18th annual international symposium on computer architecture. ACM, New York, pp 298–308

    Chapter  Google Scholar 

  9. Stenstrom P (1990) A survey of cache coherence schemes for multiprocessors. Computer 23:12–24

    Article  Google Scholar 

  10. Tomasevic M, Milutinovic V (1994) Hardware approaches to cache coherence in shared-memory multiprocessors, part 1. Micro IEEE 14:52

    Article  Google Scholar 

  11. Censier L, Feautrier P (1978) A new solution to coherence problems in multicache systems. IEEE Trans Comput C-27:1112–1118

    Article  Google Scholar 

  12. Eisley N, Peh L-S, Shang L (2006) In-network cache coherence. In: MICRO 39: proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, Los Alamitos, pp 321–332

    Google Scholar 

  13. Martin MMK, Hill MD, Wood DA (2003) Token coherence: decoupling performance and correctness. In: ISCA ’03: proceedings of the 30th annual international symposium on computer architecture. ACM, New York, pp 182–193

    Google Scholar 

  14. Speight E, Shafi H, Zhang L, Rajamony R (2005) Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors. In: ISCA ’05: proceedings of the 32nd annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos, pp 346–356

    Google Scholar 

  15. Huh J, Chang J, Burger D, Sohi GS (2004) Coherence decoupling: making use of incoherence. In: ASPLOS-XI: proceedings of the 11th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 97–106

    Chapter  Google Scholar 

  16. Lebeck AR, Wood DA (1995) Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors. In: ISCA ’95: proceedings of the 22nd annual international symposium on computer architecture. ACM, New York, pp 48–59

    Google Scholar 

  17. Lai A-C, Falsafi B (2000) Selective, accurate, and timely self-invalidation using last-touch prediction. In: ISCA ’00: proceedings of the 27th annual international symposium on computer architecture. ACM, New York, pp 139–148

    Chapter  Google Scholar 

  18. Goodman JR (1998) Using cache memory to reduce processor-memory traffic. In: ISCA ’98: 25 years of the international symposium on computer architecture (selected papers). ACM, New York, pp 255–262

    Chapter  Google Scholar 

  19. Papamarcos MS, Patel JH (1984) A low-overhead coherence solution for multiprocessors with private cache memories. In: ISCA ’84: proceedings of the 11th annual international symposium on computer architecture. ACM, New York, pp 348–354

    Chapter  Google Scholar 

  20. Beckmann BM, Marty MR, Wood DA (2006) Asr: adaptive selective replication for cmp caches. In: MICRO 39: proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, Los Alamitos, pp 443–454

    Google Scholar 

  21. Yoo S, Rha K, Cho Y, Jung J, Choi K (2000) Performance estimation of multiple-cache ip-based systems: case study of an interdependency problem and application of an extended shared memory model. In: CODES 2000. ACM, New York, pp 77–81

    Chapter  Google Scholar 

  22. Glodsmchidt SR, Hennessy JL (1993) The accuracy of trace-driven simulations of multiprocessors. In: SIGMETRICS’ 93: proceedings of the 1993 ACM SIGMETRICS conference on measurement and modeling of computer systems. ACM, New York, pp 146–157

    Google Scholar 

  23. Perez DG, Mouchard G, Temam O (2004) Microlib: A case for the quantitative comparison of micro-architecture mechanisms. In: MICRO 37: proceedings of the 37th annual IEEE/ACM international symposium on microarchitecture, Washington, DC, USA. IEEE Computer Society, Los Alamitos, pp 43–54

    Google Scholar 

  24. Beltrame G, Sciuto D, Silvano C, Lyonnard D, Pilkington C (2006) Exploiting tlm and object introspection for system-level simulation. In: DATE ’06: proceedings of the conference on design, automation and test in Europe, Leuven, Belgium. European Design and Automation Association, pp 100–105

  25. Ophelders FE, Chakraborty S, Corporaal H (2008) Intra- and inter-processor hybrid performance modeling for mpsoc architectures. In: CODES/ISSS ’08: proceedings of the 6th IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, New York, NY, USA. ACM, New York, pp 91–96

    Chapter  Google Scholar 

  26. Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput Archit News 33(4):92–99

    Article  Google Scholar 

  27. Mauer CJ, Hill MD, Wood DA (2002) Full-system timing-first simulation. In: SIGMETRICS ’02: proceedings of the 2002 ACM SIGMETRICS international conference on measurement and modeling of computer systems, New York, NY, USA. ACM, New York, pp 108–116

    Chapter  Google Scholar 

  28. Mosberger D (1993) Memory consistency models. SIGOPS Oper Syst Rev 27(1):18–26

    Article  Google Scholar 

  29. Condon A, Hill M, Plakal M, Sorin D (1999) Using lamport clocks to reason about relaxed memory models. In: Proceedings of the fifth international symposium on high-performance computer architecture, January 1999, pp 270–278

  30. Lamport L (1978) Time, clocks and the ordering of events in a distributed system. Commun ACM 21:558–565

    Article  MATH  Google Scholar 

  31. de Massas PG, Pétrot F (2008) Comparison of memory write policies for noc based multicore cache coherent systems. In: DATE ’08: proceedings of the conference on design, automation and test in Europe, New York, NY, USA. ACM, New York, pp 997–1002

    Chapter  Google Scholar 

  32. Soclib project. http://www.soclib.lip6.fr/Home.html

  33. Virtual component interface standard (ocb 2 2.0), VSI Alliance (2000)

  34. Cameron S, Moriyoshi W, Evan O, Jaswinder T, Sing P, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: ISCA ’95: proceedings of the 22nd annual international symposium on computer architecture. IEEE Computer Society, Los Alamitos

    Google Scholar 

  35. Petrot F, Gomez P (2003) Lightweight implementation of the posix threads api for an on-chip mips multiprocessor with vci interconnect. In: Design, automation and test in Europe conference and exhibition, pp 51–56

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Guironnet de Massas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guironnet de Massas, P., Pétrot, F. Evaluation of the implementation cost of cache coherence protocols using omniscient actions. Des Autom Embed Syst 14, 21–42 (2010). https://doi.org/10.1007/s10617-010-9050-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10617-010-9050-6

Keywords

Navigation