Fencing Programs with Self-Invalidation and Self-Downgrade

  • Parosh Aziz Abdulla
  • Mohamed Faouzi Atig
  • Stefanos Kaxiras
  • Carl Leonardsson
  • Alberto Ros
  • Yunyun Zhu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9688)

Abstract

Cache coherence protocols using self-invalidation and self-downgrade have recently seen increased popularity due to their simplicity, potential performance efficiency, and low energy consumption. However, such protocols result in memory instruction reordering, thus causing extra program behaviors that are often not intended by the programmer. We propose a novel formal model that captures the semantics of programs running under such protocols, and employs a set of fences that interact with the coherence layer. Using the model, we perfform a reachability analysis that can check whether a program satisfies a given safety property with the current set of fences. Based on an algorithm in [19], we describe a method for insertion of optimal sets of fences that ensure correctness of the program under such protocols. The method relies on a counter-example guided fence insertion procedure. One feature of our method is that it can handle a variety of fences (with different costs). This diversity makes optimization more difficult since one has to optimize the total cost of the inserted fences, rather than just their number. To demonstrate the strength of our approach, we have implemented a prototype and run it on a wide range of examples and benchmarks. We have also, using simulation, evaluated the performance of the resulting fenced programs.

Notes

Acknowledgment

This work was supported by the Uppsala Programming for Multicore Architectures Research Center (UPMARC), the Swedish Board of Science project, “Rethinking the Memory System”, the “Fundación Seneca-Agencia de Ciencia y Tecnología de la Región de Murcia” under the project “Jóvenes Líderes en Investigación” and European Commission FEDER funds.

References

  1. 1.
    Abdulla, P.A., Atig, M.F., Chen, Y.-F., Leonardsson, C., Rezine, A.: Counter-example guided fence insertion under TSO. In: Flanagan, C., König, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 204–219. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Adve, S.V., Hill, M.D.: Weak ordering - a new definition. In: ISCA, pp. 2–14 (1990)Google Scholar
  3. 3.
    Alglave, J., Kroening, D., Nimal, V., Poetzl, D.: Don’t sit on the fence. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 508–524. Springer, Heidelberg (2014)Google Scholar
  4. 4.
    Bouajjani, A., Derevenetc, E., Meyer, R.: Checking and enforcing robustness against TSO. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 533–553. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Bouajjani, A., Meyer, R., Möhlmann, E.: Deciding robustness against total store ordering. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011, Part II. LNCS, vol. 6756, pp. 428–440. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: SPAA, pp. 21–28 (2005)Google Scholar
  7. 7.
    Choi, B., Komuravelli, R., Sung, H., Smolinski, R., Honarmand, N., Adve, S.V., Adve, V.S., Carter, N.P., Chou, C.T.: DeNovo: rethinking the memory hierarchy for disciplined parallelism. In: PACT, pp. 155–166 (2011)Google Scholar
  8. 8.
    Davari, M., Ros, A., Hagersten, E., Kaxiras, S.: An efficient, self-contained, on-chip, directory: DIR\(_1\)-SISD. In: PACT, pp. 317–330 (2015)Google Scholar
  9. 9.
    Dice, D., Shalev, O., Shavit, N.N.: Transactional locking II. In: Dolev, S. (ed.) DISC 2006. LNCS, vol. 4167, pp. 194–208. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Dijkstra, E.W.: Cooperating sequential processes (2002)Google Scholar
  11. 11.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco (2008)Google Scholar
  12. 12.
    Hower, D.R., Hechtman, B.A., Beckmann, B.M., Gaster, B.R., Hill, M.D., Reinhardt, S.K., Wood, D.A.: Heterogeneous-race-free memory models. In: ASPLOS, pp. 427–440 (2014)Google Scholar
  13. 13.
    Kaxiras, S., Keramidas, G.: SARC coherence: scaling directory cache coherence in performance and power. IEEE Micro 30(5), 54–65 (2011)CrossRefGoogle Scholar
  14. 14.
    Kaxiras, S., Ros, A.: A new perspective for efficient virtual-cache coherence. In: ISCA, pp. 535–547 (2013)Google Scholar
  15. 15.
    Koukos, K., Ros, A., Hagersten, E., Kaxiras, S.: Building heterogeneous unified virtual memories (UVMS) without the overhead. ACM TACO 13(1), 1:1–1:22 (2016)Google Scholar
  16. 16.
    Kuperstein, M., Vechev, M., Yahav, E.: Automatic inference of memory fences. In: FMCAD, pp. 111–119. IEEE (2010)Google Scholar
  17. 17.
    Lamport, L.: A new solution of dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Lebeck, A.R., Wood, D.A.: Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors. In: ISCA, pp. 48–59 (1995)Google Scholar
  19. 19.
    Liu, F., Nedev, N., Prisadnikov, N., Vechev, M.T., Yahav, E.: Dynamic synthesis for relaxed memory models. In: PLDI, pp. 429–440 (2012)Google Scholar
  20. 20.
    Magnusson, P., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of Eighth International Parallel Processing Symposium, pp. 165–171. IEEE (1994)Google Scholar
  21. 21.
    Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Comput. Archit. News 33(4), 92–99 (2005)CrossRefGoogle Scholar
  22. 22.
    Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. (TOCS) 9(1), 21–65 (1991)CrossRefGoogle Scholar
  23. 23.
    Ros, A., Davari, M., Kaxiras, S.: Hierarchical private/shared classification: the key to simple and efficient coherence for clustered cache hierarchies. In: HPCA, pp. 186–197 (2015)Google Scholar
  24. 24.
    Ros, A., Kaxiras, S.: Complexity-effective multicore coherence. In: PACT, pp. 241–252 (2012)Google Scholar
  25. 25.
    Ros, A., Kaxiras, S.: Callback: efficient synchronization without invalidation with a directory just for spin-waiting. In: ISCA, pp. 427–438 (2015)Google Scholar
  26. 26.
    Ros, A., Kaxiras, S.: Fast & furious: a tool for detecting covert racing. In: PARMA and DITAM, pp. 1–6 (2015)Google Scholar
  27. 27.
    Sakalis, C., Leonardsson, C., Kaxiras, S., Ros, A.: Splash-3: a properly synchronized benchmark suite for contemporary research. In: ISPASS (2016)Google Scholar
  28. 28.
    Schmidt, D.C., Harrison, T.: Double-checked locking - an optimization pattern for efficiently initializing and accessing thread-safe objects. In: PLoP (1996)Google Scholar
  29. 29.
    Scott, M.L.: Shared-Memory Synchronization. Morgan & Claypool, San Rafael (2013)Google Scholar
  30. 30.
    Shasha, D., Snir, M.: Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst. (TOPLAS) 10(2), 282–312 (1988)CrossRefGoogle Scholar
  31. 31.
    Sung, H., Adve, S.V.: DeNovoSync: efficient support for arbitrary synchronization without writer-initiated invalidations. In: ASPLOS, pp. 545–559 (2015)Google Scholar
  32. 32.
    Sung, H., Komuravelli, R., Adve, S.V.: DeNovoND: efficient hardware support for disciplined non-determinism. In: ASPLOS, pp. 13–26 (2013)Google Scholar
  33. 33.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: ISCA, pp. 24–36 (1995)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Parosh Aziz Abdulla
    • 1
  • Mohamed Faouzi Atig
    • 1
  • Stefanos Kaxiras
    • 1
  • Carl Leonardsson
    • 1
  • Alberto Ros
    • 2
  • Yunyun Zhu
    • 1
  1. 1.Uppsala UniversityUppsalaSweden
  2. 2.Universidad de MurciaMurciaSpain

Personalised recommendations