A Formal Model of Parallel Execution on Multicore Architectures with Multilevel Caches

  • Shiji Bijo
  • Einar Broch Johnsen
  • Ka I Pun
  • Silvia Lizeth Tapia Tarifa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10487)

Abstract

The performance of software running on parallel or distributed architectures can be severely affected by the location of data. On shared memory multicore architectures, data movement between caches and main memory is driven by tasks executing in parallel on different cores and by a protocol to ensure cache coherence, such as MSI. This paper integrates MSI in a formal model to capture such data movement from an application perspective. We develop an executable model which integrates cache coherent data movement between different cache levels and main memory, for software described by task-level data access patterns. The proposed model is generic in the number of cache levels and cores, and abstracts from the concrete communication medium. We show that the model guarantees expected correctness properties for the MSI protocol, in particular data consistency. This paper further presents a proof of concept implementation of the proposed model in rewriting logic, which allows different choices for a program’s underlying hardware architecture to be specified and compared.

References

  1. 1.
    Adve, S.V., Gharachorloo, K.: Shared memory consistency models: a tutorial. IEEE Comput. 29(12), 66–76 (1996)CrossRefGoogle Scholar
  2. 2.
    Alglave, J., Maranget, L., Tautschnig, M., Cats, H.: Modelling, simulation, testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst. 36(2), 7:1–7:74 (2014)CrossRefGoogle Scholar
  3. 3.
    Bijo, S., Johnsen, E.B., Pun, K.I., Tapia Tarifa, S.L.: A Maude framework for cache coherent multicore architectures. In: Lucanu, D. (ed.) WRLA 2016. LNCS, vol. 9942, pp. 47–63. Springer, Cham (2016). doi:10.1007/978-3-319-44802-2_3 CrossRefGoogle Scholar
  4. 4.
    Bijo, S., Johnsen, E.B., Pun, K.I., Tapia Tarifa, S.L.: An operational semantics of cache coherent multicore architectures. In: Proceedings of Symposium Applied Computing (SAC). ACM (2016)Google Scholar
  5. 5.
    Bijo, S., Johnsen, E.B., Pun, K.I., Tapia Tarifa, S.L.: A formal model of parallel execution in multicore architectures with multilevel caches (long version). Res. rep., Department of Informatics, University of Oslo (2017). http://violet.at.ifi.uio.no/papers/mc-rr.pdf
  6. 6.
    Binkert, N., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)CrossRefGoogle Scholar
  7. 7.
    Carlson, T.E., Heirman, W., Eeckhout, L.: Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: Proceedings of High Performance Computing, Networking, Storage and Analysis (SC), pp. 52:1–52:12. ACM (2011)Google Scholar
  8. 8.
    Clavel, M., Durán, F., Eker, S., Lincoln, P., Martí-Oliet, N., Meseguer, J., Talcott, C. (eds.): All About Maude - A High-Performance Logical Framework, How to Specify, Program and Verify Systems in Rewriting Logic. LNCS, vol. 4350. Springer, Heidelberg (2007)Google Scholar
  9. 9.
    Crary, K., Sullivan, M.J.: A calculus for relaxed memory. In: Proceedings of Principles of Programming Languages (POPL), pp. 623–636. ACM (2015)Google Scholar
  10. 10.
    Culler, D.E., Gupta, A., Singh, J.P.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, San Francisco (1997)Google Scholar
  11. 11.
    Delzanno, G.: Constraint-based verification of parameterized cache coherence protocols. Formal Meth. Syst. Des. 23(3), 257–301 (2003)CrossRefMATHGoogle Scholar
  12. 12.
    Dill, D.L., Drexler, A.J., Hu, A.J., Yang, C.H.: Protocol verification as a hardware design aid. In: Proceedings of Computer Design on VLSI in Computer Processors (ICCD). IEEE (1992)Google Scholar
  13. 13.
    Dill, D.L., Park, S., Nowatzyk, A.G.: Formal specification of abstract memory models. In: Proceedings of Symposium Research on Integrated Systems, pp. 38–52. MIT Press (1993)Google Scholar
  14. 14.
    Dongol, B., Travkin, O., Derrick, J., Wehrheim, H.: A high-level semantics for program execution under total store order memory. In: Liu, Z., Woodcock, J., Zhu, H. (eds.) ICTAC 2013. LNCS, vol. 8049, pp. 177–194. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39718-9_11 CrossRefGoogle Scholar
  15. 15.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco (2011)MATHGoogle Scholar
  16. 16.
    Jagadeesan, R., Pitcher, C., Riely, J.: Generative operational semantics for relaxed memory models. In: Gordon, A.D. (ed.) ESOP 2010. LNCS, vol. 6012, pp. 307–326. Springer, Heidelberg (2010). doi:10.1007/978-3-642-11957-6_17 CrossRefGoogle Scholar
  17. 17.
    Johnsen, E.B., Hähnle, R., Schäfer, J., Schlatte, R., Steffen, M.: ABS: a core language for abstract behavioral specification. In: Aichernig, B.K., Boer, F.S., Bonsangue, M.M. (eds.) FMCO 2010. LNCS, vol. 6957, pp. 142–164. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25271-6_8 CrossRefGoogle Scholar
  18. 18.
    Kandemir, M., et al.: Improving locality using loop and data transformations in an integrated framework. In: Proceedings of ACM/IEEE International Symposium on Microarchitecture (1998)Google Scholar
  19. 19.
    Lamport, L.: How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. 28(9), 690–691 (1979)CrossRefMATHGoogle Scholar
  20. 20.
    Li, Y., Suhendra, V., Liang, Y., Mitra, T., Roychoudhury, A.: Timing analysis of concurrent programs running on shared cache multi-cores. In: Proceedings of Real-Time Systems Symposium (RTSS), pp. 57–67. IEEE (2009)Google Scholar
  21. 21.
    Mador-Haim, S., Maranget, L., Sarkar, S., Memarian, K., Alglave, J., Owens, S., Alur, R., Martin, M.M.K., Sewell, P., Williams, D.: An axiomatic memory model for POWER multiprocessors. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 495–512. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31424-7_36 CrossRefGoogle Scholar
  22. 22.
    Martin, M.M.K., et al.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33(4), 92–99 (2005)CrossRefGoogle Scholar
  23. 23.
    Martín, Ó., Verdejo, A., Martí-Oliet, N.: Model checking TLR* guarantee formulas on infinite systems. In: Iida, S., Meseguer, J., Ogata, K. (eds.) Specification, Algebra, and Software. LNCS, vol. 8373, pp. 129–150. Springer, Heidelberg (2014). doi:10.1007/978-3-642-54624-2_7 CrossRefGoogle Scholar
  24. 24.
    Meseguer, J.: Conditional rewriting logic as a unified model of concurrency. Theor. Comput. Sci. 96(1), 73–155 (1992)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Miller, J.E., et al.: Graphite: a distributed parallel simulator for multicores. In: Proceedings of the High-Performance Computer Architecture (HPCA), pp. 1–12. IEEE (2010)Google Scholar
  26. 26.
    Nita, M., Grossman, D., Chambers, C.: A theory of platform-dependent low-level software. In: Proceedings of the Principles of Programming Languages (POPL), pp. 209–220. ACM (2008)Google Scholar
  27. 27.
    Pang, J., Fokkink, W., Hofman, R.F.H., Veldema, R.: Model checking a cache coherence protocol of a Java DSM implementation. J. Log. Algeb. Prog. 71(1), 1–43 (2007)MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Patterson, D.A., Hennessy, J.L.: Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann (2013)Google Scholar
  29. 29.
    Plotkin, G.D.: A structural approach to operational semantics. J. Log. Algeb. Prog. 60–61, 17–139 (2004)MathSciNetMATHGoogle Scholar
  30. 30.
    Pong, F., Dubois, M.: Verification techniques for cache coherence protocols. ACM Comput. Surv. 29(1), 82–126 (1997)CrossRefGoogle Scholar
  31. 31.
    Ramírez, S., Rocha, C.: Formal verification of safety properties for a cache coherence protocol. In: Proceedings of the Colombian Computing Conference (10CCC), pp. 9–16. IEEE (2015)Google Scholar
  32. 32.
    Sarkar, S., Sewell, P., Alglave, J., Maranget, L., Williams, D.: Understanding POWER multiprocessors. In: Proceedings of PLDI, pp. 175–186. ACM (2011)Google Scholar
  33. 33.
    Sewell, P., Sarkar, S., Owens, S., Nardelli, F.Z., Myreen, M.O.: X86-TSO: A rigorous and usable programmer’s model for x86 multiprocessors. Commun. ACM 53(7), 89–97 (2010)CrossRefGoogle Scholar
  34. 34.
    Smith, G., Derrick, J., Dongol, B.: Admit your weakness: verifying correctness on TSO architectures. In: Lanese, I., Madelaine, E. (eds.) FACS 2014. LNCS, vol. 8997, pp. 364–383. Springer, Cham (2015). doi:10.1007/978-3-319-15317-9_22 Google Scholar
  35. 35.
    Solihin, Y.: Fundamentals of Parallel Multicore Architecture. Chapman & Hall/CRC (2015)Google Scholar
  36. 36.
    Sorin, D.J., Hill, M.D., Wood, D.A.: A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool, San Francisco (2011)Google Scholar
  37. 37.
    Yu, X., Vijayaraghavan, M., Devadas, S.: A proof of correctness for the Tardis cache coherence protocol. CoRR, abs/1505.06459 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Shiji Bijo
    • 1
  • Einar Broch Johnsen
    • 1
  • Ka I Pun
    • 1
  • Silvia Lizeth Tapia Tarifa
    • 1
  1. 1.Department of InformaticsUniversity of OsloOsloNorway

Personalised recommendations