Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

  • Irina Calciu
  • Dave Dice
  • Tim Harris
  • Maurice Herlihy
  • Alex Kogan
  • Virendra Marathe
  • Mark Moir
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8304)

Abstract

Even for small multi-core systems, it has become harder and harder to support a simple shared memory abstraction: processors access some memory regions more quickly than others, a phenomenon called non-uniform memory access (NUMA). These trends have prompted researchers to investigate alternative programming abstractions based on message passing rather than cache-coherent shared memory. To advance a pragmatic understanding of these models’ strengths and weaknesses, we have explored a range of different message passing and shared memory designs, for a variety of concurrent data structures, running on different multicore architectures. Our goal was to evaluate which combinations perform best, and where simple software or hardware optimizations might have the most impact. We observe that different approaches perform best in different circumstances, and that the communication overhead of message passing can often outweigh its benefits. Nonetheless, we discuss ways in which this balance may shift in the future. Overall, we conclude that, by emphasizing high-level shared data abstractions, software should be designed to be largely independent of the choice of low-level communication mechanism.

Keywords

NUMA message passing shared memory delegation locks concurrent data structures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baumann, A., Barham, P., Dagand, P.-E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., Singhania, A.: The multikernel: a new OS architecture for scalable multicore systems. In: Proc. ACM SIGOPS Symposium on Operating Systems Principles (SOSP), pp. 29–44 (2009)Google Scholar
  2. 2.
    Calciu, I., Gottschlich, J.E., Herlihy, M.: Using elimination and delegation to implement a scalable NUMA-friendly stack. In: Proc. Usenix Workshop on Hot Topics in Parallelism (HotPar) (2013)Google Scholar
  3. 3.
    Dashti, M., Fedorova, F., Funston, J., Gaud, F., Lachaize, R., Lachaize, B., Quema, V., Quema, M.: Traffic management: a holistic approach to memory placement on NUMA systems. In: Proc. Conf. on Arch. Support for Prog. Lang. and Op. Systems (ASPLOS), pp. 381–394 (2013)Google Scholar
  4. 4.
    Dice, D.: NUMA-aware placement of communication variables (November 2012), blogs.oracle.com/dave/entry/numa_aware_placement_of_communication1
  5. 5.
    Dice, D., Marathe, V.J., Shavit, N.: Lock cohorting: a general technique for designing NUMA locks. In: Proc. ACM Symp. on Principles and Practice of Parallel Programming (PPoPP), pp. 247–256 (2012)Google Scholar
  6. 6.
    Dice, D., Otenko, O.: Brief announcement: multilane - a concurrent blocking multiset. In: Proc. ACM SPAA, pp. 313–314 (2011)Google Scholar
  7. 7.
    Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat-combining and the synchronization parallelism tradeoff. In: Proceedings of the Twenty Third ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 355–364 (June 2010)Google Scholar
  8. 8.
    Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 206–215 (2004)Google Scholar
  9. 9.
    Lauer, H.C., Needham, R.M.: On the duality of operating system structures. SIGOPS Oper. Syst. Rev. 13(2), 3–19 (1979)CrossRefGoogle Scholar
  10. 10.
    Lozi, J.-P., David, F., Thomas, G., Lawall, J., Muller, G.: Remote core locking: Migrating critical-section execution to improve the performance of multithreaded applications. In: Proc. USENIX Annual Technical Conference, ATC (2012)Google Scholar
  11. 11.
    Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)CrossRefGoogle Scholar
  12. 12.
    Metreveli, Z., Zeldovich, N., Kaashoek, M.F.: Cphash: a cache-partitioned hash table. In: Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2012, pp. 319–320. ACM, New York (2012)Google Scholar
  13. 13.
    Oracle Corporation. Oracle’s Sun Fire X4800 Server Architecture (2010), www.oracle.com/technetwork/articles/systems-hardware-architecture/sf4800g5-architecture-163848.pdf
  14. 14.
    Oracle Corporation. Oracle’s SPARC T4-1, SPARC T4-2, SPARC T4-4, and SPARC T4-1B Server Architecture (2012), www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/o11-090-sparc-t4-arch-496245.pdf
  15. 15.
    Oyama, Y., Taura, K., Yonezawa, A.: Executing parallel programs with synchronization bottlenecks efficiently. In: Proc. Int. Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, PDSIA (1999)Google Scholar
  16. 16.
    Suleman, M.A., Mutlu, O., Qureshi, M.K., Patt, Y.N.: Accelerating critical section execution with asymmetric multi-core architectures. In: Proc. Conf. on Arch. Support for Prog. Lang. and Op. Systems (ASPLOS), pp. 253–264 (2009)Google Scholar
  17. 17.
    von Eicken, T., Culler, D.E., Goldstein, S.C., Schauser, K.E.: Active messages: a mechanism for integrated communication and computation. In: Proc. Int. Symposium on Computer Architecture (ISCA), pp. 256–266 (1992)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Irina Calciu
    • 1
  • Dave Dice
    • 2
  • Tim Harris
    • 2
  • Maurice Herlihy
    • 1
    • 2
  • Alex Kogan
    • 2
  • Virendra Marathe
    • 2
  • Mark Moir
    • 2
  1. 1.Brown UniversityUSA
  2. 2.Oracle LabsUSA

Personalised recommendations