Concurrent Data Structures in Architectures with Limited Shared Memory Support

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8805)


The Single-chip Cloud Computer (SCC) is an experimental multicore processor created by Intel Labs for the many-core research community, to study many-core processors, their programmability and scalability in connection to communication models. It is based on a distributed memory architecture that combines fast-access, small on-chip memory with large off-chip private and shared memory. Additionally, its design is meant to favour message-passing over the traditional shared-memory programming. To this effect, the platform deliberately does not provide hardware supported cache-coherence or atomic memory read/write operations across cores. Because of these limitations of the hardware support, algorithmic designs of concurrent data structures in the literature are not suitable.

In this paper, we delve into the problem of designing concurrent data structures on such systems. By utilising their very efficient message-passing together with the limited shared memory available, we provide two techniques that use the concept of a coordinator and one that combines local locks with message passing. All three achieve high concurrency and resiliency. These techniques allow us to design three efficient algorithms for concurrent FIFO queues. Our techniques are general and can be used to implement other concurrent abstract data types. We also provide an experimental study of the proposed queues on the SCC platform, analysing the behaviour of the throughput of our algorithms based on different memory placement policies.


Shared Memory Message Passing Critical Section Liveness Property FIFO Queue 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco (2008)Google Scholar
  2. 2.
    Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, PODC 1996, pp. 267–275. ACM (1996)Google Scholar
  3. 3.
    J., Dighe, Howard, o.: A 48-core ia-32 message-passing processor with dvfs in 45nm cmos. In: 2010 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 108–109 (2010)Google Scholar
  4. 4.
    Herlihy, M.P.: Impossibility and universality results for wait-free synchronization. In: Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, PODC 1988, pp. 276–290. ACM, New York (1988)CrossRefGoogle Scholar
  5. 5.
    Zhang, W., Hou, L., others: Comparison research between xy and odd-even routing algorithm of a 2-dimension 3x3 mesh topology network-on-chip. In: WRI Global Congress on Intelligent Systems, GCIS 2009, vol. 3, pp. 329–333 (2009)Google Scholar
  6. 6.
    Intel Cooporation: SCC External Architecture Specification (November 2010)Google Scholar
  7. 7.
    Cederman, D., Chatterjee, B., et al.: et al.: A study of the behavior of synchronization methods in commonly used languages and systems. In: Proceedings of the 27th IEEE International Parallel & Distributed Processing Symposium (2013)Google Scholar
  8. 8.
    Gidenstam, A., Sundell, H., Tsigas, P.: Cache-aware lock-free queues for multiple producers/Consumers and weak memory consistency. In: Lu, C., Masuzawa, T., Mosbah, M. (eds.) OPODIS 2010. LNCS, vol. 6490, pp. 302–317. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Petrovic, D., André, Schiper, o.: Leveraging hardware message passing for efficient thread synchronization. In: 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Number EPFL-CONF-190495 (2014)Google Scholar
  10. 10.
    Calciu, I., Gottschlich, J.E., Herlihy, M.: Using elimination and delegation to implement a scalable numa-friendly stack. In: Proc. Usenix Workshop on Hot Topics in Parallelism, HotPar (2013)Google Scholar
  11. 11.
    Ozi, J.P., David, F., et al.: Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. In: Proc. Usenix Annual Technical Conf., pp. 65–76 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Computer Science and EngineeringChalmers University of TechnologySweden

Personalised recommendations