Concurrent Data Structures in Architectures with Limited Shared Memory Support
The Single-chip Cloud Computer (SCC) is an experimental multicore processor created by Intel Labs for the many-core research community, to study many-core processors, their programmability and scalability in connection to communication models. It is based on a distributed memory architecture that combines fast-access, small on-chip memory with large off-chip private and shared memory. Additionally, its design is meant to favour message-passing over the traditional shared-memory programming. To this effect, the platform deliberately does not provide hardware supported cache-coherence or atomic memory read/write operations across cores. Because of these limitations of the hardware support, algorithmic designs of concurrent data structures in the literature are not suitable.
In this paper, we delve into the problem of designing concurrent data structures on such systems. By utilising their very efficient message-passing together with the limited shared memory available, we provide two techniques that use the concept of a coordinator and one that combines local locks with message passing. All three achieve high concurrency and resiliency. These techniques allow us to design three efficient algorithms for concurrent FIFO queues. Our techniques are general and can be used to implement other concurrent abstract data types. We also provide an experimental study of the proposed queues on the SCC platform, analysing the behaviour of the throughput of our algorithms based on different memory placement policies.
KeywordsShared Memory Message Passing Critical Section Liveness Property FIFO Queue
Unable to display preview. Download preview PDF.
- 1.Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco (2008)Google Scholar
- 2.Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, PODC 1996, pp. 267–275. ACM (1996)Google Scholar
- 3.J., Dighe, Howard, o.: A 48-core ia-32 message-passing processor with dvfs in 45nm cmos. In: 2010 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 108–109 (2010)Google Scholar
- 5.Zhang, W., Hou, L., others: Comparison research between xy and odd-even routing algorithm of a 2-dimension 3x3 mesh topology network-on-chip. In: WRI Global Congress on Intelligent Systems, GCIS 2009, vol. 3, pp. 329–333 (2009)Google Scholar
- 6.Intel Cooporation: SCC External Architecture Specification (November 2010)Google Scholar
- 7.Cederman, D., Chatterjee, B., et al.: et al.: A study of the behavior of synchronization methods in commonly used languages and systems. In: Proceedings of the 27th IEEE International Parallel & Distributed Processing Symposium (2013)Google Scholar
- 9.Petrovic, D., André, Schiper, o.: Leveraging hardware message passing for efficient thread synchronization. In: 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Number EPFL-CONF-190495 (2014)Google Scholar
- 10.Calciu, I., Gottschlich, J.E., Herlihy, M.: Using elimination and delegation to implement a scalable numa-friendly stack. In: Proc. Usenix Workshop on Hot Topics in Parallelism, HotPar (2013)Google Scholar
- 11.Ozi, J.P., David, F., et al.: Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. In: Proc. Usenix Annual Technical Conf., pp. 65–76 (2012)Google Scholar