Abstract
The advent of manycore architectures raises new scalability challenges for concurrent applications. Implementing scalable data structures is one of them. Several manycore architectures provide hardware message passing as a means to efficiently exchange data between cores. In this paper, we study the implementation of high-throughput concurrent maps in message-passing manycores. Partitioning and replication are the two approaches to achieve high throughput in a message-passing system. Our paper presents and compares different strongly-consistent map algorithms based on partitioning and replication. To assess the performance of these algorithms independently of architecture-specific features, we propose a communication model of message-passing manycores to express the throughput of each algorithm. The model is validated through experiments on a 36-core TILE-Gx8036 processor. Evaluations show that replication outperforms partitioning only in a narrow domain.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE, pp. 53–64 (2012)
Baumann, A., Barham, P., Dagand, P., et al.: The multikernel: a new OS architecture for scalable multicore systems. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 29–44 (2009)
Beckmann, N.: Distributed naming in a factored operating system. Master’s thesis, Massachusetts Institute of Technology (2010)
Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: Proceedings of the 2011 International Green Computing Conference and Workshops, pp. 1–8 (2011)
Calciu, I., Dice, D., Lev, Y., Luchangco, V., Marathe, V.J., Shavit, N.: NUMA-aware reader-writer locks. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2013)
Défago, X., Schiper, A., Urbán, P.: Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys 36(4), 372–421 (2004)
Devlin, B., Gray, J., Laing, B., Spix, G.: Scalability terminology: Farms, clones, partitions, and packs: Racs and raps. Technical Report MS-TR-99-85, Microsoft Research (1999)
Gamsa, B., Krieger, O., Appavoo, J., Stumm, M.: Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In: The Third Symposium on Operating Systems Design and Implementation, pp. 87–100 (1999)
Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann (2012)
Howard, J., Dighe, S., Hoskote, Y., et al.: A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In: International IEEE Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 108–109 (2010)
Kalray, http://www.kalray.eu
Lever, C.: Linux kernel hash table behavior: analysis and improvements. Technical Report TR 00-1, University of Michigan (2000)
Martin, M.M.K., Hill, M.D., Sorin, D.J.: Why on-chip cache coherence is here to stay. Communications of the ACM 55(7), 78–89 (2012)
Memcached, http://www.memcached.org
Metreveli, Z., Zeldovich, N., Kaashoek, M.F.: Cphash: A cache-partitioned hash table. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 319–320 (2012)
Petrović, D., Shahmirzadi, O., Ropars, T., Schiper, A.: Asynchronous broadcast on the Intel SCC using interrupts. In: Proceedings of the 6th Many-core Applications Research Community Symposium, pp. 24–29 (2012)
Petrović, D., Shahmirzadi, O., Ropars, T., Schiper, A.: High-performance RMA-based broadcast on the Intel SCC. In: Proceedinbgs of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 121–130 (2012)
Rafla, N., Gauba, D.: Hardware implementation of context switching for hard real-time operating systems. In: 54th IEEE International Midwest Symposium on Circuits and Systems (2011)
Ramos, S., Hoefler, T.: Modeling communication in cache-coherent SMP systems: A case-study with Xeon Phi. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 97–108 (2013)
Shahmirzadi, O., Ropars, T., Schiper, A.: High-throughput maps for message-passing manycore architectures: partitioning versus replication. Technical Report 196582, EPFL (2014)
Tilera, http://www.tilera.com
Torrellas, J.: Architectures for Extreme-Scale Computing. IEEE Computer 42(11), 28–35 (2009)
Vogels, W.: Eventually consistent. Communications of the ACM 52(1), 40–44 (2009)
Wentzlaff, D., Agarwal, A.: Factored operating systems (FOS): the case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review 43(2), 76–85 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Shahmirzadi, O., Ropars, T., Schiper, A. (2014). High-Throughput Maps on Message-Passing Manycore Architectures: Partitioning versus Replication. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-09873-9_45
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)