High-Throughput Maps on Message-Passing Manycore Architectures: Partitioning versus Replication

Shahmirzadi, Omid; Ropars, Thomas; Schiper, André

doi:10.1007/978-3-319-09873-9_45

Omid Shahmirzadi¹⁶,
Thomas Ropars¹⁶ &
André Schiper¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8632))

Included in the following conference series:

European Conference on Parallel Processing

2765 Accesses
7 Citations

Abstract

The advent of manycore architectures raises new scalability challenges for concurrent applications. Implementing scalable data structures is one of them. Several manycore architectures provide hardware message passing as a means to efficiently exchange data between cores. In this paper, we study the implementation of high-throughput concurrent maps in message-passing manycores. Partitioning and replication are the two approaches to achieve high throughput in a message-passing system. Our paper presents and compares different strongly-consistent map algorithms based on partitioning and replication. To assess the performance of these algorithms independently of architecture-specific features, we propose a communication model of message-passing manycores to express the throughput of each algorithm. The model is validated through experiments on a 36-core TILE-Gx8036 processor. Evaluations show that replication outperforms partitioning only in a narrow domain.

Download to read the full chapter text

Chapter PDF

Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

Finepoints: Partitioned Multithreaded MPI Communication

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE, pp. 53–64 (2012)
Google Scholar
Baumann, A., Barham, P., Dagand, P., et al.: The multikernel: a new OS architecture for scalable multicore systems. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 29–44 (2009)
Google Scholar
Beckmann, N.: Distributed naming in a factored operating system. Master’s thesis, Massachusetts Institute of Technology (2010)
Google Scholar
Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: Proceedings of the 2011 International Green Computing Conference and Workshops, pp. 1–8 (2011)
Google Scholar
Calciu, I., Dice, D., Lev, Y., Luchangco, V., Marathe, V.J., Shavit, N.: NUMA-aware reader-writer locks. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2013)
Google Scholar
Défago, X., Schiper, A., Urbán, P.: Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys 36(4), 372–421 (2004)
Article Google Scholar
Devlin, B., Gray, J., Laing, B., Spix, G.: Scalability terminology: Farms, clones, partitions, and packs: Racs and raps. Technical Report MS-TR-99-85, Microsoft Research (1999)
Google Scholar
Gamsa, B., Krieger, O., Appavoo, J., Stumm, M.: Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In: The Third Symposium on Operating Systems Design and Implementation, pp. 87–100 (1999)
Google Scholar
Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann (2012)
Google Scholar
Howard, J., Dighe, S., Hoskote, Y., et al.: A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In: International IEEE Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 108–109 (2010)
Google Scholar
Kalray, http://www.kalray.eu
Lever, C.: Linux kernel hash table behavior: analysis and improvements. Technical Report TR 00-1, University of Michigan (2000)
Google Scholar
Martin, M.M.K., Hill, M.D., Sorin, D.J.: Why on-chip cache coherence is here to stay. Communications of the ACM 55(7), 78–89 (2012)
Article Google Scholar
Memcached, http://www.memcached.org
Metreveli, Z., Zeldovich, N., Kaashoek, M.F.: Cphash: A cache-partitioned hash table. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 319–320 (2012)
Google Scholar
Petrović, D., Shahmirzadi, O., Ropars, T., Schiper, A.: Asynchronous broadcast on the Intel SCC using interrupts. In: Proceedings of the 6th Many-core Applications Research Community Symposium, pp. 24–29 (2012)
Google Scholar
Petrović, D., Shahmirzadi, O., Ropars, T., Schiper, A.: High-performance RMA-based broadcast on the Intel SCC. In: Proceedinbgs of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 121–130 (2012)
Google Scholar
Rafla, N., Gauba, D.: Hardware implementation of context switching for hard real-time operating systems. In: 54th IEEE International Midwest Symposium on Circuits and Systems (2011)
Google Scholar
Ramos, S., Hoefler, T.: Modeling communication in cache-coherent SMP systems: A case-study with Xeon Phi. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 97–108 (2013)
Google Scholar
Shahmirzadi, O., Ropars, T., Schiper, A.: High-throughput maps for message-passing manycore architectures: partitioning versus replication. Technical Report 196582, EPFL (2014)
Google Scholar
Tilera, http://www.tilera.com
Torrellas, J.: Architectures for Extreme-Scale Computing. IEEE Computer 42(11), 28–35 (2009)
Article Google Scholar
Vogels, W.: Eventually consistent. Communications of the ACM 52(1), 40–44 (2009)
Article Google Scholar
Wentzlaff, D., Agarwal, A.: Factored operating systems (FOS): the case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review 43(2), 76–85 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Omid Shahmirzadi, Thomas Ropars & André Schiper

Authors

Omid Shahmirzadi
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Ropars
View author publications
You can also search for this author in PubMed Google Scholar
André Schiper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, Universidade do Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Fernando Silva , Inês Dutra & Vítor Santos Costa , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shahmirzadi, O., Ropars, T., Schiper, A. (2014). High-Throughput Maps on Message-Passing Manycore Architectures: Partitioning versus Replication. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-09873-9_45
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

High-Throughput Maps on Message-Passing Manycore Architectures: Partitioning versus Replication

Abstract

Chapter PDF

Similar content being viewed by others

Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

Finepoints: Partitioned Multithreaded MPI Communication

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

High-Throughput Maps on Message-Passing Manycore Architectures: Partitioning versus Replication

Abstract

Chapter PDF

Similar content being viewed by others

Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

Finepoints: Partitioned Multithreaded MPI Communication

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation