Diamond Rings: Acknowledged Event Propagation in Many-Core Processors

  • Stefan Nürnberger
  • Randolf Rotta
  • Gabor Drescher
  • Daniel Danner
  • Jörg Nolte
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9523)

Abstract

Hardware and software consistency protocols rely on global observability of consistency events. Acknowledged broadcast is an obvious choice to propagate these events. This paper presents a generalized ring topology for parallel event propagation with acknowledged delivery. Implementations for various many-core architectures show increased performance over conventional approaches. Therefore, diamond rings are a prime candidate for implementations of distributed memory models.

Notes

Acknowledgments

This work was supported by the German Research Foundation (DFG) under grant no. NO 625/7-1 and SCHR 603/10-1. The evaluation on the Intel XeonPhi was supported by the German Federal Ministry of Education and Research (BMBF) grant no. 01IH13003C.

References

  1. 1.
    Jerger, N.E., Peh, L.S., Lipasti, M.: Virtual circuit tree multicasting: a case for on-chip hardware multicast support. In: ISCA 2008, pp. 229–240. IEEE (2008)Google Scholar
  2. 2.
    Karp, R.M., Sahay, A., Santos, E.E., Schauser, K.E.: Optimal broadcast and summation in the logp model. In: Fifth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 1993, pp. 142–153. ACM (1993)Google Scholar
  3. 3.
    Al-Khalissi, H., Bucty, R., Berekovic, M.: Efficient barrier synchronization for OpenMP-like parallelism on the intel SCC. In: Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, pp. 10–17, December 2013Google Scholar
  4. 4.
    Hedetniemi, S.M., Hedetniemi, S.T., Liestman, A.L.: A survey of gossiping and broadcasting in communication networks. Networks 18(4), 319–349 (1988)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Bar-Noy, A., Kipnis, S.: Designing broadcasting algorithms in the postal model for message-passing systems. In: SPAA 1992, pp. 13–22. ACM (1992)Google Scholar
  6. 6.
    Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K.E., Santos, E., Subramonian, R., Von Eicken, T.: LogP: towards a realistic model of parallel computation. In: Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 1993, vol. 28, pp. 1–12. ACM, San Diego (1993)Google Scholar
  7. 7.
    Bruck, J., De Coster, L., Dewulf, N., Ho, C.T., Lauwereins, R.: On the design and implementation of broadcast and global combine operations using the postal model. IEEE Trans. Parallel Distrib. Syst. 7(3), 256–265 (1996)CrossRefGoogle Scholar
  8. 8.
    Golin, M., Schuster, A.: Optimal point-to-point broadcast algorithms via lopsided trees. Discrete Appl. Math. 93(2), 233–263 (1999)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Matienzo, J., Jerger, N.E.: Performance analysis of broadcasting algorithms on the intel single-chip cloud computer. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2013), pp. 163–172. IEEE (2013)Google Scholar
  10. 10.
    Howard, J., et al.: A 48-core ia-32 message-passing processor with DVFS in 45nm CMOS. In: IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC 2010), pp. 108–109. IEEE (2010)Google Scholar
  11. 11.
    Malumbres, M.P., Duato, J.: An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors. J. Syst. Archit. 46(11), 1019–1032 (2000)CrossRefGoogle Scholar
  12. 12.
    Turner, J.S.: An optimal nonblocking multicast virtual circuit switch. In: 13th Proceedings IEEE of Networking for Global Communications, pp. 298–305. IEEE (1994)Google Scholar
  13. 13.
    Rothermel, K., Maihofer, C.: A robust and efficient mechanism for constructing multicast acknowledgement trees. In: Proceedings of Eight International Conference on Computer Communications and Networks, 1999, pp. 139–145. IEEE (1999)Google Scholar
  14. 14.
    Ramos, S., Hoefler, T.: Modeling communication in cache-coherent SMP systems - a case-study with Xeon Phi. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 97–108. ACM (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Stefan Nürnberger
    • 1
  • Randolf Rotta
    • 1
  • Gabor Drescher
    • 2
  • Daniel Danner
    • 2
  • Jörg Nolte
    • 1
  1. 1.Brandenburg University of Technology, Cottbus-SenftenbergCottbusGermany
  2. 2.Friedrich-Alexander University Erlangen-NurembergErlangenGermany

Personalised recommendations