Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

HyBar: high efficient barrier synchronization based on a hybrid packet-circuit switching Network-on-Chip

  • 54 Accesses

Abstract

Realizing barrier synchronization in multi-/many-core processors with high efficiency becomes more and more challenging as the number of cores integrated in a single chip keeps growing. Quite a few barrier solutions have been proposed, while they provide limited improvements for synchronizing large amounts of cores or incur unfavorable restrictions on performing concurrent barriers. This paper presents HyBar, a hardware barrier based on a hybrid switching NoC which adopts packet switching and circuit switching methods in two sub-networks respectively. Dedicated channels in the circuit-switching sub-network are dynamically built and removed when barrier requests traverse the packet-switching sub-network according to a modified dimensionorder routing algorithm. The efficiency of inter-core communication for concurrent barriers is improved by merging barrier arrival requests and broadcasting release requests along the circuit channels. The execution time of synthetic cases, benchmark kernels and parallel applications using various barrier solutions are evaluated in an RTL-based simulation platform. Experimental results show that our proposal provides about 15%–50% performance improvement compared to previous solutions, while the hardware overhead is marginal under SMIC 40 nm technology. Moreover, HyBar introduces a minor efficiency loss for concurrent barriers with no limitation on their layouts of participating cores in the on-chip network.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Wilkinson B, Allen M. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Upper Saddle River: Prentice Hall, 2004

  2. 2

    Sartori J, Kumar R. Low-overhead, high-speed multi-core barrier synchronization. In: Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’10), Pisa, 2010. 18–34

  3. 3

    Shen X B. Evolution of MPP SoC architecture techniques. Sci China Ser F-Inf Sci, 2008, 51: 756–764

  4. 4

    Villa O, Palermo G, Silvano C. Efficiency and scalability of barrier synchronization on NoC based many-core architectures. In: Proceedings of International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’08), New York, 2008. 81–90

  5. 5

    Monchiero M, Palermo G, Silvano C, et al. Efficient synchronization for embedded on-chip multiprocessors. IEEE Trans Very Large Scale Integration Syst, 2006, 14: 1049–1062

  6. 6

    Xiao H, Wu N, Ge F, et al. Efficient synchronization for distributed embedded multiprocessors. IEEE Trans Very Large Scale Integration Syst, 2016, 24: 779–783

  7. 7

    Wei Z Q, Liu P L, Sun R D, et al. TAB barrier: hybrid barrier synchronization for NoC-based processors. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS’15), Lisbon, 2015. 409–412

  8. 8

    Chen X, Lu Z, Jantsch A, et al. Cooperative communication based barrier synchronization in on-chip mesh architectures. IEICE Electron Expr, 2011, 8: 1856–1862

  9. 9

    Chen X W, Lu Z, Jantsch A, et al. Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. IEICE Electron Expr, 2014, 11: 20140542

  10. 10

    Abellan J L, Fernandez J, Acacio M E, et al. Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs. In: Proceedings of Design, Automation Test in Europe Conference Exhibition (DATE’12), Dresden, 2012. 491–496

  11. 11

    Oh J, PrvulovicM, Zajic A. TLSync: support for multiple fast barriers using on-chip transmission lines. In: Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11), San Jose, 2011. 105–115

  12. 12

    Kumar A, Peh L S, Kundu P, et al. Express virtual channels: towards the ideal interconnection fabric. In: Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07), San Diego, 2007. 150–161

  13. 13

    Krishna T, Peh L S. Single-cycle collective communication over a shared network fabric. In: Proceedings of the 8th IEEE/ACM International Symposium on Networks-on-Chip (NoCS’14), Ferrara, 2014. 1–8

  14. 14

    Daneshtalab M, Ebrahimi M, Mohammadi S, et al. Low-distance path-based multicast routing algorithm for networkon- chips. IET Comput Digit Tech, 2009, 3: 430–442

  15. 15

    Modarressi M, Sarbazi-Azad H, Arjomand M. A hybrid packet-circuit switched on-chip network based on SDM. In: Proceedings of Conference on Design, Automation and Test in Europe (DATE’09), Nice, 2009. 566–569

  16. 16

    Lin J, Zhou W, Yu Z, et al. A hybrid router combining circuit switching and packet switching with virtual channels for on-chip networks. In: Proceedings of the 10th IEEE International Conference on ASIC (ASICON’13), Shenzhen, 2013. 1–4

  17. 17

    Abousamra A K, Melhem R G, Jones A K. Déjà Vu switching for multiplane NoCs. In: Proeedings of the 6th IEEE/ACM International Symposium on Networks on Chip (NoCS’12), Copenhagen, 2012. 11–18

  18. 18

    Ou P, Zhang J, Quan H, et al. A 65nm 39 GOPS/W 24-core processor with 11 Tb/s/W packet-controlled circuitswitched doublelayer network-on-chip and heterogeneous execution array. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC’13), San Francisco, 2013. 56–57

  19. 19

    Jerger N D E, Peh L S, Lipasti M H. Circuit-switched coherence. In: Proceedings of the 2nd IEEE/ACM International Symposium on Networks-on-Chip (NoCS’08), Newcastle upon Tyne, 2008. 193–202

  20. 20

    Chen G, Anders M A, Kaul H, et al. A 340 mV-to-0.9V 20.2 Tb/s source-synchronous hybrid packet/circuit-switched 16×16 network-on-chip in 22nm tri-gate CMOS. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC’14), San Francisco, 2014. 276–277

  21. 21

    Glass C J, Ni L M. The turn model for adaptive routing. In: Proceedings of the 19th Annual International Symposium on Computer Architecture (ISCA’92). New York: ACM, 1992. 278–287

  22. 22

    Becker D U. Efficient microarchitecture for network-on-chip routers. Dissertation for Ph.D. Degree. Palo Alto: Stanford University, 2012

  23. 23

    McMahon F H. Livermore Fortran Kernels: a Computer Test of Numerical Performance Range. Technical Report UCRL-53745. 1986

Download references

Acknowledgments

This work was partially supported by Equipment Pre-Research Foundation of China (Grant No. 9140A08010414JW03025).

Author information

Correspondence to Zhenqi Wei.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wei, Z., Liu, P. & Sun, R. HyBar: high efficient barrier synchronization based on a hybrid packet-circuit switching Network-on-Chip. Sci. China Inf. Sci. 60, 062402 (2017). https://doi.org/10.1007/s11432-016-0306-y

Download citation

Keywords

  • NoC
  • barrier synchronization
  • packet-circuit switching
  • concurrent barriers
  • routing algorithm