HyBar: high efficient barrier synchronization based on a hybrid packet-circuit switching Network-on-Chip

Abstract

Realizing barrier synchronization in multi-/many-core processors with high efficiency becomes more and more challenging as the number of cores integrated in a single chip keeps growing. Quite a few barrier solutions have been proposed, while they provide limited improvements for synchronizing large amounts of cores or incur unfavorable restrictions on performing concurrent barriers. This paper presents HyBar, a hardware barrier based on a hybrid switching NoC which adopts packet switching and circuit switching methods in two sub-networks respectively. Dedicated channels in the circuit-switching sub-network are dynamically built and removed when barrier requests traverse the packet-switching sub-network according to a modified dimensionorder routing algorithm. The efficiency of inter-core communication for concurrent barriers is improved by merging barrier arrival requests and broadcasting release requests along the circuit channels. The execution time of synthetic cases, benchmark kernels and parallel applications using various barrier solutions are evaluated in an RTL-based simulation platform. Experimental results show that our proposal provides about 15%–50% performance improvement compared to previous solutions, while the hardware overhead is marginal under SMIC 40 nm technology. Moreover, HyBar introduces a minor efficiency loss for concurrent barriers with no limitation on their layouts of participating cores in the on-chip network.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Wilkinson B, Allen M. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Upper Saddle River: Prentice Hall, 2004

    Google Scholar 

  2. 2

    Sartori J, Kumar R. Low-overhead, high-speed multi-core barrier synchronization. In: Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’10), Pisa, 2010. 18–34

    Google Scholar 

  3. 3

    Shen X B. Evolution of MPP SoC architecture techniques. Sci China Ser F-Inf Sci, 2008, 51: 756–764

    Article  MATH  Google Scholar 

  4. 4

    Villa O, Palermo G, Silvano C. Efficiency and scalability of barrier synchronization on NoC based many-core architectures. In: Proceedings of International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’08), New York, 2008. 81–90

    Google Scholar 

  5. 5

    Monchiero M, Palermo G, Silvano C, et al. Efficient synchronization for embedded on-chip multiprocessors. IEEE Trans Very Large Scale Integration Syst, 2006, 14: 1049–1062

    Article  Google Scholar 

  6. 6

    Xiao H, Wu N, Ge F, et al. Efficient synchronization for distributed embedded multiprocessors. IEEE Trans Very Large Scale Integration Syst, 2016, 24: 779–783

    Article  Google Scholar 

  7. 7

    Wei Z Q, Liu P L, Sun R D, et al. TAB barrier: hybrid barrier synchronization for NoC-based processors. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS’15), Lisbon, 2015. 409–412

    Google Scholar 

  8. 8

    Chen X, Lu Z, Jantsch A, et al. Cooperative communication based barrier synchronization in on-chip mesh architectures. IEICE Electron Expr, 2011, 8: 1856–1862

    Article  Google Scholar 

  9. 9

    Chen X W, Lu Z, Jantsch A, et al. Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. IEICE Electron Expr, 2014, 11: 20140542

    Article  Google Scholar 

  10. 10

    Abellan J L, Fernandez J, Acacio M E, et al. Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs. In: Proceedings of Design, Automation Test in Europe Conference Exhibition (DATE’12), Dresden, 2012. 491–496

    Google Scholar 

  11. 11

    Oh J, PrvulovicM, Zajic A. TLSync: support for multiple fast barriers using on-chip transmission lines. In: Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11), San Jose, 2011. 105–115

    Google Scholar 

  12. 12

    Kumar A, Peh L S, Kundu P, et al. Express virtual channels: towards the ideal interconnection fabric. In: Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07), San Diego, 2007. 150–161

    Google Scholar 

  13. 13

    Krishna T, Peh L S. Single-cycle collective communication over a shared network fabric. In: Proceedings of the 8th IEEE/ACM International Symposium on Networks-on-Chip (NoCS’14), Ferrara, 2014. 1–8

    Google Scholar 

  14. 14

    Daneshtalab M, Ebrahimi M, Mohammadi S, et al. Low-distance path-based multicast routing algorithm for networkon- chips. IET Comput Digit Tech, 2009, 3: 430–442

    Article  Google Scholar 

  15. 15

    Modarressi M, Sarbazi-Azad H, Arjomand M. A hybrid packet-circuit switched on-chip network based on SDM. In: Proceedings of Conference on Design, Automation and Test in Europe (DATE’09), Nice, 2009. 566–569

    Google Scholar 

  16. 16

    Lin J, Zhou W, Yu Z, et al. A hybrid router combining circuit switching and packet switching with virtual channels for on-chip networks. In: Proceedings of the 10th IEEE International Conference on ASIC (ASICON’13), Shenzhen, 2013. 1–4

    Google Scholar 

  17. 17

    Abousamra A K, Melhem R G, Jones A K. Déjà Vu switching for multiplane NoCs. In: Proeedings of the 6th IEEE/ACM International Symposium on Networks on Chip (NoCS’12), Copenhagen, 2012. 11–18

    Google Scholar 

  18. 18

    Ou P, Zhang J, Quan H, et al. A 65nm 39 GOPS/W 24-core processor with 11 Tb/s/W packet-controlled circuitswitched doublelayer network-on-chip and heterogeneous execution array. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC’13), San Francisco, 2013. 56–57

    Google Scholar 

  19. 19

    Jerger N D E, Peh L S, Lipasti M H. Circuit-switched coherence. In: Proceedings of the 2nd IEEE/ACM International Symposium on Networks-on-Chip (NoCS’08), Newcastle upon Tyne, 2008. 193–202

    Google Scholar 

  20. 20

    Chen G, Anders M A, Kaul H, et al. A 340 mV-to-0.9V 20.2 Tb/s source-synchronous hybrid packet/circuit-switched 16×16 network-on-chip in 22nm tri-gate CMOS. In: Proceedings of IEEE International Solid-State Circuits Conference (ISSCC’14), San Francisco, 2014. 276–277

    Google Scholar 

  21. 21

    Glass C J, Ni L M. The turn model for adaptive routing. In: Proceedings of the 19th Annual International Symposium on Computer Architecture (ISCA’92). New York: ACM, 1992. 278–287

    Google Scholar 

  22. 22

    Becker D U. Efficient microarchitecture for network-on-chip routers. Dissertation for Ph.D. Degree. Palo Alto: Stanford University, 2012

    Google Scholar 

  23. 23

    McMahon F H. Livermore Fortran Kernels: a Computer Test of Numerical Performance Range. Technical Report UCRL-53745. 1986

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by Equipment Pre-Research Foundation of China (Grant No. 9140A08010414JW03025).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhenqi Wei.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wei, Z., Liu, P. & Sun, R. HyBar: high efficient barrier synchronization based on a hybrid packet-circuit switching Network-on-Chip. Sci. China Inf. Sci. 60, 062402 (2017). https://doi.org/10.1007/s11432-016-0306-y

Download citation

Keywords

  • NoC
  • barrier synchronization
  • packet-circuit switching
  • concurrent barriers
  • routing algorithm