Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Stealth-ACK: stealth transmissions of NoC acknowledgements

隐形ACK:片上网络ACK包的隐形传输

  • 78 Accesses

  • 2 Citations

Abstract

Network-on-Chip (NoC) is a promising replacement of bus architecture due to its better scalability. In state-of-the-art NoCs, each packet contains several fixed-length flits, which facilitates allocations of network resources but brings in many unused bits. In this paper, we propose a novel technique called Stealth-ACK to effectively address the above problem. Stealth-ACK leverages unused bits in head flits of non-ACK packets to carry and stealthily transmit ACK information. Such stealth transmissions of ACK information effectively reduce not only the amount of dedicated ACK packets on NoC, but also the number of unused bits in head flits of non-ACK packets, which significantly reduces wastes on NoC bandwidth. Experimental results show that Stealth-ACK averagely increases the throughput of 16 × 16 2-D mesh NoC by 11.9%, and averagely reduces the NoC latency by 34.8% on application traces of SPLASH-2. Moreover, Stealth-ACK only requires trivial hardware modification to basic router architectures, which incurs negligible power consumption and area cost.

创新点

首先, 我们提出的利用非ACK包的头微片中未被使用的位来传输ACK信息的方法可以和多种cache一致性协议无缝组合从而减少带宽浪费; 其次, 隐形ACK传输方法提供了灵活的模式(隐藏模式和暴露模式)用于传输ACK信息, 基于此, ACK信息和非ACK包的平均延迟都得到显著下降; 最后, 为了使用隐形ACK传输方法只需要在基本路由器结构中做简单的修改, 而这种修改带来的功耗和面积开销是可忽略的。

This is a preview of subscription content, log in to check access.

References

  1. 1

    Vangal S, Howard J, Ruhl G, et al. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In: Proceedings of International Solid-State Circuits Conference, San Francisco, 2007

  2. 2

    Wentzlaff D, Griffin P, Hoffmann H, et al. On-chip interconnection architecture of the tile processor. In: Proceedings of International Symposium on Microarchitecture, Chicago, Illinois, USA, 2007, 27: 15–31

  3. 3

    Dally W, Towles B. Principles and Practices of Interconnection Networks. San Francisco: Morgan Kaufmann Publishers Inc., 2003

  4. 4

    Benini L, De Micheli G. Networks on chip: a new paradigm for systems on chip design. In: Proceedings of Design, Automation and Test in Europe Conference and Exhibition, Paris, 2002. 418–419

  5. 5

    Dally W, Towles B. Route packets, not wires: on-chip interconnection networks. In: Proceedings of Design Automation Conference, Las Vegas, 2001. 684–689

  6. 6

    Gratz P, Kim C, McDonald R, et al. Implementation and evaluation of on-chip network architectures. In: Proceedings of International Conference on Computer Design, San Jose, 2006. 477–484

  7. 7

    Landin A, Hagersten E, Haridi S. Race-free interconnection networks and multiprocessor consistency. In: Proceedings of International Symposium on Computer Architecture, Toronto, 1991. 106–115

  8. 8

    Sanchez D, Michelogiannakis G, Kozyrakis C. An analysis of on-chip interconnection networks for large-scale chip multiprocessors. ACM Trans Architect Code Optim, 2010, 7: 4

  9. 9

    Bakhoda A, Kim J, Aamodt T. Throughput-effective on-chip networks for manycore accelerators. In: Proceedings of International Symposium on Microarchitecture, Atlanta, 2010. 421–432

  10. 10

    Kim G, Kim J, Yoo S. FlexiBuffer: reducing leakage power in on-chip network routers. In: Proceedings of Design Automation Conference, Pacifico Yokohama, 2011. 936–941

  11. 11

    Kim H, Kim G, Kim J. Scalable on-chip network in power constrained manycore processors. In: Proceedings of International Green Computing Conference, San Jose, 2012. 1–2

  12. 12

    Kim H, Ghoshal P, Grot B, et al. Reducing network-on-chip energy consumption through spatial locality speculation. In: Proceedings of International Symposium on Networks-on-Chip, Pittsburgh, 2011. 233–240

  13. 13

    Kim J. Low-cost router microarchitecture for on-chip networks. In: Proceedings of International Symposium on Mi-croarchitecture, New York City, 2009. 255–266

  14. 14

    Owens J, Dally W, Ho R, et al. Research challenges for on-chip interconnection networks. In: Proceedings of Interna-tional Symposium on Microarchitecture, Chicago, 2007. 27: 96–108

  15. 15

    Enright Jerger N D, Peh L S. On-Chip Networks. 1st ed. San Francisco: Morgan and Claypool Publishers, 2009

  16. 16

    Gratz P, Grot B, Keckler S. Regional congestion awareness for load balance in networks-on-chip. In: Proceedings of International Symposium on High Performance Computer Architecture, Salt Lake City, 2008. 203–214

  17. 17

    Ma S, Enright Jerger N B, Wang Z Y. DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip. In: Proceedings of International Symposium on Computer Architecture, San Jose, 2011. 413–424

  18. 18

    Woo S, Ohara M, Torrie E, et al. The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of International Symposium on Computer Architecture, Santa Margherita Ligure, 1995. 24–36

  19. 19

    Peh L S, Dally W. A delay model and speculative architecture for pipelined routers. In: Proceedings of International Symposium on High-Performance Computer Architecture, Nuevo Leone, 2001. 255–266

  20. 20

    Galles M. Spider: a high-speed network interconnect. In: Proceedings of International Symposium on Microarchitec-ture, Research Triangle Park, 1997. 34–39

  21. 21

    McKeown N. Whole packet forwarding: efficient design of fully adaptive routing algorithms for networks-on-chip. In: Proceedings of International Symposium on High Performance Computer Architecture, New Orleans, 2012. 1–12

  22. 22

    McKeown N. The islip scheduling algorithm for input-queued switches. IEEE/ACM Trans Netw, 1999, 7: 188–201

  23. 23

    Kumar A, Kundu P, Singhx A, et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In: Proceedings of International Conference on Computer Design, Lake Tahoe, 2007. 63–70

  24. 24

    Intel Corporation. A touchstone delta system description. 1991

  25. 25

    Miller J, Kasture H, Kurian G, et al. Graphite: a distributed parallel simulator for multicores. In: Proceedings of International Symposium on High Performance Computer Architecture, Bangalore, 2010. 1–12

  26. 26

    Kim C, Burger D, Keckler S. Nonuniform cache architectures for wire-delay dominated on-chip caches. In: Proceedings of International Symposium on Microarchitecture, San Diego, 2003. 99–107

  27. 27

    Kahng A, Li B, Peh L S, et al. ORION 2.0: a power-area simulator for interconnection networks. IEEE Trans Very Large Scale Integr Syst, 2012, 20: 191–196

  28. 28

    Li M, Zeng Q A, Jone W B. DyXY—a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In: Proceedings of Design Automation Conference, San Francisco, 2006. 849–852

  29. 29

    Singh A, Dally W, Gupta A, et al. GOAL: a load-balanced adaptive routing algorithm for torus networks. In: Pro-ceedings of International Symposium on Computer Architecture, San Diego, 2003. 194–295

  30. 30

    Jiang N, Kim J, Dally W J. Indirect adaptive routing on large scale interconnection networks. In: Proceedings of International Symposium on Computer Architecture, Austin, 2009. 220–231

  31. 31

    Das R, Mutlu O, Moscibroda T, et al. Aérgia: exploiting packet latency slack in on-chip networks. In: Proceedings of International Symposium on Computer Architecture, Saint-Malo, 2010

  32. 32

    Lee J, Shin M, Kim H, et al. Exploiting mutual awareness between prefetchers and on-chip networks in multi-cores. In: Proceedings of Parallel Architectures and Compilation Techniques, Galveston, 2011. 177–178

  33. 33

    Dally W, Aoki H. Deadlock-free adaptive routing in multicomputer networks using virtual channels. IEEE Trans Parallel Distr Syst, 1993, 4: 466–475

  34. 34

    Duato J. A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distr Syst, 1993, 4: 1320–1331

  35. 35

    Duato J. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distr Syst, 1995, 6: 1055–1067

  36. 36

    Duato J. A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. IEEE Trans Parallel Distr Syst, 1996, 7: 841–854

  37. 37

    Krishna T, Peh L S, Beckmann B M, et al. Towards the ideal on-chip fabric for 1-to-many and many-to-1 communi-cation. In: Proceedings of International Symposium on Microarchitecture, Porto Alegre, 2011. 71–82

  38. 38

    Badr H, Podar S. An optimal shortest-path routing policy for network computers with regular mesh-connected topolo-gies. IEEE Trans Comput, 1989, 38: 1362–1371

  39. 39

    Ted Nesson S L J. ROMM routing on mesh and torus networks. In: Proceedings of International Symposium on Parallelism in Algorithms and Architectures, Santa Barbara, 1995

Download references

Author information

Correspondence to Rui Mao.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tao, J., Qiu, S., Liu, S. et al. Stealth-ACK: stealth transmissions of NoC acknowledgements. Sci. China Inf. Sci. 60, 092102 (2017). https://doi.org/10.1007/s11432-015-0328-y

Download citation

Keywords

  • networks-on-chip
  • acknowledgement packet
  • router architecture
  • optimisation
  • routing algorithm

关键词

  • 片上网络
  • ACK包
  • 路由器结构
  • 优化
  • 路由算法