NPGPU: Network Processing on Graphics Processing Units

  • Yangdong Deng
  • Xiaomemg Jiao
  • Shuai Mu
  • Kang Kang
  • Yuhao Zhu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 164)


The Internet is still expanding despite its already unprecedented complexity. To meet the ever-increasing bandwidth requirements under fast appearing new services and applications, today’s Internet routers and other key network devices are challenged by two conflicting requirements, high performance and good programmability. In this work, we propose a series of data-parallel algorithms that can be efficiently implemented on modern graphics processing units (GPUs). Experimental results proved that the GPU could serve as an excellent packet processing platform by significantly outperforming CPU on typical router applications. On such a basis, we proposed a hybrid microarchitecture by integrating both CPU and GPU. Besides dramatically enhancing packet throughput, the integrated microarchitecture could also optimize quality-of-service metrics, which is also of key importance for network applications. Our work suggests that an integrated CPU/GPU architecture provides a promising solution for implementing future network processing hardware.


GPU router table lookup packet classification meta-programming deep packet inspection Bloom filter DFA Software Router QoS 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Shin, M., Kim, Y.: New Challenges on Future Network and Standardization. Advanced Communication Technology 1, 754–759 (2008)Google Scholar
  3. 3.
    Varghese, G.: Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  4. 4.
    Chao, H.J., Liu, B.: High Performance Switches and Routers. Wiley Interscience, Hoboken (2007)CrossRefGoogle Scholar
  5. 5.
    The Snort Project: Snort users manual 2.8.0,
  6. 6.
    Kulkarni, C., Gries, M., Sauer, C., Keutzer, K.: Programming Challenges in Network Processor Deployment. In: International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 178–187 (2003)Google Scholar
  7. 7.
    De Carli, et al.: PLUG: Flexible Lookup Modules for Rapid Deployment of New Protocols in High-speed Routers. In: SIGCOMM, Barcelona, Spain (2009)Google Scholar
  8. 8.
    Blythe, D.: Rise of the Graphics Processor. Proceedings of IEEE 96, 761–778 (2008)CrossRefGoogle Scholar
  9. 9.
    NVIDIA: CUDA Programming Guide 2.3 (2009)Google Scholar
  10. 10.
    Mu, S., et al.: IP Routing Processing with Graphic Processors. In: Design Automation and Test in Europe, Dresden, Germany (2010)Google Scholar
  11. 11.
    Han, S., Jang, K., Park, K.S., Moon, S.: PacketShader: a GPU-Accelerated Software Router. In: SIGCOMM, New Delhi, India (2010)Google Scholar
  12. 12.
    Kang, K., Deng, Y.: Scalable Packet Classification via GPU Metaprogramming. In: Design Automation and Test in Europe, Grenoble, France (2010)Google Scholar
  13. 13.
    PCI-SIG: PCIe® Base 3.0 Specification,
  14. 14.
    Luo, Y., Bhuyan, L., Chen, X.: Shared Memory Multiprocessor Architectures for Softwre IP Routers. IEEE Transaction On Parallel and Distributed Systems 14, 1240–1249 (2003)CrossRefGoogle Scholar
  15. 15.
    MAWI Working Group: Traffic Archive,
  16. 16.
    ClassBench, ClassBench: A Packet Classification Benchmark,
  17. 17.
    Routing Information Service (RIS),
  18. 18.
  19. 19.
    Bloom, B.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communication of the ACM 13, 422–426 (1970)CrossRefzbMATHGoogle Scholar
  20. 20.
    Aho, A.V., Corasick, M.J.: Efficient String Matching: an aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Taylor, D.E.: Survey and Taxonomy of Packet Classification Techniques. ACM Computing Surveys 37, 238–275 (2005)CrossRefGoogle Scholar
  22. 22.
    Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Therory, Languages, and Computation. Addison-Wesley, Reading (2000)zbMATHGoogle Scholar
  23. 23.
    Sugawara, Y., Inaba, M., Hiraki, K.: Over 10Gbps String Matching Mechanism for Multi-stream Packet Scanning Systems. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 484–493. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yangdong Deng
    • 1
  • Xiaomemg Jiao
    • 1
  • Shuai Mu
    • 1
  • Kang Kang
    • 1
  • Yuhao Zhu
    • 2
  1. 1.Institute of MicroelectronicsTsinghua UniversityBeijingChina
  2. 2.Electrical and Computer Engineering DepartmentUniversity of Texas at AustinAustinUSA

Personalised recommendations