Skip to main content
Log in

A calibrated asymptotic framework for analyzing packet classification algorithms on GPUs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Packet classification is a computationally intensive, highly parallelizable task in many advanced network systems like high-speed routers and firewalls. Recently, graphics processing units (GPUs) have been exploited as efficient accelerators for parallel implementation of software packet classifiers. However, due to the lack of a comprehensive analysis framework, none of the conducted studies to date has efficiently exploited the capabilities of the complex memory subsystem of such highly threaded machines. In this work, we combine asymptotic and calibrated analysis frameworks to present a more efficient framework that not only can boost the straightforward design of efficient parallel algorithms that run on different architectures of GPU but also can provide a powerful analysis tool for predicting any empirical result. Comparing analytical results with the experimental findings of ours and other researchers who have implemented and evaluated packet classification algorithms on a variety of GPUs evinces the efficiency of the proposed analysis framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Pao D, Lu Z (2014) A multi-pipeline architecture for high-speed packet classification. Comput Commun 54:84–96

    Article  Google Scholar 

  2. Tumari BS, Lakshmipriya W (2014) FPGA implementation of binary-tree-based high speed packet classification system. Int J Comb Res Dev 2:17–22

    Google Scholar 

  3. Zheng K, Che H, Wang Z, Liu B (2005) TCAM-based distributed parallel packet classification algorithm with range-matching solution. In: INFOCOM 2005, 24th Annual Joint Conference of the IEEE Computer and Communications Societies, 2005, pp 293–303

  4. Zheng K, Che H, Wang Z, Liu B, Zhang X (2006) DPPC-RE: TCAM-based distributed parallel packet classification with range encoding. IEEE Trans Comput 55:947–961

    Article  Google Scholar 

  5. Cao Z, Kodialam M, Lakshman T (2014) Traffic steering in software defined networks: planning and online routing. In: ACM SIGCOMM Computer Communication Review: SIGCOMM’14, vol 44, pp 65–70

  6. Guerra Perez K, Yang X, Scott-Hayward S, Sezer S (2014) A configurable packet classification architecture for software-defined networking. In: 27th IEEE International System-on-Chip Conference (SOCC), 2014, pp 353–358

  7. Han S, Jang K, Park K, Moon S (2011) PacketShader: a GPU-accelerated software router. ACM SIGCOMM Comput Commun Rev 41:195–206

    Google Scholar 

  8. Perez KG, Yang X, Scott-Hayward S, Sezer S (2014) Optimized packet classification for Software-Defined Networking. In: IEEE International Conference on Communications (ICC), 2014, pp 859–864

  9. Zhao Y, Chen L, Xie G, Zhao J, Ding J (2018) GPU implementation of a cellular genetic algorithm for scheduling dependent tasks of physical system simulation programs. J Comb Optim 35:293–317

    Article  MathSciNet  Google Scholar 

  10. Gong T, Fan T, Guo J, Cai Z (2017) GPU-based parallel optimization of immune convolutional neural network and embedded system. Eng Appl Artif Intell 62:384–395

    Article  Google Scholar 

  11. Przymus P, Kaczmarski K (2014) Dynamic compression strategy for time series database using GPU. In: New Trends in Databases and Information Systems. Springer, pp 235–244

  12. Ghidouche K, Sider A, Couturier R, Guyeux C (2017) Efficient high degree polynomial root finding using GPU. J Comput Sci 18:46–56

    Article  MathSciNet  Google Scholar 

  13. Taylor DE (2005) Survey and taxonomy of packet classification techniques. ACM Comput Surv 37:238–275

    Article  Google Scholar 

  14. Nakano K (2013) The hierarchical memory machine model for GPUs. In: IEEE 27th International Parallel and Distributed Processing Symposium Workshops & Ph.D. Forum (IPDPSW), 2013, pp 591–600

  15. Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. In: ACM SIGPLAN Notices, 2012, pp 11–22

  16. Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009 2009, pp 1–10

  17. Ma L, Chamberlain RD, Buhler JD, Franklin MA (2011) Bloom filter performance on graphics engines. In: International Conference on Parallel Processing (ICPP), 2011, pp 522–531

  18. Liu W, Schmidt B, Voss G, Muller-Wittig W (2007) Streaming algorithms for biological sequence alignment on GPUs. IEEE Trans Parallel Distrib Syst 18:1270–1281

    Article  Google Scholar 

  19. Bokhari SH, Bokhari SS (2013) A comparison of the Cray XMT and XMT-2. Concurr Comput Pract Exp 25:2123–2139

    Article  Google Scholar 

  20. Lim H, Lee S, Swartzlander EE Jr (2012) A new hierarchical packet classification algorithm. Comput Netw 56:3010–3022

    Article  Google Scholar 

  21. Varvello M, Laufer R, Zhang F, Lakshman T (2016) Multilayer packet classification with graphics processing units. IEEE/ACM Trans Netw 24:2728–2741

    Article  Google Scholar 

  22. NVIDIA (2018) NVIDIA CUDA (compute unified device architecture) programming guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. Accessed July 2018

  23. AMD: Global Provider of Innovative Graphics, Processors. http://www.amd.com. Accessed July 2018

  24. Li Y, Zhang D, Liu AX, Zheng J (2013) GAMT: a fast and scalable IP lookup engine for GPU-based software routers. In: Proceedings of the Ninth ACM/IEEE Symposium on Architectures for Networking and Communications Systems, 2013, pp 1–12

  25. Lin F, Wang G, Zhou J, Zhang S, Yao X (2016) High-performance IPv6 address lookup in GPU-accelerated software routers. J Netw Comput Appl 74:1–10

    Article  Google Scholar 

  26. Fernández JL, Ferreiro-Ferreiro AM, García-Rodríguez JA, Vázquez C (2018) GPU parallel implementation for asset-liability management in insurance companies. J Comput Sci 24:232–254

    Article  MathSciNet  Google Scholar 

  27. Vasiliadis G, Athanasopoulos E, Polychronakis M, Ioannidis S (2014) PixelVault: using GPUs for securing cryptographic operations. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014, pp 1131–1142

  28. Specifications of the NVIDIA Geforce GT 425M graphics card. https://www.geforce.com/hardware/notebook-gpus/geforce-gt-425m/specifications. Accessed July 2018

  29. Fortune S, Wyllie J (1978) Parallelism in random access machines. In: Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, 1978, pp 114–118

  30. Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33:103–111

    Article  Google Scholar 

  31. Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos KE, et al (1993) LogP: towards a realistic model of parallel computation. In: ACM Sigplan Notices, 1993, pp 1–12

  32. Kirtzic JS, Daescu O, Richardson T (2012) A parallel algorithm development model for the GPU architecture. In: Proceedings of Int’l Conference on Parallel and Distributed Processing Techniques and Applications, 2012

  33. Haque SA, Maza MM, Xie N (2014) A many-core machine model for designing algorithms with minimum parallelism overheads. arXiv preprint arXiv:1402.0264

  34. Nottingham A, Irwin B (2009) GPU packet classification using OpenCL: a consideration of viable classification methods. In: Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists, 2009, pp 160–169

  35. Hung C-L, Lin Y-L, Li K-C, Wang H-H, Guo S-W (2011) Efficient GPGPU-based parallel packet classification. In: Trust, Security and Privacy in Computing and Communications (TrustCom), 2011, pp 1367–1374

  36. Deng Y, Jiao X, Mu S, Kang K, Zhu Y (2011) NPGPU: network processing on graphics processing units. In: Theoretical and Mathematical Foundations of Computer Science. Springer, 2011, pp 313–321

  37. Kang K, Deng YS Scalable packet classification via GPU metaprogramming. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011, pp 1–4

  38. Zhou S, Singapura SG, Prasanna VK (2014) High-performance packet classification on GPU. In: High Performance Extreme Computing Conference (HPEC) 2014, pp 1–6

  39. Zheng J, Zhang D, Li Y, Li G (2015) Accelerate packet classification using GPU: a case study on HiCuts. In: Computer Science and Its Applications. Springer, 2015, pp 231–238

  40. Qu YR, Zhang HH, Zhou S, Prasanna VK (2015) Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU. In: Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for Networking and Communications Systems, 2015, pp 87–98

  41. Lee JH, Sim J, Kim H (2015) BSSync: processing near memory for machine learning workloads with bounded staleness consistency models. In: International Conference on Parallel Architecture and Compilation (PACT), 2015, pp 241–252

  42. Yang C-Q, Wu Q, Tang T, Wang F, Xue J-L (2013) Programming for scientific computing on peta-scale heterogeneous parallel systems. J Cent South Univ 20:1189–1203

    Article  Google Scholar 

  43. Cheng J, Grossman M, McKercher T (2014) Professional Cuda C programming. Wiley, London

    Google Scholar 

  44. Feng W-C, Xiao S To GPU synchronize or not GPU synchronize? In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), 2010, pp 3801–3804

  45. Milic U, Gelado I, Puzovic N, Ramirez A, Tomasevic M (2013) Parallelizing general histogram application for cuda architectures. In: International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013, pp 11–18

  46. Fan Z, Chen W, Vierimaa V, Harju A (2017) Efficient molecular dynamics simulations with many-body potentials on graphics processing units. Comput Phys Commun 218:10–16

    Article  Google Scholar 

  47. Liu L, Zhang Y, Liu M, Wang C, Wang J (2017) A-MapCG: an adaptive MapReduce framework for GPUs. In: International Conference on Networking, Architecture, and Storage (NAS) 2017, pp 1–8

  48. Maghazeh A, Bordoloi UD, Dastgeer U, Andrei A, Eles P, Peng Z (2017) Latency-aware packet processing on CPU–GPU heterogeneous systems. In: Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE, 2017, pp 1–6

  49. Amarıs M, Cordeiro D, Goldman A, de Camargo RY (2015) A simple BSP-based model to predict execution time in GPU applications. In: 22nd annual IEEE International Conference on High Performance Computing (HiPC 2015), 2015, pp 285–294

  50. Nakano K (2014) Simple memory machine models for GPUs. Int J Parallel Emerg Distrib Syst 29:17–37

    Article  Google Scholar 

  51. Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: ACM SIGARCH Computer Architecture News, 2009, pp 152–163

  52. Liu W, Müller-Wittig W, Schmidt B (2007) Performance predictions for general-purpose computation on GPUs. In: International Conference on Parallel Processing, ICPP 2007, p 50

  53. Muralidharan S, Roy A, Hall M, Garland M, Rai P (2016) Architecture-adaptive code variant tuning. ACM SIGPLAN Not 51:325–338

    Article  Google Scholar 

  54. Taylor DE, Turner JS (2005) Classbench: a packet classification benchmark. In: INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies, 2005, pp 2068–2079

  55. Specifications of the NVIDIA Geforce GTX 750 graphics card. https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-750/specifications. Accessed July 2018

  56. Hunter GM, Steiglitz K (1979) Operations on images using quad trees. IEEE Trans Pattern Anal Mach Intell 1:145–153

    Article  Google Scholar 

  57. Samet H (1990) Hierarchical spatial data structures. Springer, Berlin, pp 191–212

    Google Scholar 

  58. Berger L, Mariot JP, Launay C (1992) A new formulation for fast image coding using quadtree representation. Pattern Recognit Lett 13:425–432

    Article  Google Scholar 

  59. Wong W-T, Shih FY, Su T-F (2006) Thinning algorithms based on quadtree and octree representations. Inf Sci 176:1379–1394

    Article  MathSciNet  Google Scholar 

  60. Hou X, Han M, Gong C, Qian X (2015) SAR complex image data compression based on quadtree and zerotree coding in discrete wavelet transform domain: a comparative study. Neurocomputing 148:561–568

    Article  Google Scholar 

  61. Yuen CH, Lui OY, Wong KW (2013) Hybrid fractal image coding with quadtree-based progressive structure. J Vis Commun Image Represent 24:1328–1341

    Article  Google Scholar 

  62. Campos V, Sastre F, Yagües M, Bellver M, Giró-i-Nieto X, Torres J (2017) Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster. Procedia Comput Sci 108:315–324

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Abbasi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbasi, M., Rafiee, M. A calibrated asymptotic framework for analyzing packet classification algorithms on GPUs. J Supercomput 75, 6574–6611 (2019). https://doi.org/10.1007/s11227-019-02861-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02861-2

Keywords

Navigation