Multi-stream Parallel String Matching on Kepler Architecture

  • Nhat-Phuong Tran
  • Myungho Lee
  • Sugwon Hong
  • Dong Hoon Choi
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 274)


Aho-Corasick (AC) algorithm is a commonly used string matching algorithm. It performs multiple patterns matching for computer and network security, bioinformatics, among many other applications. These applications impose high computational requirements, thus efficient parallelization of the AC algorithm is crucial. In this paper, we present a multi-stream based parallelization approach for the string matching using the AC algorithm on the latest Nvidia Kepler architecture. Our approach efficiently utilizes the HyperQ feature of the Kepler GPU so that multiple streams generated from a number of OpenMP threads running on the host multicore processor can be efficiently executed on a large number of fine-grain processing cores. Experimental results show that our approach delivers up to 420Gbps throughput performance on Nvidia Tesla K20 GPU.


string matching Kepler GPU multi-stream HyperQ multithreading 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 20(Session 10), 761–772 (1977)Google Scholar
  2. 2.
    Jacob, N., Brodley, C.: Offloading IDS Computation to the GPU. In: The 22nd Annual Computer Security Applications Conference (2006)Google Scholar
  3. 3.
    Lin, C.-H., Tsai, S.-Y., Liu, C.-H., Chang, S.-C., Shyu, J.-M.: Accelerating String Matching Using Multi-Threaded Algorithm on GPU. In: 2010 IEEE Global Telecommunications Conference, GLOBECOM 2010, December 6-10, pp. 1–5 (2010)Google Scholar
  4. 4.
    Norton, M.: Optimizing Pattern Matching for Intrusion Detection (July 2004),
  5. 5.
    NVIDIA, CUDA Best Practices Guide: NVIDIA CUDA C Programming Best Practices Guide – CUDA Toolkit 4.0 (May 2011)Google Scholar
  6. 6.
  7. 7.
    OpenACC (March 2012),
  8. 8.
  9. 9.
    Saavedra-Barrera, R.H., Culler, D.E., von Eicken, T.: Analysis of multithreaded architectures for parallel computing. In: ACM Symposium on Parallel Algorithms and Architectures - SPAA, pp. 169–178 (1990)Google Scholar
  10. 10.
    Scarpazza, D., Villa, O., Petrini, F.: Peak-Performance DFA-based String Matching on the Cell Processor. In: International Workshop on System Management Techniques, Processes, and Services (2007)Google Scholar
  11. 11.
    Scarpazza, D., Villa, O., Petrini, F.: Accelerating Real-Time String Searching with Multicore Processors. IEEE Computer Society (2008)Google Scholar
  12. 12.
    Schatz, M.C., Trapnell, C.: Fast Exact String Matching on the GPU. Center for Bioinformatics and Computational Biology (2007)Google Scholar
  13. 13.
    Sen, S.: Performance Charaterization and Improvement of Snort as an IDS (August 2006),
  14. 14.
    Smith, R., Goyal, N., Ormont, J., Sankaralingam, K., Estan, C.: Evaluating GPUs for Network Packet Signature Matching. In: IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2009, April 26-28, pp. 175–184 (2009)Google Scholar
  15. 15.
    Tumeo, A., Villa, O.: Accelerating DNA analysis applications on GPU clusters. In: 2010 IEEE 8th Symposium on Application Specific Processors (SASP), June 13-14, pp. 71–76 (2010)Google Scholar
  16. 16.
    Tumeo, A., Villa, O.: Efficient Pattern Matching on GPUs for Intrusion Detection Systems. In: Proceedings of the 7th ACM International Conference on Computing Frontiers (2010)Google Scholar
  17. 17.
    Vasiliadis, G., Antonatos, S., Polychronakis, M., Markatos, E.P., Ioannidis, S.: Gnort: High Performance Network Intrusion Detection Using Graphics Processors. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 116–134. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Volkov, V., Demmel, J.W.: Benchmarking GPUs to Tune Dense Linear Algebra. In: SC 2008, pp. Art.31:1–31:11 (November 2008)Google Scholar
  19. 19.
    White paper, NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK 110 The Fastest, Most Efficient HPC Architecture Ever Built, Nvidia (2012)Google Scholar
  20. 20.
    Zha, X., Sahni, S.: Multipattern string matching on a GPU. In: IEEE Symposium on Computers and Communications (ISCC), June 28-July 1, pp. 277–282 (2011)Google Scholar
  21. 21.
    Zha, X., Scarpazza, D., Sahni, S.: Highly Compressed Multi-pattern String Matching on the Cell Broadband Engine. In: IEEE Symposium on Computers and Communications (ISCC), June 28-July 1, pp. 257–264 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Nhat-Phuong Tran
    • 1
  • Myungho Lee
    • 1
  • Sugwon Hong
    • 1
  • Dong Hoon Choi
    • 2
  1. 1.Department of Computer Science and EngineeringMyongji UniversityKyung Ki DoKorea
  2. 2.Korea Institute of Science and Technology Information (KISTI)DaejeonKorea

Personalised recommendations