Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

GPU-Friendly Parallel Genome Matching with Tiled Access and Reduced State Transition Table


In this paper, we propose a new parallel genome matching algorithm using graphics processing units (GPUs). Our proposed approach is based on the Aho–Corasick algorithm and it was developed based on a consideration of the architectural features of existing GPUs with a hundred or more cores. Thus, we provide an appropriate task partitioning method that runs on multiple threads and we fully utilize the cache memory and the shared memory structures available in GPUs. Especially, we propose a tiled access method for rapid data transfer from the global memory to the shared memory. We also provide new models for cache-friendly state transition table to improve performance of pattern matching operations on GPUs. The maximum throughput we achieved in various experiments was 15.3 Gbps. Moreover, we showed that our proposed design outperformed an earlier approach with a 15.4 % performance improvement.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17


  1. 1.

    Snort Users Manual 2.6.1 (2006)

  2. 2.

    Clam AntiVirus 0.96 User Manual (2007)

  3. 3.

    NVIDIA CUDA Programming Guide 3.0. NVIDIA (2009)

  4. 4.

    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

  5. 5.

    Baghsorkhi, S.S., Gelado, I., Delahaye, M., Hwu, W.m.W.: Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors. In: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pp. 23–34. ACM, New York, NY, USA (2012). doi:10.1145/2145816.2145820. http://doi.acm.org/10.1145/2145816.2145820

  6. 6.

    Baker, Z.K., Prasanna, V.K.: A Methodology for Synthesis of Efficient Intrusion Detection Systems on FPGAs. In: Proceedings of 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM ’04), pp. 135–144. IEEE Computer Society (2004)

  7. 7.

    Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA Workloads Using a Detailed GPU Simulator. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), pp. 163–174. IEEE Computer Society (2009)

  8. 8.

    Boeva, V., Clement, J., Regnier, M., Roytberg, M., Makeev, V.: Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules. Algorithms Mol Biol 2(1), 13 (2007)

  9. 9.

    Bos, H., Huang, K.: A Network Intrusion Detection System on IXP1200 Network Processors with Support for Large Rule Sets. Leiden Universiteit, Technical report (2004)

  10. 10.

    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)

  11. 11.

    Brudno, M., Morgenstern, B.: Fast and sensitive alignment of large genomic sequences. In: Proceedings of 2002 Bioinformatics Conference, pp. 138–147. IEEE Computer Society (2002)

  12. 12.

    Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of 7th Annual International Conference on Research in Computational Molecular Biology, RECOMB ’03, pp. 67–75. ACM (2003)

  13. 13.

    Castelo, A.T., Martins, W., Gao, G.R.: TROLL-tandem repeat occurrence locator. Bioinformatics 18(4), 634–636 (2002)

  14. 14.

    Dandass, Y., Burgess, S., Lawrence, M., Bridges, S.: Accelerating string set matching in FPGA hardware for bioinformatics research. BMC Bioinformatics 9(1), 197–207 (2008)

  15. 15.

    Hong, S., Kim, H.: An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA ’09, pp. 152–163. ACM, New York (2009)

  16. 16.

    Knuth, D.E., James, H., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

  17. 17.

    Majumder, A., Rastogi, R., Vanama, S.: Scalable regular expression matching on data streams. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD ’08), pp. 161–172. ACM (2008)

  18. 18.

    Michael, M., Dieterich, C., Vingron, M.: Siteblast-rapid and sensitive local alignment of genomic sequences employing motif anchors. Bioinformatics 21(9), 2093–2094 (2005)

  19. 19.

    National Center for Biotechnology Information: Genbank (2010). http://www.ncbi.nlm.nih.gov/genbank

  20. 20.

    NVIDIA: Tuning CUDA Applications for Fermi 1.3 (2010)

  21. 21.

    Scarpazza, D.: Top-performance tokenization and small-ruleset regular expression matching. Int. J. Parallel Program. 39(1), 3–32 (2011)

  22. 22.

    Scarpazza, D., Villa, O., Petrini, F.: High-speed string searching against large dictionaries on the cell/B.E. Processor. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing, 2008 (IPDPS 2008), pp. 1–12. IEEE Computer Society (2008)

  23. 23.

    Tan, L., Sherwood, T.: A high throughput string matching architecture for intrusion detection and prevention. In: Proceedings of 32nd Annual International Symposium on Computer Architecture (ISCA ’05), pp. 112–122. IEEE Computer Society (2005)

  24. 24.

    Trapnell, C., Schatz, M.C.: Optimizing data intensive GPGPU computations for DNA sequence alignment. Parallel Comput. 35(8–9), 429–440 (2009)

  25. 25.

    Tumeo, A., Villa, O.: Accelerating DNA analysis applications on GPU clusters. In: Proceedings of 2010 IEEE 8th Symposium on Application Specific Processors (SASP), pp. 71–76. IEEE Computer Society (2010)

  26. 26.

    Tumeo, A., Villa, O., Sciuto, D.: Efficient pattern matching on GPUs for intrusion detection systems. In: Proceedings of 7th ACM International Conference on Computing Frontiers, CF ’10, pp. 87–88. ACM (2010)

  27. 27.

    University of California at Santa Cruz: Genome browser (2010). http://genome.ucsc.edu

  28. 28.

    Vasiliadis, G., Antonatos, S., Polychronakis, M., Markatos, E.P., Ioannidis, S.: Gnort: high performance network intrusion detection using graphics processors. In: Proceedings of 11th International Symposium on Recent Advances in Intrusion Detection (RAID ’08), pp. 116–134. Springer-Verlag (2008)

  29. 29.

    Vespa, L., Weng, N., Ramaswamy, R.: MS-DFA: multiple-stride pattern matching for scalable deep packet inspection. Comput J 54(2), 285–303 (2011)

  30. 30.

    Villa, O., Chavarria-Miranda, D., Maschhoff, K.: Input-independent, scalable and fast string matching on the cray XMT. In: Proceedings of IEEE International Symposium on Parallel Distributed Processing, 2009. IPDPS 2009, pp. 1–12 (2009)

  31. 31.

    Wu, S., Manber, U.: Agrep - A fast approximate pattern-matching tool. In: Proceedings of USENIX Technical Conference, pp. 153–162. USENIX (1992)

  32. 32.

    Xu, C., Kirk, S., Jenkins, S.: Tiling for performance tuning on different models of GPUs. In: Proceedings of 2009 Second International Symposium on Information Science and Engineering (ISISE), pp. 500–504 (2009)

  33. 33.

    Yang, Y.H.E., Jiang, W., Prasanna, V.K.: Compact architecture for high-throughput regular expression matching on FPGA. In: Proceedings of 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS ’08), pp. 30–39. ACM (2008)

  34. 34.

    Zhongqiang, C., Yuan, Z., Zhongrong, C., Alex, D.: A digest and pattern matching-based intrusion detection engine. Comput. J. 52(6), 699–723 (2009)

Download references


This work was supported by the Basic Science Research Program through the National Research Foundation of Korea, which is funded by the Ministry of Education, Science and Technology [2009-0070364].

Author information

Correspondence to Won W. Ro.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Oh, Y., Oh, D. & Ro, W.W. GPU-Friendly Parallel Genome Matching with Tiled Access and Reduced State Transition Table. Int J Parallel Prog 41, 526–551 (2013). https://doi.org/10.1007/s10766-012-0234-5

Download citation


  • Concurrent programming
  • Pattern matching
  • Graphics processors
  • Parallel processing