International Journal of Parallel Programming

, Volume 41, Issue 4, pp 526–551 | Cite as

GPU-Friendly Parallel Genome Matching with Tiled Access and Reduced State Transition Table

  • Yunho Oh
  • Doohwan Oh
  • Won W. RoEmail author


In this paper, we propose a new parallel genome matching algorithm using graphics processing units (GPUs). Our proposed approach is based on the Aho–Corasick algorithm and it was developed based on a consideration of the architectural features of existing GPUs with a hundred or more cores. Thus, we provide an appropriate task partitioning method that runs on multiple threads and we fully utilize the cache memory and the shared memory structures available in GPUs. Especially, we propose a tiled access method for rapid data transfer from the global memory to the shared memory. We also provide new models for cache-friendly state transition table to improve performance of pattern matching operations on GPUs. The maximum throughput we achieved in various experiments was 15.3 Gbps. Moreover, we showed that our proposed design outperformed an earlier approach with a 15.4 % performance improvement.


Concurrent programming Pattern matching Graphics processors Parallel processing 



This work was supported by the Basic Science Research Program through the National Research Foundation of Korea, which is funded by the Ministry of Education, Science and Technology [2009-0070364].


  1. 1.
    Snort Users Manual 2.6.1 (2006)Google Scholar
  2. 2.
    Clam AntiVirus 0.96 User Manual (2007)Google Scholar
  3. 3.
    NVIDIA CUDA Programming Guide 3.0. NVIDIA (2009)Google Scholar
  4. 4.
    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Baghsorkhi, S.S., Gelado, I., Delahaye, M., Hwu, W.m.W.: Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors. In: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP ’12, pp. 23–34. ACM, New York, NY, USA (2012). doi: 10.1145/2145816.2145820.
  6. 6.
    Baker, Z.K., Prasanna, V.K.: A Methodology for Synthesis of Efficient Intrusion Detection Systems on FPGAs. In: Proceedings of 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM ’04), pp. 135–144. IEEE Computer Society (2004)Google Scholar
  7. 7.
    Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA Workloads Using a Detailed GPU Simulator. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2009), pp. 163–174. IEEE Computer Society (2009)Google Scholar
  8. 8.
    Boeva, V., Clement, J., Regnier, M., Roytberg, M., Makeev, V.: Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules. Algorithms Mol Biol 2(1), 13 (2007)CrossRefGoogle Scholar
  9. 9.
    Bos, H., Huang, K.: A Network Intrusion Detection System on IXP1200 Network Processors with Support for Large Rule Sets. Leiden Universiteit, Technical report (2004)Google Scholar
  10. 10.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)zbMATHCrossRefGoogle Scholar
  11. 11.
    Brudno, M., Morgenstern, B.: Fast and sensitive alignment of large genomic sequences. In: Proceedings of 2002 Bioinformatics Conference, pp. 138–147. IEEE Computer Society (2002)Google Scholar
  12. 12.
    Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of 7th Annual International Conference on Research in Computational Molecular Biology, RECOMB ’03, pp. 67–75. ACM (2003)Google Scholar
  13. 13.
    Castelo, A.T., Martins, W., Gao, G.R.: TROLL-tandem repeat occurrence locator. Bioinformatics 18(4), 634–636 (2002)CrossRefGoogle Scholar
  14. 14.
    Dandass, Y., Burgess, S., Lawrence, M., Bridges, S.: Accelerating string set matching in FPGA hardware for bioinformatics research. BMC Bioinformatics 9(1), 197–207 (2008)CrossRefGoogle Scholar
  15. 15.
    Hong, S., Kim, H.: An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA ’09, pp. 152–163. ACM, New York (2009)Google Scholar
  16. 16.
    Knuth, D.E., James, H., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Majumder, A., Rastogi, R., Vanama, S.: Scalable regular expression matching on data streams. In: Proceedings of 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD ’08), pp. 161–172. ACM (2008)Google Scholar
  18. 18.
    Michael, M., Dieterich, C., Vingron, M.: Siteblast-rapid and sensitive local alignment of genomic sequences employing motif anchors. Bioinformatics 21(9), 2093–2094 (2005)CrossRefGoogle Scholar
  19. 19.
    National Center for Biotechnology Information: Genbank (2010).
  20. 20.
    NVIDIA: Tuning CUDA Applications for Fermi 1.3 (2010)Google Scholar
  21. 21.
    Scarpazza, D.: Top-performance tokenization and small-ruleset regular expression matching. Int. J. Parallel Program. 39(1), 3–32 (2011)CrossRefGoogle Scholar
  22. 22.
    Scarpazza, D., Villa, O., Petrini, F.: High-speed string searching against large dictionaries on the cell/B.E. Processor. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing, 2008 (IPDPS 2008), pp. 1–12. IEEE Computer Society (2008)Google Scholar
  23. 23.
    Tan, L., Sherwood, T.: A high throughput string matching architecture for intrusion detection and prevention. In: Proceedings of 32nd Annual International Symposium on Computer Architecture (ISCA ’05), pp. 112–122. IEEE Computer Society (2005)Google Scholar
  24. 24.
    Trapnell, C., Schatz, M.C.: Optimizing data intensive GPGPU computations for DNA sequence alignment. Parallel Comput. 35(8–9), 429–440 (2009)CrossRefGoogle Scholar
  25. 25.
    Tumeo, A., Villa, O.: Accelerating DNA analysis applications on GPU clusters. In: Proceedings of 2010 IEEE 8th Symposium on Application Specific Processors (SASP), pp. 71–76. IEEE Computer Society (2010)Google Scholar
  26. 26.
    Tumeo, A., Villa, O., Sciuto, D.: Efficient pattern matching on GPUs for intrusion detection systems. In: Proceedings of 7th ACM International Conference on Computing Frontiers, CF ’10, pp. 87–88. ACM (2010)Google Scholar
  27. 27.
    University of California at Santa Cruz: Genome browser (2010).
  28. 28.
    Vasiliadis, G., Antonatos, S., Polychronakis, M., Markatos, E.P., Ioannidis, S.: Gnort: high performance network intrusion detection using graphics processors. In: Proceedings of 11th International Symposium on Recent Advances in Intrusion Detection (RAID ’08), pp. 116–134. Springer-Verlag (2008)Google Scholar
  29. 29.
    Vespa, L., Weng, N., Ramaswamy, R.: MS-DFA: multiple-stride pattern matching for scalable deep packet inspection. Comput J 54(2), 285–303 (2011)CrossRefGoogle Scholar
  30. 30.
    Villa, O., Chavarria-Miranda, D., Maschhoff, K.: Input-independent, scalable and fast string matching on the cray XMT. In: Proceedings of IEEE International Symposium on Parallel Distributed Processing, 2009. IPDPS 2009, pp. 1–12 (2009)Google Scholar
  31. 31.
    Wu, S., Manber, U.: Agrep - A fast approximate pattern-matching tool. In: Proceedings of USENIX Technical Conference, pp. 153–162. USENIX (1992)Google Scholar
  32. 32.
    Xu, C., Kirk, S., Jenkins, S.: Tiling for performance tuning on different models of GPUs. In: Proceedings of 2009 Second International Symposium on Information Science and Engineering (ISISE), pp. 500–504 (2009)Google Scholar
  33. 33.
    Yang, Y.H.E., Jiang, W., Prasanna, V.K.: Compact architecture for high-throughput regular expression matching on FPGA. In: Proceedings of 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS ’08), pp. 30–39. ACM (2008)Google Scholar
  34. 34.
    Zhongqiang, C., Yuan, Z., Zhongrong, C., Alex, D.: A digest and pattern matching-based intrusion detection engine. Comput. J. 52(6), 699–723 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.Mobile Communication BusinessSamsung ElectronicsSuwonKorea
  2. 2.School of Electrical and Electronic EngineeringYonsei UniversitySeoulKorea

Personalised recommendations