Fast Multi-Keyword Range Search Using GPGPU

  • Amirul AbdullahEmail author
  • Amril Nazir
  • Mohanavelu Senapan
  • Soo Saw Meng
  • Ettikan Karuppiah


Large organisations are constantly challenged by the need to handle big data. Big data sizes are a constantly moving target, as of 2013 ranging from a few dozen terabytes to many petabytes of data. The data is usually stored in very large databases that are often indexed off-line to enable the acceleration of on-line searches. More recently, the p-ary algorithm has been proposed to exploit the massively parallel architecture of graphics processors (GPUs) to substantially accelerate the search operations on such large databases. In this chapter we present a multi-keyword range search technique that efficiently exploits index data structures to search multiple text keywords in large databases. The multi-keyword range search is an extension of the p-ary algorithm which was originally developed by Kaldewey et al. We enhanced the p-ary algorithm to support multi-keyword search on GPGPU. We compare the performance in terms of response time, throughput and speed-ups between CPU and GPGPU implementations. The performance benchmarks demonstrated that our algorithm achieves up to 25× and 6× performance in terms of speed-up on Tesla K20c GPU card when compared to a single and multicore CPU implementations, respectively.


GPGPU CUDA GPU P-ary Multi-keyword search Binary search 



The research was done under Joint Lab, NVIDIA-HP-MIMOS GPU R&D and Solution Center. This is the first GPU solution centre in South East Asia. Funding for the work came from MOSTI, Malaysia. The authors would like to thank Prof. Simon See and Pradeep Gupta from NVIDIA for the supports. We also would like to acknowledge assistance and help provided by Zakiah Zulkefli from Universiti Sains Malaysia during her internship at MIMOS Berhad.


  1. 1.
    Kaldewey, T., Hagen, J., Blas, A.D., Sedlar, E.: Parallel search on video cards. In: HotPar’09: Proceedings of the First USENIX conference on Hot topics in parallelism, p. 9. (2009).
  2. 2.
    Souders, S.: High-performance web sites. CACM, pp. 36–41 (2008)Google Scholar
  3. 3.
    Boyer, R.S., Strother Moore, J.: A fast string searching algorithm. Commun. ACM 20(10), 762–772, (1977).  10.1145/359842.359859
  4. 4.
    Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Sci. Comput. 6(2), 323–350 (1977)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 249–260 (1987). doi: 10.1147/rd.312.0249
  6. 6.
    Kouzinopoulos, C.S., Margaritis, K.G.: String matching on a multicore GPU using CUDA. In: Proceedings of the 2009 13th Panhellenic Conference on Informatics, pp. 14–18. (2009). doi: 10.1109/pci.2009.47
  7. 7.
    Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 182–188 (2011).
  8. 8.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM. 18(9), 509–517, (1975). doi: 10.1145/361002.361007
  9. 9.
    Nievergelt, J.: Binary search trees and file organization. In: ACM Computing Surveys (CSUR), pp. 195–207 (1974)Google Scholar
  10. 10.
    Yao, A.C-C.: Should tables be sorted. J. ACM 28(3), 615–628 (1981). doi:  10.1145/322261.322274
  11. 11.
    Hwu, W.W: GPU Computing Gems Jade Edition. Elsevier Science. (2011).
  12. 12.
    He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD ‘08), pp. 511–524. ACM, New York, NY. (2008). doi: 10.1145/1376616.1376670
  13. 13.
    Gregg, C., Hazelwood, K.: Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In: 2013 I.E. International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 134–144 (2011)Google Scholar
  14. 14.
    Fang, W., Lau, K.K., Lu, M., Xiao, X., Lam, C.K., Yang, P.Y., He, B., Luo, Q., Sander, P.V., Yang, K.: Parallel Data Mining on Graphics Processors. Technical Report HKUST-CS08-07, Hong Kong University of Science and Technology (HKUST) (2008)Google Scholar
  15. 15.
    Lustig, D., Martonosi, M.: Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization. In: HPCA, pp. 354–365 (2013)Google Scholar
  16. 16.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using {CUDA}. J. Parallel Distrib. Comput. 1370–1380 (2008). General-Purpose Processing using Graphics Processing Units

Copyright information

© Springer Science+Business Media Singapore 2015

Authors and Affiliations

  • Amirul Abdullah
    • 1
    Email author
  • Amril Nazir
    • 1
  • Mohanavelu Senapan
    • 1
  • Soo Saw Meng
    • 1
  • Ettikan Karuppiah
    • 1
  1. 1.MIMOS BerhadKuala LumpurMalaysia

Personalised recommendations