Hunting the pertinency of hash and bloom filter combinations on GPU for fast pattern matching

Bhat, Radhakrishna; Thilak, Reddy Kanala; Vaibhav, Reddy Panyala

doi:10.1007/s41870-022-00964-3

Hunting the pertinency of hash and bloom filter combinations on GPU for fast pattern matching

Original Research
Open access
Published: 17 May 2022

Volume 14, pages 2667–2679, (2022)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Information Technology Aims and scope Submit manuscript

Hunting the pertinency of hash and bloom filter combinations on GPU for fast pattern matching

Download PDF

Radhakrishna Bhat¹,
Reddy Kanala Thilak¹ &
Reddy Panyala Vaibhav¹

1863 Accesses
1 Citation
Explore all metrics

Abstract

There has been rapid growth in the field of graphical processing unit (GPU) programming due to the drastic increase in the computing hardware manufacturing. The technology used in these devices is now more affordable and accessible to the general public. With this growth, many serial programming applications that are now being transformed into more efficient parallel programming applications with significant improvement in the performance. The best example for this is parallel implementation of the probabilistic data structure Bloom filter in set membership queries. However, despite of it’s remarkable performance in speed and memory usage, there is a computational overhead in the calculation of hashes in Bloom filter. In this paper, the impact of the choice of hash functions on the qualitative properties of the Bloom filter has been experimentally recorded and the results show that there is a possibility of large performance gap among various hash functions. We have implemented the Bloom filter based pattern matching technique on GPU using compute unified device architecture (CUDA) and benchmark the performance of several cryptographic and non-cryptographic hash functions.

Efficient Parallel Knuth-Morris-Pratt Algorithm for Multi-GPUs with CUDA

GPU-accelerated string matching for database applications

Article 12 November 2015

Efficient Pattern Matching on CPU-GPU Heterogeneous Systems

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The Bloom filter [2] is a fast and memory-efficient probabilistic data structure that can be used to test set membership. The Bloom filter with m bits and k hash functions has a time complexity of ${\mathcal {O}}(k)$ for both insertion and query operations. This complexity is entirely independent of the number of elements that are already inserted in the set. The Bloom filter is significantly better in terms of space complexity when compared to standard storage data structures because it doesn’t need to store the actual element but it stores the hash of each element. The only significant computational overhead involved in the functioning of a Bloom filter is hashing. The selection of the appropriate hash functions and an optimal number of hash functions affects the probability of false positives. The selected hash functions must be independent, uniformly distributed, and fast. The simple non-cryptographic hash functions will satisfy these criteria and provide a significant performance improvement. In contrast, cryptographic hash functions provide additional stability at the cost of computation and therefore they are computationally expensive.

But, most of the research works involving the implementation of Bloom filters only focus on developing extensions on Bloom filter instead of focusing on the hash function used in it. Furthermore, the majority of these implementations have been developed on CPUs. This paper focuses on two different dimensions of standard Bloom filter to improve it’s overall performance—choosing the correct hash functions and selecting appropriate hardware. We have clearly shown the non-intuitiveness of best performance of Bloom filter without considering the underlying hash functions. The insertion of an element in Bloom filter requires k hashes to be computed, which occurs sequentially on CPUs. The calculation of these hash functions can be done in parallel without impacting the Bloom filter’s qualitative properties.

We have used a modified version of the One-Hashing algorithm [19] to generate k different hashes using only one hash function to make it possible to compare various hashes independently. We have mainly focused implementing standard Bloom filter by parallelizing the hash function computation and compared various cryptographic and non-cryptographic hash functions, and documented the impact on different parameters of the Bloom filter. In this work, our goal is to create a standard hash function reference that can be used to choose the appropriate hash and the hardware environment in which the Bloom filter is deployed.

2 Related work

This section describes the existing schemes on Bloom filter applications in variety of domain such as bioinformatics, pattern matching, database storage, networking and security etc.

2.1 Bioinformatics

Liu et al. [18] introduced the first use of hybrid mode of CPU and GPU computations in DNS sequencing error detection using high throughput short read (HTSR) dataset. Ma et al. [21] have proposed a parallel bloom filter pattern search method for genome biosequence alignment applications using a standard bit vector bloom filter. Heo et al. [34] introduced the use of hybrid filters at different phases of the DNA sequencing. Refer Tables 1 and 2 for summary of all the above mentioned schemes.

Table 1 The performance comparison of existing schemes in bioinformatics-1

Full size table

Table 2 The performance comparison of existing schemes in bioinformatics-2

Full size table

Table 3 The performance comparison of existing schemes in Patter Matching-1

Full size table

Table 4 The performance comparison of existing schemes in Patter Matching-2

Full size table

2.2 Pattern matching

Moraru and Andersen [22] improved Rabin–Karp pattern search method by introducing two types of Bloom filters—(i) cache-partitioned (ii) feed-forward. Ong et al. [24] compare the computational improvement of CUDA implementation versus a multi-core implementation of a string searching algorithm using Bloom filters. Lin et al. [17] introduces a new GPU-assisted string matching method called parallel failureless-Aho-Corasick Algorithm (PFAC) and its variants which exhibit better execution performance over Aho-Corasick algorithm. Zhang et al. [37] has extended the basic operation of Bloom filter and exploited the parallel capabilities of graphical processing unit to get improved performance over existing methods. Hayashikawa et al. [12] have proposed slightly modified form of classic Bloom filter called folded Bloom filter to overcome the high false positive rate of blocked Bloom filter [27] for pattern matching applications.

Sachendra and Shalini [3] showcase the superior performance of a similarity search using standard bloom filter on GPU. Even though the proposed technique claims better running time performance over standard Dice coefficient similarity search [6, 29] method, there are three fundamental problems with this technique (i) entire input will be pre-stored on GPU global memory, (ii) the input text document must be converted to shingles for feature extraction, (iii) the input query must be represented as an integer array. Wada et al. [31] have constructed a circuit-level pattern matching Bloom filter using Block RAM and Ulter RAM combinations in restricted memory devices such as in field programmable gate array (FPGA). In this work, multiple rolling hash functions [15] are used to restrict the false-probability rate and to obtain speedy pattern matching results. Refer Tables 3 and 4 for summary of all the above mentioned schemes.

2.3 Database storage

Gubner et al. [10] propose a heterogenous query processing framework based on fluid-coprocessing where tasks of different sizes that can fit into memory are dynamically processed together. Tim et al. [11] have demonstrated the use of Bloom filter in accelerating the join operation in the database to achieve better early pruning in join queries.

2.4 Networking and security

Gholami et al. [9] utilize three standard Bloom filters along with GPU programming in order to increase the network packet classification speed and decrease memory consumption. The open-source modular router called Click [23] is utilized to perform packet classification. Dyumin et al. [7] focus on the implementation of a modified counter Bloom filter and showcases the results of statistical analysis with different varying parameters like number of hash functions, length of the counter, filter size, number of input elements. Hung et al. [13] propose a solution for accelerating traditional network intrusion detection systems by using Bloom filters through the implementation of a GPU-based multiple pattern matching algorithm for packet filtering. Xiong et al. [32] showcase the implementation of a probabilistic Bloom filter on GPU to analyze the frequency of traffic flow in a network.

2.5 Miscellaneous

Dharmapurikar et al. [5] have identified the inability of software-based network monitoring systems to cope up with the line speed of the network in detecting special type of patterns (or attacks) in network packets. To overcome this, authors have proposed a field programmable gate array (FPGA)-based network monitoring system using counting bloom filter using universal hash functions [28]. Costa et al. [4] primarily focused to offload computation load of a classic Bloom filter on a separate co-processor like GPU. Zhang et al. [36] focus on minimizing the time required to process the intersection of sorted inverted lists using GPU programming. The algorithm assigns every unique document in a list to a unique GPU thread that uses a parallel binary search algorithm that takes $\mathcal {O}$(log(n)) memory accesses. Iacob et al. [14] propose a solution for information retrieval which is a logical extension for pattern matching using GPUs. Yao et al. [35] proposed the first probabilistic Bloom filter (PBF) which can probabilistically flip the filter bits with some probability. Sisi et al. [33] have proposed first GPU-assisted probabilistic counting Bloom filter (PBF) over classic Bloom filter using the method of probabilistic flipping for monitoring network traffic for suspicious activities such as denial-of-service (Dos) attack. Xiong et al. [33] To augment with dynamic traffic authors have proposed the modified version of PBF over [35] with careful analysis for optimal parameter selection using game theory techniques. Tripathy et al. [30] highlights on the improved performance in larger similarity search concept trees using a GPU co-processor.

3 Background

This section describes the prerequisites and background knowledge to understand the proposed scheme.

3.1 Bloom filter

The standard Bloom filter is a probabilistic data structure that allows for the testing of set membership. The false positives are possible but not false negatives. It is also an extremely fast and space-efficient data structure. It is implemented as a bit array of m bits, all of which are set to 0 when it is empty. When an element needs to be added to the Bloom filter, it passes through k different hash functions to produce k unique addresses by performing the module operation with m to produce a random address corresponding to one of the m array positions. To query for an element, it is passed through the same k hash functions to obtain k array positions. If any of the bits in one of these positions is set to 0, it can be concluded that the element is definitely not in the set. But, if all the bits are set to 1, there is a chance that the element is actually in the set. If the element is not in the set, yet all the bits are set to 1, it is a false positive. The Bloom filter basic insertion process with $k=3$ is shown in Fig. 1. The Eq. (1) can be used to calculate the optimal values for the different parameters of the bloom filter:

$$\begin{aligned} n= & {} \Bigg \lceil \frac{m}{\left( \frac{-k}{\text {log}(1-e^{\frac{\text {log}(p)}{k}})}\right) }\Bigg \rceil \nonumber \\ p= & {} {\Bigg (1-e^{(\frac{-kn}{m})}}\Bigg )^k\nonumber \\ m= & {} \Bigg \lceil \frac{n\cdot \text {log}(p)}{\text {log} \Bigg (\frac{1}{2^{\text {log}(2)}}\Bigg )}\Bigg \rceil \nonumber \\ k= & {} \text {ROUND}\left( \frac{m}{n}\cdot \text {log}(2)\right) \end{aligned}$$

(1)

where m is the number of bits in the Bloom filter, n is the approximate number of items in the Bloom filter, p is the false positive probability, k is the number of hash functions and ROUND() returns the nearest integer multiple of 10.

Several variations of Bloom filters have been introduced to improve their functionality. Patgiri et al. [26] and Luo et al. [20] showcase a good survey of the various types of Bloom filters and the domains in which they are deployed. The counting Bloom filters introduced by Fan et al. [8] provide a way to allow deletion of elements without resetting the filter but they use 3–4 times more space. The scalable Bloom filers proposed in Almeida et al. [1] have the capability to adapt to the amount of data stored dynamically by implementing a sequence of standard bloom filters with increasing capacity. The spatial Bloom filters proposed by Palmieri et al. [25] are able to store multiple sets in one data structure and allow for the prioritization among these sets which lets them preserve the important elements.

3.2 Evolution of GPU computing and CUDA framework

The CPU contains a few cores which are optimized with a large amount of cache memory and can handle several software threads by processing them in a sequential or serial manner. In contrast, a GPU consists of massively parallel architecture which contains thousands of small, efficient cores designed to handle multiple tasks simultaneously. Historically, GPUs have been used as special-purpose processing units for very particular use-cases like graphics processing which involves computing information of millions of pixels. Recently, GPU computing is being expanded into the domain of general-purpose computing making it easier for developers to make use of the additional processing power in their applications.

3.3 Cloud-based GPU framework

Cloud providers like Microsoft Azure, Amazon Web Services, Google Colab, and IBM SoftLayer have collaborated with Nvidia to provide on-demand GPU access over the internet. Several challenges do exist when providing access to high performance computing over the cloud. Some of the biggest challenges are latency, storage speed, and cloud virtualization. Different high performance applications required varying amounts of computer power, I/O bottlenecked applications can utilize a high latency connection but more demanding applications require low-latency high throughput connections. Overall, the growing cloud infrastructure today makes it possible to provide on-demand GPU access which has the potential to enable many technological innovations without requiring heavy capital investment. Th parallel computing is now more affordable and accessible than ever before and developers have the capability to significantly enhance their software performance by exploiting this infrastructure.

3.4 CUDA framework

This framework allows developers to write programs for the Nvidia family of GPUs with ease. It provides abstractions over threads groups, shared memories, and barrier synchronization by exposing them to the programmer in a simple manner. The GPU consists of many streaming multiprocessors (SMs) and a multithreaded program is partitioned into a block of threads that execute independently and are allocated to these SMs.

The global memory is the largest memory space on the GPU and also has the highest latency. The data that needs to remain persistent throughout the program execution is stored here. The shared memory is fast, small (a few kilobytes), and is allocated to each multiprocessor intended to be used as an application cache. The other types of memory are: texture, constant, and register memories. The threads running on GPU cannot access the memory of the host hence all data required for computation must to transferred from the host main memory to the GPU’s global memory. Then, a kernel function is called which launches a large number of parallel threads to perform the computation whose results are stored in the global memory. Finally, these results are copied back to the host memory for post-processing.

To maximize performance, the following strategies must be used: (i) maximize parallel execution to achieve maximum utilization, (ii) optimize memory usage to achieve maximum memory throughput, (iii) optimize instruction usage to achieve maximum instruction throughput, (iv) minimize memory thrashing

4 Proposed scheme

This section describes the proposed system model and it’s implementation on the graphical processing unit.

Our implementation consists of four phases as shown in Fig. 2:

Phase-1: Random data generation.
Phase-2: Preprocessing.
Phase-3: Bloom filter insert operation and performance benchmarking.
Phase-4: Bloom filter query operation and performance benchmarking.

Phase-1 and Phase-2 are common for the CPU and GPU implementations. In the first phase, a list of randomized strings between 16 and 128 bytes are generated in which each string serves as input data for both the insertion and query operations on Bloom filter.

In the second phase, the generated random data is concatenated to form a long string of strings separated by a single space character to facilitate the direct transfer of this concatenated string to the GPU global memory and stored in a contiguous array without the need to process each input string independently. The pseudocodes for Phase-1 and Phase-2 are described in Algorithm 1 and Algorithm 2. We have implemented the first and second phases of our proposed scheme using the Python scripting language. This allows us to measure and document the performance of just the Bloom filter operations without the need to include the data generation and preprocessing phases.

Phase-3 and Phase-4 are implemented using CUDA programming framework as shown in Fig. 3 on Nvidia GPU in order to optimize the insertion and query operations of the Bloom filter data structure. Specifically, the large number of streaming multiprocessors available on the GPU are utilized to reduce the time taken to complete the hashing phase of the Bloom filter operations. For the CPU version, Phase-3 is implemented using the C language without the use of any multi-threading paradigms in order to serve as a serial implementation benchmark to be compared with the parallel GPU version. The pseudocode for Phase-3 and Phase-4 is described in Algorithm 3. The pseudocodes for pattern insert and parallel pattern search are described in Algorithm 4 and Algorithm 5.

4.1 Implementation of modified one-hashing algorithm

We have implemented a slightly modified version of the one-hashing algorithm wherein rather than using consecutive prime numbers as modulo operands, we use the current iteration of the hash function as modulo operands. Furthermore, we utilize the technique proposed in Kirsch et al. [16], but rather than using two independent hash functions, we split a given hash function into two parts. Essentially, our hash computation is a hybrid approach of both one-hashing bloom filter (OHBF) [19] and less-hashing bloom filter (LHBF) Kirsch et al. [16]. The difference between the traditional hashing implementation and the proposed hashing implementation is given in Fig. 4a, b.

4.2 Performance trade-off in hash functions

The cryptographic hash functions display a good quality of randomness and uniform distribution but are computationally expensive. The popular choices for cryptographic hash functions are SHA1, MD5 and BLAKE3. The cost of these functions is usually heavily dependant on the size of the input data, which essentially rules them out for applications where long strings need to be hashed to generate unique addresses for the Bloom filter. The non-cryptographic hash functions are computationally inexpensive and often used as part of Bloom filter implementation. These non-cryptographic hash functions display relatively low amount of randomness which often leads to an increase in the false-positive probability rate. Nevertheless, they still remain a popular choice, especially the hash functions XXHASH32, DJB2, Jenkins, and APHash etc. We have implemented and recoded the average performance of various cryptographic and non-cryptographic hash functions as shown in Figs. 5 and 6 to find the suitability for the proposed scheme.

5 Results and discussions

The proposed parallel implementation and the traditional serial implementation of the Bloom filter are fed with up to 1 million randomly generated strings (or words) where each string ranges from 16 to 128 bytes, hence the maximum amount of input data fed into the system ranges from 16 to 120 megabytes. The x-axis and y-axis in Figs. 7, 8 represent the number of strings and the execution time on CPU and GPU respectively.

The CPU version is faster than the GPU version when the input contains less than 15–25 thousand strings as shown in Figs. 7 and 8. This is justified due to the memory transfer latency in the GPU version wherein the input data and the index array have to be copied from the host memory to the global memory. But, when the amount of data is larger, the parallel implementation outperforms the serial implementation as shown in Figs. 7 and 8. Further optimizations are possible which would make the parallel implementation even more efficient. We have also implemented the same in OpenMP and the results are benchmarked as shown in Fig 9. The same set of input strings is fed into GPU implementation with variety of hash functions and their performance is benchmarked in Figs. 10, 11, 12, 13 and 14. As expected, non-cryptographic hash functions outperform cryptographic hash functions in terms of execution speed. This benchmark result along with the domain of the application in which a Bloom filter is deployed are two major factors that must be deliberated before choosing the appropriate hash function. In both CPU and GPU benchmarking scenarios, the number of input strings is incremented by 10,000 for every iteration up to 1 million strings. With careful result analysis, it is intuitive that the query time along with insertion time provides a good metric for real-world simulation because the Bloom filter data structure stays in the global memory as it would in a model deployed in a live project.

6 Conclusion and future work

We have implemented standard Bloom filter on GPU using the CUDA framework. We have also utilized the slightly modified version of one-hashing algorithm to compute multiple hash functions by using single hash value. Even though this one-hashing algorithm allowed us to benchmark various hash functions without combining several varieties of hash functions we have explicitly compared several cryptographic and non-cryptographic hash functions on CPU and GPU. From the implementation results, it is intuitive that the GPU implementation of the Bloom filter is more efficient than the CPU version when the data transfer overhead is large. Furthermore, it is also observed that non-cryptographic hash functions are significantly faster than cryptographic hash functions on the GPU implementation. Further, the domain-specific GPU optimizations such as usage of shared memory, memory coalescing, etc. can be adopted in the proposed scheme to take the performance to the next level.

References

Almeida PS, Baquero C, Preguiça N, Hutchison D (2007) Scalable bloom filters. Inf Process Lett 101(6):255–261
Article MathSciNet Google Scholar
Bloom Burton H (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Article Google Scholar
Chauhan SS, Batra S (2018) A parallel computational approach for similarity search using bloom filters. Comput Intell 34(2):713–733
Article MathSciNet Google Scholar
Costa LB, Al-Kiswany S, Ripeanu M (2009) Gpu support for batch oriented workloads. In: 2009 IEEE 28th international performance computing and communications conference, pp 231–238
Dharmapurikar S, Krishnamurthy P, Sproull TS, Lockwood JW (2004) Deep packet inspection using parallel bloom filters. IEEE Micro 24(1):52–61
Article Google Scholar
Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26(3):297–302
Article Google Scholar
Dyumin AA, Kuznetsov AA, Rovnyagin MM (2015) Evaluation of statistical properties of a modified bloom filter for heterogeneous gpgpu-systems. In: 2015 IEEE NW Russia young researchers in electrical and electronic engineering conference (EIConRusNW), pp 71–74
Fan L, Cao P, Almeida J, Broder AZ (2000) Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans Netw 8(3):281–293
Article Google Scholar
Gholami M, Sookhtezari Y, Haghighi M, Bahram-Beigy B, Ahmadi M (2015) A high-performance click-based packet classifier on gpu. In: 2015 5th international conference on computer and knowledge engineering (ICCKE), pp 42–47
Gubner T, Tomé D, Lang H, Boncz P (2019) Fluid co-processing: Gpu bloom-filters for cpu joins. In: DaMoN’19. Association for Computing Machinery, New York
Gubner T, Tomé D, Lang H, Boncz P (2019) Fluid co-processing: Gpu bloom-filters for cpu joins. In: Proceedings of the 15th international workshop on data management on new hardware, DaMoN’19. Association for Computing Machinery, New York
Hayashikawa M, Nakano K, Ito Y, Yasudo R (2019) Folded bloom filter for high bandwidth memory, with gpu implementations. In: 2019 7th international symposium on computing and networking (CANDAR). IEEE Computer Society, Los Alamitos, pp 18–27
Hung C-L, Lin C-Y, Po-Chang W (2017) An efficient gpu-based multiple pattern matching algorithm for packet filtering. J Signal Process Syst 86(2–3):347–358
Article Google Scholar
Iacob A, Itu L, Sasu L, Moldoveanu F, Suciu C (2015) Gpu accelerated information retrieval using bloom filters. In: 2015 19th international conference on system theory, control and computing (ICSTCC), pp 872–876
Karp RM, Rabin MO (1987) Efficient randomized pattern-matching algorithms. IBM J Res Dev 31(2):249–260
Article MathSciNet Google Scholar
Kirsch A, Mitzenmacher M (2008) Less hashing, same performance: building a better bloom filter. Random Struct Algorithms 33(2):187–218
Article MathSciNet Google Scholar
Lin C-H, Liu C-H, Chien L-S, Chang S-C (2013) Accelerating pattern matching using a novel parallel algorithm on gpus. IEEE Trans Comput 62(10):1906–1916
Article MathSciNet Google Scholar
Liu Y, Schmidt B, Maskell DL (2011) Decgpu: distributed error correction on massively parallel graphics processing units using cuda and mpi. In: BMC bioinformatics, vol 12
Lu J, Yang T, Wang Y, Dai H, Jin L, Song H, Liu B (2015) One-hashing bloom filter. In: 2015 IEEE 23rd international symposium on quality of service (IWQoS), pp 289–298
Luo L, Guo D, Ma RTB, Rottenstreich O, Luo X (2018) Optimizing bloom filter: challenges, solutions, and comparisons. arXiv:1804.04777 [CoRR abs]
Ma L, Chamberlain RD, Buhler JD, Franklin MA (2011) Bloom filter performance on graphics engines. In: 2011 international conference on parallel processing, pp 522–531
Moraru I, Andersen DG (2012) Exact pattern matching with feed-forward bloom filters. ACM J Exp Algorithmics 17:3–1
Article MathSciNet Google Scholar
Morris R, Kohler E, Jannotti J, Frans Kaashoek M (1999) The click modular router. SIGOPS Oper Syst Rev 33(5):217–231
Article Google Scholar
Ong WM, Baskaran VM, Chong PK, Ettikan KK, Yong KK (2013) A parallel bloom filter string searching algorithm on a many-core processor. In: 2013 IEEE conference on open systems (ICOS), pp 1–6
Palmieri P, Calderoni L, Maio D (2015) Spatial bloom filters: enabling privacy in location-aware applications. In: Proceedings—10th international conference on information security and cryptology. Springer, pp 16–36
Patgiri R, Nayak S, Borgohain SK (2019) Hunting the pertinency of bloom filter in computer networking and beyond: a survey. J Comput Netw Commun
Putze P, Sanders P, Singler J (2010) Cache-, hash-, and space-efficient bloom filters. ACM J Exp Algorithmics 14
Ramakrishna MV, Fu E, Bahcekapili E (1997) Efficient hardware hashing functions for high performance computers. IEEE Trans Comput 46(12):1378–1381
Article Google Scholar
Sørensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Kongelige Danske Videnskabernes Selskab 5(4):1–34
Google Scholar
Tripathy A, Mohan S, Mahapatra R (2011) Optimizing a semantic comparator using cuda-enabled graphics hardware. In: 2011 IEEE fifth international conference on semantic computing, pp 125–132
Wada T, Matsumura N, Nakano K, Ito Y (2018) Efficient byte stream pattern test using bloom filter with rolling hash functions on the fpga. In: 2018 sixth international symposium on computing and networking (CANDAR), pp 66–75
Xiong S, Yao Y, Berry M, Qi H, Cao Q (2017) Frequent traffic flow identification through probabilistic bloom filter and its gpu-based acceleration. J Netw Comput Appl 87(C):60–72
Article Google Scholar
Xiong S, Yao Y, Berry M, Qi H, Cao Q (2017) Frequent traffic flow identification through probabilistic bloom filter and its gpu-based acceleration. J Netw Comput Appl 87(C):60–72
Article Google Scholar
Yun H, Wu XL, Chen D, Ma J, Hwu WM (2014) Bless: bloom filter-based error correction solution for high-throughput sequencing reads. Bioinformatics 30(10):1354–1362
Article Google Scholar
Yao Y, Xiong S, Liao J, Berry M, Qi H, Cao Q (2015) Identifying frequent flows in large datasets through probabilistic bloom filters. In: 2015 IEEE 23rd international symposium on quality of service (IWQoS), pp 279–288
Zhang F, Wu D, Ao N, Wang G, Liu X, Liu J (2011) Fast lists intersection with bloom filter using graphics processing units. In: SAC ’11. Association for Computing Machinery, New York, pp 825–826
Zhang H, Xu D, Tian Z, Fan Y (2015) An efficient parallel algorithm for exact multi-pattern matching. Secure Commun Netw 8(9):1688–1697
Article Google Scholar

Download references

Funding

Open access funding provided by Manipal Academy of Higher Education, Manipal.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
Radhakrishna Bhat, Reddy Kanala Thilak & Reddy Panyala Vaibhav

Authors

Radhakrishna Bhat
View author publications
You can also search for this author in PubMed Google Scholar
Reddy Kanala Thilak
View author publications
You can also search for this author in PubMed Google Scholar
Reddy Panyala Vaibhav
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RB contributes to conceptualization, methodology, modeling, implementation, interpretation and validation of results, manuscript writing and review. RKT contributes to implementation, interpretation and validation of results, manuscript review. RPV contributes to implementation, interpretation and validation of results, manuscript review.

Corresponding author

Correspondence to Radhakrishna Bhat.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bhat, R., Thilak, R.K. & Vaibhav, R.P. Hunting the pertinency of hash and bloom filter combinations on GPU for fast pattern matching. Int. j. inf. tecnol. 14, 2667–2679 (2022). https://doi.org/10.1007/s41870-022-00964-3

Download citation

Received: 28 December 2021
Accepted: 19 April 2022
Published: 17 May 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s41870-022-00964-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hunting the pertinency of hash and bloom filter combinations on GPU for fast pattern matching

Abstract

Similar content being viewed by others

Efficient Parallel Knuth-Morris-Pratt Algorithm for Multi-GPUs with CUDA

GPU-accelerated string matching for database applications

Efficient Pattern Matching on CPU-GPU Heterogeneous Systems

1 Introduction