Autoscaling Bloom filter: controlling tradeoff between true and false positives
 77 Downloads
Abstract
A Bloom filter is a special case of an artificial neural network with two layers. Traditionally, it is seen as a simple data structure supporting membership queries on a set. The standard Bloom filter does not support the delete operation, and therefore, many applications use a counting Bloom filter to enable deletion. This paper proposes a generalization of the counting Bloom filter approach, called “autoscaling Bloom filters”, which allows adjustment of its capacity with probabilistic bounds on false positives and true positives. Thus, by relaxing the requirement on perfect true positive rate, the proposed autoscaling Bloom filter addresses the major difficulty of Bloom filters with respect to their scalability. In essence, the autoscaling Bloom filter is a binarized counting Bloom filter with an adjustable binarization threshold. We present the mathematical analysis of its performance and provide a procedure for minimizing its false positive rate.
Keywords
Bloom filter Counting Bloom filter Autoscaling Bloom filter True positive rate False positive rate1 Introduction
Many applications require fast and memoryefficient querying of an item’s membership in a set. A Bloom filter (BF) is a simple binary data structure, which supports approximate set membership queries.
From a neural processing point of view, BFs are a special case of an artificial neural network with two layers (input and output), where each position in a filter is implemented as a binary neuron (see more details in [1]). Such a network does not have interneuronal connections. That is, output neurons (positions of the filter) have only individual connections with themselves and the corresponding input neurons. BFs are also related to a neural network architecture called distributed connectionist production system [2].
The standard BF (SBF) allows adding new elements to the filter and is characterized by a perfect true positive rate (i.e., 1), but nonzero false positive rate. The false positive rate depends on the number of elements to be stored in the filter, and the filter’s parameters, including the number of hash functions and the size of the filter. However, SBF lacks the functionality of deleting an element. Therefore, a counting Bloom filter (CBF) [3], providing the delete operation, is commonly used. When the size of CBF and the number of elements to be stored are known, the number of hash functions can be optimized to minimize the false positive rate.
Another practical issue is that the parameters of a BF (i.e., size of filter and number of hash functions) cannot be altered once it is constructed. If the current filter does not satisfy the performance requirements (e.g., false positive rate), it is necessary to rebuild the entire filter, which is computationally expensive. Therefore, the optimization of a BF is problematic and costly when the number of elements to be stored is unknown or varies dynamically. In fact, this is one of the major scalability difficulties of BFs. This paper presents a solution allowing overcoming it.
ABF belongs to a class of binary BFs and is constructed by binarization of a CBF with the binarization threshold (\(\varTheta\)) as a parameter. Querying the ABF also uses a decision threshold (T) to determine whether there is sufficient evidence to respond that the query item is an element of the stored set. Both parameters, \(\varTheta\) and T, can be varied, while the ABF is in use without requiring the filter data structure to be rebuilt. Figure 1 illustrates the main idea behind the ABF. Figure 1a shows an example CBF of size 20, which stores four elements (\(x_1\) to \(x_4\)). Each element is mapped to three different positions of the filter, one position for each of the three hash functions. The value at each position is the number of elements mapped to that position by the three hash functions and varies between 0 and 4 (highlighted by different colors). The SBF (Fig. 1b) is formed by setting all nonzero positions of the CBF to one^{1}. The two lower parts of the figure denoted as (c) and (d) present two examples of the ABF for two different sets of parameters: \(\varTheta =1\) and \(T=2\); \(\varTheta =3\) and \(T=1\), respectively. In all four examples, the filter is queried with the unstored element y, testing for membership of the set of stored elements. The correct answer in every case, obviously, is that y is not a member of the stored set. In the SBF example, all nonzero positions of y are set to one, which is interpreted by the SBF algorithm as indicating that the query element is a member of the stored set, thus generating a false positive response. In contrast, in Fig. 1c, y has only one position in common with the ABF, while all elements \(x_i\) have at least two positions. Thus, if a decision threshold T (for the number of activated positions) is set to two, then y will be correctly rejected by the ABF while all the stored elements are correctly reported as present. On the other hand, for the ABF in Fig. 1d, the binarization threshold (\(\varTheta =3\)) is too low and it is not possible to set a decision threshold T (even the smallest possible \(T=1\)) such that all stored elements \(x_i\) are reported as present.
Mathematically, the ABF has its roots in the theory of sparse distributed data representations [6]. ABF can also be interpreted in terms of hyperdimensional computing [7], where everything is represented as highdimensional vectors and computation is implemented by arithmetic operations on the vectors. Both sparse distributed representations and hyperdimensional computing can be conceptualized as weightless artificial neural networks.
This paper presents a theoretical generalization of CBFs by exploring a direct correspondence between BFs and hyperdimensional representations along with the practical implications. BFs are treated as a special case application of distributed representations where each element stored in the BF is represented as a hyperdimensional binary vector constructed by the hash functions. The mathematics of sparse hyperdimensional computing [6] (SHC) is used for describing the behavior of the proposed ABF. The construction of the filter itself corresponds to the bundling operation [6] of binary vectors.

It proposes the ABF, which is a generalization of the CBF with probabilistic bounds on false positives and true positives;

It presents the mathematical analysis and experimental evaluation of the ABF properties;

It gives a procedure for automatic minimization of the false positive rate adapting to the number of the elements stored in the filter;

For the first time, it shows that BFs are a special case of hyperdimensional computing.
2 Related work
A recent probabilistic analysis of the SBF is presented in [8]. Detailed surveys on BFs and their applications are provided in [9] and [10]. BFs are often applied in the area of pattern recognition [11, 12]. For example, recent applications of BFs and their modifications include certificate revocation for smart grids [13], classification of text strings [14], and detection of Transmission Control Protocol (TCP) network worms [15]. An important aspect for the applicability of BFs in modern networking applications is the processing speed of a filter. In order to improve the speed of the membership check, the authors in [16] proposed a novel filter type called ultrafast BFs. In [17], it was shown that BFs can be accelerated (in terms of processing speed) by using particular types of hashing functions.
This section overviews the approaches most relevant to the presented ABF approach. One direction of research is to propose new types of data structures supporting approximate membership queries. For example, recently proposed invertible Bloom lookup tables [18], quotient filters [19], counting quotient filters [20], TinySet [21], and cuckoo filters [22] support dynamic deletion. Another popular research topic is to improve the performance of the SBF via modifications of the original approach. The ternary BF [23] improves the performance of the CBF as it only allows three possible values of each position. The deletable BF [24] uses additional positions in the filter, which are used to support the deletion of elements from the filter without introducing false negatives. The complement Bloom Filter [25] uses an additional BF in order to identify the trueness of BF positives. The on–off BF [26] reduces false positives by including in the filter additional information about those elements that generate false positives. Fingerprint counting BF [27] is a modification improving the CBF with the usage of fingerprints on the filter elements. In [13], the authors propose to use two BFs and an external mechanism in order to resolve cases when the membership is confirmed by both filters. In a similar fashion, the crosschecking BF [28] constructs several additional BFs, which are used to crosscheck the main BF if it issues a positive result. The scalable Bloom filter [29] can maintain the desired false positive rate even when the number of stored elements is unknown. However, it has to maintain a series of BFs in order to do so. Another related approach called variableincrement CBF (VICBF) was presented in [30]. Similar to the CBF, the VICBF supports the delete operation; however, it requires less memory for achieving the same false positive rate. The improvement is due to the usage of a hashed variable increment rather than a counter increment as in the CBF. In comparison with the VICBF, the ABF could fully operate with the binary components; however, it would lose the ability to delete elements. Nevertheless, once the VICBF is designed, it does not have the builtin functional to tolerate large variations in the number of stored elements. While the VICBF is a generalization of the CBF, it would not be trivial to apply the ABF to a given VICBF as the different variable increments values are used to get the final values in each position of a filter. The retouched BF (RBF) [4] is conceptually the most relevant approach to the ABF since it allows some false negatives as a tradeoff for decreasing the false positive rate. The major difference to the proposed approach is that RBF eliminates false positives that are known in advance. When the potential false positives are not known in advance, the RBF could randomly erase several nonzero positions of the filter.
In contrast to the previous work, the ABF is suitable for reducing the false positive rate even when the whole universe of elements is either unknown or is too large to use additional mechanisms for encoding the elements not included in the filter.
3 Autoscaling Bloom filter
3.1 Preliminaries: BFs
At the initialization phase, a BF can be seen as a vector of length m where all positions are set to zero. The value of m determines the size of the filter. In order to store in the filter an element q, from the universe of elements, the element should be mapped into the filter’s space. This process is usually seen as application of k different hash functions to the element. The result of each hash function is an integer between 1 and m. This value indicates the index of the position of the filter which should be updated. In the case of the SBF, an update corresponds to setting the value of the corresponding position of the SBF to 1. If the position already has value 1, it stays unchanged. In the case of the CBF, an update corresponds to incrementing the value of the corresponding position of the CBF by 1. Thus, when storing a new element in the filter, at most k positions of the filter update their values. Note that there is a possibility that two or more hash functions return the same result. In this case, there would be less than k updated positions. However, it is usually recommended to choose hash functions such that they have a negligible probability of returning the same index value. Therefore, without loss of generality, suppose that the k results of k hash functions applied to q never coincide. That is, all k indices pointing to positions in the filter are unique.
3.2 Preliminaries: probability theory
Two probability distributions are useful for the analysis presented here. These are binomial and hypergeometric distributions. Both are discrete. They describe the probability of s successes (draws for which the drawn entities are defined as successful) in g random draws from a finite population of size G that contains exactly S successful entities. The difference between binomial and hypergeometric distributions is that the binomial distribution describes the probability of s successes in g draws with replacement, while the hypergeometric distribution describes the probability of s successes in g draws without replacement. Binomial and hypergeometric distributions are the most natural choice for modeling BFs since they correspond to the discrete nature of values in BFs. It is worth mentioning that when the number of random draws g is large, both distributions could be approximated by normal or Poisson distributions depending on relations between g, s, and G. We do not use the approximations in this paper as this allows avoiding errors introduced by approximations.
The difference is that for the binomial distribution positions are independent while for the hypergeometric distribution they are not. For example, if the actual values of some positions are known for the realization of a hypergeometric experiment, then the probability of a success for the rest of the positions should be updated accordingly. This is because draws from the population are done without replacement.
As the probability mass function for the hypergeometric distribution is not used below, it is omitted here.
3.3 Preliminaries: relation between BFs and probability theory
A value in ith position of \(\mathbf{CBF }\) [see (1)] can be seen as a discrete random variable (denoted as I) in the range \(I \in {\mathbb {Z}}  0 \le I \le n\), where n denotes the number of elements stored in a filter. Because representations \(\mathbf{x }_i\) stored in \(\mathbf{CBF }\) are independent realizations of the hypergeometric experiment, I follows the binomial distribution: \(I \sim \text {B}(g,p_s)\) where \(g=n\), \(p_s=p_1\).
In fact, (9) differs from the standard expression (10) for \(p_0\). However, both produce different results only for small lengths of the filter (\(m<50\)), which are not of practical importance.
3.4 Definition of autoscaling Bloom filter
Note that when \(\varTheta =0\), the ABF is equivalent to the SBF.
In general, the expected dot product (denoted \({\bar{d}}_x\)) between the ABF and an element x included in the filter is less than or equal to k.^{2} As the binarization threshold \(\varTheta\) increases, more of the nonzero positions in the CBF are mapped to zero values in the corresponding ABF. This necessarily reduces the dot product of the ABF vector with the query vector. Therefore, there is a need for the second parameter of the ABF, which determines the lowest value of dot product indicating the presence of an element in the filter. Denote this decision threshold parameter as T (\(0 \le T \le k\)), then an element of the universe q is judged to be a member of the ABF if and only if the dot product between \(\mathbf{ABF }\) and \(\mathbf{q }\) is greater than or equal to T.
3.5 Probabilistic characterization of the autoscaling Bloom filter
Note that when \(\varTheta =0\), \({\bar{d}}_x(\mathbf{ABF }, \mathbf{x })=k\) which corresponds to the SBF [see (4)]. In other words, the SBF can be seen as a special case of the ABF. The calculations in (15) when \(\varTheta >0\) can be interpreted in the following way. The dot product between \(\mathbf{SBF }\) and \(\mathbf{x }\) is k. A position in \(\mathbf{CBF }\) with value \(v>0\) contributes 1 to the values of dot products of v stored elements. Thus, if this position is set to zero in the SBF, there will be v elements with the dot product equal to \(k1\) while the dot products for the rest of the elements still equal k. Then, the expected dot product between the filter and an element is decremented by v / n. In fact, the number of positions with value v is unknown, but it is possible to calculate the probability \(\hbox {Pr}(I=v)\) of such position in \(\mathbf{CBF }\) using (8). Then the expected number of such positions in \(\mathbf{CBF }\) is determined via (11) and equals \(m\hbox {Pr}(I=v)\). When the ABF suppresses all such positions, each of them decrements the expected dot product by v / n. Then, the total decrement of the expected dot product by the suppressed positions with value v is expected to be \(mv\hbox {Pr}(I=v)/n\). Because the ABF suppresses all positions with values less than or equal to \(\varTheta\), the decrements of the expected dot product introduced by each value v should be summed up.
Both dot products \(d_x\) and \(d_y\) are characterized by discrete random variables (denoted as X and Y, respectively) which in turn are described by binomial distributions: \(X \sim \text {B}(k,p_x)\) and \(Y \sim \text {B}(k,p_y)\).
3.6 Performance properties of ABF
4 Evaluation of ABF
4.1 Optimization of ABF’s parameters
In order to choose the best value of T (or even both \(\varTheta\) and T), an optimization criterion is needed. It is proposed to optimize the accuracy (\(\text {ACC}\)) of the filter. This is defined as the average value of true positive rate and true negative rate: \(\text {ACC}=(\text {TPR}+(1\text {FPR}))/2\). Note that this definition of accuracy is also known as unweighted average recall. Note also that the accuracy does not have to be the only choice for the optimization criterion. The choice of \(\text {ACC}\) implies that false positives and false negatives are treated as equally costly. However, in a practical application this may not be true. Instead, each of the four possible outcomes (true positive, false positive, true negative, false negative) will have an associated domaindependent cost. The designer would then optimize the design parameters so as to minimize the cost in the application scenario. For example, if the total number of elements and the number of elements stored in the filter are known, then such performance metrics as F1 score and Matthews correlation coefficient [34] can be used for optimization. In the absence of a specific application, we are forced to use a general performance summary. We have chosen to use accuracy as a general summary because it is simple and well understood.
4.2 An example: ABF in action
The behavior of ABF for different \(\varTheta\) is illustrated in Figure 2. The length of the CBF (and all derived ABFs) is \(m=10{,}000\). It stores \(n=500\) unique elements, and each element is mapped to an individual BF with \(k=100\) nonzero positions. Note that the value of k in this example is intentionally not optimized for the given m and n. The particular value of k is chosen for demonstration purposes to clearly illustrate the situation when the SBF has a high false positive rate which can be significantly decreased by the ABF. Similar effects can be seen for other values of k, m, and n.
The plot for \(\varTheta =0\) corresponds to the SBF. In this case, X is deterministic and located at \(k=100\) as expected given \(k=100\) nonzero positions for the SBF. Hence, the optimal value of T is trivially equal to k and \(\text {TPR}=100\%\). A large portion of the distribution for Y is also concentrated at \(k=100\), which leads to high \(\text {FPR}=52\%\). On the other hand, the ABFs with \(\varTheta >0\) have better separation of the two distributions. Much lower FPR can be achieved by reducing the \(\text {TPR}\) below 100%. The optimal values of T (indicated by black vertical bars) were found for each value of \(\varTheta\) according to (21). The lowest acceptable value of \(\text {TPR}\), \(L_{\text {TPR}}\) was set to 0.97. This particular value was chosen to demonstrate that, in principle, a large reduction of the FPR can be achieved via a small reduction in the TPR. The best values of \(\text {TPR}\), \(\text {FPR}\), and \(\text {ACC}\) for each plot are depicted in the figure. For example, even changing \(\varTheta\) from 0 to 1 allows \(\text {FPR}\) to be reduced from 0.52 to 0.24 at the cost of reducing \(\text {TPR}\) by only 3%. Overall, the accuracy is improved by 0.13. The best performance among the considered range is achieved for \(\varTheta =4\), resulting in \(\text {TPR}=0.98\), \(\text {FPR}=0.04\), \(\text {ACC}=0.97\), thus improving the accuracy of the SBF by 31%. It should be noted that the presented example considered only a narrow range of \(\varTheta\). In principle, \(\varTheta\) could be chosen between 0 and n, and therefore, it is important to observe the performance of the ABF for larger \(\varTheta\). Figure 3 demonstrates the dependency between \(\varTheta\) and \(\text {ACC}\), where for each \(\varTheta\) in the range \(0 \le \varTheta \le 20\), T was optimized according to (21) without limiting \(L_{\text {TPR}}\). The first six values of \(\text {ACC}\) in Figure 3 correspond to the values depicted in Figure 2. These values lie in the region where the \(\text {ACC}\) was increasing for each new value of \(\varTheta\). However, for values of \(\varTheta >5\), we observe that \(\text {ACC}\) is constantly decreasing until it reaches 0.5. This decrease happens because with the increased \(\varTheta\) the sparsity of the ABF is increasing until all positions in the filter are set to zero. This moment corresponds to \(\text {ACC}=0.5\) because an empty filter has no information about the stored elements, and thus, its \(\text {TPR}\) is zero, but it also has no false positives (i.e., \(\text {FPR}=0\)), which results in \(\text {ACC}=0.5\). Therefore, we observed that the dependency between \(\varTheta\) and \(\text {ACC}\) is nonlinear and that there is a peak value of \(\text {ACC}\), which in the considered example was achieved for \(\varTheta =4\).
4.3 Comparison with the optimized BF
Summary of the experimental setup
Parameter  RBF  ABF  Nonoptimized BF  Optimized BF 

Number of hash functions, k  100  Calculated for n using (3) ranged between 1 and 139  
Number of stored elements, n  \(50 \le n \le 5000\) step 50  
Filter length, m  10,000  
Number of times rebuilt  1  23 
The \(\text {TPR}\) of the optimized and nonoptimized BFs is always 1, while for the ABF and nonoptimized RBF it can be less. In particular, the \(\text {TPR}\) of the ABF varies in the allowed range between \(L_{\text {TPR}}\) and 1. For large values of n (>1000), the \(\text {TPR}\) of the ABF is approximately equal to \(L_{\text {TPR}}\). In the case of nonoptimized RBF the \(\text {TPR}\) was around 0.9 over the whole range of n. The \(\text {FPR}\) of all the filters grows with increasing n. As anticipated, the nonoptimized BF soon (at \(n \approx 1000\)) achieves \(\text {FPR}=1\) and stays there until the end. A similar behavior is demonstrated by the nonoptimized RBF with the exception that the highest value of \(\text {FPR}\) is 0.9. Note that with RBF, the price one has to pay for the lower \(\text {FPR}\) is the decreased \(\text {TPR}\). Two other filters, the ABF and the optimized BF, demonstrate a smooth increase in \(\text {FPR}\). The \(\text {FPR}\) is lower than 1 for both filters even when \(n=5000\) (approximately 0.6 and 0.4, respectively). The accuracy curves aggregate the behavior for \(\text {TPR}\) and \(\text {FPR}\). For most values of n, the nonoptimized BF and RBF reach \(\text {ACC}=0.5\) as their \(\text {FPR}\)s reach the maximal values. Their accuracies for large values of n are the same because the gain in \(\text {FPR}\) equals the loss in \(\text {TPR}\) for the nonoptimized RBF. The accuracies of the ABF and the optimized BF smoothly decay with the growth of n, being 0.66 and 0.8 when \(n=5000\). Thus, the ABF significantly outperforms the nonoptimized BF and RBF when their \(\text {FPR}\)s are increasing. In general, the performance of the ABF follows that of the optimized BF with some constant loss. The increase in accuracy from ABF to optimized BF can be understood as the value delivered by being able to specify in advance precisely the number of elements to be stored in the filter. The best tradeoff between \(\text {TPR}\) and \(\text {FPR}\) is in the region of n where \(\text {FPR}\) of the nonoptimized BF is steeply increasing from 0 to 1.
It is important to reemphasize the advantages of the ABF over the optimized BF. In the experiments above, the ABF addressed the major difficulty of the SBF, which is its limited scalability, since the ABF does not require the recalculation of the whole filter as the number of the stored elements is increasing. Thus, the ABF allows adopting the performance of the filter even when the number of elements to be stored simultaneously is not known in advance. On the contrary, the SBF (i.e., the optimized BF in the experiments) is not scalable as it must be rebuilt if a new value of k is chosen. In the experiments reported in Fig. 4, k varied between 1 and 139 and the optimized BF was rebuilt 23 times (cf. Table 1). The fact that the optimized BF has to be rebuilt every time when k changes limits its usecases for situations with dynamic ranges of elements such as in Fig. 4. Another very important advantage of the ABF is that due to its adaptiveness, the number of hash functions k can be fixed for a wide range of stored elements. Fixed k allows significantly simplifying hardware implementations since there would be no need to account for increased area and power of a chip [5] when k grows. Obviously, since the optimized BF has to work in a dynamic range of k, it does not have this advantage.
5 Conclusion
This paper introduced the autoscaling Bloom filter. The ABF is a generalization of the standard binary BF, derived from the counting BF, with procedures for achieving probabilistic bounds on false positives and true positives. It was shown that the ABF can significantly decrease the false positive rate at a cost of allowing a nonzero false negative rate. The evaluation revealed that the accuracy of the ABF follows the standard BF with the optimized number of hash functions with some constant loss. As opposed to the optimized BF, the ABF provides means for optimization of the filter’s performance without requiring the entire filter to be rebuilt when the number of stored elements in the filter is changing dynamically. This optimization can be achieved while the number of hash functions remains fixed.
There are several limitations to this study. First, since the paper focused on presenting and characterizing the algorithm rather than a solution to any problem, no particular attention has been paid to study the effect of an optimization criterion on the ABF’s performance. Instead, we simply adopted the accuracy. Second, the analysis of the ABF presented in this paper used counting BFs with the unlimited range of counters. In practice, however, the size of counters is limited to several bits [35]. In the future work, we will focus on analyzing the effect of restricted counters in counting BFs on the ABF.
Footnotes
 1.
Note that the SBF is a special case of the ABF, arising when the binarization threshold is set to zero and the decision threshold is set to the number of used hash functions.
 2.
It should be noted that the calculation of expected similarity (e.g., dot product) between two vectors, one of which may store the other, is a general problem formulation in hyperdimensional computing and can be seen as the “detection” type of retrieval (see [33] for details).
Notes
Acknowledgements
Open access funding provided by Lulea University of Technology. This work was supported by the Swedish Research Council under Grant 201504677.
Compliance with ethical standards
Conflict of Interest
The authors declare that they have no conflict of interest.
References
 1.Gritsenko V, Rachkovskij D, Frolov A, Gayler R, Kleyko D, Osipov E (2017) Neural distributed autoassociative memories: a survey. Cybern Comput Eng 2(188):5–35Google Scholar
 2.Touretzky D, Hinton G (1988) A distributed connectionist production system. Cognit Sci 12(3):423–466CrossRefGoogle Scholar
 3.Fan L, Cao P, Almeida J, Broder A (2000) Summary cache: a scalable widearea web cache sharing protocol. IEEE/ACM Trans Netw 8(3):281–293CrossRefGoogle Scholar
 4.Donnet B, Baynat B, Friedman T (2006) Retouched Bloom filters: allowing networked applications to trade off selected false positives against false negatives. In: ACM CoNEXT conference, pp 1–12Google Scholar
 5.Akhlaghi V, Rahimi A, Gupta RK (2016) Resistive Bloom filters: from approximate membership to approximate computing with bounded errors. In: Conference on Design, Automation and Test in Europe (DATE), pp 1–4Google Scholar
 6.Rachkovskij DA (2001) Representation and processing of structures with binary sparse distributed codes. IEEE Trans Knowl Data Eng 3(2):261–276CrossRefGoogle Scholar
 7.Kanerva P (2009) Hyperdimensional computing: an introduction to computing in distributed representation with highdimensional random vectors. Cognit Comput 1(2):139–159CrossRefGoogle Scholar
 8.Grandi F (2018) On the analysis of Bloom filters. Inf Process Lett 129:35–39MathSciNetCrossRefzbMATHGoogle Scholar
 9.Tarkoma S, Rothenberg CE, Lagerspetz E (2012) Theory and practice of Bloom filters for distributed systems. IEEE Commun Surv Tutor 14(1):131–155CrossRefGoogle Scholar
 10.Broder A, Mitzenmacher M (2004) Network applications of Bloom filters: a survey. Internet Math 1(4):485–509MathSciNetCrossRefzbMATHGoogle Scholar
 11.Kazemi SMR, Bidgoli BM, Shamshirband S, Karimi SM, Ghorbani MA, Chau KW, Pour RK (2018) Novel geneticbased negative correlation learning for estimating soil temperature. Eng Appl Comput Fluid Mech 12(1):506–516Google Scholar
 12.Wu CL, Chau KW (2011) Rainfallrunoff modeling using artificial neural network coupled with singular spectrum analysis. J Hydrol 399:394–409CrossRefGoogle Scholar
 13.Rabieh K, Mahmoud M, Akkaya K, Tonyali S (2017) Scalable certificate revocation schemes for smart grid AMI networks using Bloom filters. IEEE Trans Dependable Secure Comput 14(4):420–432CrossRefGoogle Scholar
 14.Ma H, Tseng YC, Chen LI (2016) A CMACbased scheme for determining membership with classification of text strings. Neural Comput Appl 27(7):1959–1967CrossRefGoogle Scholar
 15.Anbar M, Abdullah R, Munther A, AlBetar MA, Saad RMA (2017) NADTW: new approach for detecting TCP worm. Neural Comput Appl 28(1):525–538CrossRefGoogle Scholar
 16.Lu J, Wan Y, Li Y, Zhang C, Dai H, Wang Y, Zhang G, Liu B (2017) Ultrafast Bloom filters using SIMD techniques. In: 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), pp 1–6Google Scholar
 17.Zhang Y, Zheng Z, Zhang X (2017) Efficient Bloom filter for network protocols using AES instruction set. IET Commun 11(11):1815–1821CrossRefGoogle Scholar
 18.Pontarelli S, Reviriego P, Mitzenmacher M (2014) Improving the performance of invertible bloom lookup tables. Inf Process Lett 114(4):185–191CrossRefzbMATHGoogle Scholar
 19.Bender M, FarachColton M, Johnson R, Kraner R, Kuszmaul B, Medjedovic D, Montes P, Shetty P, Spillane RP, Zadok E (2012) Don’t thrash: how to cache your hash on flash. Proc VLDB Endow 5(11):1627–1637CrossRefGoogle Scholar
 20.Pandey P, Bender M, Johnson R, Patro R (2017) A generalpurpose counting filter: making every bit count. In: SIGMOD’17 Proceedings of the 2017 ACM international conference on management of data, pp 775–787Google Scholar
 21.Einziger G, Friedman R (2017) TinySet—an access efficient self adjusting Bloom filter construction. IEEE/ACM Trans Netw 25(4):2295–2307CrossRefGoogle Scholar
 22.Fan B, Andersen D, Kaminsky M, Mitzenmacher M (2014) Cuckoo filter: practically better than bloom. In: CoNEXT’14 Proceedings of the 10th ACM international on conference on emerging networking experiments and technologies, pp 75–88Google Scholar
 23.Lim H, Lee J, Byun H, Yim C (2017) Ternary Bloom filter replacing counting Bloom filter. IEEE Commun Lett 21(2):278–281CrossRefGoogle Scholar
 24.Rothenberg CE, Macapuna CAB, Verdi FL, Magalhaes MF (2010) The deletable Bloom filter: a new member of the Bloom family. IEEE Commun Lett 14(6):557–559CrossRefGoogle Scholar
 25.Lim H, Lee J, Yim C (2015) Complement Bloom filter for identifying true positiveness of a Bloom Filter. IEEE Commun Lett 19(11):1905–1908CrossRefGoogle Scholar
 26.Carrea L, Vernitski A, Reed M (2016) Yesno Bloom filter: a way of representing sets with fewer false positives. ArXiv 160301060:1–28Google Scholar
 27.Pontarelli S, Reviriego P, Maestro J (2016) Improving counting Bloom filter performance with fingerprints. Inf Process Lett 116(4):304–309MathSciNetCrossRefzbMATHGoogle Scholar
 28.Lim H, Lee N, Lee J, Yim C (2014) Reducing false positives of a Bloom filter using crosschecking Bloom filters. Appl Math Inf Sci 8(4):1865–1877CrossRefGoogle Scholar
 29.Almeida P, Baquero C, Preguica N, DHutchison (2007) Scalable Bloom filters. Inf Process Lett 101(6):255–261MathSciNetCrossRefzbMATHGoogle Scholar
 30.Rottenstreich O, Kanizo Y, Keslassy I (2014) The variableincrement counting Bloom filter. IEEE/ACM Trans Netw 22(4):1092–1105CrossRefGoogle Scholar
 31.Bose P, Guo H, Kranakis E, Maheshwari A, Morin P, Morrison J, Smid M, Tang Y (2008) On the falsepositive rate of Bloom filters. Inf Process Lett 108(4):210–213MathSciNetCrossRefzbMATHGoogle Scholar
 32.Christensen K, Roginsky A, Jimeno M (2010) A new analysis of the false positive rate of a Bloom filter. Inf Process Lett 110(21):944–949MathSciNetCrossRefzbMATHGoogle Scholar
 33.Frady EP, Kleyko D, Sommer FT (2018) A theory of sequence indexing and working memory in recurrent neural networks. Neural Comput 30:1449–1513MathSciNetCrossRefGoogle Scholar
 34.Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27:861–874CrossRefGoogle Scholar
 35.Bonomi F, Mitzenmacher M, Panigrahy R, Singh S, Varghese G (2006) An improved construction for counting Bloom filters. In: 14th Annual European Symposium on Algorithms, LNCS 4168, pp 684–695Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.