Abstract
In many network applications, Bloom filters are used to support exact-matching membership query for their randomized space-efficient data structure with a small probability of false answers. We extend the standard Bloom filter to Locality-Sensitive Bloom Filter (LSBF) to provide Approximate Membership Query (AMQ) service. We achieve this by replacing uniform and independent hash functions with locality-sensitive hash functions. Such replacement makes the storage in LSBF to be locality sensitive. Meanwhile, LSBF is space efficient and query responsive by employing the Bloom filter design. In the design of the LSBF structure, we propose a bit vector to reduce False Positives (FP). The bit vector can verify multiple attributes belonging to one member. We also use an active overflowed scheme to significantly decrease False Negatives (FN). Rigorous theoretical analysis (e.g., on FP, FN, and space overhead) shows that the design of LSBF is space compact and can provide accurate response to approximate membership queries. We have implemented LSBF in a real distributed system to perform extensive experiments using real-world traces. Experimental results show that LSBF, compared with a baseline approach and other state-of-the-art work in the literature (SmartStore and LSB-tree), takes less time to respond to AMQ and consumes much less storage space (\(\copyright \){2012}IEEE. Reprinted, with permission, from Ref. [1].).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. (TC) 61(6), 817–830 (2012)
L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman, Exact and approximate membership testers, in Proceedings of STOC (1978), pp. 59–65
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, Multi-probe lsh: efficient indexing for high-dimensional similarity search, in Proceedings of VLDB (2007), pp. 950–961
F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, G. Varghese, Beyond bloom filters: from approximate membership checks to approximate state machines, in Proceedings of ACM SIGCOMM (2006)
Y. Zhu, H. Jiang, False rate analysis of Bloom filter replicas in distributed systems, in Proceedings of ICPP (2006), pp. 255–262
W. Feng, D.D. Kandlur, D. Saha, K.G. Shin, Stochastic fair blue: a queue management algorithm for enforcing fairness, in Proceedings of INFOCOM (2001)
F.M. Cuenca-Acuna, C.Peery, R.P. Martin, T.D. Nguyen, PlantP: using gossiping to build content addressable peer-to-peer information sharing communities, in IEEE HPDC (2003)
A. Pagh, R. Pagh, S. Rao, An optimal bloom filter replacement, in Proceedings of SODA (2005), pp. 823–829
S. Dharmapurikar, P. Krishnamurthy, D.E. Taylor, Longest prefix matching using bloom filters, in Proceedings of ACM SIGCOMM (2003), pp. 201–212
A. Broder, M. Mitzenmacher, Using multiple hash functions to improve IP lookups, inProceedings of INFOCOM (2001), pp. 1454–1463
F. Baboescu, G. Varghese, Scalable packet classification. IEEE/ACM Trans. Netw. 13(1), 2–14 (2005)
P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of STOC (1998), pp. 604–613
A. Kirsch, M. Mitzenmacher, Distance-sensitive bloom filters, in Proceedings of Algorithm Engineering and Experiments (ALENEX) (2006)
A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 1, 117–122 (2008)
L. Fan, P. Cao, J. Almeida, A. Broder, Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)
M. Mitzenmacher, Compressed bloom filters. IEEE/ACM Trans. Netw. 10(5), 604–612 (2002)
Y. Hua, Y. Zhu, H. Jiang, D. Feng, L. Tian, Scalable and adaptive metadata management in ultra large-scale file systems, in Proceedings of ICDCS (2008), pp. 403–410
A. Kumar, J.J. Xu, J. Wang, O. Spatschek, L.E. Li, Space-code bloom filter for efficient per-flow traffic measurement, in Proceedings of INFOCOM (2004), pp. 1762–1773
C. Saar, M. Yossi, Spectral bloom filters, Proceedings of ACM SIGMOD (2003), pp. 241–252
D. Guo, J. Wu, H. Chen, X. Luo, Theory and network application of dynamic bloom filters, in Proceedings of INFOCOM (2006)
B. Xiao, Y. Hua, Using parallel bloom filters for multi-attribute representation on network services. IEEE Trans. Parallel Distrib. Syst. 1, 20–32 (2010)
H. Song, F. Hao, M. Kodialam, T.V. Lakshman, IPv6 lookups using distributed and load balanced bloom filters for 100Gbps core router line cards, in INFOCOM (2009)
F. Hao, M. Kodialam, T.V. Lakshman, H. Song, Fast multiset membership testing using combinatorial bloom filters, in Proceedings of INFOCOM (2009)
F. Hao, M. Kodialam, T.V. Lakshman, Incremental bloom filters, in Proceedings of INFOCOM (2008), pp. 1741–1749
A. Broder, M. Mitzenmacher, Network applications of bloom filters: a survey. Internet Math. 1, 485–509 (2005)
A. Joly, O. Buisson, A posteriori multi-probe locality sensitive hashing, in Proceedings of ACM Multimedia (2008)
Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of ICPP (2008), pp. 644–651
M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in Proceedings of the Annual Symposium on Computational Geometry (2004), pp. 253–262
A. Andoni, M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing using stable distributions, in Nearest Neighbor Methods in Learning and Vision: Theory and Practice, ed. by T. Darrell, P. Indyk, G. Shakhnarovich (MIT Press, 2006)
M. Charikar, Similarity estimation techniques from rounding algorithms, in Proceedings of STOC (2002), pp. 380–388
N. Agrawal, W. Bolosky, J. Douceur, J. Lorch, A five-year study of file-system metadata, in Proceedings of FAST (2007)
The Forest CoverType dataset, UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/Covertype
Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems, in Proceedings of ACM/IEEE Supercomputing Conference (SC) (2009)
Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of SIGMOD (2009)
A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of ACM SIGMOD (1984), pp. 47–57
A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in textitVLDB (1999), pp. 518–529
A. Leung, I. Adams, E.L. Miller, Magellan: a searchable metadata architecture for large-scale file systems, in University of California, Santa Cruz, UCSC-SSRC-09-07 (2009)
V. Athitsos, M. Potamias, P. Papapetrou, G. Kollios, Nearest neighbor retrieval using distance-based hashing, in Proceedings of ICDE (2008)
Y. Hua, Y. Zhu, H. Jiang, D. Feng, L. Tian, Supporting scalable and adaptive metadata management in ultra large-scale file systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 22(4), 580–593 (2011)
J. Bruck, J. Gao, A. Jiang, Weighted bloom filter, in, Proceedings of the 2006 IEEE International Symposium on Information Theory (ISIT 2006) (2006), pp. 2304–2308
M. Zhong, P. Lu, K. Shen, J. Seiferas, Optimizing data popularity conscious bloom filters, in PODC (2008)
F. Hao, M. Kodialam, T. Lakshman, Building high accuracy Bloom filters using partitioned hashing, in Proceedings of SIGMETRICS (2007), pp. 277–288
B. Donnet, B. Baynat, T. Friedman, Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives, in Proceedings of ACM CoNEXT (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Hua, Y., Liu, X. (2019). Locality-Sensitive Bloom Filter for Approximate Membership Query. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_5
Download citation
DOI: https://doi.org/10.1007/978-981-13-2721-6_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2720-9
Online ISBN: 978-981-13-2721-6
eBook Packages: Computer ScienceComputer Science (R0)