Abstract
Locality-Sensitive Hashing (LSH) is a popular method for answering c-approximate nearest neighbor queries in high-dimensional spaces. Existing LSH methods are either DRAM-based ones which consume a vast amount of expensive DRAM and are time-consuming to rebuild after programs reboot, or disk-based ones such as the state-of-the-art QALSH (Query-Aware LSH), which suffers from high latency of disk I/O. In this paper, we find that the emerging non-volatile memory (NVM) can be leveraged to solve the above problems. Its economic characteristics and data durability urge us to persist most of the LSH index in NVM to reduce DRAM occupancy; and its byte-addressability and low latency contribute to fast query processing. Since QALSH uses B+-Trees as index data structures and LB-Tree is the state-of-the-art NVM-optimized B+-Tree, we first directly combine QALSH with LB-Tree to get LB-QALSH. However, LB-QALSH shows poor query performance under NVM. To fully utilize the advantages of NVM, we propose an NVM-optimized implementation of QALSH, named NV-QALSH, which is the first NVM-optimized LSH. NV-QALSH adopts three optimization designs to achieve a high query performance. Experiments show that NV-QALSH outperforms LB-QALSH with a 1.5-4.7x speedup. Furthermore, compared with the state-of-the-art DRAM-based LSH, NV-QALSH greatly reduces the DRAM occupancy and the index rebuilt time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Chen, S., Gibbons, P.B., Nath, S., et al.: Rethinking database algorithms for phase change memory. In: CIDR, vol. 11, p. 5 (2011)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262 (2004)
Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 541–552 (2012)
Gong, L., Wang, H., Ogihara, M., Xu, J.: iDEC: indexable distance estimating codes for approximate nearest neighbor search. Proc. VLDB Endow. 13(9), 1483–1497 (2020)
Huang, Q., Feng, J., Fang, Q., Ng, W., Wang, W.: Query-aware locality-sensitive hashing scheme for \(l_p\) norm. VLDB J. 26(5), 683–708 (2017). https://doi.org/10.1007/s00778-017-0472-7
Huang, Q., Feng, J., Zhang, Y., Fang, Q., Ng, W.: Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proc. VLDB Endow. 9(1), 1–12 (2015)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Lei, Y., Huang, Q., Kankanhalli, M., Tung, A.K.: Locality-sensitive hashing scheme based on longest circular co-substring. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2589–2599 (2020)
Liu, J., Chen, S., Wang, L.: Lb+ trees: optimizing persistent index performance on 3dxpoint memory. Proc. VLDB Endow. 13(7), 1078–1090 (2020)
Liu, W., Wang, H., Zhang, Y., Wang, W., Qin, L.: I-LSH: I/O efficient c-approximate nearest neighbor search in high-dimensional space. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1670–1673. IEEE (2019)
Lu, K., Kudo, M.: R2LSH: a nearest neighbor search scheme based on two-dimensional projected spaces. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1045–1056. IEEE (2020)
Lu, K., Wang, H., Wang, W., Kudo, M.: VHP: approximate nearest neighbor search via virtual hypersphere partitioning. Proc. VLDB Endow. 13(9), 1443–1455 (2020)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 950–961 (2007)
Oukid, I., Lasperas, J., Nica, A., Willhalm, T., Lehner, W.: FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In: Proceedings of the 2016 International Conference on Management of Data, pp. 371–386 (2016)
Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional Euclidean space with a tiny index. In: Proceedings of the VLDB Endowment (2014)
Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. (TODS) 35(3), 1–46 (2010)
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. VLDB 98, 194–205 (1998)
Yang, C., Deng, D., Shang, S., Shao, L.: Efficient locality-sensitive hashing over high-dimensional data streams. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1986–1989. IEEE (2020)
Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: 18th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 2020), pp. 169–182 (2020)
Acknowledgements
The corresponding author of this work is Jianlin Feng. This work is partially supported by China NSFC under Grant No. 61772563.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Yao, Z., Zhang, J., Feng, J. (2021). NV-QALSH: An NVM-Optimized Implementation of Query-Aware Locality-Sensitive Hashing. In: Strauss, C., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2021. Lecture Notes in Computer Science(), vol 12924. Springer, Cham. https://doi.org/10.1007/978-3-030-86475-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-86475-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86474-3
Online ISBN: 978-3-030-86475-0
eBook Packages: Computer ScienceComputer Science (R0)