Skip to main content

NV-QALSH: An NVM-Optimized Implementation of Query-Aware Locality-Sensitive Hashing

  • Conference paper
  • First Online:
  • 833 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12924))

Abstract

Locality-Sensitive Hashing (LSH) is a popular method for answering c-approximate nearest neighbor queries in high-dimensional spaces. Existing LSH methods are either DRAM-based ones which consume a vast amount of expensive DRAM and are time-consuming to rebuild after programs reboot, or disk-based ones such as the state-of-the-art QALSH (Query-Aware LSH), which suffers from high latency of disk I/O. In this paper, we find that the emerging non-volatile memory (NVM) can be leveraged to solve the above problems. Its economic characteristics and data durability urge us to persist most of the LSH index in NVM to reduce DRAM occupancy; and its byte-addressability and low latency contribute to fast query processing. Since QALSH uses B+-Trees as index data structures and LB-Tree is the state-of-the-art NVM-optimized B+-Tree, we first directly combine QALSH with LB-Tree to get LB-QALSH. However, LB-QALSH shows poor query performance under NVM. To fully utilize the advantages of NVM, we propose an NVM-optimized implementation of QALSH, named NV-QALSH, which is the first NVM-optimized LSH. NV-QALSH adopts three optimization designs to achieve a high query performance. Experiments show that NV-QALSH outperforms LB-QALSH with a 1.5-4.7x speedup. Furthermore, compared with the state-of-the-art DRAM-based LSH, NV-QALSH greatly reduces the DRAM occupancy and the index rebuilt time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.

  2. 2.

    http://yann.lecun.com/exdb/mnist/.

  3. 3.

    http://corpus-texmex.irisa.fr/.

  4. 4.

    http://www.cs.toronto.edu/~kriz/cifar.html.

  5. 5.

    http://archive.ics.uci.edu/ml/datasets/p53+Mutants.

  6. 6.

    https://github.com/HuangQiang/QALSH.

  7. 7.

    https://github.com/HuangQiang/QALSH_Mem.

  8. 8.

    https://github.com/DBWangGroupUNSW/SRS.

  9. 9.

    https://github.com/1flei/lccs-lsh.

References

  1. Chen, S., Gibbons, P.B., Nath, S., et al.: Rethinking database algorithms for phase change memory. In: CIDR, vol. 11, p. 5 (2011)

    Google Scholar 

  2. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262 (2004)

    Google Scholar 

  3. Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 541–552 (2012)

    Google Scholar 

  4. Gong, L., Wang, H., Ogihara, M., Xu, J.: iDEC: indexable distance estimating codes for approximate nearest neighbor search. Proc. VLDB Endow. 13(9), 1483–1497 (2020)

    Article  Google Scholar 

  5. Huang, Q., Feng, J., Fang, Q., Ng, W., Wang, W.: Query-aware locality-sensitive hashing scheme for \(l_p\) norm. VLDB J. 26(5), 683–708 (2017). https://doi.org/10.1007/s00778-017-0472-7

    Article  Google Scholar 

  6. Huang, Q., Feng, J., Zhang, Y., Fang, Q., Ng, W.: Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proc. VLDB Endow. 9(1), 1–12 (2015)

    Article  Google Scholar 

  7. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)

    Google Scholar 

  8. Lei, Y., Huang, Q., Kankanhalli, M., Tung, A.K.: Locality-sensitive hashing scheme based on longest circular co-substring. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2589–2599 (2020)

    Google Scholar 

  9. Liu, J., Chen, S., Wang, L.: Lb+ trees: optimizing persistent index performance on 3dxpoint memory. Proc. VLDB Endow. 13(7), 1078–1090 (2020)

    Article  Google Scholar 

  10. Liu, W., Wang, H., Zhang, Y., Wang, W., Qin, L.: I-LSH: I/O efficient c-approximate nearest neighbor search in high-dimensional space. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1670–1673. IEEE (2019)

    Google Scholar 

  11. Lu, K., Kudo, M.: R2LSH: a nearest neighbor search scheme based on two-dimensional projected spaces. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1045–1056. IEEE (2020)

    Google Scholar 

  12. Lu, K., Wang, H., Wang, W., Kudo, M.: VHP: approximate nearest neighbor search via virtual hypersphere partitioning. Proc. VLDB Endow. 13(9), 1443–1455 (2020)

    Article  Google Scholar 

  13. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 950–961 (2007)

    Google Scholar 

  14. Oukid, I., Lasperas, J., Nica, A., Willhalm, T., Lehner, W.: FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In: Proceedings of the 2016 International Conference on Management of Data, pp. 371–386 (2016)

    Google Scholar 

  15. Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional Euclidean space with a tiny index. In: Proceedings of the VLDB Endowment (2014)

    Google Scholar 

  16. Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. (TODS) 35(3), 1–46 (2010)

    Article  Google Scholar 

  17. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. VLDB 98, 194–205 (1998)

    Google Scholar 

  18. Yang, C., Deng, D., Shang, S., Shao, L.: Efficient locality-sensitive hashing over high-dimensional data streams. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1986–1989. IEEE (2020)

    Google Scholar 

  19. Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: 18th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 2020), pp. 169–182 (2020)

    Google Scholar 

Download references

Acknowledgements

The corresponding author of this work is Jianlin Feng. This work is partially supported by China NSFC under Grant No. 61772563.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianlin Feng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yao, Z., Zhang, J., Feng, J. (2021). NV-QALSH: An NVM-Optimized Implementation of Query-Aware Locality-Sensitive Hashing. In: Strauss, C., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2021. Lecture Notes in Computer Science(), vol 12924. Springer, Cham. https://doi.org/10.1007/978-3-030-86475-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86475-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86474-3

  • Online ISBN: 978-3-030-86475-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics