Skip to main content

Efficient kNN Join over Dynamic High-Dimensional Data

  • 95 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13459)

Abstract

Given a user dataset U and an object dataset I in high-dimensional space, a kNN join query retrieves each object in dataset U its k nearest neighbors from the dataset I. kNN join is a fundamental and essential operation in applications from many domains such as databases, computer vision, multi-media, machine learning, recommendation systems, and many more. The datasets in real world often update dynamically on insertion or deletion of objects. However, existing algorithms of dynamic kNN join lack support for deletion and batch update, which are important in real-life applications. In this paper, we propose a new method of kNN join over dynamic high-dimensional data. Specifically, our method features lazy updates, batch operations, and optimised deletions. Experiments on real-world datasets show that our method outperforms the existing algorithms of naive RkNN join and HDR Tree by up to 5 and 4 times, respectively.

Keywords

  • kNN join
  • Dynamic data
  • High-dimensional data

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-15512-3_5
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-15512-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

References

  1. Achlioptas, D.: Database-friendly random projections. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 274–281 (2001)

    Google Scholar 

  2. Berchtold, S., Böhm, C., Kriegal, H.P.: The pyramid-technique: towards breaking the curse of dimensionality. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 142–153 (1998)

    Google Scholar 

  3. Böhm, C., Krebs, F.: The k-nearest neighbour join: Turbo charging the KDD process. Knowl. Inf. Syst. 6(6), 728–749 (2004)

    CrossRef  Google Scholar 

  4. Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: a new approach to indexing high dimensional spaces. In: VLDB Conference (2000)

    Google Scholar 

  5. Cheema, M.A., Zhang, W., Lin, X., Zhang, Y.: Efficiently processing snapshot and continuous reverse k nearest neighbors queries. VLDB J. 21(5), 703–728 (2012)

    CrossRef  Google Scholar 

  6. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009)

    Google Scholar 

  7. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, vol. 97, pp. 426–435 (1997)

    Google Scholar 

  8. Cui, B., Ooi, B.C., Su, J., Tan, K.L.: Contorting high dimensional data for efficient main memory KNN processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management Of Data, pp. 479–490 (2003)

    Google Scholar 

  9. Gionis, A., et al.: Similarity search in high dimensions via hashing. In: VLDB, vol. 99, pp. 518–529 (1999)

    Google Scholar 

  10. Hu, Y., Yang, C., Zhan, P., Zhao, J., Li, Y., Li, X.: Efficient continuous KNN join processing for real-time recommendation. Personal Ubiquit. Comput. 25(6), 1001–1011 (2021)

    CrossRef  Google Scholar 

  11. Jagadish, H.V., Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. (TODS) 30(2), 364–397 (2005)

    CrossRef  Google Scholar 

  12. Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: Indexing the edges-a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 166–174 (2000)

    Google Scholar 

  13. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol. 98, pp. 194–205 (1998)

    Google Scholar 

  14. Xia, C., Lu, H., Ooi, B.C., Hu, J.: GORDER: an efficient method for KNN join processing. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 756–767 (2004)

    Google Scholar 

  15. Yang, C., Yu, X., Liu, Y.: Continuous KNN join processing for real-time recommendation. In: 2014 IEEE International Conference on Data Mining, pp. 640–649. IEEE (2014)

    Google Scholar 

  16. Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based KNN join processing for high-dimensional data. Inf. Softw. Technol. 49(4), 332–344 (2007)

    CrossRef  Google Scholar 

  17. Yu, C., Zhang, R., Huang, Y., Xiong, H.: High-dimensional KNN joins with incremental updates. Geoinformatica 14(1), 55–82 (2010)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengyi Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Ukey, N., Yang, Z., Zhang, G., Liu, B., Li, B., Zhang, W. (2022). Efficient kNN Join over Dynamic High-Dimensional Data. In: Hua, W., Wang, H., Li, L. (eds) Databases Theory and Applications. ADC 2022. Lecture Notes in Computer Science, vol 13459. Springer, Cham. https://doi.org/10.1007/978-3-031-15512-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15512-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15511-6

  • Online ISBN: 978-3-031-15512-3

  • eBook Packages: Computer ScienceComputer Science (R0)