Advertisement

GeoInformatica

, 14:55 | Cite as

High-dimensional kNN joins with incremental updates

  • Cui YuEmail author
  • Rui Zhang
  • Yaochun Huang
  • Hui Xiong
Article

Abstract

The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin +, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin+ for processing high-dimensional kNN joins in dynamic workloads.

Keywords

Query optimization Storage & access Optimization and performance 

References

  1. 1.
  2. 2.
    Achtert E, Böhm C, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In: SIGMOD’06: proceedings of the 2006 ACM SIGMOD international conference on management of data, pp 515–526Google Scholar
  3. 3.
    Berchtold S, Keim DA (1998) High-dimensional index structures database support for next decade’s applications (tutorial). In: SIGMOD ’98: proceedings of the 1998 ACM SIGMOD international conference on management of data, p 501Google Scholar
  4. 4.
    Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Proceeding of the 7th international conference on database theory (ICDT), pp 217–235Google Scholar
  5. 5.
    Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373CrossRefGoogle Scholar
  6. 6.
    Böhm C, Krebs F (2004) The k-nearest neighbor join: turbo charging the kdd process. Knowl Inf Syst (KAIS) 6(6):728–749CrossRefGoogle Scholar
  7. 7.
    Böhm C, Kriegel H-P (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the 7th international conference on extending database technology (EDBT), pp 36–50Google Scholar
  8. 8.
    Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: VLDB ’97: proceedings of the 23rd international conference on very large data bases, pp 426–435Google Scholar
  9. 9.
    Dasarathy BV (1991) Nearest neighbor (nn) norms - nn pattern classification techniques. IEEE Computer Society, Silver SpringGoogle Scholar
  10. 10.
    Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57Google Scholar
  11. 11.
    Hartigan J, Wong M (1979) A K-means clustering algorithm. Appl Stat 28:100–108CrossRefGoogle Scholar
  12. 12.
    Huang X, Jensen CS, Saltenis S (2006) Multiple k nearest neighbor query processing in spatial network databases. In: ADBIS ’06: proceedings of 10th East European conference of advances in databases and information systems, pp 266–281Google Scholar
  13. 13.
    Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397CrossRefGoogle Scholar
  14. 14.
    Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 201–212Google Scholar
  15. 15.
    Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3:517–542CrossRefGoogle Scholar
  16. 16.
    Rafiei D, Mendelzon A (2000) Querying time series data based on similarity. IEEE Trans Knowl Data Eng 12(5):675–693CrossRefGoogle Scholar
  17. 17.
    Tao Y, Papadias D, Lian X (2004) Reverse knn search in arbitrary dimensionality. In: VLDB ’04: Proceedings of the 30th international conference on very large data bases, pp 744–755Google Scholar
  18. 18.
    Tao Y, Yiu, ML, Mamoulis N (2006) Reverse nearest neighbor search in metric spaces. IEEE Trans Knowl Data Eng 18(9):1239–1252CrossRefGoogle Scholar
  19. 19.
    Weber R, Schek H, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: proceedings of the 24rd international conference on very large data bases, pp 194–205Google Scholar
  20. 20.
    Wong R, Tao Y, Fu A, Xiao X, Pryakhin A, Renz M (2007) On efficient spatial matching. In: VLDB ’07: proceedings of the 33rd international conference on very large data bases, pp 579–590Google Scholar
  21. 21.
    Xia C, Lu H, Ooi BC, Hu J (2004) Gorder: an efficient method for knn join processing. In: VLDB ’04: proceedings of the 30th international conference on very large data bases, pp 756–767Google Scholar
  22. 22.
    Yang C, Lin K (2001) An index structure for efficient reverse nearest neighbor queries In: Proceedings of the 17th international conference on data engineering (ICDE), pp 485–492Google Scholar
  23. 23.
    Yu C, Cui B, Wang S, Su J (2007) Efficient index-based knn join processing for high-dimensional data. Inf Softw Technol 49(4):332–344CrossRefGoogle Scholar
  24. 24.
    Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to knn processing. In: VLDB ’01: proceedings of the 27th international conference on very large data bases, pp 166–174Google Scholar
  25. 25.
    Zhang R, Koudas N, Ooi BC, Srivastava D (2005) Multiple aggregations over data streams. In: SIGMOD ’05: proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 299–310Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Monmouth UniversityWest Long BranchUSA
  2. 2.University of MelbourneCarltonAustralia
  3. 3.University of Texas - DallasDallasUSA
  4. 4.Rutgers, the State University of New JerseyNewarkUSA

Personalised recommendations