Advertisement

Journal of Combinatorial Optimization

, Volume 32, Issue 4, pp 1107–1132 | Cite as

Enabling high-dimensional range queries using kNN indexing techniques: approaches and empirical results

  • Tim Wylie
  • Michael A. Schuh
  • Rafal A. Angryk
Article

Abstract

Many modern search applications are high-dimensional and depend on efficient orthogonal range queries. These applications span web-based and scientific needs as well as uses for data mining. Although k-nearest neighbor queries are becoming increasingly common due to mobile and geospatial applications, orthogonal range queries in high-dimensional data remain extremely important and relevant. For efficient querying, data is typically stored in an index optimized for either kNN or range queries. This can be problematic when data is optimized for kNN retrieval and a user needs a range query or vice versa. Here, we address the issue of using a kNN-based index for range queries, as well as outline the general computational geometry problem of adapting these systems to range queries. We refer to these methods as space-based decompositions and provide a straightforward heuristic for this problem. Using iDistance as our applied kNN indexing technique, we also develop an optimal (data-based) algorithm designed specifically for its indexing scheme. We compare this method to the suggested naïve approach using real world datasets. The data-based algorithm consistently performs better.

Keywords

Indexing Nearest neighbor kNN Range queries High-dimensional data iDistance Wildcard search Sphere cover 

References

  1. Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23:345–405. doi: 10.1145/116873.116880 CrossRefGoogle Scholar
  2. Bayer R, McCreight EM (1972) Organization and maintenance of large ordered indices. Acta Inform 1:173–189CrossRefMATHGoogle Scholar
  3. Bellman R, Bellman RE (1961) Adaptive control processes: a guided tour, vol 4. Princeton University Press, PrincetonCrossRefMATHGoogle Scholar
  4. Berchtold S, Böhm C, Kriegal HP (1998) The pyramid-technique: towards breaking the curse of dimensionality. SIGMOD Rec 27:142–153CrossRefGoogle Scholar
  5. de Berg M, Cheong O, van Kreveld M, Overmars M (2008) Computational geometry: algorithms and applications, 3rd edn. Springer, HeidelbergCrossRefMATHGoogle Scholar
  6. Chazelle B (1990) Lower bounds for orthogonal range searching: I. The reporting case. J ACM 37(2):200–212. doi: 10.1145/77600.77614 MathSciNetCrossRefMATHGoogle Scholar
  7. Chen Z, Fu B, Tang Y, Zhu B (2006) A ptas for a disc covering problem using width-bounded separators. J Comb Optim 11(2):203–217. doi: 10.1007/s10878-006-7132-y MathSciNetCrossRefMATHGoogle Scholar
  8. Doulkeridis C, Vlachou A, Kotidis Y, Vazirgiannis M (2007) Peer-to-peer similarity search in metric spaces. In: Proceedings of the 33rd international conference on very large data bases, VLDB’07, pp 986–997Google Scholar
  9. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 47–57Google Scholar
  10. Hales TC (2006) Historical overview of the kepler conjecture. Discret Comput Geom 36:5–20MathSciNetCrossRefMATHGoogle Scholar
  11. Hales TC (2014) The flyspeck project. https://code.google.com/p/flyspeck/. Accessed 10 Oct 2014
  12. Hales TC, McLaughlin S (2008) A proof of the dodecahedral conjecture. CoRR abs/9811079, 9811079v3Google Scholar
  13. Ilarri S, Mena E, Illarramendi A (2006) Location-dependent queries in mobile contexts: distributed processing using mobile agents. IEEE Trans Mob Comput 5(8):1029–1043CrossRefGoogle Scholar
  14. Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30:364–397CrossRefGoogle Scholar
  15. Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137. doi: 10.1109/tit.1982.1056489 MathSciNetCrossRefMATHGoogle Scholar
  16. Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol 2, pp 1150–1157Google Scholar
  17. Lu Y, Chen D, Cha J (2015) Packing cubes into a cube is NP-complete in the strong sense. J Comb Optim 29(1):197–215. doi: 10.1007/s10878-013-9701-1 MathSciNetCrossRefMATHGoogle Scholar
  18. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of the 5th Berkeley symposium on Mathematical Statistics and Probability, UC Press, vol 1, pp 281–297Google Scholar
  19. Ooi BC, Tan KL, Yu C, Bressan S (2000) Indexing the edges: a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, ACM, New York, PODS’00, pp 166–174Google Scholar
  20. Qu L, Chen Y, Yang X (2008) iDistance based interactive visual surveillance retrieval algorithm. In: Intelligent Computation Technology and Automation (ICICTA), IEEE, vol 1, pp 71–75Google Scholar
  21. Samet H (2006) Foundations of multidimensional and metric data structures (The Morgan Kaufmann series in computer graphics and geometric modeling). Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  22. Schuh MA, Wylie T, Angryk RA (2013a) Improving the performance of high-dimensional knn retrieval through localized dataspace segmentation and hybrid indexing. In: Advances in databases and information systems (ADBIS’13). Lecture notes in computer science, vol 8133. Springer, Berlin, pp 344–357Google Scholar
  23. Schuh MA, Wylie T, Banda JM, Angryk RA (2013b) A comprehensive study of idistance partitioning strategies for knn queries and high-dimensional data indexing. In: The 29th British national conference on databases (BNCOD’13). Lecture notes in computer science, vol 7968. Springer, Berlin, pp 238–252Google Scholar
  24. Schuh MA, Wylie T, Angryk RA (2014a) Mitigating the curse of dimensionality for exact knn retrieval. In: Proceedings of the 27th international Florida artifical intelligence research society conference, FLAIRS’14, pp 363–368Google Scholar
  25. Schuh MA, Wylie T, Liu C, Angryk RA (2014b) Approximating high-dimensional range queries with knn indexing techniques. In: The 20th international computing and combinatorics conference (COCOON’14). Lecture notes in computer science, vol 8591, pp 369–380Google Scholar
  26. Shen HT (2005) Towards effective indexing for very large video sequence database. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD’05, pp 730–741Google Scholar
  27. Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40(4):175–179CrossRefMATHGoogle Scholar
  28. Yu C, Ooi BC, Tan KL, Jagadish HV (2001) Indexing the distance: an efficient method to KNN processing. In: Proceedings of the 27th international conference on very large data bases, Morgan Kaufmann Publishers Inc., San Francisco, VLDB’01, pp 421–430Google Scholar
  29. Zhang J, Zhou X, Wang W, Shi B, Pei J (2006) Using high dimensional indexes to support relevance feedback based interactive images retrieval. In: Proceedings of the 32nd international conference on very large data bases, VLDB’06, pp 1211–1214Google Scholar
  30. Zhang R, Ooi B, Tan KL (2004) Making the pyramid technique robust to query types and workloads. In: Proceedings of the 20th international conference on data engineering, pp 313–324Google Scholar
  31. Zhu B (2007) On the 1-density of unit ball covering. CoRR abs/0711.2092, 0711.2092v4Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Tim Wylie
    • 1
  • Michael A. Schuh
    • 2
  • Rafal A. Angryk
    • 3
  1. 1.University of Texas - Rio Grande ValleyEdinburgUSA
  2. 2.Montana State UniversityBozemanUSA
  3. 3.Georgia State UniversityAtlantaUSA

Personalised recommendations