Skip to main content
Log in

Enabling high-dimensional range queries using kNN indexing techniques: approaches and empirical results

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Many modern search applications are high-dimensional and depend on efficient orthogonal range queries. These applications span web-based and scientific needs as well as uses for data mining. Although k-nearest neighbor queries are becoming increasingly common due to mobile and geospatial applications, orthogonal range queries in high-dimensional data remain extremely important and relevant. For efficient querying, data is typically stored in an index optimized for either kNN or range queries. This can be problematic when data is optimized for kNN retrieval and a user needs a range query or vice versa. Here, we address the issue of using a kNN-based index for range queries, as well as outline the general computational geometry problem of adapting these systems to range queries. We refer to these methods as space-based decompositions and provide a straightforward heuristic for this problem. Using iDistance as our applied kNN indexing technique, we also develop an optimal (data-based) algorithm designed specifically for its indexing scheme. We compare this method to the suggested naïve approach using real world datasets. The data-based algorithm consistently performs better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets/Corel+Image+Features.

  2. http://archive.ics.uci.edu/ml/datasets/YearPredictionMSD.

  3. http://corpus-texmex.irisa.fr/.

References

  • Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23:345–405. doi:10.1145/116873.116880

    Article  Google Scholar 

  • Bayer R, McCreight EM (1972) Organization and maintenance of large ordered indices. Acta Inform 1:173–189

    Article  MATH  Google Scholar 

  • Bellman R, Bellman RE (1961) Adaptive control processes: a guided tour, vol 4. Princeton University Press, Princeton

    Book  MATH  Google Scholar 

  • Berchtold S, Böhm C, Kriegal HP (1998) The pyramid-technique: towards breaking the curse of dimensionality. SIGMOD Rec 27:142–153

    Article  Google Scholar 

  • de Berg M, Cheong O, van Kreveld M, Overmars M (2008) Computational geometry: algorithms and applications, 3rd edn. Springer, Heidelberg

    Book  MATH  Google Scholar 

  • Chazelle B (1990) Lower bounds for orthogonal range searching: I. The reporting case. J ACM 37(2):200–212. doi:10.1145/77600.77614

    Article  MathSciNet  MATH  Google Scholar 

  • Chen Z, Fu B, Tang Y, Zhu B (2006) A ptas for a disc covering problem using width-bounded separators. J Comb Optim 11(2):203–217. doi:10.1007/s10878-006-7132-y

    Article  MathSciNet  MATH  Google Scholar 

  • Doulkeridis C, Vlachou A, Kotidis Y, Vazirgiannis M (2007) Peer-to-peer similarity search in metric spaces. In: Proceedings of the 33rd international conference on very large data bases, VLDB’07, pp 986–997

  • Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 47–57

  • Hales TC (2006) Historical overview of the kepler conjecture. Discret Comput Geom 36:5–20

    Article  MathSciNet  MATH  Google Scholar 

  • Hales TC (2014) The flyspeck project. https://code.google.com/p/flyspeck/. Accessed 10 Oct 2014

  • Hales TC, McLaughlin S (2008) A proof of the dodecahedral conjecture. CoRR abs/9811079, 9811079v3

  • Ilarri S, Mena E, Illarramendi A (2006) Location-dependent queries in mobile contexts: distributed processing using mobile agents. IEEE Trans Mob Comput 5(8):1029–1043

    Article  Google Scholar 

  • Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30:364–397

    Article  Google Scholar 

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137. doi:10.1109/tit.1982.1056489

    Article  MathSciNet  MATH  Google Scholar 

  • Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE international conference on computer vision, vol 2, pp 1150–1157

  • Lu Y, Chen D, Cha J (2015) Packing cubes into a cube is NP-complete in the strong sense. J Comb Optim 29(1):197–215. doi:10.1007/s10878-013-9701-1

    Article  MathSciNet  MATH  Google Scholar 

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of the 5th Berkeley symposium on Mathematical Statistics and Probability, UC Press, vol 1, pp 281–297

  • Ooi BC, Tan KL, Yu C, Bressan S (2000) Indexing the edges: a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, ACM, New York, PODS’00, pp 166–174

  • Qu L, Chen Y, Yang X (2008) iDistance based interactive visual surveillance retrieval algorithm. In: Intelligent Computation Technology and Automation (ICICTA), IEEE, vol 1, pp 71–75

  • Samet H (2006) Foundations of multidimensional and metric data structures (The Morgan Kaufmann series in computer graphics and geometric modeling). Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  • Schuh MA, Wylie T, Angryk RA (2013a) Improving the performance of high-dimensional knn retrieval through localized dataspace segmentation and hybrid indexing. In: Advances in databases and information systems (ADBIS’13). Lecture notes in computer science, vol 8133. Springer, Berlin, pp 344–357

  • Schuh MA, Wylie T, Banda JM, Angryk RA (2013b) A comprehensive study of idistance partitioning strategies for knn queries and high-dimensional data indexing. In: The 29th British national conference on databases (BNCOD’13). Lecture notes in computer science, vol 7968. Springer, Berlin, pp 238–252

  • Schuh MA, Wylie T, Angryk RA (2014a) Mitigating the curse of dimensionality for exact knn retrieval. In: Proceedings of the 27th international Florida artifical intelligence research society conference, FLAIRS’14, pp 363–368

  • Schuh MA, Wylie T, Liu C, Angryk RA (2014b) Approximating high-dimensional range queries with knn indexing techniques. In: The 20th international computing and combinatorics conference (COCOON’14). Lecture notes in computer science, vol 8591, pp 369–380

  • Shen HT (2005) Towards effective indexing for very large video sequence database. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD’05, pp 730–741

  • Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40(4):175–179

    Article  MATH  Google Scholar 

  • Yu C, Ooi BC, Tan KL, Jagadish HV (2001) Indexing the distance: an efficient method to KNN processing. In: Proceedings of the 27th international conference on very large data bases, Morgan Kaufmann Publishers Inc., San Francisco, VLDB’01, pp 421–430

  • Zhang J, Zhou X, Wang W, Shi B, Pei J (2006) Using high dimensional indexes to support relevance feedback based interactive images retrieval. In: Proceedings of the 32nd international conference on very large data bases, VLDB’06, pp 1211–1214

  • Zhang R, Ooi B, Tan KL (2004) Making the pyramid technique robust to query types and workloads. In: Proceedings of the 20th international conference on data engineering, pp 313–324

  • Zhu B (2007) On the 1-density of unit ball covering. CoRR abs/0711.2092, 0711.2092v4

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Wylie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wylie, T., Schuh, M.A. & Angryk, R.A. Enabling high-dimensional range queries using kNN indexing techniques: approaches and empirical results. J Comb Optim 32, 1107–1132 (2016). https://doi.org/10.1007/s10878-015-9927-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-015-9927-1

Keywords

Navigation