High-Dimensional Similarity Search Using Data-Sensitive Space Partitioning

  • Sachin Kulkarni
  • Ratko Orlandic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4080)


Nearest neighbor search has a wide variety of applications. Unfortunately, the majority of search methods do not scale well with dimensionality. Recent efforts have been focused on finding better approximate solutions that improve the locality of data using dimensionality reduction. However, it is possible to preserve the locality of data and find exact nearest neighbors in high dimensions without dimensionality reduction. This paper introduces a novel high-performance technique to find exact k-nearest neighbors in both low and high dimensional spaces. It relies on a new method for data-sensitive space partitioning based on explicit data clustering, which is introduced in the paper for the first time. This organization supports effective reduction of the search space before accessing secondary storage. Costly Euclidean distance calculations are reduced through efficient processing of a lightweight memory-based filter. The algorithm outperforms sequential scan and the VA-File in high-dimensional situations. Moreover, the results with dynamic loading of data show that the technique works well on dynamic datasets as well.


Dimensionality Reduction Live Region Similarity Search Neighbor Search Query Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, C.C.: On the effects of dimensionality reduction on high dimensional similarity search. In: Proc. 20th PODS Conf. pp. 256–266 (2001)Google Scholar
  2. 2.
    Aggarwal, C.C.: Hierarchical subspace sampling: A unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search. In: Proc. ACM SIGMOD Conf., pp. 452–463 (2002)Google Scholar
  3. 3.
    Berchtold, S., Ertl, B., Keim, D., Kriegel, H.P., Seidl, T.: Fast nearest neighbor search in high-dimensional space. In: Proc. 14th ICDE Int. Conf. on Data Engineering, pp. 209–218 (1998)Google Scholar
  4. 4.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999, vol. 1540, pp. 217–235. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  5. 5.
    Blott, S., Weber, R.: A simple Vector-Approximation file for similarity search in high-dimensional vector spaces. Technical report, Esprit Project Hermes (no. 9141) (1997)Google Scholar
  6. 6.
    Fagin, R., Kumar, R., Shivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proc. ACM SIGMOD Conf., pp. 301–312 (2003)Google Scholar
  7. 7.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimension via hashing. In: Proc. 25th VLDB Conf., pp. 518–529 (1999)Google Scholar
  8. 8.
    Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is nearest neighbor in high dimensional spaces? In: Proc. 26th VLDB Conf., pp. 506–515 (2000)Google Scholar
  9. 9.
    Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. SIGMOD Record 26(2), 369–380 (1997)CrossRefGoogle Scholar
  10. 10.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 281–297 (1967)Google Scholar
  11. 11.
    Orlandic, R., Lukaszuk, J., Swietlik, C.: The design of a retrieval technique for high-dimensional data on tertiary storage. SIGMOD Record 31(2), 15–21 (2002)CrossRefGoogle Scholar
  12. 12.
    Orlandic, R., Lukaszuk, J.: Efficient high-dimensional indexing by superimposing space-partitioning schemes. In: Proc. 8th International Database Engineering & Applications Symposium IDEAS 2004, pp. 257–264 (2004)Google Scholar
  13. 13.
    Orlandic, R., Lai, Y., Yee, W.G.: Clustering high-dimensional data using an efficient and effective data space reduction. In: Proc. ACM Conference on Information and Knowledge Management CIKM 2005, pp. 201–208 (2005)Google Scholar
  14. 14.
    Robinson, J.T.: The K-D-B-Tree: A search structure for large multidimensional dynamic Indexes. In: Proc. ACM SIGMOD Conf., pp. 10–18 (1981)Google Scholar
  15. 15.
    Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation. In: Proc. 26th VLDB Conf., pp. 516–526 (2000)Google Scholar
  16. 16.
    Seidl, T., Kriegel, H.P.: Optimal multi-Step k-nearest neighbor search. In: Proc. ACM SIGMOD Conf., pp. 154–165 (1998)Google Scholar
  17. 17.
    Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity search methods in high-dimensional spaces. In: Proc. 24th VLDB Conf., pp. 194–205 (1998)Google Scholar
  18. 18.
    Weber, R., Zezula, P.: The theory and practice of similarity searches in high dimensional data spaces (extended abstract). In: 4th DELOS Workshop (1997)Google Scholar
  19. 19.
    Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V.: Indexing the distance: An efficient method to KNN processing. In: Proc. 26th VLDB Conf., pp. 421–430 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sachin Kulkarni
    • 1
  • Ratko Orlandic
    • 2
  1. 1.Department of Computer ScienceIllinois Institute of TechnologyChicagoUSA
  2. 2.Computer Science DepartmentUniversity of Illinois at SpringfieldSpringfieldUSA

Personalised recommendations