Approximation Techniques to Enable Dimensionality Reduction for Voronoi-Based Nearest Neighbor Search

  • Christoph Brochhaus
  • Marc Wichterich
  • Thomas Seidl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

Utilizing spatial index structures on secondary memory for nearest neighbor search in high-dimensional data spaces has been the subject of much research. With the potential to host larger indexes in main memory, applications demanding a high query throughput stand to benefit from index structures tailored for that environment. “Index once, query at very high frequency” scenarios on semi-static data require particularly fast responses while allowing for more extensive precalculations. One such precalculation consists of indexing the solution space for nearest neighbor queries as used by the approximate Voronoi cell-based method. A major deficiency of this promising approach is the lack of a way to incorporate effective dimensionality reduction techniques. We propose methods to overcome the difficulties faced for normalized data and present a second reduction step that improves response times through limiting the dimensionality of the Voronoi cell approximations. In addition, we evaluate the suitability of our approach for main memory indexing where speedup factors of up to five can be observed for real world data sets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berchtold, S., Ertl, B., Keim, D.A., Kriegel, H.P., Seidl, T.: Fast Nearest Neighbor Search in High-Dimensional Spaces. In: ICDE Conf, pp. 209–218 (1998)Google Scholar
  2. 2.
    Berchtold, S., Keim, D.A., Kriegel, H.P., Seidl, T.: Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space. In: IEEE Trans. Knowl. Data Eng, vol. 12, pp. 45–57 (2000)Google Scholar
  3. 3.
    Dobkin, D., Lipton, R.: Multidimensional Searching Problems. SIAM J. on Computing 5, 181–186 (1976)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB Conf, pp. 194–205 (1998)Google Scholar
  5. 5.
    Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: SIGMOD Conf., pp. 47–57 (1984)Google Scholar
  6. 6.
    Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In: SIGMOD Conf., pp. 322–331 (1990)Google Scholar
  7. 7.
    Kim, K., Cha, S.K., Kwon, K.: Optimizing Multidimensional Index Trees for Main MemoryAccess. In: SIGMOD Conf, 139–150 (2001)Google Scholar
  8. 8.
    Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-Tree: An Index Structure for High-Dimensional Data. In: VLDB Conf, 28–39 (1996)Google Scholar
  9. 9.
    Roussopoulos, N., Kelley, S., Vincent, S.: Nearest Neighbor Queries. In: SIGMOD Conf, 71–79 (1995)Google Scholar
  10. 10.
    Hjaltason, G.R., Samet, H.: Ranking in Spatial Databases. In: SSD, pp. 83–95 (1995)Google Scholar
  11. 11.
    Bohannon, P., McIlroy, P., Rastogi, R.: Main-Memory Index Structures with Fixed-Size Partial Keys. In: SIGMOD Conf, pp. 163–174 (2001)Google Scholar
  12. 12.
    Rao, J., Ross, K.A.: Making B+-Trees Cache Conscious in Main Memory. In: SIGMOD Conf., pp. 475–486 (2000)Google Scholar
  13. 13.
    Voronoi, G.: Nouvelles applications des parametres continus la theorie des formes quadratiques. J. für die reine und angewandte Mathematik 138, 198–287 (1908)CrossRefGoogle Scholar
  14. 14.
    Aurenhammer, F., Klein, R.: Handbook of Computational Geometry, pp. 201–290. Elsevier Science Publishers, Amsterdam (2000)CrossRefGoogle Scholar
  15. 15.
    Klee, V.: On the Complexity of d-dimensional Voronoi Diagrams. Archiv der Mathematik 34, 75–80 (1980)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Seidel, R.: On the Number of Faces in Higher-Dimensional Voronoi Diagrams. In: Symposium on Computational Geometry, pp. 181–185 (1987)Google Scholar
  17. 17.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge (1992)Google Scholar
  18. 18.
    Kaski, S.: Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering. IJCNN, 413–418 (1998)Google Scholar
  19. 19.
    Edelsbrunner, H.: Algorithms in Combinatorial Geometry. Springer-Verlag (1987)Google Scholar
  20. 20.
    Jaffar, J., Maher, M.J., Stuckey, P.J., Yap, R.H.C.: Projecting CLP(R) Constraints. New Generation Computing 11, 449–469 (1993)MATHCrossRefGoogle Scholar
  21. 21.
    Bradford Barber, C., Dobkin, D., Huhdanpaa, H.: The Quickhull Algorithm for Convex Hulls. ACM Trans. Math. Softw. 22, 469–483 (1996)MATHCrossRefGoogle Scholar
  22. 22.
    Goldstein, J., Platt, J.C., Burges, C.J.C.: Indexing High Dimensional Rectangles for Fast Multimedia Identification. Technical Report MSR-TR-2003-38, Microsoft Research (2003)Google Scholar
  23. 23.
    Hafner, J., Sawhney, H.S., Equitz, W., Flickner, M., Niblack, W.: Efficient Color Histogram Indexing for Quadratic Form Distance Functions. IEEE Trans. PAMI 17, 729–736 (1995)Google Scholar
  24. 24.
    Wahlster, W.: Verbmobil: Foundations of Speech-to-Speech Translation, pp. 537–631. Springer, Heidelberg (2000)Google Scholar
  25. 25.
    Keogh, E., Folias, T.: The UCR Time Series Data Mining Archive. (2002), http://www.cs.ucr.edu/~eamonn/TSDMA/index.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Christoph Brochhaus
    • 1
  • Marc Wichterich
    • 1
  • Thomas Seidl
    • 1
  1. 1.Data Management and Exploration GroupRWTH Aachen UniversityGermany

Personalised recommendations