NNH: Improving Performance of Nearest-Neighbor Searches Using Histograms

  • Liang Jin
  • Nick Koudas
  • Chen Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2992)

Abstract

Efficient search for nearest neighbors (NN) is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper we propose a novel technique, called NNH (“Nearest Neighbor Histograms”), which uses specific histogram structures to improve the performance of NN search algorithms. A primary feature of our proposal is that such histogram structures can co-exist in conjunction with a plethora of NN search algorithms without the need to substantially modify them. The main idea behind our proposal is to choose a small number of pivot objects in the space, and pre-calculate the distances to their nearest neighbors. We provide a complete specification of such histogram structures and show how to use the information they provide towards more effective searching. In particular, we show how to construct them, how to decide the number of pivots, how to choose pivot objects, how to incrementally maintain them under dynamic updates, and how to utilize them in conjunction with a variety of NN search algorithms to improve the performance of NN searches. Our intensive experiments show that nearest neighbor histograms can be efficiently constructed and maintained, and when used in conjunction with a variety of algorithms for NN search, they can improve the performance dramatically.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Faloutsos, C., Ranganathan, M., Manolopoulos, I.: Fast Subsequence Matching in Time Series Databases. In: Proceedings of ACM SIGMOD, pp. 419–429 (1994)Google Scholar
  2. 2.
    Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  3. 3.
    Gersho, A., Gray, R.: Vector Quantization and Data Compression. Kluwer, Dordrecht (1991)Google Scholar
  4. 4.
    Ferragina, P., Grossi, R.: The String B-Tree: A New Data Structure for String Search in External Memory and Its Applications. Journal of ACM 46(2), 237–280 (1999)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Hjaltason, G.R., Samet, H.: Incremental distance join algorithms for spatial databases. In: SIGMOD (1998)Google Scholar
  6. 6.
    Shin, H., Moon, B., Lee, S.: Adaptive multi-stage distance join processing. In: SIGMOD (2000)Google Scholar
  7. 7.
    Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD, pp. 47–57 (1984)Google Scholar
  8. 8.
    Hjaltason, G.R., Samet, H.: Ranking in spatial databases. In: Symposium on Large Spatial Databases, pp. 83–95 (1995)Google Scholar
  9. 9.
    Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Transactions on Database Systems 24, 265–318 (1999)CrossRefGoogle Scholar
  10. 10.
    Jin, L., Koudas, N., Li, C.: NNH: Improving performance of nearest-neighbor searches using histograms (full version). Technical report, UC Irvine (2002)Google Scholar
  11. 11.
    Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal Histograms with Quality Guarantees. In: VLDB, pp. 275–286 (1998)Google Scholar
  12. 12.
    Mattias, Y., Vitter, J.S., Wang, M.: Dynamic Maintenance of Wavelet-Based Histograms. In: Proceedings of VLDB, Cairo, Egypt, pp. 101–111 (2000)Google Scholar
  13. 13.
    Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: The Aqua Approximate Query Answering System. In: Proceedings of ACM SIGMOD, pp. 574–578 (1999)Google Scholar
  14. 14.
    Vitter, J., Wang, M.: Approximate computation of multidimensional aggregates on sparse data using wavelets. In: Proceedings of SIGMOD, pp. 193–204 (1999)Google Scholar
  15. 15.
    Preparata, F.P., Shamos, M.I.: Computational Geometry. Springer, Heidelberg (1985)Google Scholar
  16. 16.
    Gaede, V., Gunther, O.: Multidimensional Access Methods. ACM Computing Surveys (1998)Google Scholar
  17. 17.
    Samet, H.: The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading (1990)Google Scholar
  18. 18.
    Brinkhoff, T., Kriegel, H.P., Seeger, B.: Efficient Processing of Spatial Joins using R-trees. In: Proceedings of ACM SIGMOD, pp. 237–246 (1993)Google Scholar
  19. 19.
    Brin, S.: Near neighbor search in large metric spaces. The VLDB Journal, 574–584 (1995)Google Scholar
  20. 20.
    Bustos, B., Navarro, G., Ch’avez, E.: Pivot selection techniques for proximity searching in metric spaces. In: Proc. of the XXI Conference of the Chilean Computer Science Society (SCCC 2001) (2001)Google Scholar
  21. 21.
    Filho, R.F.S., Traina, A.J.M., Traina Jr., C., Faloutsos, C.: Similarity search without tears: The OMNI family of all-purpose access methods. In: ICDE, pp. 623–630 (2001)Google Scholar
  22. 22.
    Vleugels, J., Veltkamp, R.C.: Efficient image retrieval through vantage objects. In: Visual Information and Information Systems, pp. 575–584 (1999)Google Scholar
  23. 23.
    Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked Join Indicies. In: ICDE (2003)Google Scholar
  24. 24.
    Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity Search Methods in High Dimensional Spaces. In: VLDB (1998)Google Scholar
  25. 25.
    White, D.A., Jain, R.: Similarity indexing with the ss-tree. In: ICDE, pp. 516–523 (1996)Google Scholar
  26. 26.
    Katayama, N., Satoh, S.: The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of ACM SIGMOD, pp. 369–380 (1997)Google Scholar
  27. 27.
    Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD, pp. 71–79 (1995)Google Scholar
  28. 28.
    Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.P.: A cost model for nearest neighbor search in high-dimensional data space. In: PODS, pp. 78–86 (1997)Google Scholar
  29. 29.
    Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: Proceedings of ACM SIGMOD, pp. 73–84 (1998)Google Scholar
  30. 30.
    Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, Los Altos, USA, pp. 144–155. Morgan Kaufmann Publishers, San Francisco (1994)Google Scholar
  31. 31.
    Motwani, R., Raghavan, P.: Randomized Algorithms. Prentice-Hall, Englewood Cliffs (1997)Google Scholar
  32. 32.
    Bishop, C.: Neural Networks for Pattern Recognizion. Oxford University Press, Oxford (1996)Google Scholar
  33. 33.
    Standard Template Library (2003), http://www.sgi.com/tech/stl/
  34. 34.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA: ACM-SIAM (1993)Google Scholar
  35. 35.
    Chavez, E., Navarro, G., Baeza-Yates, R.A., Marroquin, J.L.: Searching in metric spaces. ACM Computing Surveys 33, 273–321 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Liang Jin
    • 1
  • Nick Koudas
    • 2
  • Chen Li
    • 1
  1. 1.Information and Computer ScienceUniversity of CaliforniaIrvineUSA
  2. 2.AT&T Labs ResearchFlorham ParkUSA

Personalised recommendations