EDBT 2004: Advances in Database Technology - EDBT 2004 pp 385-402 | Cite as
NNH: Improving Performance of Nearest-Neighbor Searches Using Histograms
Abstract
Efficient search for nearest neighbors (NN) is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper we propose a novel technique, called NNH (“Nearest Neighbor Histograms”), which uses specific histogram structures to improve the performance of NN search algorithms. A primary feature of our proposal is that such histogram structures can co-exist in conjunction with a plethora of NN search algorithms without the need to substantially modify them. The main idea behind our proposal is to choose a small number of pivot objects in the space, and pre-calculate the distances to their nearest neighbors. We provide a complete specification of such histogram structures and show how to use the information they provide towards more effective searching. In particular, we show how to construct them, how to decide the number of pivots, how to choose pivot objects, how to incrementally maintain them under dynamic updates, and how to utilize them in conjunction with a variety of NN search algorithms to improve the performance of NN searches. Our intensive experiments show that nearest neighbor histograms can be efficiently constructed and maintained, and when used in conjunction with a variety of algorithms for NN search, they can improve the performance dramatically.
Keywords
Near Neighbor Priority Queue Query Point Queue Size Pivot PointPreview
Unable to display preview. Download preview PDF.
References
- 1.Faloutsos, C., Ranganathan, M., Manolopoulos, I.: Fast Subsequence Matching in Time Series Databases. In: Proceedings of ACM SIGMOD, pp. 419–429 (1994)Google Scholar
- 2.Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
- 3.Gersho, A., Gray, R.: Vector Quantization and Data Compression. Kluwer, Dordrecht (1991)Google Scholar
- 4.Ferragina, P., Grossi, R.: The String B-Tree: A New Data Structure for String Search in External Memory and Its Applications. Journal of ACM 46(2), 237–280 (1999)CrossRefMathSciNetGoogle Scholar
- 5.Hjaltason, G.R., Samet, H.: Incremental distance join algorithms for spatial databases. In: SIGMOD (1998)Google Scholar
- 6.Shin, H., Moon, B., Lee, S.: Adaptive multi-stage distance join processing. In: SIGMOD (2000)Google Scholar
- 7.Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD, pp. 47–57 (1984)Google Scholar
- 8.Hjaltason, G.R., Samet, H.: Ranking in spatial databases. In: Symposium on Large Spatial Databases, pp. 83–95 (1995)Google Scholar
- 9.Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Transactions on Database Systems 24, 265–318 (1999)CrossRefGoogle Scholar
- 10.Jin, L., Koudas, N., Li, C.: NNH: Improving performance of nearest-neighbor searches using histograms (full version). Technical report, UC Irvine (2002)Google Scholar
- 11.Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal Histograms with Quality Guarantees. In: VLDB, pp. 275–286 (1998)Google Scholar
- 12.Mattias, Y., Vitter, J.S., Wang, M.: Dynamic Maintenance of Wavelet-Based Histograms. In: Proceedings of VLDB, Cairo, Egypt, pp. 101–111 (2000)Google Scholar
- 13.Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: The Aqua Approximate Query Answering System. In: Proceedings of ACM SIGMOD, pp. 574–578 (1999)Google Scholar
- 14.Vitter, J., Wang, M.: Approximate computation of multidimensional aggregates on sparse data using wavelets. In: Proceedings of SIGMOD, pp. 193–204 (1999)Google Scholar
- 15.Preparata, F.P., Shamos, M.I.: Computational Geometry. Springer, Heidelberg (1985)Google Scholar
- 16.Gaede, V., Gunther, O.: Multidimensional Access Methods. ACM Computing Surveys (1998)Google Scholar
- 17.Samet, H.: The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading (1990)Google Scholar
- 18.Brinkhoff, T., Kriegel, H.P., Seeger, B.: Efficient Processing of Spatial Joins using R-trees. In: Proceedings of ACM SIGMOD, pp. 237–246 (1993)Google Scholar
- 19.Brin, S.: Near neighbor search in large metric spaces. The VLDB Journal, 574–584 (1995)Google Scholar
- 20.Bustos, B., Navarro, G., Ch’avez, E.: Pivot selection techniques for proximity searching in metric spaces. In: Proc. of the XXI Conference of the Chilean Computer Science Society (SCCC 2001) (2001)Google Scholar
- 21.Filho, R.F.S., Traina, A.J.M., Traina Jr., C., Faloutsos, C.: Similarity search without tears: The OMNI family of all-purpose access methods. In: ICDE, pp. 623–630 (2001)Google Scholar
- 22.Vleugels, J., Veltkamp, R.C.: Efficient image retrieval through vantage objects. In: Visual Information and Information Systems, pp. 575–584 (1999)Google Scholar
- 23.Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked Join Indicies. In: ICDE (2003)Google Scholar
- 24.Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity Search Methods in High Dimensional Spaces. In: VLDB (1998)Google Scholar
- 25.White, D.A., Jain, R.: Similarity indexing with the ss-tree. In: ICDE, pp. 516–523 (1996)Google Scholar
- 26.Katayama, N., Satoh, S.: The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of ACM SIGMOD, pp. 369–380 (1997)Google Scholar
- 27.Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD, pp. 71–79 (1995)Google Scholar
- 28.Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.P.: A cost model for nearest neighbor search in high-dimensional data space. In: PODS, pp. 78–86 (1997)Google Scholar
- 29.Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: Proceedings of ACM SIGMOD, pp. 73–84 (1998)Google Scholar
- 30.Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, Los Altos, USA, pp. 144–155. Morgan Kaufmann Publishers, San Francisco (1994)Google Scholar
- 31.Motwani, R., Raghavan, P.: Randomized Algorithms. Prentice-Hall, Englewood Cliffs (1997)Google Scholar
- 32.Bishop, C.: Neural Networks for Pattern Recognizion. Oxford University Press, Oxford (1996)Google Scholar
- 33.Standard Template Library (2003), http://www.sgi.com/tech/stl/
- 34.Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA: ACM-SIAM (1993)Google Scholar
- 35.Chavez, E., Navarro, G., Baeza-Yates, R.A., Marroquin, J.L.: Searching in metric spaces. ACM Computing Surveys 33, 273–321 (2001)CrossRefGoogle Scholar