Trading Quality for Time with Nearest-Neighbor Search
In many situations, users would readily accept an approximate query result if evaluation of the query becomes faster. In this article, we investigate approximate evaluation techniques based on the VA-File for Nearest-Neighbor Search (NN-Search). The VA-File contains approximations of feature points. These approximations frequently suffice to eliminate the vast majority of points in a first phase. Then, a second phase identifies the NN by computing exact distances of all remaining points. To develop approximate query-evaluation techniques, we proceed in two steps: first, we derive an analytic model for VA-File based NN-search. This is to investigate the relationship between approximation granularity, effectiveness of the filtering step and search performance. In more detail, we develop formulae for the distribution of the error of the bounds and the duration of the different phases of query evaluation. Based on these results, we develop different approximate query evaluation techniques. The first one adapts the bounds to have a more rigid filtering, the second one skips computation of the exact distances. Experiments show that these techniques have the desired effect: for instance, when allowing for a small but specific reduction of result quality, we observed a speedup of 7 in 50-NN search.
KeywordsResult Quality Approximation Quality Quality Constraint Query Evaluation Approximate Result
Unable to display preview. Download preview PDF.
- 1.Sunil Arya et al. An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. Technical report, 1998.Google Scholar
- 2.D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J.M. Hellerstein, Y. Ioannidis, H.V. Jagadish, T. Johnson, R. Ng, V. Poosala, K.A. Ross, and K. C. Sevcik. The New Jersey data reduction report. Data Engineering, 20(4):3–45, 1997.Google Scholar
- 3.N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pages 322–331, Atlantic City, NJ, 23–25 May 1990.Google Scholar
- 4.S. Berchtold, C. Böhm, B. Braunmüller, D.A. Keim, and H.-P. Kriegel. Fast parallel similarity search in multimedia databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1–12, Tucson, USA, 1997.Google Scholar
- 5.S. Berchtold, D.A. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 28–39, 1996.Google Scholar
- 6.K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “nearest neighbour” meaningful? In Catriel Beeri and Peter Buneman, editors, Proc. 7th Int. Conf. Data Theory, ICDT, number 1540 in Lecture Notes in Computer Science, LNCS, pages 217–235. Springer-Verlag, 10–12 January 1999.Google Scholar
- 7.P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the International Conference on Very Large Databases (VLDB), Greece, 1997.Google Scholar
- 8.Paolo Ciaccia, Marco Patella, and Pavel Zezula. A cost model for similarity queries in metric spaces. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS), 1998.Google Scholar
- 9.A. Dimai. Spatial encoding using differences of global features. In Storage and Retrieval for Image and Video Databases IV, volume 3022 of SPIE Proceedings Series, pages 352–360, Feb. 1997.Google Scholar
- 10.Ronald Fagin. Combining fuzzy information from multiple systems. In Procedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, volume PODS, pages 216–226, Montreal, Canada, June 1996.Google Scholar
- 11.Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity Search in High Dimensions via Hasing. In Proceedings of the 25th International Conference on Very Large Data Bases. Morgan Kaufmann, 1999. Edinburgh, Scotland.Google Scholar
- 12.A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 47–57, Boston, MA, June 1984.Google Scholar
- 13.K. V. R. Kanth, D. Agrawal, and A. Singh. Dimensionality reduction for similarity searching in dynamic databases. SIGMOD Record (ACM Special Interest Group on Management of Data), 27(2):166–176, 1998.Google Scholar
- 14.N. Katayama and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 369–380, Tucson, Arizon USA, 1997.Google Scholar
- 15.R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the International Conference on Very Large Databases (VLDB), volume 24, New York, USA, August 1998.Google Scholar
- 16.Roger Weber and Klemens Böhm. Trading quality for time with nearestneighbor search. Technical report, Dept. of Computer Science, 1999. Available at http://www-dbs.ethz.ch/~weber/paper/EDBT00Long.ps.