Abstract
In many situations, users would readily accept an approximate query result if evaluation of the query becomes faster. In this article, we investigate approximate evaluation techniques based on the VA-File for Nearest-Neighbor Search (NN-Search). The VA-File contains approximations of feature points. These approximations frequently suffice to eliminate the vast majority of points in a first phase. Then, a second phase identifies the NN by computing exact distances of all remaining points. To develop approximate query-evaluation techniques, we proceed in two steps: first, we derive an analytic model for VA-File based NN-search. This is to investigate the relationship between approximation granularity, effectiveness of the filtering step and search performance. In more detail, we develop formulae for the distribution of the error of the bounds and the duration of the different phases of query evaluation. Based on these results, we develop different approximate query evaluation techniques. The first one adapts the bounds to have a more rigid filtering, the second one skips computation of the exact distances. Experiments show that these techniques have the desired effect: for instance, when allowing for a small but specific reduction of result quality, we observed a speedup of 7 in 50-NN search.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sunil Arya et al. An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. Technical report, 1998.
D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J.M. Hellerstein, Y. Ioannidis, H.V. Jagadish, T. Johnson, R. Ng, V. Poosala, K.A. Ross, and K. C. Sevcik. The New Jersey data reduction report. Data Engineering, 20(4):3–45, 1997.
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pages 322–331, Atlantic City, NJ, 23–25 May 1990.
S. Berchtold, C. Böhm, B. Braunmüller, D.A. Keim, and H.-P. Kriegel. Fast parallel similarity search in multimedia databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1–12, Tucson, USA, 1997.
S. Berchtold, D.A. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 28–39, 1996.
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “nearest neighbour” meaningful? In Catriel Beeri and Peter Buneman, editors, Proc. 7th Int. Conf. Data Theory, ICDT, number 1540 in Lecture Notes in Computer Science, LNCS, pages 217–235. Springer-Verlag, 10–12 January 1999.
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the International Conference on Very Large Databases (VLDB), Greece, 1997.
Paolo Ciaccia, Marco Patella, and Pavel Zezula. A cost model for similarity queries in metric spaces. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS), 1998.
A. Dimai. Spatial encoding using differences of global features. In Storage and Retrieval for Image and Video Databases IV, volume 3022 of SPIE Proceedings Series, pages 352–360, Feb. 1997.
Ronald Fagin. Combining fuzzy information from multiple systems. In Procedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, volume PODS, pages 216–226, Montreal, Canada, June 1996.
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity Search in High Dimensions via Hasing. In Proceedings of the 25th International Conference on Very Large Data Bases. Morgan Kaufmann, 1999. Edinburgh, Scotland.
A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 47–57, Boston, MA, June 1984.
K. V. R. Kanth, D. Agrawal, and A. Singh. Dimensionality reduction for similarity searching in dynamic databases. SIGMOD Record (ACM Special Interest Group on Management of Data), 27(2):166–176, 1998.
N. Katayama and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 369–380, Tucson, Arizon USA, 1997.
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the International Conference on Very Large Databases (VLDB), volume 24, New York, USA, August 1998.
Roger Weber and Klemens Böhm. Trading quality for time with nearestneighbor search. Technical report, Dept. of Computer Science, 1999. Available at http://www-dbs.ethz.ch/~weber/paper/EDBT00Long.ps.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weber, R., Böhm, K. (2000). Trading Quality for Time with Nearest-Neighbor Search. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds) Advances in Database Technology — EDBT 2000. EDBT 2000. Lecture Notes in Computer Science, vol 1777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46439-5_2
Download citation
DOI: https://doi.org/10.1007/3-540-46439-5_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67227-2
Online ISBN: 978-3-540-46439-6
eBook Packages: Springer Book Archive