Optimizing the Distance Computation Order of Multi-Feature Similarity Search Indexing
Multi-feature search is an effective approach to similarity search. Unfortunately, the search efficiency decreases with the number of features. Several indexing approaches aim to achieve efficiency by incrementally reducing the approximation error of aggregated distance bounds. They apply heuristics to determine the distance computations order and update the object’s aggregated bounds after each computation. However, the existing indexing approaches suffer from several drawbacks. They use the same computation order for all objects, do not support important types of aggregation functions and do not take the varying CPU and I/O costs of different distance computations into account. To resolve these problems, we introduce a new heuristic to determine an efficient distance computation order for each individual object. Our heuristic supports various important aggregation functions and calculates cost-benefit-ratios to incorporate the varying computation costs of different distance functions. The experimental evaluation reveals that our heuristic outperforms state-of-the-art approaches in terms of the number of distance computations as well as search time.
KeywordsDistance Function Partial Distance Search Time Distance Computation Aggregation Function
Unable to display preview. Download preview PDF.
- 2.Böhm, K., Mlivoncic, M., Schek, H.-J., Weber, R.: Fast evaluation techniques for complex similarity queries. In: Proc. of the 27th International Conference on Very Large Data Bases, VLDB 2001, pp. 211–220. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
- 4.Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries for image databases. In: Proc. of the 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
- 6.Zierenberg, M.: Partial refinement for similarity search with multiple features. In: Traina, A.J.M., Traina Jr, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 13–24. Springer, Heidelberg (2014) Google Scholar
- 8.Griffin, G., Holub, A., Perona, P.: Caltech-256 Object Category Dataset. Tech. rep. 7694. California Institute of Technology (2007)Google Scholar