Easing the Dimensionality Curse by Stretching Metric Spaces
Queries over sets of complex elements are performed extracting features from each element, which are used in place of the real ones during the processing. Extracting a large number of significant features increases the representative power of the feature vector and improves the query precision. However, each feature is a dimension in the representation space, consequently handling more features worsen the dimensionality curse. The problem derives from the fact that the elements tends to distribute all over the space and a large dimensionality allows them to spread over much broader spaces. Therefore, in high-dimensional spaces, elements are frequently farther from each other, so the distance differences among pairs of elements tends to homogenize. When searching for nearest neighbors, the first one is usually not close, but as long as one is found, small increases in the query radius tend to include several others. This effect increases the overlap between nodes in access methods indexing the dataset. Both spatial and metric access methods are sensitive to the problem. This paper presents a general strategy applicable to metric access methods in general, improving the performance of similarity queries in high dimensional spaces. Our technique applies a function that “stretches” the distances. Thus, close objects become closer and far ones become even farther. Experiments using the metric access method Slim-tree show that similarity queries performed in the transformed spaces demands up to 70% less distance calculations, 52% less disk access and reduces up to 57% in total time when comparing with the original spaces.
KeywordsDistance Calculation Range Query Synthetic Dataset Access Method Original Space
Unable to display preview. Download preview PDF.
- 2.Güntzer, U., Balke, W.T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: VLDB, Cairo - Egypt, pp. 419–428 (2000)Google Scholar
- 3.Felipe, J.C., Traina, A.J.M., Caetano Traina, J.: Global warp metric distance: Boosting content-based image retrieval through histograms. In: ISM 2005: Proceedings of the Seventh IEEE International Symposium on Multimedia, Washington, DC, USA, pp. 295–302. IEEE Computer Society, Los Alamitos (2005)Google Scholar
- 4.Bugatti, H.P., Traina, A.J.M., Traina, C.J.: Assessing the best integration between distance-function and image-feature to answer similarity queries. In: 23rd Annual ACM Symposium on Applied Computing (SAC 2008), Fortaleza, Ceará - Brazil, pp. 1225–1230. ACM Press, New York (2008)Google Scholar
- 9.Berchtold, S., Böhm, C., Kriegel, H.P.: The pyramid-tree: Breaking the curse of dimensionality. In: ACM SIGMOD International Conference on Management of Data, Seattle, WA, pp. 142–153 (1998)Google Scholar
- 10.Yianilos, P.N.: Locally lifting the curse of dimensionality for nearest neighbor search. In: Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 361–370 (2000)Google Scholar
- 12.Park, M., Jin, J.S., Wilson, L.S.: Fast content-based image retrieval using quasi-gabor filter and reduction of image feature dimension. In: SSIAI 2002, Santa Fe, New Mexico, pp. 178–182. IEEE Computer Society, Los Alamitos (2002)Google Scholar
- 13.Ye, J., Li, Q., Xiong, H., Park, H., Janardan, R., Kumar, V.: Idr/qr: An incremental dimension reduction algorithm via qr decomposition. TKDE 17(9), 1208–1222 (2005)Google Scholar
- 16.Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Jarke, M. (ed.) VLDB, Athens, Greece, pp. 426–435. Morgan Kaufmann, San Francisco (1997)Google Scholar
- 18.Santos Filho, R.F., Traina, A.J.M., Traina Jr., C., Faloutsos, C.: Similarity search without tears: The omni family of all-purpose access methods. In: ICDE, Heidelberg, Germany, pp. 623–630. IEEE Computer Society Press, Los Alamitos (2001)Google Scholar
- 20.Nadvorny, C.F., Heuser, C.A.: Twisting the metric space to achieve better metric trees. In: SBBD, pp. 178–190 (2004)Google Scholar
- 21.Katayama, N., Satoh, S.: Distinctiveness-sensitive nearest neighbor search for efficient similarity retrieval of multimedia information. In: ICDE, Washington, DC, USA, pp. 493–502. IEEE Computer Society Press, Los Alamitos (2001)Google Scholar