Subspace Nearest Neighbor Search - Problem Statement, Approaches, and Discussion
Computing the similarity between objects is a central task for many applications in the field of information retrieval and data mining. For finding k-nearest neighbors, typically a ranking is computed based on a predetermined set of data dimensions and a distance function, constant over all possible queries. However, many high-dimensional feature spaces contain a large number of dimensions, many of which may contain noise, irrelevant, redundant, or contradicting information. More specifically, the relevance of dimensions may depend on the query object itself, and in general, different dimension sets (subspaces) may be appropriate for a query. Approaches for feature selection or -weighting typically provide a global subspace selection, which may not be suitable for all possibly queries. In this position paper, we frame a new research problem, called subspace nearest neighbor search, aiming at multiple query-dependent subspaces for nearest neighbor search. We describe relevant problem characteristics, relate to existing approaches, and outline potential research directions.
KeywordsNearest neighbor search Subspace analysis and search Subspace clustering Subspace outlier detection
Unable to display preview. Download preview PDF.
- 1.Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE TKDE 17(6), 734–749 (2005)Google Scholar
- 2.Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Proc. 7th Int. Conf. Database Theory, pp. 217–235 (1999)Google Scholar
- 5.Hinneburg, A., Keim, D.A., Aggarwal, C.C.: What is the nearest neighbor in high dimensional spaces? In: Proc. 26th Int. Conf. on VLDB, Cairo, Egypt (2000)Google Scholar
- 7.Houle, M.E., Ma, X., Oria, V., Sun, J.: Efficient algorithms for similarity search in axis-aligned subspaces. In: Traina, A.J.M., Traina Jr, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 1–12. Springer, Heidelberg (2014) Google Scholar
- 10.Liu, H., Motoda, H.: Computational Methods of Feature Selection. Data Mining and Knowledge Discovery Series. Chapman & Hall/CRC Press (2007)Google Scholar
- 11.Micenkova, B., Dang, X.H., Assent, I., Ng, R.: Explaining outliers by subspace separability. In: 13th. IEEE ICDM, pp. 518–527 (2013)Google Scholar
- 12.Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. In: VLDB, vol. 2, pp. 1270–1281 (2009)Google Scholar