A Nearest Neighbor Method Using Bisectors
A novel algorithm for finding the nearest neighbor was proposed. According to the development of modern technology, the demand is increasing in large-scale datasets with a large number of samples and a large number of features. However, almost all sophisticated algorithms proposed so far are effective only in a small number of features, say, up to 10. This is because in a high-dimensional space many pairs of samples share a same distance. Then the naive algorithm outperforms the others. In this study, we considered to utilize a sequential information of distances obtained by the examined training samples. Indeed, a combinatorial information of examined samples was used as bisectors between possible pairs of them. With this algorithm, a query is processed in O(αβ nd) for n samples in a d-dimensional space and for α,β < 1, in expense of a preprocessing time and space in O(n 2). We examined the performance of the algorithm.
KeywordsTraining Sample Distance Calculation Query Point Query Time Ball Test
- 7.Fukunaga, K.: Nonparametric Density Estimation, in Introduction to Statistical Pattern Recognition, pp. 268–287. Academic Press, London (1990)Google Scholar
- 8.Murphy, P.M., Aha, D.W.: UCI Repository of Machine Learning Databases [Machine-Readable Data Repository]. University of California, Department of Information and Computer Science, Irvine, California (1991)Google Scholar
- 11.Kleinberg, J.: Two algorithms for nearest-neighbor search in high dimensions. In: Proc. 29th ACM Symposium on Theory of Computing, pp. 599–608 (1997)Google Scholar