Refinements to nearest-neighbor searching ink-dimensional trees
This note presents a simplification and generalization of an algorithm for searchingk-dimensional trees for nearest neighbors reported by Friedmanet al . If the distance between records is measured usingL2, the Euclidean norm, the data structure used by the algorithm to determine the bounds of the search space can be simplified to a single number. Moreover, because distance measurements inL2 are rotationally invariant, the algorithm can be generalized to allow a partition plane to have an arbitrary orientation, rather than insisting that it be perpendicular to a coordinate axis, as in the original algorithm. When ak-dimensional tree is built, this plane can be found from the principal eigenvector of the covariance matrix of the records to be partitioned. These techniques and others yield variants ofk-dimensional trees customized for specific applications.
It is wrong to assume thatk-dimensional trees guarantee that a nearest-neighbor query completes in logarithmic expected time. For smallk, logarithmic behavior is observed on all but tiny trees. However, for largerk, logarithmic behavior is achievable only with extremely large numbers of records. Fork = 16, a search of ak-dimensional tree of 76,000 records examines almost every record.
Key wordsk-dimensional tree Searching Nearest-neighbor search
Unable to display preview. Download preview PDF.