Synonyms
Clustering; Karhunen-Loève transform (KLT); Multi-dimensional indexing; Nearest neighbors query; Principal component analysis (PCA); Singular value decomposition (SVD)
Definition
Representing objects such as images by their feature vectors and searching for similarity according to the distances of the points representing them in high-dimensional space via k-nearest-neighbor (k-NN) queries to a target image are a popular paradigm. Dimensionality reduction via singular value decomposition (SVD) to individual clusters of a dataset results in higher dimensionality reduction for the same normalized mean square error (NMSE) than applying singular value decomposition (SVD) to the whole dataset. The cost of processing k-NN queries is further reduced by suitable indexing structures such as the ordered partition (OP)-tree and the stepwise dimensionality increasing (SDI)-tree.
Historical Background
IBM’s Query by Image Content (QBIC) project, which utilized content-based image retrieval...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Aggarwal CC, Procopiuc CM, Wolf JL, Yu PS, Park JS. Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference; 1999. p. 61–72.
Aggarwal CC, Yu PS. Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference; 2000. p. 70–81.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference, Seattle, June 1998. p. 94–105.
Böhm C, Kailing K, Kröger P, Zimek A. Computing clusters of correlation connected objects. In: Proceedings of the ACM SIGMOD International Conference; 2004. p. 455–66.
Castelli V, Thomasian A, Li CS. CSVD: clustering and singular value decomposition for approximate similarity search in high dimensional spaces. IEEE Trans. Knowl Data Eng. 2003;14(3):671–85.
Chakrabarti K, Mehrotra S. Local dimensionality reduction: a new approach to indexing high dimensional space. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2000. p. 89–100.
Faloutsos C. Searching multimedia databases by content. Advances in database systems. Boston: KAP/Elsevier; 1996.
Kim B, Park S. A fast k-nearest-neighbor finding algorithm based on the ordered partition. IEEE Trans. Pattern Anal. Mach. Intell. 1986;8(6):761–66.
Korn F, Jagadish HV, Faloutsos C. Efficiently supporting ad hoc queries in large datasets of time sequences. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1997. p. 289–300.
Korn F, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z. Fast and effective retrieval of medical tumor shapes: nearest neighbor search in medical image databases. IEEE Trans Knowl Data Eng. 1998;10(6):889–904.
Kriegel HP, Kröger P, Zimek A. Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data. 2009;3(1): 1–58.
Linde Y, Buzo A, Gray R. An algorithm for vector quantizer design. IEEE Trans Commun. 1980;28(1):84–95.
Ravikanth KV, Agrawal D, Singh A. Dimensionality-reduction for similarity searching in dynamic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 166–76.
Samet H. Foundations of multidimensional and metric data structure. Amsterdam: Elsevier; 2006.
Thomasian A, Zhang L. The stepwise dimensionality increasing – SDI index for high dimensional data. Comput J. 2006;49(5):609–18.
Thomasian A, Zhang L. Persistent clustered main memory index for accelerating k -NN queries on high dim. datasets. Multimed. Tools Appl. 2008;38(2):253–70.
Thomasian A, Castelli V, Li CS. RCSVD: recursive clustering and singular value decomposition for approximate high-dimensionality indexing. In: Proceedings of the ACM International Conference on Information and Knowledge Management. p. 201–07.
Thomasian A, Li Y, Zhang L. Exact k-NN queries on clustered SVD datasets. Inf. Process. Lett. 2005;94(6):247–52.
Thomasian A, Li Y, Zhang L. Optimal subspace dimensionality for k-nearest-neighbor queries on clustered and dimensionality reduced datasets with SVD. Multimed. Tools Appl. 2008;40(2):241–59.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Thomasian, A. (2018). Dimensionality Reduction Techniques for Nearest-Neighbor Computations. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80771
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80771
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering