Dimensionality Reduction Techniques for Nearest-Neighbor Computations

Thomasian, Alexander

doi:10.1007/978-1-4614-8265-9_80771

Dimensionality Reduction Techniques for Nearest-Neighbor Computations

Alexander Thomasian³

Reference work entry
First Online: 01 January 2018

42 Accesses

Synonyms

Clustering; Karhunen-Loève transform (KLT); Multi-dimensional indexing; Nearest neighbors query; Principal component analysis (PCA); Singular value decomposition (SVD)

Definition

Representing objects such as images by their feature vectors and searching for similarity according to the distances of the points representing them in high-dimensional space via k-nearest-neighbor (k-NN) queries to a target image are a popular paradigm. Dimensionality reduction via singular value decomposition (SVD) to individual clusters of a dataset results in higher dimensionality reduction for the same normalized mean square error (NMSE) than applying singular value decomposition (SVD) to the whole dataset. The cost of processing k-NN queries is further reduced by suitable indexing structures such as the ordered partition (OP)-tree and the stepwise dimensionality increasing (SDI)-tree.

Historical Background

IBM’s Query by Image Content (QBIC) project, which utilized content-based image retrieval...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

Aggarwal CC, Procopiuc CM, Wolf JL, Yu PS, Park JS. Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference; 1999. p. 61–72.
Google Scholar
Aggarwal CC, Yu PS. Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference; 2000. p. 70–81.
Google Scholar
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference, Seattle, June 1998. p. 94–105.
Google Scholar
Böhm C, Kailing K, Kröger P, Zimek A. Computing clusters of correlation connected objects. In: Proceedings of the ACM SIGMOD International Conference; 2004. p. 455–66.
Google Scholar
Castelli V, Thomasian A, Li CS. CSVD: clustering and singular value decomposition for approximate similarity search in high dimensional spaces. IEEE Trans. Knowl Data Eng. 2003;14(3):671–85.
Article Google Scholar
Chakrabarti K, Mehrotra S. Local dimensionality reduction: a new approach to indexing high dimensional space. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2000. p. 89–100.
Google Scholar
Faloutsos C. Searching multimedia databases by content. Advances in database systems. Boston: KAP/Elsevier; 1996.
Book MATH Google Scholar
Kim B, Park S. A fast k-nearest-neighbor finding algorithm based on the ordered partition. IEEE Trans. Pattern Anal. Mach. Intell. 1986;8(6):761–66.
Article MathSciNet MATH Google Scholar
Korn F, Jagadish HV, Faloutsos C. Efficiently supporting ad hoc queries in large datasets of time sequences. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1997. p. 289–300.
Google Scholar
Korn F, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z. Fast and effective retrieval of medical tumor shapes: nearest neighbor search in medical image databases. IEEE Trans Knowl Data Eng. 1998;10(6):889–904.
Article Google Scholar
Kriegel HP, Kröger P, Zimek A. Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data. 2009;3(1): 1–58.
Article Google Scholar
Linde Y, Buzo A, Gray R. An algorithm for vector quantizer design. IEEE Trans Commun. 1980;28(1):84–95.
Article Google Scholar
Ravikanth KV, Agrawal D, Singh A. Dimensionality-reduction for similarity searching in dynamic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 166–76.
Google Scholar
Samet H. Foundations of multidimensional and metric data structure. Amsterdam: Elsevier; 2006.
MATH Google Scholar
Thomasian A, Zhang L. The stepwise dimensionality increasing – SDI index for high dimensional data. Comput J. 2006;49(5):609–18.
Article Google Scholar
Thomasian A, Zhang L. Persistent clustered main memory index for accelerating k -NN queries on high dim. datasets. Multimed. Tools Appl. 2008;38(2):253–70.
Article Google Scholar
Thomasian A, Castelli V, Li CS. RCSVD: recursive clustering and singular value decomposition for approximate high-dimensionality indexing. In: Proceedings of the ACM International Conference on Information and Knowledge Management. p. 201–07.
Google Scholar
Thomasian A, Li Y, Zhang L. Exact k-NN queries on clustered SVD datasets. Inf. Process. Lett. 2005;94(6):247–52.
Article MathSciNet MATH Google Scholar
Thomasian A, Li Y, Zhang L. Optimal subspace dimensionality for k-nearest-neighbor queries on clustered and dimensionality reduced datasets with SVD. Multimed. Tools Appl. 2008;40(2):241–59.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Thomasian and Associates, Pleasantville, NY, USA
Alexander Thomasian

Authors

Alexander Thomasian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Thomasian .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Thomasian, A. (2018). Dimensionality Reduction Techniques for Nearest-Neighbor Computations. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80771

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_80771
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics