Search and Classification of High Dimensional Data

Rabani, Yuval

doi:10.1007/3-540-45753-4_1

Yuval Rabani⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2462))

Included in the following conference series:

International Workshop on Approximation Algorithms for Combinatorial Optimization

819 Accesses

Abstract

Modeling data sets as points in a high dimensional vector space is a trendy theme in modern information retrieval and data mining. Among the numerous drawbacks of this approach is the fact that many of the required processing tasks are computationally hard in high dimension. We survey several algorithmic ideas that have applications to the design and analysis of polynomial time approximation schemes for nearest neighbor search and clustering of high dimensional data. The main lesson from this line of research is that if one is willing to settle for approximate solutions, then high dimensional geometry is easy. Examples are included in the reference list below.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering. In Proc. of the 41th Ann. IEEE Symp. on Foundations of Computer Science, 2000, pages 240–250.
Google Scholar
M. Bădoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In Proc. of the 34th Ann. ACM Symp. on Theory of Computing, 2002.
Google Scholar
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering in large graphs and matrices. In Proc. of the 10th Ann. ACM-SIAM Symp. on Discrete Algorithms, 1999, pages 291–299.
Google Scholar
W. Fernandez de la Vega, M. Karpinski, C. Kenyon, and Y. Rabani. Polynomial time approximation schemes for metric min-sum clustering. Electronic Colloquium on Computational Complexity report number TR02-025. Available at ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/2002/TR02-025/index.html
S. Har-Peled and K.R. Varadarajan. Projective clustering in high dimensions using core-sets. In Proc. of the 18th Ann. ACM Symp. on Computational Geometry, 2002, pages 312–318.
Google Scholar
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. of the 30th Ann. ACM Symp. on Theory of Computing, 1998, pages 604–613.
Google Scholar
J. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proc. of the 29th Ann. ACM Symp. on Theory of Computing, 1997, pages 599–608.
Google Scholar
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457–474, 2000. Preliminary version appeared in STOC’ 98.
Article MATH MathSciNet Google Scholar
N. Mishra, D. Oblinger, and L. Pitt. Sublinear time approximate clustering. In Proc. of the 12th Ann. ACM-SIAM Symp. on Discrete Algorithms, January 2001, pages 439–447.
Google Scholar
R. Ostrovsky and Y. Rabani. Polynomial time approximation schemes for geometric clustering problems. J. of the ACM, 49(2):139–156, March 2002. Preliminary version appeared in FOCS’ 00.
Article MathSciNet Google Scholar
L.J. Schulman. Clustering for edge-cost minimization. In Proc. of the 32nd Ann. ACM Symp. on Theory of Computing, 2000, pages 547–555.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Technion - IIT, 32000, Haifa, Israel
Yuval Rabani

Authors

Yuval Rabani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Informatik und praktische Mathematik, Universität Kiel, Olshausenstr. 40, 24098, Kiel, Germany
Klaus Jansen
Dipartimento di Informatika e Sistemistica, Universita di Roma La Sapienza, Via Salaria 113, 00198, Roma, Italy
Stefano Leonardi
College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, 30332-0280, Georgia, USA
Vijay Vazirani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rabani, Y. (2002). Search and Classification of High Dimensional Data. In: Jansen, K., Leonardi, S., Vazirani, V. (eds) Approximation Algorithms for Combinatorial Optimization. APPROX 2002. Lecture Notes in Computer Science, vol 2462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45753-4_1

Download citation

DOI: https://doi.org/10.1007/3-540-45753-4_1
Published: 04 October 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44186-1
Online ISBN: 978-3-540-45753-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics