Abstract
Modeling data sets as points in a high dimensional vector space is a trendy theme in modern information retrieval and data mining. Among the numerous drawbacks of this approach is the fact that many of the required processing tasks are computationally hard in high dimension. We survey several algorithmic ideas that have applications to the design and analysis of polynomial time approximation schemes for nearest neighbor search and clustering of high dimensional data. The main lesson from this line of research is that if one is willing to settle for approximate solutions, then high dimensional geometry is easy. Examples are included in the reference list below.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering. In Proc. of the 41th Ann. IEEE Symp. on Foundations of Computer Science, 2000, pages 240–250.
M. Bădoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In Proc. of the 34th Ann. ACM Symp. on Theory of Computing, 2002.
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering in large graphs and matrices. In Proc. of the 10th Ann. ACM-SIAM Symp. on Discrete Algorithms, 1999, pages 291–299.
W. Fernandez de la Vega, M. Karpinski, C. Kenyon, and Y. Rabani. Polynomial time approximation schemes for metric min-sum clustering. Electronic Colloquium on Computational Complexity report number TR02-025. Available at ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/2002/TR02-025/index.html
S. Har-Peled and K.R. Varadarajan. Projective clustering in high dimensions using core-sets. In Proc. of the 18th Ann. ACM Symp. on Computational Geometry, 2002, pages 312–318.
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. of the 30th Ann. ACM Symp. on Theory of Computing, 1998, pages 604–613.
J. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proc. of the 29th Ann. ACM Symp. on Theory of Computing, 1997, pages 599–608.
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput., 30(2):457–474, 2000. Preliminary version appeared in STOC’ 98.
N. Mishra, D. Oblinger, and L. Pitt. Sublinear time approximate clustering. In Proc. of the 12th Ann. ACM-SIAM Symp. on Discrete Algorithms, January 2001, pages 439–447.
R. Ostrovsky and Y. Rabani. Polynomial time approximation schemes for geometric clustering problems. J. of the ACM, 49(2):139–156, March 2002. Preliminary version appeared in FOCS’ 00.
L.J. Schulman. Clustering for edge-cost minimization. In Proc. of the 32nd Ann. ACM Symp. on Theory of Computing, 2000, pages 547–555.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rabani, Y. (2002). Search and Classification of High Dimensional Data. In: Jansen, K., Leonardi, S., Vazirani, V. (eds) Approximation Algorithms for Combinatorial Optimization. APPROX 2002. Lecture Notes in Computer Science, vol 2462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45753-4_1
Download citation
DOI: https://doi.org/10.1007/3-540-45753-4_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44186-1
Online ISBN: 978-3-540-45753-4
eBook Packages: Springer Book Archive