Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Sublinear Clustering

  • Artur Czumaj
  • Christian Sohler
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_798

Definition

Sublinear clustering describes the process of clustering a given set of input objects using only a small subset of the input set, which is typically selected by a random process. A solution computed by a sublinear clustering algorithm is an implicit description of the clustering (rather than a partition of the input objects), for example in the form of cluster centers. Sublinear clustering is usually applied when the input set is too large to be processed with standard clustering algorithms.

Motivation and Background

 Clusteringis the process of partitioning a set of objects into subsets of similar objects. In machine learning, it is, for example, used in unsupervised learning to fit input data to a density model. In many modern applications of clustering, the input sets consist of billions of objects to be clustered. Typical examples include web search, analysis of web traffic, and spam detection. Therefore, even though many relatively efficient clustering algorithms are...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Alon, N., Dar, S., Parnas, M., & Ron, D. (2003). Testing of clustering. SIAM Journal on Discrete Mathematics, 16(3), 393–417.MathSciNetzbMATHCrossRefGoogle Scholar
  2. Bădoiu, M., Har-Peled, S., & Indyk, P. (2002). Approximate clustering via core-sets. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), (pp. 250–257).Google Scholar
  3. Ben-David, S. (2004). A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In Proceedings of the 17th Annual Conference on Learning Theory (COLT), (pp. 415–426).Google Scholar
  4. Chen, K. (2006). On k-median clustering in high dimensions. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), (pp. 1177–1185).CrossRefGoogle Scholar
  5. Czumaj, A., & Sohler, C. (2007). Sublinear-time approximation for clustering via random sampling. Random Structures & Algorithms, 30(1–2), 226–256.MathSciNetzbMATHCrossRefGoogle Scholar
  6. Feldman, D., Monemizadeh, M., & Sohler, C. (2007). A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd Annual ACM Symposium on Computational Geometry (SoCG), (pp. 11–18).Google Scholar
  7. Frahling, G., & Sohler, C. (2006). A fast k-means implementation using coresets. In Proceedings of the 22nd Annual ACM Symposium on Computational Geometry (SoCG), (pp. 135–143).Google Scholar
  8. Har-Peled, S. & Kushal, A. (2005). Smaller coresets for k-median and k-means clustering. In Proceedings of the 21st Annual ACM Symposium on Computational Geometry (SoCG), (pp. 126–134).Google Scholar
  9. Har-Peled, S., & Mazumdar, S. (2004). On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), (pp. 291–300).Google Scholar
  10. Meyerson, A., O’Callaghan, L., & Plotkin S.(July 2004). A k-median algorithm with running time independent of data size. Machine Learning, 56(1–3), (pp. 61–87).zbMATHCrossRefGoogle Scholar
  11. Mishra, N., Oblinger, D., & Pitt, L. (2001). Sublinear time approximate clustering. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), (pp. 439–447).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Artur Czumaj
  • Christian Sohler

There are no affiliations available