Encyclopedia of Machine Learning and Data Mining

2017 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Sublinear Clustering

  • Artur Czumaj
  • Christian Sohler
Reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7687-1_798

Definition

Sublinear clustering describes the process of clustering a given set of input objects using only a small subset of the input set, which is typically selected by a random process. A solution computed by a sublinear clustering algorithm is an implicit description of the clustering (rather than a partition of the input objects), for example in the form of cluster centers. Sublinear clustering is usually applied when the input set is too large to be processed with standard clustering algorithms.

Motivation and Background

 Clusteringis the process of partitioning a set of objects into subsets of similar objects. In machine learning, it is, for example, used in unsupervised learning to fit input data to a density model. In many modern applications of clustering, the input sets consist of billions of objects to be clustered. Typical examples include web search, analysis of web traffic, and spam detection. Therefore, even though many relatively efficient clustering algorithms are...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Alon N, Dar S, Parnas M, Ron D (2003) Testing of clustering. SIAM J Discret Math 16(3):393–417MathSciNetzbMATHCrossRefGoogle Scholar
  2. Bǎdoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings of the 34th annual ACM symposium on theory of computing (STOC), Montreal, pp 250–257Google Scholar
  3. Ben-David S (2004) A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In: Proceedings of the 17th annual conference on learning theory (COLT), Banff, pp 415–426Google Scholar
  4. Chen K (2006) On k-median clustering in high dimensions. In: Proceedings of the 17th annual ACM-SIAM symposium on discrete algorithms (SODA), Miami, pp 1177–1185Google Scholar
  5. Czumaj A, Sohler C (2007) Sublinear-time approximation for clustering via random sampling. Random Struct Algorithms 30(1–2):226–256MathSciNetzbMATHCrossRefGoogle Scholar
  6. Feldman D, Monemizadeh M, Sohler C (2007) A PTAS for k-means clustering based on weak coresets. In: Proceedings of the 23rd annual ACM symposium on computational geometry (SoCG), Gyeongju, pp 11–18Google Scholar
  7. Frahling G, Sohler C (2006) A fast k-means implementation using coresets. In: Proceedings of the 22nd annual ACM symposium on computational geometry (SoCG), Sedona, pp 135–143Google Scholar
  8. Har-Peled S, Kushal A (2005) Smaller coresets for k-median and k-means clustering. In: Proceedings of the 21st annual ACM symposium on computational geometry (SoCG), Pisa, pp 126–134Google Scholar
  9. Har-Peled S, Mazumdar S (2004) On coresets for k-means and k-median clustering. In: Proceedings of the 36th annual ACM symposium on theory of computing (STOC), Chicago, pp 291–300Google Scholar
  10. Meyerson A, O’Callaghan L, Plotkin S (2004) A k-median algorithm with running time independent of data size. Mach Learn 56(1–3):61–87zbMATHCrossRefGoogle Scholar
  11. Mishra N, Oblinger D, Pitt L (2001) Sublinear time approximate clustering. In: Proceedings of the 12th annual ACM-SIAM symposium on discrete algorithms (SODA), Washington, DC, pp 439–447zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Artur Czumaj
    • 1
  • Christian Sohler
    • 2
  1. 1.University of WarwickCoventryUK
  2. 2.University of PaderbornPaderbornGermany