Sublinear clustering describes the process of clustering a given set of input objects using only a small subset of the input set, which is typically selected by a random process. A solution computed by a sublinear clustering algorithm is an implicit description of the clustering (rather than a partition of the input objects), for example in the form of cluster centers. Sublinear clustering is usually applied when the input set is too large to be processed with standard clustering algorithms.
Motivation and Background
Clusteringis the process of partitioning a set of objects into subsets of similar objects. In machine learning, it is, for example, used in unsupervised learning to fit input data to a density model. In many modern applications of clustering, the input sets consist of billions of objects to be clustered. Typical examples include web search, analysis of web traffic, and spam detection. Therefore, even though many relatively efficient clustering algorithms are...
- Bădoiu, M., Har-Peled, S., & Indyk, P. (2002). Approximate clustering via core-sets. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), (pp. 250–257).Google Scholar
- Ben-David, S. (2004). A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In Proceedings of the 17th Annual Conference on Learning Theory (COLT), (pp. 415–426).Google Scholar
- Feldman, D., Monemizadeh, M., & Sohler, C. (2007). A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd Annual ACM Symposium on Computational Geometry (SoCG), (pp. 11–18).Google Scholar
- Frahling, G., & Sohler, C. (2006). A fast k-means implementation using coresets. In Proceedings of the 22nd Annual ACM Symposium on Computational Geometry (SoCG), (pp. 135–143).Google Scholar
- Har-Peled, S. & Kushal, A. (2005). Smaller coresets for k-median and k-means clustering. In Proceedings of the 21st Annual ACM Symposium on Computational Geometry (SoCG), (pp. 126–134).Google Scholar
- Har-Peled, S., & Mazumdar, S. (2004). On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), (pp. 291–300).Google Scholar
- Mishra, N., Oblinger, D., & Pitt, L. (2001). Sublinear time approximate clustering. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), (pp. 439–447).Google Scholar