Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Subspace Clustering Techniques

  • Peer Kröger
  • Arthur Zimek
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_607

Synonyms

Bi-clustering; Co-clustering; Correlation clustering; Oriented clustering; Pattern-based clustering; Projected clustering

Definition

Cluster analysis aims at finding a set of subsets (i.e., a clustering) of objects in a data set. A meaningful clustering reflects a natural grouping of the data. In high-dimensional data, irrelevant attributes and correlated attributes make any natural grouping hardly detectable. Specialized techniques aim at finding clusters in subspaces of a high-dimensional data space.

Historical Background

While different weighting of attributes was in use since clusters were derived by hand, the problem of finding a cluster based on a subset of attributes and a specialized solution was first described 1972 by Hartigan [1]. But, triggered by modern capabilities of massive acquisition of high-dimensional data in many scientific and economic domains and the first general approaches to the problem [2, 3, 4], research focused on the problem not till 1998. The...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Hartigan JA. Direct clustering of a data matrix. J Am Stat Assoc. 1972;67(337):123–29.CrossRefGoogle Scholar
  2. 2.
    Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 94–105.Google Scholar
  3. 3.
    Aggarwal CC, Procopiuc CM, Wolf JL, Yu PS, Park JS. Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999. p. 61–72.Google Scholar
  4. 4.
    Aggarwal CC, Yu PS. Finding generalized projected clusters in high dimensional space. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 70–81.Google Scholar
  5. 5.
    Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform. 2004;1(1):24–45.CrossRefGoogle Scholar
  6. 6.
    Kriegel HP, Kr¨ger P, Zimek A. Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data (TKDD). 2009;3(1):1–58.Google Scholar
  7. 7.
    Kriegel HP, Kr¨ger P, Zimek A. Subspace clustering. Wiley Interdiscip Rev Data Min Knowl Disc. 2012;2(4):351–64.CrossRefGoogle Scholar
  8. 8.
    Bellman R. Adaptive control processes. A guided tour. Princeton: Princeton University Press; 1961.zbMATHCrossRefGoogle Scholar
  9. 9.
    Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is “Nearest Neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory; 1999. p. 217–35.Google Scholar
  10. 10.
    Houle ME, Kriegel HP, Kr¨ger P, Schubert E, Zimek A. Can shared-neighbor distances defeat the curse of dimensionality? In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management; 2010. p. 482–500.Google Scholar
  11. 11.
    Achtert E, B¨hm C, David J, Kr¨ger P, Zimek A. Global correlation clustering based on the Hough transform. Stat Anal Data Min. 2008;1(3):111–27.MathSciNetCrossRefGoogle Scholar
  12. 12.
    Achtert E, B¨hm C, Kriegel HP, Kr¨ger P, Zimek A. Deriving quantitative models for correlation clusters. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining; 2006. p. 4–13.Google Scholar
  13. 13.
    Zimek A, Vreeken J. The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach Learn. 2013;98(1–2):121–55.MathSciNetzbMATHGoogle Scholar
  14. 14.
    Sim K, Gopalkrishnan V, Zimek A, Cong G. A survey on enhanced subspace clustering. Data Min Knowl Disc. 2013;26(2):332–97.MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Achtert E, Kriegel HP, Schubert E, Zimek A. Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 1009–12.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Ludwig-Maximilians-Universität MünchenMunichGermany
  2. 2.Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark