Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Categorical Data Clustering

  • Periklis Andritsos
  • Panayiotis Tsaparas
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_99



Data clustering is informally defined as the problem of partitioning a set of objects into groups, such that the objects in the same group are similar, while the objects in different groups are dissimilar. Categorical data clustering refers to the case where the data objects are defined over  categorical attributes. A categorical attribute is an attribute whose domain is a set of discrete values that are not inherently comparable. That is, there is no single ordering or inherent distance function for the categorical values, and there is no mapping from categorical to numerical values that is semantically meaningful.

Motivation and Background

Clustering is a problem of great practical importance that has been the focus of substantial research in several domains for decades. As storage capacities grow, we have at hand larger amounts of data available for analysis and mining. Clustering plays an instrumental role in this...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Andritsos, P., Tsaparas, P., Miller, R. J., Kenneth, C., & Sevcik, K. C. (2004). LIMBO: Scalable clustering of categorical data. In Proceedings of the 9th international conference on extending database technology (EDBT) (pp. 123–146). Heraklion, Greece.Google Scholar
  2. Barbarà, D., Couto, J., & Li, Y. (2002). COOLCAT: An entropy-based algorithm for categorical clustering. In Proceedings of the 11th international conference on information and knowledge management (CIKM) (pp. 582–589). McLean, VA.Google Scholar
  3. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.MATHGoogle Scholar
  4. Das, G., & Mannila, H. (2000). Context-based similarity measures for categorical databases. In Proceedings of the 4th European conference on principles of data mining and knowledge discovery (PKDD) (pp. 201–210). Lyon, France.Google Scholar
  5. Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.Google Scholar
  6. Ganti, V., Gehrke, J., & Ramakrishnan, R. (1999). CACTUS: Clustering categorical data using summaries. In Proceedings of the 5th international conference on knowledge discovery and data mining (KDD) (pp. 73–83). San Diego, CA.Google Scholar
  7. Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data, 1(1), Article No 4.Google Scholar
  8. Gluck, M., & Corter, J. (1985). Information, uncertainty, and the utility of categories. In Proceedings of the 7th annual conference of the cognitive science society (COGSCI) (pp. 283–287). Irvine, CA.Google Scholar
  9. Guha, S., Rastogi, R., & Shim, K. (1999). ROCK: A robust clustering algorithm for categorical attributes. In Proceedings of the 15th international conference on data engineering (pp. 512–521). Sydney, Australia.Google Scholar
  10. Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann.Google Scholar
  11. Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2(3), 283–304.Google Scholar
  12. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice-Hall.MATHGoogle Scholar
  13. Jarke, M., Lenzerini, M., Vassiliou, Y., & Vassiliadis, P. (1999). Fundamentals of data warehouses. Berlin: Springer.Google Scholar
  14. Kleinberg, Jon (1999). Authoritative sources in a hyperlinked environment”. Journal of the ACM 46(5): 604632.MathSciNetGoogle Scholar
  15. Zaki, M. J., Peters, M., Assent, I., & Seidl, T. (2005). CLICKS: An effective algorithm for mining subspace clusters in categorical datasets. In Proceeding of the 11th international conference on knowledge discovery and data mining (KDD) (pp. 736–742). Chicago, IL.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Periklis Andritsos
  • Panayiotis Tsaparas

There are no affiliations available