GENCCS: A Correlated Group Difference Approach to Contrast Set Mining
Contrast set mining has developed as a data mining task which aims at discerning differences amongst groups. These groups can be patients, organizations, molecules, and even time-lines, and are defined by a selected property that distinguishes one from the other. A contrast set is a conjunction of attribute-value pairs that differ significantly in their distribution across groups. The search for contrast sets can be prohibitively expensive on relatively large datasets because every combination of attribute-values must be examined, causing a potential exponential growth of the search space. In this paper, we introduce the notion of a correlated group difference (CGD) and propose a contrast set mining technique that utilizes mutual information and all confidence to select the attribute-value pairs that are most highly correlated, in order to mine CGDs. Our experiments on real datasets demonstrate the efficiency of our approach and the interestingness of the CGDs discovered.
KeywordsSearch Space Mutual Information Search Tree Minimum Support Round Robin
Unable to display preview. Download preview PDF.
- 1.Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: mining contrast sets. In: KDD 1999: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302–306. ACM, New York (1999)Google Scholar
- 3.Hilderman, R., Peckham, T.: A statistically sound alternative approach to mining contrast sets. In: Proceedings of the 4th Australasian Data Mining Conference (AusDM 2005), pp. 157–172 (2005)Google Scholar
- 4.Simeon, M., Hilderman, R.J.: Exploratory quantitative contrast set mining: A discretization approach. In: ICTAI 2007: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence, vol. 2, pp. 124–131. IEEE Computer Society, Washington, DC, USA (2007)CrossRefGoogle Scholar
- 12.Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)Google Scholar
- 13.Dunkel, B., Soparkar, N.: Data organization and access for efficient data mining. In: ICDE, pp. 522–529 (1999)Google Scholar
- 14.Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 326–335. ACM, New York (2003)Google Scholar
- 17.Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar