GENCCS: A Correlated Group Difference Approach to Contrast Set Mining

  • Mondelle Simeon
  • Robert Hilderman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6871)

Abstract

Contrast set mining has developed as a data mining task which aims at discerning differences amongst groups. These groups can be patients, organizations, molecules, and even time-lines, and are defined by a selected property that distinguishes one from the other. A contrast set is a conjunction of attribute-value pairs that differ significantly in their distribution across groups. The search for contrast sets can be prohibitively expensive on relatively large datasets because every combination of attribute-values must be examined, causing a potential exponential growth of the search space. In this paper, we introduce the notion of a correlated group difference (CGD) and propose a contrast set mining technique that utilizes mutual information and all confidence to select the attribute-value pairs that are most highly correlated, in order to mine CGDs. Our experiments on real datasets demonstrate the efficiency of our approach and the interestingness of the CGDs discovered.

Keywords

Search Space Mutual Information Search Tree Minimum Support Round Robin 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: mining contrast sets. In: KDD 1999: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 302–306. ACM, New York (1999)Google Scholar
  2. 2.
    Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Min. Knowl. Discov. 5, 213–246 (2001)CrossRefMATHGoogle Scholar
  3. 3.
    Hilderman, R., Peckham, T.: A statistically sound alternative approach to mining contrast sets. In: Proceedings of the 4th Australasian Data Mining Conference (AusDM 2005), pp. 157–172 (2005)Google Scholar
  4. 4.
    Simeon, M., Hilderman, R.J.: Exploratory quantitative contrast set mining: A discretization approach. In: ICTAI 2007: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence, vol. 2, pp. 124–131. IEEE Computer Society, Washington, DC, USA (2007)CrossRefGoogle Scholar
  5. 5.
    Bayardo, R.J.: Efficiently mining long patterns from databases. In: SIGMOD 1998: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 85–93. ACM, New York (1998)CrossRefGoogle Scholar
  6. 6.
    Wong, T.T., Tseng, K.L.: Mining negative contrast sets from data with discrete attributes. Expert Syst. Appl. 29(2), 401–407 (2005)CrossRefGoogle Scholar
  7. 7.
    Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)MathSciNetMATHGoogle Scholar
  8. 8.
    Lin, J., Keogh, E.J.: Group SAX: Extending the notion of contrast sets to time series and multimedia data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 284–296. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Ke, Y., Cheng, J., Ng, W.: Correlated pattern mining in quantitative databases. ACM Trans. Database Syst. 33, 14:1–14:45 (2008)CrossRefGoogle Scholar
  10. 10.
    Xiong, H., Shekhar, S., Tan, P.M., Kumar, V.: Taper: a two-step approach for all-strong-pairs correlation query in large databases. IEEE Transactions on Knowledge and Data Engineering 18(4), 493–508 (2006)CrossRefGoogle Scholar
  11. 11.
    Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley-Interscience, New York (2006)MATHGoogle Scholar
  12. 12.
    Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)Google Scholar
  13. 13.
    Dunkel, B., Soparkar, N.: Data organization and access for efficient data mining. In: ICDE, pp. 522–529 (1999)Google Scholar
  14. 14.
    Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 326–335. ACM, New York (2003)Google Scholar
  15. 15.
    Gouda, K., Zaki, M.J.: Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11(3), 223–242 (2005)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Kralj, P., Lavrac, N., Gamberger, D., Krstacic, A.: Contrast set mining for distinguishing between similar diseases. In: Bellazzi, R., Abu-Hanna, A., Hunter, J. (eds.) AIME 2007. LNCS (LNAI), vol. 4594, pp. 109–118. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mondelle Simeon
    • 1
  • Robert Hilderman
    • 1
  1. 1.Department of Computer ScienceUniversity of ReginaReginaCanada

Personalised recommendations