Skip to main content

Mining Interesting Correlated Contrast Sets

  • Conference paper
  • First Online:

Abstract

Contrast set mining has been developed as a data mining task which aims at discerning differences across groups. These groups can be patients, organizations, molecules, and even time-lines. A valid correlated contrast set is a conjunction of attribute-value pairs that are highly correlated with each other and differ significantly in their distribution across groups. Although the search for valid correlated contrast sets produces a comparatively smaller set of results than the search for valid contrast sets, these results must still be further filtered in order to be examined by a domain expert and have decisions enacted from them. In this paper, we apply the minimum support ratio threshold which measures the ratio of maximum to minimum support across groups. We propose a contrast set mining technique which utilizes the minimum support ratio threshold to discover maximal valid correlated contrast sets. We also demonstrate how four probability-based objective measures developed for association rules can be used to rank contrast sets. Our experiments on real datasets demonstrate the efficiency and effectiveness of our approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.

    Google Scholar 

  2. Stephen D. Bay and Michael J. Pazzani. Detecting group differences: Mining contrast sets. Data Min. Knowl. Discov., 5(3):213–246, 2001.

    MathSciNet  MATH  Google Scholar 

  3. Sergey Brin, RajeevMotwani, Jeffrey D. Ullman, and Shalom Tsur. Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec., 26(2):255–264, 1997.

    Google Scholar 

  4. Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 2006.

    Google Scholar 

  5. Liqiang Geng and Howard J. Hamilton. Interestingness measures for data mining: A survey. ACM Comput. Surv., 38(3):9, 2006.

    Google Scholar 

  6. R.J. Hilderman and T. Peckham. A statistically sound alternative approach to mining contrast sets. Proceedings of the 4th Australasian Data Mining Conference (AusDM05), pages 157–172, Dec. 2005.

    Google Scholar 

  7. S Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6:65–70, 1979.

    MathSciNet  MATH  Google Scholar 

  8. Petra Kralj, Nada Lavrac, Dragan Gamberger, and Antonija Krstacic. Contrast set mining for distinguishing between similar diseases. In AIME, pages 109–118, 2007.

    Google Scholar 

  9. Nada Lavrac, Peter A. Flach, and Blaz Zupan. Rule evaluation measures: A unifying view. In ILP, pages 174–185, 1999.

    Google Scholar 

  10. Jessica Lin and Eamonn J. Keogh. Group sax: Extending the notion of contrast sets to time series and multimedia data. In PKDD, pages 284–296, 2006.

    Google Scholar 

  11. Zohreh Nazeri, Daniel Barbar´a, Kenneth A. De Jong, George Donohue, and Lance Sherry. Contrast-set mining of aircraft accidents and incidents. In ICDM, pages 313–322, 2008.

    Google Scholar 

  12. Gregory Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, pages 229–248. AAAI/MIT Press, 1991.

    Google Scholar 

  13. Mondelle Simeon and Robert J. Hilderman. Exploratory quantitative contrast set mining: A discretization approach. In ICTAI (2), pages 124–131, 2007.

    Google Scholar 

  14. Mondelle Simeon and Robert J. Hilderman. COSINE: A Vertical Group Difference Approach to Contrast Set Mining. In Canadian Conference on AI, pages 359–371, 2011.

    Google Scholar 

  15. Mondelle Simeon and Robert J. Hilderman. GENCCS: A Correlated Group Difference Approach to Contrast Set Mining. In MLDM, pages 140–154, 2011.

    Google Scholar 

  16. Mondelle Simeon, Robert J. Hilderman, and Howard J. Hamilton. Mining interesting contrast sets. In INTENSIVE 2012, pages 14–21, 2012.

    Google Scholar 

  17. Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava. Selecting the right objective measure for association analysis. Inf. Syst., 29(4):293–313, 2004.

    Google Scholar 

  18. Tzu-Tsung Wong and Kuo-Lung Tseng. Mining negative contrast sets from data with discrete attributes. Expert Syst. Appl., 29(2):401–407, 2005.

    Google Scholar 

  19. Masaharu Yoshioka. Analyzing multiple news sites by contrasting articles. In SITIS08, pages 45–51, Washington, DC, USA, 2008. IEEE Computer Society.

    Google Scholar 

  20. G. Udny Yule. On the association of attributes in statistics:With illustrations from the material of the childhood society, &c. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 194(252-261):257–319, 1900.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mondelle Simeon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London

About this paper

Cite this paper

Simeon, M., Hilderman, R.J., Hamilton, H.J. (2012). Mining Interesting Correlated Contrast Sets. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4739-8_4

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4738-1

  • Online ISBN: 978-1-4471-4739-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics