Mining Association Rules from Database Tables with the Instances of Simpson’s Paradox

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 186)

Abstract

This paper investigates a problem of mining association rules (ARs) from database tables in the case of the occurrence of Simpson’s paradox. Firstly, the paper reports that it is impossible to mine reliable association rules using solely objective, data-based evaluation measures. The importance of the problem comes from the fact that in non-experimental environments, e.g. in medicine or economy, the Simpson’s paradox is likely to occur and difficult to overcome by the controlled acquisition of data. This paper proposes a new approach that exploits the supplementary knowledge during the selection of ARs, and thus overcomes the presence of Simpson’s paradox. In the experimental part, the paper identifies the problem in exemplary real-world data and shows how the proposed approach can be used in practice.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imielinski, T.: Mining association rules between sets of items in large databases. In: ACM-SIGMOD, pp. 207–216 (1993)Google Scholar
  2. 2.
    Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. International Journal: Information Theories and Applications 10, 370–376 (2003)Google Scholar
  3. 3.
    Fabris, C., Freitas, A.: Discovering surprising patterns by detecting instances of simpson’s paradox. In: Research and Development Intelligent Systems XVI, pp. 148–160. Springer (1999)Google Scholar
  4. 4.
    Freitas, A., McGarry, K., Correa, E.: Integrating bayesian networks and simpson’s paradox in data mining. In: Russo, F., Williamson, J. (eds.) Causality and Probability in the Sciences. Texts in Philosophy, vol. 5, pp. 43–62. University of Kent, College Publications, United Kingdom (2007)Google Scholar
  5. 5.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Comput. Surv. 38(3), 9 (2006), doi: http://doi.acm.org/10.1145/1132960.1132963 CrossRefGoogle Scholar
  6. 6.
    Guillet, F., Hamilton, H.J. (eds.): Quality Measures in Data Mining. SCI, vol. 43. Springer (2007)Google Scholar
  7. 7.
    Lindley, D., Novick, M.: The role of exchangeability in inference. Annals of Statistics 9, 45–58 (1981)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Liu, B., Hsu, W., Mun, L.F., Yan Lee, H.: Finding interesting patterns using user expectations. IEEE Transactions on Knowledge and Data Engineering 11, 817–832 (1996)Google Scholar
  9. 9.
    Ma, H., Dennis, K.: Effects of simpsons paradox in market basket analysis. Journal of Chinese Statistical Association 42(2), 209–221 (2004)Google Scholar
  10. 10.
    Pearl, J.: Causality, Models Reasoning and Inference. Cambridge University Press (2000)Google Scholar
  11. 11.
    Silberschatz, A., Tuzhilin, A.: On subjective measures of interestingness in knowledge discovery. In: KDD, pp. 275–281 (1995)Google Scholar
  12. 12.
    Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering 8, 970–974 (1996)CrossRefGoogle Scholar
  13. 13.
    Simpson, E.: The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society 13, 238–241 (1951)MathSciNetMATHGoogle Scholar
  14. 14.
    Srikant, R., Agrawal, R.: Mining generalized association rules. Future Generation Computer Systems 13(2-3), 161–180 (1997)CrossRefGoogle Scholar
  15. 15.
    Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Heckerman, D., Mannila, H., Pregibon, D., Uthurusamy, R. (eds.) Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, KDD, pp. 67–73. AAAI Press (1997)Google Scholar
  16. 16.
    Yule, G.: Notes on the theory of association of attributes of statistics. Biometrika 2, 121–134 (1903)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Institute of Computer ScienceUniversity of SilesiaSosnowiecPoland

Personalised recommendations