Mining Association Rules from Database Tables with the Instances of Simpson’s Paradox
This paper investigates a problem of mining association rules (ARs) from database tables in the case of the occurrence of Simpson’s paradox. Firstly, the paper reports that it is impossible to mine reliable association rules using solely objective, data-based evaluation measures. The importance of the problem comes from the fact that in non-experimental environments, e.g. in medicine or economy, the Simpson’s paradox is likely to occur and difficult to overcome by the controlled acquisition of data. This paper proposes a new approach that exploits the supplementary knowledge during the selection of ARs, and thus overcomes the presence of Simpson’s paradox. In the experimental part, the paper identifies the problem in exemplary real-world data and shows how the proposed approach can be used in practice.
Unable to display preview. Download preview PDF.
- 1.Agrawal, R., Imielinski, T.: Mining association rules between sets of items in large databases. In: ACM-SIGMOD, pp. 207–216 (1993)Google Scholar
- 2.Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. International Journal: Information Theories and Applications 10, 370–376 (2003)Google Scholar
- 3.Fabris, C., Freitas, A.: Discovering surprising patterns by detecting instances of simpson’s paradox. In: Research and Development Intelligent Systems XVI, pp. 148–160. Springer (1999)Google Scholar
- 4.Freitas, A., McGarry, K., Correa, E.: Integrating bayesian networks and simpson’s paradox in data mining. In: Russo, F., Williamson, J. (eds.) Causality and Probability in the Sciences. Texts in Philosophy, vol. 5, pp. 43–62. University of Kent, College Publications, United Kingdom (2007)Google Scholar
- 6.Guillet, F., Hamilton, H.J. (eds.): Quality Measures in Data Mining. SCI, vol. 43. Springer (2007)Google Scholar
- 8.Liu, B., Hsu, W., Mun, L.F., Yan Lee, H.: Finding interesting patterns using user expectations. IEEE Transactions on Knowledge and Data Engineering 11, 817–832 (1996)Google Scholar
- 9.Ma, H., Dennis, K.: Effects of simpsons paradox in market basket analysis. Journal of Chinese Statistical Association 42(2), 209–221 (2004)Google Scholar
- 10.Pearl, J.: Causality, Models Reasoning and Inference. Cambridge University Press (2000)Google Scholar
- 11.Silberschatz, A., Tuzhilin, A.: On subjective measures of interestingness in knowledge discovery. In: KDD, pp. 275–281 (1995)Google Scholar
- 15.Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Heckerman, D., Mannila, H., Pregibon, D., Uthurusamy, R. (eds.) Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, KDD, pp. 67–73. AAAI Press (1997)Google Scholar