Abstract
Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.
Keywords
- Association Rule
- Frequent Pattern
- Mining Algorithm
- Rule Mining
- Rule Evaluation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download conference paper PDF
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the International Conference on Very Large Data Bases, VLDB (1994)
Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)
Bayardo, R.J.: Constraint-based rule mining in large, dense databases. In: Proceedings of the International Conference on Data Engineering, ICDE (1999)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society 57(1), 289–300 (1995)
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the International Conference on Management of Data, SIGMOD (1997)
Cheng, H., Yan, X., Han, J., Wei Hsu, C.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of the International Conference on Data Engineering, ICDE (2007)
Clark, P., Niblett, T.: The cn2 induction algorithm. Machine Learning (1989)
Cohen, W.: Fast effective rule induction. In: Proceedings of International Conference on Machine Learning, ICML (1995)
Cohen, W., Singer, Y.: A simple, fast, and effective rule learner. In: Proceedings of the National Conference on Artificial Intelligence, AAAI (1999)
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering 17, 1036–1050 (2005)
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, SIGKDD (1999)
Exarchos, T.P., Tsipouras, M.G., Papaloukas, C., Fotiadis, D.I.: A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data and Knowledge Engineering 66, 467–487 (2008)
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI (1993)
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Computing Surveys 38 (2006)
Grosskreutz, H., Boley, M., Krause-Traudes, M.: Subgroup discovery for election analysis: a case study in descriptive data mining. In: Proceedings of the International Conference on Discovery Science (2010)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning (1995)
Kavsek, B., Lavrač, N.: APRIORI-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence 20(7), 543–583 (2006)
Lavrač, N., Gamberger, D.: Relevancy in Constraint-Based Subgroup Discovery. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining. LNCS (LNAI), vol. 3848, pp. 243–266. Springer, Heidelberg (2006)
Li, J., Shen, H., Topor, R.: Mining Optimal Class Association Rule Set. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, p. 364. Springer, Heidelberg (2001)
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings of the International Conference on Data Mining, ICDM (2001)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Knowledge Discovery and Data Mining, pp. 80–86 (1998)
Nijssen, S., Guns, T., De Raedt, L.: Correlated itemset mining in roc space: a constraint programming approach. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, SIGKDD (2009)
Novak, P.K., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research (JMLR) 10, 377–403 (2009)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (2002)
Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering (1992)
Webb, G.I.: Discovering significant patterns. Machine Learning 68(1), 1–33 (2007)
Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, SIGKDD (2006)
Yang, Y., Webb, G.I., Wu, X.: Discretization methods. In: The Data Mining and Knowledge Discovery Handbook, pp. 113–130. Springer (2005)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Transaction on Knowledge and Data Engineering (TKDE) 12, 372–390 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Batal, I., Cooper, G., Hauskrecht, M. (2012). A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-33486-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33485-6
Online ISBN: 978-3-642-33486-3
eBook Packages: Computer ScienceComputer Science (R0)
