Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2012: Machine Learning and Knowledge Discovery in Databases pp 260–276Cite as

  1. Home
  2. Machine Learning and Knowledge Discovery in Databases
  3. Conference paper
A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules

A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules

  • Iyad Batal21,
  • Gregory Cooper22 &
  • Milos Hauskrecht21 
  • Conference paper
  • 4686 Accesses

  • 3 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7524)

Abstract

Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.

Keywords

  • Association Rule
  • Frequent Pattern
  • Mining Algorithm
  • Rule Mining
  • Rule Evaluation

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the International Conference on Very Large Data Bases, VLDB (1994)

    Google Scholar 

  2. Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)

    CrossRef  MATH  Google Scholar 

  3. Bayardo, R.J.: Constraint-based rule mining in large, dense databases. In: Proceedings of the International Conference on Data Engineering, ICDE (1999)

    Google Scholar 

  4. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society 57(1), 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  5. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the International Conference on Management of Data, SIGMOD (1997)

    Google Scholar 

  6. Cheng, H., Yan, X., Han, J., Wei Hsu, C.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of the International Conference on Data Engineering, ICDE (2007)

    Google Scholar 

  7. Clark, P., Niblett, T.: The cn2 induction algorithm. Machine Learning (1989)

    Google Scholar 

  8. Cohen, W.: Fast effective rule induction. In: Proceedings of International Conference on Machine Learning, ICML (1995)

    Google Scholar 

  9. Cohen, W., Singer, Y.: A simple, fast, and effective rule learner. In: Proceedings of the National Conference on Artificial Intelligence, AAAI (1999)

    Google Scholar 

  10. Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering 17, 1036–1050 (2005)

    CrossRef  Google Scholar 

  11. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, SIGKDD (1999)

    Google Scholar 

  12. Exarchos, T.P., Tsipouras, M.G., Papaloukas, C., Fotiadis, D.I.: A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data and Knowledge Engineering 66, 467–487 (2008)

    CrossRef  Google Scholar 

  13. Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI (1993)

    Google Scholar 

  14. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Computing Surveys 38 (2006)

    Google Scholar 

  15. Grosskreutz, H., Boley, M., Krause-Traudes, M.: Subgroup discovery for election analysis: a case study in descriptive data mining. In: Proceedings of the International Conference on Discovery Science (2010)

    Google Scholar 

  16. Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning (1995)

    Google Scholar 

  17. Kavsek, B., Lavrač, N.: APRIORI-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence 20(7), 543–583 (2006)

    CrossRef  Google Scholar 

  18. Lavrač, N., Gamberger, D.: Relevancy in Constraint-Based Subgroup Discovery. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining. LNCS (LNAI), vol. 3848, pp. 243–266. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  19. Li, J., Shen, H., Topor, R.: Mining Optimal Class Association Rule Set. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, p. 364. Springer, Heidelberg (2001)

    CrossRef  Google Scholar 

  20. Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings of the International Conference on Data Mining, ICDM (2001)

    Google Scholar 

  21. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Knowledge Discovery and Data Mining, pp. 80–86 (1998)

    Google Scholar 

  22. Nijssen, S., Guns, T., De Raedt, L.: Correlated itemset mining in roc space: a constraint programming approach. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, SIGKDD (2009)

    Google Scholar 

  23. Novak, P.K., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research (JMLR) 10, 377–403 (2009)

    MATH  Google Scholar 

  24. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (2002)

    Google Scholar 

  25. Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering (1992)

    Google Scholar 

  26. Webb, G.I.: Discovering significant patterns. Machine Learning 68(1), 1–33 (2007)

    CrossRef  Google Scholar 

  27. Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, SIGKDD (2006)

    Google Scholar 

  28. Yang, Y., Webb, G.I., Wu, X.: Discretization methods. In: The Data Mining and Knowledge Discovery Handbook, pp. 113–130. Springer (2005)

    Google Scholar 

  29. Zaki, M.J.: Scalable algorithms for association mining. IEEE Transaction on Knowledge and Data Engineering (TKDE) 12, 372–390 (2000)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science, University of Pittsburgh, USA

    Iyad Batal & Milos Hauskrecht

  2. Department of Biomedical Informatics, University of Pittsburgh, USA

    Gregory Cooper

Authors
  1. Iyad Batal
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Gregory Cooper
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Milos Hauskrecht
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK

    Peter A. Flach

  2. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road,, BS8 1UB, Bristol, UK

    Tijl De Bie & Nello Cristianini & 

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Batal, I., Cooper, G., Hauskrecht, M. (2012). A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_17

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-33486-3_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33485-6

  • Online ISBN: 978-3-642-33486-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature