Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2012: Machine Learning and Knowledge Discovery in Databases pp 243–259Cite as

  1. Home
  2. Machine Learning and Knowledge Discovery in Databases
  3. Conference paper
A Bayesian Approach for Classification Rule Mining in Quantitative Databases

A Bayesian Approach for Classification Rule Mining in Quantitative Databases

  • Dominique Gay21 &
  • Marc Boullé21 
  • Conference paper
  • 4821 Accesses

  • 8 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7524)

Abstract

We suggest a new framework for classification rule mining in quantitative data sets founded on Bayes theory – without univariate preprocessing of attributes. We introduce a space of rule models and a prior distribution defined on this model space. As a result, we obtain the definition of a parameter-free criterion for classification rules. We show that the new criterion identifies interesting classification rules while being highly resilient to spurious patterns. We develop a new parameter-free algorithm to mine locally optimal classification rules efficiently. The mined rules are directly used as new features in a classification process based on a selective naive Bayes classifier. The resulting classifier demonstrates higher inductive performance than state-of-the-art rule-based classifiers.

Keywords

  • Association Rule
  • Rule Mining
  • Pattern Mining
  • Categorical Attribute
  • Quantitative Attribute

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: ACM SIGMOD 1993, pp. 207–216 (1993)

    Google Scholar 

  2. Asuncion, A., Newman, D.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml/

  3. Boley, M., Gärtner, T., Grosskreutz, H.: Formal concept sampling for counting and threshold-free local pattern mining. In: SIAM DM 2010, pp. 177–188 (2010)

    Google Scholar 

  4. Boullé, M.: A bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)

    MATH  Google Scholar 

  5. Boullé, M.: MODL: A bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)

    CrossRef  Google Scholar 

  6. Boullé, M.: Compression-based averaging of selective naive Bayes classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)

    MATH  Google Scholar 

  7. Bringmann, B., Nijssen, S., Zimmermann, A.: Pattern-based classification: A unifying perspective. In: LeGo 2009 Workshop @ EMCL/PKDD 2009 (2009)

    Google Scholar 

  8. Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: Proceedings ICDE 2007, pp. 716–725 (2007)

    Google Scholar 

  9. Cohen, W.W.: Fast effective rule induction. In: ICML 1995, pp. 115–123 (1995)

    Google Scholar 

  10. Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley (2006)

    Google Scholar 

  11. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: ICML 1998, pp. 144–151 (1998)

    Google Scholar 

  13. Fürnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Revue 13(1), 3–54 (1999)

    CrossRef  MATH  Google Scholar 

  14. Gay, D., Selmaoui, N., Boulicaut, J.-F.: Feature Construction Based on Closedness Properties Is Not That Simple. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 112–123. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  15. Grünwald, P.: The minimum description length principle. MIT Press (2007)

    Google Scholar 

  16. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Expl. 11(1), 10–18 (2009)

    CrossRef  Google Scholar 

  17. Jorge, A.M., Azevedo, P.J., Pereira, F.: Distribution Rules with Numeric Attributes of Interest. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 247–258. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  18. Ke, Y., Cheng, J., Ng, W.: Correlated pattern mining in quantitative databases. ACM Transactions on Database Systems 33(3) (2008)

    Google Scholar 

  19. Kontonasios, K.N., de Bie, T.: An information-theoretic approach to finding informative noisy tiles in binary databases. In: SIAM DM 2010, pp. 153–164 (2010)

    Google Scholar 

  20. van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  21. Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications. Springer (2008)

    Google Scholar 

  22. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings KDD 1998, pp. 80–86 (1998)

    Google Scholar 

  23. Pfahringer, B.: A New MDL Measure for Robust Rule Induction. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 331–334. Springer, Heidelberg (1995)

    CrossRef  Google Scholar 

  24. Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A Midterm Report. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 3–20. Springer, Heidelberg (1993)

    Google Scholar 

  25. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal (1948)

    Google Scholar 

  26. Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: SIGMOD 1996, pp. 1–12 (1996)

    Google Scholar 

  27. Tatti, N.: Probably the best itemsets. In: KDD 2010, pp. 293–302 (2010)

    Google Scholar 

  28. Voisine, N., Boullé, M., Hue, C.: A bayes evaluation criterion for decision trees. In: Advances in Knowledge Discovery & Management, pp. 21–38. Springer (2010)

    Google Scholar 

  29. Wang, J., Karypis, G.: HARMONY : efficiently mining the best rules for classification. In: Proceedings SIAM DM 2005, pp. 34–43 (2005)

    Google Scholar 

  30. Webb, G.I.: Discovering associations with numeric variables. In: KDD 2001, pp. 383–388 (2001)

    Google Scholar 

  31. Webb, G.I.: Discovering significant patterns. Machine Learning 68(1), 1–33 (2007)

    CrossRef  Google Scholar 

  32. Yin, X., Han, J.: CPAR : Classification based on predictive association rules. In: Proceedings SIAM DM 2003, pp. 369–376 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Orange Labs, 2, avenue Pierre Marzin, F-22307, Lannion Cedex, France

    Dominique Gay & Marc Boullé

Authors
  1. Dominique Gay
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Marc Boullé
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK

    Peter A. Flach

  2. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road,, BS8 1UB, Bristol, UK

    Tijl De Bie & Nello Cristianini & 

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gay, D., Boullé, M. (2012). A Bayesian Approach for Classification Rule Mining in Quantitative Databases. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_16

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-33486-3_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33485-6

  • Online ISBN: 978-3-642-33486-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature