A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets

Gay, Dominique; Boullé, Marc

doi:10.1007/978-3-642-35855-5_1

Dominique Gay⁵ &
Marc Boullé⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 471))

616 Accesses

Abstract

Classification rules play an important role in prediction tasks. Their popularity is mainly due to their simple and interpretable form. Classification methods combining classification rules that are interesting (w.r.t. a defined interestingness measure) generally lead to good predictions. However, the performance of rulebased classifiers is strongly dependent on the interestingness measure used (e.g. confidence, growth rate, ... ) and on themeasure threshold to be set for differentiating interesting from non-interesting rules; threshold setting is a non-trivial problem. Furthermore, it can be easily shown that the mined rules are individually non-robust: an interesting (e.g. frequent and confident) rule mined from the training set could be no more confident in a test phase. In this paper, we suggest a new criterion for the evaluation of the robustness of classification rules in binary labeled data sets. Our criterion arises from a Bayesian approach: we propose an expression of the probability of a rule given the data. The most probable rules are thus the rules that are robust. Our Bayesian criterion is derived from this defined expression and allows us to mark out the robust rules from a given set of rules without parameter tuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings ACM SIGMOD 1993, pp. 207–216 (1993)
Google Scholar
Antonie, M.-L., Zaïane, O.R.: An associative classifier based on positive and negative rules. In: DMKD 2004 (2004)
Google Scholar
Baralis, E., Chiusano, S.: Essential classification rule sets. ACM Transactions on Database Systems 29(4), 635–674 (2004)
Article Google Scholar
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets : A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7(1), 5–22 (2003)
Article MathSciNet Google Scholar
Boullé, M.: A bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)
MATH Google Scholar
Boullé, M.: MODL: A bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)
Article Google Scholar
Bringmann, B., Nijssen, S., Zimmermann, A.: Pattern-based classification: A unifying perspective. In: LeGo 2009 Workshop co-located with EMCL/PKDD 2009 (2009)
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings KDD 1999, pp. 43–52. ACM Press (1999)
Google Scholar
Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: Classification by Aggregating Emerging Patterns. In: Arikawa, S., Nakata, I. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 30–42. Springer, Heidelberg (1999)
Chapter Google Scholar
François, P., Crémilleux, B., Robert, C., Demongeot, J.: MENINGE: a medical consulting system for child’s meningitis study on a series of consecutive cases. Artificial Intelligence in Medecine 4(4), 281–292 (1992)
Article Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Gay, D., Boullé, M.: Un critère bayésien pour évaluer la robustesse des règles de classification. In: EGC 2011. Revue des Nouvelles Technologies de l’Information, vol. RNTI-E-20, pp. 539–550. Hermann-Éditions (2011)
Google Scholar
Grünwald, P.: The minimum description length principle. MIT Press (2007)
Google Scholar
Hue, C., Boullé, M.: A new probabilistic approach in rank regression with optimal bayesian partitioning. Journal of Machine Learning Research 8, 2727–2754 (2007)
MATH Google Scholar
Jorge, A.M., Azevedo, P.J., Pereira, F.: Distribution Rules with Numeric Attributes of Interest. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 247–258. Springer, Heidelberg (2006)
Chapter Google Scholar
Khenchaf, A., Poncelet, P. (eds.): Extraction et gestion des connaissances (EGC 2011), Janvier 25-29, Brest, France. Revue des Nouvelles Technologies de l’Information, vol. RNTI-E-20. Hermann-Éditions (2011)
Google Scholar
Le Bras, Y., Meyer, P., Lenca, P., Lallich, S.: A Robustness Measure of Association Rules. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS (LNAI), vol. 6322, pp. 227–242. Springer, Heidelberg (2010)
Chapter Google Scholar
Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer (2008)
Google Scholar
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings ICDM 2001, pp. 369–376. IEEE Computer Society (2001)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings KDD 1998, pp. 80–86. AAAI Press (1998)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1), 25–46 (1999)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal (1948)
Google Scholar
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SIAM DM 2006 (2006)
Google Scholar
Suzuki, E.: Negative Encoding Length as a Subjective Interestingness Measure for Groups of Rules. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 220–231. Springer, Heidelberg (2009)
Chapter Google Scholar
van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)
Chapter Google Scholar
Voisine, N., Boullé, M., Hue, C.: A Bayes Evaluation Criterion for Decision Trees. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 21–38. Springer, Heidelberg (2010)
Chapter Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann (2005)
Google Scholar
Zhang, X., Dong, G., Ramamohanarao, K.: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In: KDD 2000, pp. 310–314 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

TECH/ASAP/PROFiling & data mining, Orange Labs, 2, avenue Pierre Marzin, F-22307, Lannion Cédex, France
Dominique Gay & Marc Boullé

Authors

Dominique Gay
View author publications
You can also search for this author in PubMed Google Scholar
Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominique Gay .

Editor information

Editors and Affiliations

Polytechnic School of Nantes University, LINA (CNRS UMR 6241), rue C. Pauc, Nantes Cedex 3, 44306, France
Fabrice Guillet
LaBRI, Univ. Bordeaux 1, 351 Cours de la Libération, Talence Cedex, 33405, France
Bruno Pinaud
, Polytech'Tours, Dpt Informatique, Université François-Rabelais de Tours, 64 avenue Jean Portalis, Tours, 37200, France
Gilles Venturini
Laboratoire ERIC, Université Lumière Lyon 2, Bât L., avenue Pierre Mendès-France 5, Bron, 69600, France
Djamel Abdelkader Zighed

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gay, D., Boullé, M. (2013). A Bayesian Criterion for Evaluating the Robustness of Classification Rules in Binary Data Sets. In: Guillet, F., Pinaud, B., Venturini, G., Zighed, D. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 471. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35855-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-35855-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35854-8
Online ISBN: 978-3-642-35855-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics