Improved Naive Bayes for Extremely Skewed Misclassification Costs

Kołcz, Aleksander; Chowdhury, Abdur

doi:10.1007/11564126_58

Aleksander Kołcz²³ &
Abdur Chowdhury²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3721))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2850 Accesses
2 Citations

Abstract

Naive Bayes has been an effective and important classifier in the text categorization domain despite violations of its underlying assumptions. Although quite accurate, it tends to provide poor estimates of the posterior class probabilities, which hampers its application in the cost-sensitive context. The apparent high confidence with which certain errors are made is particularly problematic when misclassification costs are highly skewed, since conservative setting of the decision threshold may greatly decrease the classifier utility. We propose an extension of the Naive Bayes algorithm aiming to discount the confidence with which errors are made. The approach is based on measuring the amount of change to feature distribution necessary to reverse the initial classifier decision and can be implemented efficiently without over-complicating the process of Naive Bayes induction. In experiments with three benchmark document collections, the decision-reversal Naive Bayes is demonstrated to substantially improve over the popular multinomial version of the Naive Bayes algorithm, in some cases performing more than 40% better.

Download to read the full chapter text

Chapter PDF

Constrained Naïve Bayes with application to unbalanced data classification

Article Open access 20 October 2021

A discriminative model selection approach and its application to text classification

Article 15 July 2017

Supervised Classification Using Balanced Training

References

Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of the 10th European Conference on Machine Learning, pp. 4–15 (1998)
Google Scholar
McCallum, A., Nigam, K.: A comparision of event models for Naive Bayes text classification. In: Proceedings of the AAAI 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifer under zero-one los. Machine Language 29, 103–130 (1997)
Article MATH Google Scholar
Webb, G., Pazzani, M.: Adjiusted probability navie bayesian induction. In: Proceedings of the 11th Australian Joint Conference on Artificial Intelligence (1998)
Google Scholar
Bennett, P.N.: Assessing the calibration of Native Bayes posterior estimats. Technical Report CMU-CS-155, Computer Science Department, School of Computer Science, Carnegie Mellon University (2000)
Google Scholar
Wu, Y.L., Goh, K.S., Li, B., You, H., Chang, E.Y.: The anatonomy of a myltimodal information filter. In: Proceedings of the Ninth ACM SIGKDD International Conference on knowledge Discovery and Data Mining (KDD 2003), pp. 462–471 (2003)
Google Scholar
Kukar, M.: Transductive reliability estimation for medical diagnosis. Artificial Intelligene in Medicine 29, 81–106 (2003)
Article Google Scholar
Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training algorithms for linear text classifiers. In: Proceedings of SIGIR 1996, 19th ACM International Conference on Research and Development in Infromation Retrieval, pp. 298–306 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

AOL, Inc., 44900 Prentice Drive, Dulles, VA, 20166, USA
Aleksander Kołcz & Abdur Chowdhury

Authors

Aleksander Kołcz
View author publications
You can also search for this author in PubMed Google Scholar
Abdur Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6, 4050-190, Porto, Portugal
Luís Torgo
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel Brazdil
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
Faculty of Economics of the University of Porto, Portugal
João Gama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kołcz, A., Chowdhury, A. (2005). Improved Naive Bayes for Extremely Skewed Misclassification Costs. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_58

Download citation

DOI: https://doi.org/10.1007/11564126_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improved Naive Bayes for Extremely Skewed Misclassification Costs

Abstract

Chapter PDF

Similar content being viewed by others

Constrained Naïve Bayes with application to unbalanced data classification

A discriminative model selection approach and its application to text classification

Supervised Classification Using Balanced Training

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improved Naive Bayes for Extremely Skewed Misclassification Costs

Abstract

Chapter PDF

Similar content being viewed by others

Constrained Naïve Bayes with application to unbalanced data classification

A discriminative model selection approach and its application to text classification

Supervised Classification Using Balanced Training

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation