Abstract
We explain the phenomenon that the naive Bayesian classifier may dominate the proper one as happened in clinical studies, cf. Gammerman and Thatcher (Methods of Information in Medicine, 30, 15–22, 1991). Today this effect may be of concern for real-time health care monitoring or surveillance. The reason for the dominance relation lies in a mix of an a-priori not fixed dimension of the state-space (symptom space) given a disease, the feature selection procedure and the parameter estimation. Estimating conditional probabilities in high dimensions when using a proper Bayesian model can lead to an “over fitting,” a missing value problem, and, consequently, to a loss of classification accuracy. Due to the “Curse of dimension” the degradation may not even be compensated by big data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barnard, J., & Rubin, D. B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86, 948–955.
Fahrmeir, L, Hamerle, A., & Tutz, G. (1996). Multivariate statistische Verfahren. Berlin: De Gruyter.
Gammerman, A., & Thatcher, A. R. (1991). Bayesian Diagnostic probabilities without assuming independence of symptoms. Methods of Information in Medicine, 30, 15–22.
Khanna, T. (1990). Foundations of neural networks. New York: Addison-Wesley, Reading etc.
Kolmogorov, A. N. (1933). Grundbegriffe der wahrscheinlichkeitsrechnung. Berlin: Springer
Lenz, H.-J. (1995). On the idiot vs. proper bayes approach in clinical diagnostic systems. In A. Gammerman (Ed.), Probabilistic Reasoning and Bayesian Belief Networks (pp. 227–236). Alfred Waller in association with UNICOM, Henley-on-Thames.
Schwartz, S., Wiles, J., Gough, I., & Phillips, S. (1993). Connectionist, rule-based and Bayesian decision aids: an empirical comparison. In D. J. Hand (Ed.), Artificial intelligence frontiers in statistics (pp. 264–278). London: Chapman & Hall.
Spiegelhalter, D. J. (1986). A Statistical view of uncertainty in expert systems. In W. Gale (Ed.), Artificial intelligence and statistics (pp. 17–55). New York: Addison-Wesley, Reading.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Lenz, H . (2015). Why the Naive Bayesian Classifier for Clinical Diagnostics or Monitoring Can Dominate the Proper One Even for Massive Data Sets. In: Knoth, S., Schmid, W. (eds) Frontiers in Statistical Quality Control 11. Frontiers in Statistical Quality Control. Springer, Cham. https://doi.org/10.1007/978-3-319-12355-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-12355-4_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12354-7
Online ISBN: 978-3-319-12355-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)