Why the Naive Bayesian Classifier for Clinical Diagnostics or Monitoring Can Dominate the Proper One Even for Massive Data Sets

Lenz, Hans - J.

doi:10.1007/978-3-319-12355-4_23

Hans - J. Lenz⁵

Part of the book series: Frontiers in Statistical Quality Control ((FSQC))

1502 Accesses
1 Citations

Abstract

We explain the phenomenon that the naive Bayesian classifier may dominate the proper one as happened in clinical studies, cf. Gammerman and Thatcher (Methods of Information in Medicine, 30, 15–22, 1991). Today this effect may be of concern for real-time health care monitoring or surveillance. The reason for the dominance relation lies in a mix of an a-priori not fixed dimension of the state-space (symptom space) given a disease, the feature selection procedure and the parameter estimation. Estimating conditional probabilities in high dimensions when using a proper Bayesian model can lead to an “over fitting,” a missing value problem, and, consequently, to a loss of classification accuracy. Due to the “Curse of dimension” the degradation may not even be compensated by big data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barnard, J., & Rubin, D. B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86, 948–955.
Article MATH MathSciNet Google Scholar
Fahrmeir, L, Hamerle, A., & Tutz, G. (1996). Multivariate statistische Verfahren. Berlin: De Gruyter.
MATH Google Scholar
Gammerman, A., & Thatcher, A. R. (1991). Bayesian Diagnostic probabilities without assuming independence of symptoms. Methods of Information in Medicine, 30, 15–22.
Google Scholar
Khanna, T. (1990). Foundations of neural networks. New York: Addison-Wesley, Reading etc.
MATH Google Scholar
Kolmogorov, A. N. (1933). Grundbegriffe der wahrscheinlichkeitsrechnung. Berlin: Springer
Book Google Scholar
Lenz, H.-J. (1995). On the idiot vs. proper bayes approach in clinical diagnostic systems. In A. Gammerman (Ed.), Probabilistic Reasoning and Bayesian Belief Networks (pp. 227–236). Alfred Waller in association with UNICOM, Henley-on-Thames.
Google Scholar
Schwartz, S., Wiles, J., Gough, I., & Phillips, S. (1993). Connectionist, rule-based and Bayesian decision aids: an empirical comparison. In D. J. Hand (Ed.), Artificial intelligence frontiers in statistics (pp. 264–278). London: Chapman & Hall.
Chapter Google Scholar
Spiegelhalter, D. J. (1986). A Statistical view of uncertainty in expert systems. In W. Gale (Ed.), Artificial intelligence and statistics (pp. 17–55). New York: Addison-Wesley, Reading.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Statistik und Ökonometrie, Freie Universität Berlin, 14195, Berlin, Germany
Hans - J. Lenz

Authors

Hans - J. Lenz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans - J. Lenz .

Editor information

Editors and Affiliations

Helmut Schmidt University, Hamburg, Germany
Sven Knoth
Europa-Universität Viadrina, Frankfurt/Oder, Brandenburg, Germany
Wolfgang Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lenz, H . (2015). Why the Naive Bayesian Classifier for Clinical Diagnostics or Monitoring Can Dominate the Proper One Even for Massive Data Sets. In: Knoth, S., Schmid, W. (eds) Frontiers in Statistical Quality Control 11. Frontiers in Statistical Quality Control. Springer, Cham. https://doi.org/10.1007/978-3-319-12355-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-12355-4_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12354-7
Online ISBN: 978-3-319-12355-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics