Abstract
This article addresses the problem of learning from potentially misclassified and incomplete categorical data when the error structure is unknown and no prior information about the distribution of the data is available. We propose to use the knowledge gained from the well-known practice of double sampling to accomplish two goals; First, we estimate the unknown error structure. Then, under the framework of imprecise probability, we derive a prior Dirichlet distribution that expresses a state of quasi-near-ignorance about the data. Updating this prior using sample data leads to a quasi-near-ignorance posterior distribution that produces non-trivial estimates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Augustin, T., Coolen, F.P.A., de Cooman, G., Troffaes, M.C.M. (eds.): Introduction to Imprecise Probabilities. Wiley Series in Probability and Statistics, Wiley, Chichester (2014)
Bayarri, M.J., Berger, J.O.: The interplay of Bayesian and frequentist analysis. Stat. Sci. 19(1), 58–80 (2004)
Bernard, J.-M.: An introduction to the imprecise Dirichlet model for multinomial data. Int. J. Approximate Reasoning 39(2–3), 123–150 (2005)
Bernard, J.-M., Ruggeri, F. (eds.): Special section on the imprecise Dirichlet model (Issues in imprecise probability). Int. J. Approximate Reasoning 50(2), 201–268 (2009)
Bickel, D.R.: Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing. Stat. Methods Appl. 24(4), 523–546 (2015). https://doi.org/10.1007/s10260-015-0299-6
Bollinger, C.-R., Van Hasselt, M.: A Bayesian analysis of binary misclassification. Econ. Lett. 156, 68–73 (2017)
Bross, I.: Misclassification in 2 \(\times \) 2 tables. Biometrics 10(4), 478–486 (1954)
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)
Hu, Z.H.: Dirichlet process probit misclassification mixture model for misclassified binary data. Ph.D. thesis, UCL (2021). https://discovery.ucl.ac.uk/id/eprint/10140643/. Cited 15 May 2022
Küchenhoff, H., Augustin, T., Kunz, A.: Partially identified prevalence estimation under misclassification using the Kappa coefficient. Int. J. Approximate Reasoning 53(8), 1168–1182 (2012)
Manski, C.F.: Partial Identification of Probability Distributions. Springer, New York (2003). https://doi.org/10.1007/b97478
Masegosa, A.R., Moral, S.: Imprecise probability models for learning multinomial distributions from data. Applications to learning credal networks. Int. J. Approximate Reasoning 55(7), 1548–1569 (2014)
Omar, A., von Oertzen, T., Augustin, T.: Learning from categorical data subject to non-random misclassification and non-response under prior quasi-near-ignorance using an imprecise Dirichlet model. In: Ciucci, D., et al. (eds.) Proceedings of the 19th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2022) (2022, forthcoming). https://link.springer.com/chapter/10.1007/978-3-031-08974-9_43
Piatti, A., Zaffalon, M., Trojani, F., Hutter, M.: Limits of learning about a categorical latent variable under prior near-ignorance. Int. J. Approximate Reasoning 50(4), 597–611 (2009)
Raue, A., Kreutz, C., Theis, F.J., Timmer, J.: Joining forces of Bayesian and frequentist methodology: a study for inference in the presence of non-identifiability. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 371(1984), 20110544 (2013)
Swartz, T., Haitovsky, Y., Vexler, A., Yang, T.: Bayesian identifiability and misclassification in multinomial data. Can. J. Stat. 32(3), 285–302 (2004)
Tenenbein, A.: A double sampling scheme for estimating from misclassified multinomial data with applications to sampling inspection. Technometrics 14(1), 187–202 (1972)
Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London (1991)
Walley, P.: Inferences from multinomial data: learning about a bag of marbles. J. Roy. Stat. Soc. 58(1), 3–57 (1996)
Acknowledgements
The authors are grateful to an anonymous reviewer for their helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Omar, A., Augustin, T. (2023). Imprecise Learning from Misclassified and Incomplete Categorical Data with Unknown Error Structure. In: GarcÃa-Escudero, L.A., et al. Building Bridges between Soft and Statistical Methodologies for Data Science . SMPS 2022. Advances in Intelligent Systems and Computing, vol 1433. Springer, Cham. https://doi.org/10.1007/978-3-031-15509-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-15509-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15508-6
Online ISBN: 978-3-031-15509-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)