Imprecise Learning from Misclassified and Incomplete Categorical Data with Unknown Error Structure

Omar, Aziz; Augustin, Thomas

doi:10.1007/978-3-031-15509-3_39

Aziz Omar^21,22,23 &
Thomas Augustin²²

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1433))

Included in the following conference series:

International Conference on Soft Methods in Probability and Statistics

334 Accesses

Abstract

This article addresses the problem of learning from potentially misclassified and incomplete categorical data when the error structure is unknown and no prior information about the distribution of the data is available. We propose to use the knowledge gained from the well-known practice of double sampling to accomplish two goals; First, we estimate the unknown error structure. Then, under the framework of imprecise probability, we derive a prior Dirichlet distribution that expresses a state of quasi-near-ignorance about the data. Updating this prior using sample data leads to a quasi-near-ignorance posterior distribution that produces non-trivial estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Tenenbein (1972) discussed some rules to calculate m based on observation cost and desired precision.
2.
Interested readers are referred to Omar et al. (2022, Theorem 3).

References

Augustin, T., Coolen, F.P.A., de Cooman, G., Troffaes, M.C.M. (eds.): Introduction to Imprecise Probabilities. Wiley Series in Probability and Statistics, Wiley, Chichester (2014)
MATH Google Scholar
Bayarri, M.J., Berger, J.O.: The interplay of Bayesian and frequentist analysis. Stat. Sci. 19(1), 58–80 (2004)
Article MathSciNet Google Scholar
Bernard, J.-M.: An introduction to the imprecise Dirichlet model for multinomial data. Int. J. Approximate Reasoning 39(2–3), 123–150 (2005)
Article MathSciNet Google Scholar
Bernard, J.-M., Ruggeri, F. (eds.): Special section on the imprecise Dirichlet model (Issues in imprecise probability). Int. J. Approximate Reasoning 50(2), 201–268 (2009)
Google Scholar
Bickel, D.R.: Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing. Stat. Methods Appl. 24(4), 523–546 (2015). https://doi.org/10.1007/s10260-015-0299-6
Article MathSciNet MATH Google Scholar
Bollinger, C.-R., Van Hasselt, M.: A Bayesian analysis of binary misclassification. Econ. Lett. 156, 68–73 (2017)
Article MathSciNet Google Scholar
Bross, I.: Misclassification in 2 \(\times \) 2 tables. Biometrics 10(4), 478–486 (1954)
Article MathSciNet Google Scholar
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)
Article Google Scholar
Hu, Z.H.: Dirichlet process probit misclassification mixture model for misclassified binary data. Ph.D. thesis, UCL (2021). https://discovery.ucl.ac.uk/id/eprint/10140643/. Cited 15 May 2022
Küchenhoff, H., Augustin, T., Kunz, A.: Partially identified prevalence estimation under misclassification using the Kappa coefficient. Int. J. Approximate Reasoning 53(8), 1168–1182 (2012)
Article MathSciNet Google Scholar
Manski, C.F.: Partial Identification of Probability Distributions. Springer, New York (2003). https://doi.org/10.1007/b97478
Book MATH Google Scholar
Masegosa, A.R., Moral, S.: Imprecise probability models for learning multinomial distributions from data. Applications to learning credal networks. Int. J. Approximate Reasoning 55(7), 1548–1569 (2014)
Google Scholar
Omar, A., von Oertzen, T., Augustin, T.: Learning from categorical data subject to non-random misclassification and non-response under prior quasi-near-ignorance using an imprecise Dirichlet model. In: Ciucci, D., et al. (eds.) Proceedings of the 19th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2022) (2022, forthcoming). https://link.springer.com/chapter/10.1007/978-3-031-08974-9_43
Piatti, A., Zaffalon, M., Trojani, F., Hutter, M.: Limits of learning about a categorical latent variable under prior near-ignorance. Int. J. Approximate Reasoning 50(4), 597–611 (2009)
Article MathSciNet Google Scholar
Raue, A., Kreutz, C., Theis, F.J., Timmer, J.: Joining forces of Bayesian and frequentist methodology: a study for inference in the presence of non-identifiability. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 371(1984), 20110544 (2013)
Article MathSciNet Google Scholar
Swartz, T., Haitovsky, Y., Vexler, A., Yang, T.: Bayesian identifiability and misclassification in multinomial data. Can. J. Stat. 32(3), 285–302 (2004)
Article MathSciNet Google Scholar
Tenenbein, A.: A double sampling scheme for estimating from misclassified multinomial data with applications to sampling inspection. Technometrics 14(1), 187–202 (1972)
Article Google Scholar
Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London (1991)
Book Google Scholar
Walley, P.: Inferences from multinomial data: learning about a bag of marbles. J. Roy. Stat. Soc. 58(1), 3–57 (1996)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to an anonymous reviewer for their helpful comments and suggestions.

Author information

Authors and Affiliations

Department of Psychology, University of the German Federal Armed Forces in Munich, Munich, Germany
Aziz Omar
Department of Statistics, LMU Munich, Munich, Germany
Aziz Omar & Thomas Augustin
Department of Mathematics, Insurance and Applied Statistics, Helwan University, Cairo, Egypt
Aziz Omar

Authors

Aziz Omar
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Augustin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aziz Omar .

Editor information

Editors and Affiliations

Departamento de Estadística e I.O., University of Valladolid, Valladolid, Spain
Luis A. García-Escudero
Departamento de Estadística e I.O., University of Valladolid, Valladolid, Spain
Alfonso Gordaliza
Departamento de Estadística e I.O., University of Valladolid, Valladolid, Spain
Agustín Mayo
Departamento de Estadística, I.O. y D.M., University of Oviedo, Oviedo, Spain
María Asunción Lubiano Gomez
Departamento de Estadística, I.O. y D.M., University of Oviedo, Oviedo, Spain
Maria Angeles Gil
Department of Computational Statistics and Data Analysis, Warsaw University of Technology, Warsaw, Poland
Przemyslaw Grzegorzewski
Department of Stochastic Methods, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Olgierd Hryniewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Omar, A., Augustin, T. (2023). Imprecise Learning from Misclassified and Incomplete Categorical Data with Unknown Error Structure. In: García-Escudero, L.A., et al. Building Bridges between Soft and Statistical Methodologies for Data Science . SMPS 2022. Advances in Intelligent Systems and Computing, vol 1433. Springer, Cham. https://doi.org/10.1007/978-3-031-15509-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-15509-3_39
Published: 25 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15508-6
Online ISBN: 978-3-031-15509-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics