Skip to main content

Imprecise Learning from Misclassified and Incomplete Categorical Data with Unknown Error Structure

  • Conference paper
  • First Online:
Building Bridges between Soft and Statistical Methodologies for Data Science (SMPS 2022)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1433))

Included in the following conference series:

  • 334 Accesses

Abstract

This article addresses the problem of learning from potentially misclassified and incomplete categorical data when the error structure is unknown and no prior information about the distribution of the data is available. We propose to use the knowledge gained from the well-known practice of double sampling to accomplish two goals; First, we estimate the unknown error structure. Then, under the framework of imprecise probability, we derive a prior Dirichlet distribution that expresses a state of quasi-near-ignorance about the data. Updating this prior using sample data leads to a quasi-near-ignorance posterior distribution that produces non-trivial estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Tenenbein (1972) discussed some rules to calculate m based on observation cost and desired precision.

  2. 2.

    Interested readers are referred to Omar et al. (2022, Theorem 3).

References

  • Augustin, T., Coolen, F.P.A., de Cooman, G., Troffaes, M.C.M. (eds.): Introduction to Imprecise Probabilities. Wiley Series in Probability and Statistics, Wiley, Chichester (2014)

    MATH  Google Scholar 

  • Bayarri, M.J., Berger, J.O.: The interplay of Bayesian and frequentist analysis. Stat. Sci. 19(1), 58–80 (2004)

    Article  MathSciNet  Google Scholar 

  • Bernard, J.-M.: An introduction to the imprecise Dirichlet model for multinomial data. Int. J. Approximate Reasoning 39(2–3), 123–150 (2005)

    Article  MathSciNet  Google Scholar 

  • Bernard, J.-M., Ruggeri, F. (eds.): Special section on the imprecise Dirichlet model (Issues in imprecise probability). Int. J. Approximate Reasoning 50(2), 201–268 (2009)

    Google Scholar 

  • Bickel, D.R.: Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing. Stat. Methods Appl. 24(4), 523–546 (2015). https://doi.org/10.1007/s10260-015-0299-6

    Article  MathSciNet  MATH  Google Scholar 

  • Bollinger, C.-R., Van Hasselt, M.: A Bayesian analysis of binary misclassification. Econ. Lett. 156, 68–73 (2017)

    Article  MathSciNet  Google Scholar 

  • Bross, I.: Misclassification in 2 \(\times \) 2 tables. Biometrics 10(4), 478–486 (1954)

    Article  MathSciNet  Google Scholar 

  • Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)

    Article  Google Scholar 

  • Hu, Z.H.: Dirichlet process probit misclassification mixture model for misclassified binary data. Ph.D. thesis, UCL (2021). https://discovery.ucl.ac.uk/id/eprint/10140643/. Cited 15 May 2022

  • Küchenhoff, H., Augustin, T., Kunz, A.: Partially identified prevalence estimation under misclassification using the Kappa coefficient. Int. J. Approximate Reasoning 53(8), 1168–1182 (2012)

    Article  MathSciNet  Google Scholar 

  • Manski, C.F.: Partial Identification of Probability Distributions. Springer, New York (2003). https://doi.org/10.1007/b97478

    Book  MATH  Google Scholar 

  • Masegosa, A.R., Moral, S.: Imprecise probability models for learning multinomial distributions from data. Applications to learning credal networks. Int. J. Approximate Reasoning 55(7), 1548–1569 (2014)

    Google Scholar 

  • Omar, A., von Oertzen, T., Augustin, T.: Learning from categorical data subject to non-random misclassification and non-response under prior quasi-near-ignorance using an imprecise Dirichlet model. In: Ciucci, D., et al. (eds.) Proceedings of the 19th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2022) (2022, forthcoming). https://link.springer.com/chapter/10.1007/978-3-031-08974-9_43

  • Piatti, A., Zaffalon, M., Trojani, F., Hutter, M.: Limits of learning about a categorical latent variable under prior near-ignorance. Int. J. Approximate Reasoning 50(4), 597–611 (2009)

    Article  MathSciNet  Google Scholar 

  • Raue, A., Kreutz, C., Theis, F.J., Timmer, J.: Joining forces of Bayesian and frequentist methodology: a study for inference in the presence of non-identifiability. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 371(1984), 20110544 (2013)

    Article  MathSciNet  Google Scholar 

  • Swartz, T., Haitovsky, Y., Vexler, A., Yang, T.: Bayesian identifiability and misclassification in multinomial data. Can. J. Stat. 32(3), 285–302 (2004)

    Article  MathSciNet  Google Scholar 

  • Tenenbein, A.: A double sampling scheme for estimating from misclassified multinomial data with applications to sampling inspection. Technometrics 14(1), 187–202 (1972)

    Article  Google Scholar 

  • Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London (1991)

    Book  Google Scholar 

  • Walley, P.: Inferences from multinomial data: learning about a bag of marbles. J. Roy. Stat. Soc. 58(1), 3–57 (1996)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful to an anonymous reviewer for their helpful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aziz Omar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Omar, A., Augustin, T. (2023). Imprecise Learning from Misclassified and Incomplete Categorical Data with Unknown Error Structure. In: García-Escudero, L.A., et al. Building Bridges between Soft and Statistical Methodologies for Data Science . SMPS 2022. Advances in Intelligent Systems and Computing, vol 1433. Springer, Cham. https://doi.org/10.1007/978-3-031-15509-3_39

Download citation

Publish with us

Policies and ethics