Robust Learning with Missing Data

Ramoni, Marco; Sebastiani, Paola

doi:10.1023/A:1010968702992

Robust Learning with Missing Data

Published: November 2001

Volume 45, pages 147–170, (2001)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Robust Learning with Missing Data

Download PDF

Marco Ramoni¹ &
Paola Sebastiani²

1955 Accesses
98 Citations
Explore all metrics

Abstract

This paper introduces a new method, called the robust Bayesian estimator (RBE), to learn conditional probability distributions from incomplete data sets. The intuition behind the RBE is that, when no information about the pattern of missing data is available, an incomplete database constrains the set of all possible estimates and this paper provides a characterization of these constraints. An experimental comparison with two popular methods to estimate conditional probability distributions from incomplete data—Gibbs sampling and the EM algorithm—shows a gain in robustness. An application of the RBE to quantify a naive Bayesian classifier from an incomplete data set illustrates its practical relevance.

References

Blake, C., Keogh, E.,& Merz, C. J. (1998). UCI Repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine, CA.
Google Scholar
Castillo, E., Gutierrez, J. M.,& Hadi, A. S. (1997). Expert systems and probabilistic network models. New York, NY: Springer.
Google Scholar
Cheeseman, P.& Stutz, J. (1996). Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth,& R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 153–180). Cambridge, MA: MIT Press.
Google Scholar
Cooper, G. F.& Herskovitz, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
Google Scholar
Dempster, A. P., Laird, D.,& Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, B, 39, 1–38.
Google Scholar
Domingos, P.& Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.
Google Scholar
Fertig, K.W.& Breese, J. S. (1993). Probability intervals over influence diagrams. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 280–286.
Google Scholar
Friedman, N. (1997). Learning belief networks in the presence of missing values and hidden variables. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (pp. 1277–1284). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Friedman, N., Geiger, D.,& Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.
Google Scholar
Geman, S.& Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.
Google Scholar
Good, I. J. (1968). The estimation of probability: An essay on modern bayesian methods. Cambridge, MA: MIT Press.
Google Scholar
Heckerman, D., Geiger, D.,& Chickering, D. M. (1995). Learning Bayesian networks: The combinations of knowledge and statistical data. Machine Learning, 20, 197–243.
Google Scholar
Kyburg, H. E. (1983). Rational belief. Behavioral and Brain Sciences, 6, 231–273.
Google Scholar
Langley, P., Iba, W.,& Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). Menlo Park, CA: AAAI Press.
Google Scholar
Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19, 191–201.
Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Ramoni, M. (1995). Ignorant influence diagrams. In Proceedings of the Fortheenth International Joint Conference on Artificial Intelligence (pp. 1808–1814). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Ramoni, M., Riva, A., Stefanelli, M.,& Patel, V. (1995). An ignorant belief network to forecast glucose concentration from clinical databases. Artificial Intelligence in Medicine, 7, 541–559.
Google Scholar
Ramoni, M.& Sebastiani, P. (1998). Parameter estimation in Bayesian networks from incomplete databases. Intelligent Data Analysis Journal, 2, 139–160.
Google Scholar
Ramoni, M.& Sebastiani, P. (1999). Bayesian methods. In M. Berthold& D. J. Hand (Eds.), Intelligent data analysis. An introduction (pp. 29–166). New York, NY: Springer.
Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Google Scholar
Russell, S., Binder, J., Koller, D.,& Kanazawa, K. (1995). Local learning in probabilistic networks with hidden variables. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1146–1151). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Snow, P. (1991). Improved posterior probability estimates from prior and linear constraint system. IEEE Transactions on Systems, Man, and Cybernetics, 21, 464–469.
Google Scholar
Spiegelhalter, D. J.& Cowell, R. G. (1992). Learning in probabilistic expert systems. In Bayesian statistics 4 (pp. 447–466). Oxford, UK: Oxford University Press.
Google Scholar
Thiesson, B. (1995). Accelerated quantification of Bayesian networks with incomplete data. In Proceedings of the First ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 306–311). New York, NY: ACM Press.
Google Scholar
Thomas, A., Spiegelhalter, D. J.,& Gilks, W. R. (1992). Bugs: A program to perform Bayesian inference using Gibbs Sampling. In Bayesian statistics 4 (pp. 837–42). Oxford, UK: Oxford University Press.
Google Scholar
Whittaker, J. (1990). Graphical models in applied multivariate statistics. New York, NY: Wiley.
Google Scholar
Zhang, N. L. (1996). Irrelevance and parameter learning in Bayesian networks. Artificial Intelligence, 88, 359–373.
Google Scholar

Download references

Author information

Authors and Affiliations

Children's Hospital Informatics Program, Harvard Medical School, Boston, MA, 02115, USA
Marco Ramoni
Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA, 01002, USA
Paola Sebastiani

Authors

Marco Ramoni
View author publications
You can also search for this author in PubMed Google Scholar
Paola Sebastiani
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramoni, M., Sebastiani, P. Robust Learning with Missing Data. Machine Learning 45, 147–170 (2001). https://doi.org/10.1023/A:1010968702992

Download citation

Issue Date: November 2001
DOI: https://doi.org/10.1023/A:1010968702992

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust Learning with Missing Data

Abstract

Article PDF

Similar content being viewed by others

Maximum Likelihood Under Incomplete Information: Toward a Comparison of Criteria

Imprecise Learning from Misclassified and Incomplete Categorical Data with Unknown Error Structure

Maximum Likelihood Estimation and Coarse Data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Robust Learning with Missing Data

Abstract

Article PDF

Similar content being viewed by others

Maximum Likelihood Under Incomplete Information: Toward a Comparison of Criteria

Imprecise Learning from Misclassified and Incomplete Categorical Data with Unknown Error Structure

Maximum Likelihood Estimation and Coarse Data

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation