Abstract
This paper introduces a new method, called the robust Bayesian estimator (RBE), to learn conditional probability distributions from incomplete data sets. The intuition behind the RBE is that, when no information about the pattern of missing data is available, an incomplete database constrains the set of all possible estimates and this paper provides a characterization of these constraints. An experimental comparison with two popular methods to estimate conditional probability distributions from incomplete data—Gibbs sampling and the EM algorithm—shows a gain in robustness. An application of the RBE to quantify a naive Bayesian classifier from an incomplete data set illustrates its practical relevance.
Article PDF
Similar content being viewed by others
References
Blake, C., Keogh, E.,& Merz, C. J. (1998). UCI Repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine, CA.
Castillo, E., Gutierrez, J. M.,& Hadi, A. S. (1997). Expert systems and probabilistic network models. New York, NY: Springer.
Cheeseman, P.& Stutz, J. (1996). Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth,& R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 153–180). Cambridge, MA: MIT Press.
Cooper, G. F.& Herskovitz, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
Dempster, A. P., Laird, D.,& Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, B, 39, 1–38.
Domingos, P.& Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.
Fertig, K.W.& Breese, J. S. (1993). Probability intervals over influence diagrams. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 280–286.
Friedman, N. (1997). Learning belief networks in the presence of missing values and hidden variables. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (pp. 1277–1284). San Francisco, CA: Morgan Kaufmann.
Friedman, N., Geiger, D.,& Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.
Geman, S.& Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.
Good, I. J. (1968). The estimation of probability: An essay on modern bayesian methods. Cambridge, MA: MIT Press.
Heckerman, D., Geiger, D.,& Chickering, D. M. (1995). Learning Bayesian networks: The combinations of knowledge and statistical data. Machine Learning, 20, 197–243.
Kyburg, H. E. (1983). Rational belief. Behavioral and Brain Sciences, 6, 231–273.
Langley, P., Iba, W.,& Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). Menlo Park, CA: AAAI Press.
Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19, 191–201.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA: Morgan Kaufmann.
Ramoni, M. (1995). Ignorant influence diagrams. In Proceedings of the Fortheenth International Joint Conference on Artificial Intelligence (pp. 1808–1814). San Francisco, CA: Morgan Kaufmann.
Ramoni, M., Riva, A., Stefanelli, M.,& Patel, V. (1995). An ignorant belief network to forecast glucose concentration from clinical databases. Artificial Intelligence in Medicine, 7, 541–559.
Ramoni, M.& Sebastiani, P. (1998). Parameter estimation in Bayesian networks from incomplete databases. Intelligent Data Analysis Journal, 2, 139–160.
Ramoni, M.& Sebastiani, P. (1999). Bayesian methods. In M. Berthold& D. J. Hand (Eds.), Intelligent data analysis. An introduction (pp. 29–166). New York, NY: Springer.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Russell, S., Binder, J., Koller, D.,& Kanazawa, K. (1995). Local learning in probabilistic networks with hidden variables. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1146–1151). San Francisco, CA: Morgan Kaufmann.
Snow, P. (1991). Improved posterior probability estimates from prior and linear constraint system. IEEE Transactions on Systems, Man, and Cybernetics, 21, 464–469.
Spiegelhalter, D. J.& Cowell, R. G. (1992). Learning in probabilistic expert systems. In Bayesian statistics 4 (pp. 447–466). Oxford, UK: Oxford University Press.
Thiesson, B. (1995). Accelerated quantification of Bayesian networks with incomplete data. In Proceedings of the First ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 306–311). New York, NY: ACM Press.
Thomas, A., Spiegelhalter, D. J.,& Gilks, W. R. (1992). Bugs: A program to perform Bayesian inference using Gibbs Sampling. In Bayesian statistics 4 (pp. 837–42). Oxford, UK: Oxford University Press.
Whittaker, J. (1990). Graphical models in applied multivariate statistics. New York, NY: Wiley.
Zhang, N. L. (1996). Irrelevance and parameter learning in Bayesian networks. Artificial Intelligence, 88, 359–373.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ramoni, M., Sebastiani, P. Robust Learning with Missing Data. Machine Learning 45, 147–170 (2001). https://doi.org/10.1023/A:1010968702992
Issue Date:
DOI: https://doi.org/10.1023/A:1010968702992