On Concentration of Discrete Distributions with Applications to Supervised Learning of Classifiers

  • Magnus Ekdahl
  • Timo Koski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4571)


Computational procedures using independence assumptions in various forms are popular in machine learning, although checks on empirical data have given inconclusive results about their impact. Some theoretical understanding of when they work is available, but a definite answer seems to be lacking. This paper derives distributions that maximizes the statewise difference to the respective product of marginals. These distributions are, in a sense the worst distribution for predicting an outcome of the data generating mechanism by independence. We also restrict the scope of new theoretical results by showing explicitly that, depending on context, independent (’Naïve’) classifiers can be as bad as tossing coins. Regardless of this, independence may beat the generating model in learning supervised classification and we explicitly provide one such scenario.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Russell, S., Norvig, P.: Artificial intelligence: a modern approach. Prentice-Hall, Englewood Cliffs (1995)zbMATHGoogle Scholar
  2. 2.
    Chow, C., Liu, C.: Approximating discrete probability distributions with dependency trees. IEEE Transactions on Information Theory 14(3), 462–467 (1968)zbMATHCrossRefGoogle Scholar
  3. 3.
    Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning Journal 20(3), 197–243 (1995)zbMATHGoogle Scholar
  4. 4.
    Hand, D., Yu, K.: Idiot’s bayes–not so stupid after all? International Statistical Review 69(3), 385–398 (2001)zbMATHCrossRefGoogle Scholar
  5. 5.
    Lewis, P.: Approximating probability distributions to reduce storage requirements. Information and Control 2, 214–225 (1959)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)zbMATHGoogle Scholar
  7. 7.
    Catoni, O.: Statistical Learning Theory and Stochastic Optimization. Springer, Heidelberg (2004)zbMATHGoogle Scholar
  8. 8.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)zbMATHGoogle Scholar
  9. 9.
    Huang, K., King, I., Lyu, M.: Finite mixture model of bounded semi-naive Bayesian network classifier. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, Springer, Heidelberg (2003)Google Scholar
  10. 10.
    Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)zbMATHGoogle Scholar
  11. 11.
    Titterington, D., Murray, G., Murray, L., Spiegelhalter, D., Skene, A., Habbema, J., Gelpke, G.: Comparison of discrimination techniques applied to a complex data set of head injured patients. Journal of the Royal Statistical Society 144(2), 145–175 (1981)zbMATHMathSciNetGoogle Scholar
  12. 12.
    Chickering, D.: Learning equivalence classes of bayesian-network structures. The Journal of Machine Learning Research 2, 445–498 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Rish, I., Hellerstein, J., Thathachar, J.: An analysis of data characteristics that affect naive bayes performance. Technical Report RC21993, IBM (2001)Google Scholar
  14. 14.
    Ekdahl, M.: Approximations of Bayes Classifiers for Statistical Learning of Clusters. Licentiate thesis, Linköpings Universitet (2006)Google Scholar
  15. 15.
    Ekdahl, M., Koski, T., Ohlson, M.: Concentrated or non-concentrated discrete distributions are almost independent. IEEE Transactions on Information Theory (submitted)Google Scholar
  16. 16.
    Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2), 103–130 (1997)zbMATHCrossRefGoogle Scholar
  17. 17.
    Ekdahl, M., Koski, T.: Bounds for the loss in probability of correct classification under model based approximation. Journal of Machine Learning Research 7, 2473–2504 (2006)MathSciNetGoogle Scholar
  18. 18.
    Hagerup, T., Rub, C.: A guided tour of Chernoff bounds. Information Processing Letters 33, 305–308 (1989)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Magnus Ekdahl
    • 1
  • Timo Koski
    • 1
  1. 1.Department of Mathematics, Linköpings University, SE-581 83 LinköpingSweden

Personalised recommendations