Knowledge and Information Systems

, Volume 24, Issue 1, pp 35–57 | Cite as

A non-parametric semi-supervised discretization method

  • Alexis Bondu
  • Marc Boullé
  • Vincent Lemaire
Regular Paper


Semi-supervised classification methods aim to exploit labeled and unlabeled examples to train a predictive model. Most of these approaches make assumptions on the distribution of classes. This article first proposes a new semi-supervised discretization method, which adopts very low informative prior on data. This method discretizes the numerical domain of a continuous input variable, while keeping the information relative to the prediction of classes. Then, an in-depth comparison of this semi-supervised method with the original supervised MODL approach is presented. We demonstrate that the semi-supervised approach is asymptotically equivalent to the supervised approach, improved with a post-optimization of the intervals bounds location.


Bayesian Semi-supervised Discretization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berger J (2006) The case of objective Bayesian analysis. Bayesian Anal 1(3): 385–402MathSciNetGoogle Scholar
  2. 2.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: COLT ’98: Proceedings of the eleventh annual conference on Computational learning theory. ACM Press, New York, pp 92–100Google Scholar
  3. 3.
    Boullé M (2005) A Bayes optimal approach for partitioning the values of categorical attributes. J Mach Learn Res 6: 1431–1452MathSciNetGoogle Scholar
  4. 4.
    Boullé M (2006) MODL: a Bayes optimal discretization method for continuous attributes. Mach Learn 65(1): 131–165CrossRefGoogle Scholar
  5. 5.
    Catlett J (1991) On changing continuous attributes into ordered discrete attributes. In: EWSL-91: Proceedings of the European working session on learning on machine learning. Springer, New York, pp 164–178Google Scholar
  6. 6.
    Chapelle O, Schölkopf B, Zien A (2007) Semi-supervised learning. MIT Press, CambridgeGoogle Scholar
  7. 7.
    Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: International conference on machine learning, pp 194–202Google Scholar
  8. 8.
    Fawcett T (2003) Roc graphs: notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, HP Labs.
  9. 9.
    Fayyad U, Irani K (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8: 87–102zbMATHGoogle Scholar
  10. 10.
    Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery: an overview. Adv Knowl Discov Data Min 1–34Google Scholar
  11. 11.
    Fujino A, Ueda N, Saito K (2007) A hybrid generative/discriminative approach to text classification with additional information. Inf Process Manage 43: 379–392CrossRefGoogle Scholar
  12. 12.
    Holte R (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11: 63–91zbMATHCrossRefGoogle Scholar
  13. 13.
    Jin R, Breitbart Y, Muoh C. (2009) Data discretization unification. Knowl Inf Syst 19: 1–29CrossRefGoogle Scholar
  14. 14.
    Kohavi R, Sahami M (1996) Error-based and entropy-based discretization of continuous features. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 114–119Google Scholar
  15. 15.
    Langley P, Iba W, Thomas K (1992) An analysis of Bayesian classifiers. In: Press A (ed) Tenth national conference on artificial intelligence, pp 223–228Google Scholar
  16. 16.
    Liu H, Hussain F, Tan C, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6(4): 393–423CrossRefMathSciNetGoogle Scholar
  17. 17.
    Maeireizo B, Litman D, Hwa R (2004) Analyzing the effectiveness and applicability of co-training. In: ACL ’04: the companion proceedings of the 42nd annual meeting of the association for computational linguisticsGoogle Scholar
  18. 18.
    Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine.
  19. 19.
    Pyle D (1999) Data preparation for data mining. Morgan Kaufmann , San Francisco, p 19Google Scholar
  20. 20.
    Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471zbMATHCrossRefGoogle Scholar
  21. 21.
    Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. In: Seventh IEEE workshop on applications of computer visionGoogle Scholar
  22. 22.
    Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–MadisonGoogle Scholar
  23. 23.
    Shannon C (1948) A mathematical theory of communication. Key papers in the development of information theoryGoogle Scholar
  24. 24.
    Sugiyama M, Krauledat M, Müller K (2007) Covariate shift adaptation by importance weighted cross validation. J Mach Learn Res 8: 985–1005Google Scholar
  25. 25.
    Sugiyama M, Müller K (2005) Model selection under covariate shift. In: ICANN, International conference on computational on artificial neural networks: formal models and their applicationsGoogle Scholar
  26. 26.
    Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PY, Zhou Z, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1)Google Scholar
  27. 27.
    Zhou ZH, Li M (2009) Semi-supervised learning by disagreement. Knowl Inf Syst doi: 10.1007/s10115-009-0209-z
  28. 28.
    Zighed D, Rakotomalala R (2000) Graphes d’induction. Hermes, FranceGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  1. 1.EDF R&D (ICAME/SOAD)ClamartFrance
  2. 2.ORANGE LABS (TECH/EASY/TSI)LannionFrance

Personalised recommendations