Optimal Bayesian 2D-Discretization for Variable Ranking in Regression

  • Marc Boullé
  • Carine Hue
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4265)


In supervised machine learning, variable ranking aims at sorting the input variables according to their relevance w.r.t. an output variable. In this paper, we propose a new relevance criterion for variable ranking in a regression problem with a large number of variables. This criterion comes from a discretization of both input and output variables, derived as an extension of a Bayesian non parametric discretization method for the classification case. For that, we introduce a family of discretization grid models and a prior distribution defined on this model space. For this prior, we then derive the exact Bayesian model selection criterion. The obtained most probable grid-partition of the data emphasizes the relation (or the absence of relation) between inputs and output and provides a ranking criterion for the input variables. Preliminary experiments both on synthetic and real data demonstrate the criterion capacity to select the most relevant variables and to improve a regression tree.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BFOS84]
    Breiman, L., Friedman, J.H., Olshen, A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)zbMATHGoogle Scholar
  2. [Bou06]
    Boullé, M.: Modl: A Bayes optimal discretization method for continuous attributes. Machine Learning (accepted for publication, 2006)Google Scholar
  3. [Cat91]
    Catlett, J.: On changing continuous variables into ordered discrete variables. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482. Springer, Heidelberg (1991)CrossRefGoogle Scholar
  4. [CCK+00]
    Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: Crisp-dm 1.0: step-by-step data mining guide. Applied Statistics Algorithms (2000)Google Scholar
  5. [DKS95]
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202 (1995)Google Scholar
  6. [DNM98]
    Blake, C.L., Newman, D.J., Hettich, S., Merz, C.J.: UCI repository of machine learning databases (1998)Google Scholar
  7. [Fis36]
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annual Eugenics 7 (1936)Google Scholar
  8. [GE03]
    Guyon, I., Elissef, A.: An introduction to variable and feature selection. Journal of Machine Learning Research (2003)Google Scholar
  9. [Hol93]
    Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning (1993)Google Scholar
  10. [LG99]
    Leray, Ph., Gallinari, P.: Feature selection with neural networks. Behaviormetrika (1999)Google Scholar
  11. [LHTD02]
    Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery (2002)Google Scholar
  12. [Ris78]
    Rissanen, J.: Modeling by shortest data description. Automatica (1978)Google Scholar
  13. [Sch78]
    Schwarz, G.: Estimating the dimension of a model. Annals of Statistics (1978)Google Scholar
  14. [Sha48]
    Shannon, C.E.: A mathematical theory of communication. Bell systems technical journal (1948)Google Scholar
  15. [VL00]
    Vitanyi, P.M.B., Li, M.: Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Trans. Inform. Theory (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Marc Boullé
    • 1
  • Carine Hue
    • 1
  1. 1.France Télécom R&D Lannion 

Personalised recommendations