Optimal Bayesian 2D-Discretization for Variable Ranking in Regression
In supervised machine learning, variable ranking aims at sorting the input variables according to their relevance w.r.t. an output variable. In this paper, we propose a new relevance criterion for variable ranking in a regression problem with a large number of variables. This criterion comes from a discretization of both input and output variables, derived as an extension of a Bayesian non parametric discretization method for the classification case. For that, we introduce a family of discretization grid models and a prior distribution defined on this model space. For this prior, we then derive the exact Bayesian model selection criterion. The obtained most probable grid-partition of the data emphasizes the relation (or the absence of relation) between inputs and output and provides a ranking criterion for the input variables. Preliminary experiments both on synthetic and real data demonstrate the criterion capacity to select the most relevant variables and to improve a regression tree.
Unable to display preview. Download preview PDF.
- [Bou06]Boullé, M.: Modl: A Bayes optimal discretization method for continuous attributes. Machine Learning (accepted for publication, 2006)Google Scholar
- [CCK+00]Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: Crisp-dm 1.0: step-by-step data mining guide. Applied Statistics Algorithms (2000)Google Scholar
- [DKS95]Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202 (1995)Google Scholar
- [DNM98]Blake, C.L., Newman, D.J., Hettich, S., Merz, C.J.: UCI repository of machine learning databases (1998)Google Scholar
- [Fis36]Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annual Eugenics 7 (1936)Google Scholar
- [GE03]Guyon, I., Elissef, A.: An introduction to variable and feature selection. Journal of Machine Learning Research (2003)Google Scholar
- [Hol93]Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning (1993)Google Scholar
- [LG99]Leray, Ph., Gallinari, P.: Feature selection with neural networks. Behaviormetrika (1999)Google Scholar
- [LHTD02]Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique. Data Mining and Knowledge Discovery (2002)Google Scholar
- [Ris78]Rissanen, J.: Modeling by shortest data description. Automatica (1978)Google Scholar
- [Sch78]Schwarz, G.: Estimating the dimension of a model. Annals of Statistics (1978)Google Scholar
- [Sha48]Shannon, C.E.: A mathematical theory of communication. Bell systems technical journal (1948)Google Scholar
- [VL00]Vitanyi, P.M.B., Li, M.: Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Trans. Inform. Theory (2000)Google Scholar