Machine Learning

, Volume 65, Issue 1, pp 131–165 | Cite as

MODL: A Bayes optimal discretization method for continuous attributes



While real data often comes in mixed format, discrete and continuous, many supervised induction algorithms require discrete data. Efficient discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. In this paper, we propose a new discretization method MODL1, founded on a Bayesian approach. We introduce a space of discretization models and a prior distribution defined on this model space. This results in the definition of a Bayes optimal evaluation criterion of discretizations. We then propose a new super-linear optimization algorithm that manages to find near-optimal discretizations. Extensive comparative experiments both on real and synthetic data demonstrate the high inductive performances obtained by the new discretization method.


Data mining Machine learning Discretization Bayesianism Data analysis 


  1. Bay, S. (2001). Multivariate discretization for set mining. Knowledge and Information Systems, 3(4), 491–512.MATHCrossRefGoogle Scholar
  2. Bertier, P., & Bouroche, J. M. (1981). Analyse des données multidimensionnelles. Presses Universitaires de France.Google Scholar
  3. Blake, C. L., & Merz, C. J. (1998). UCI Repository of machine learning databases []. Irvine, CA: University of California, Department of Information and Computer Science.
  4. Boullé, M. (2003). Khiops: A discretization method of continuous attributes with guaranteed resistance to noise. Proceeding of the Third International Conference on Machine Learning and Data Mining in Pattern Recognition (pp. 50–64).Google Scholar
  5. Boullé, M. (2004). Khiops: A statistical discretization method of continuous attributes. Machine Learning, 55(1), 53–69.MATHCrossRefGoogle Scholar
  6. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. California: Wadsworth International.MATHGoogle Scholar
  7. Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. In Proceedings of the European Working Session on Learning (pp. 87–102). Springer-Verlag.Google Scholar
  8. Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the 12 th International Conference on Machine Learning. (pp. 194–202) San Francisco, CA: Morgan Kaufmann.Google Scholar
  9. Elomaa, T., & Rousu, J. (1996). Finding optimal multi-splits for numerical attributes in decision tree learning. Technical report, NeuroCOLT Technical Report NC-TR-96-041. Royal Holloway, University of London.Google Scholar
  10. Elomaa, T., & Rousu, J. (1999). General and efficient multisplitting of numerical attributes. Machine Learning, 36, 201–244.MATHCrossRefGoogle Scholar
  11. Fayyad, U., & Irani, K. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87–102.MATHGoogle Scholar
  12. Fischer, W. D. (1958). On grouping for maximum of homogeneity. Journal of the American Statistical Association, 53, 789–798.MathSciNetCrossRefGoogle Scholar
  13. Fulton, T., Kasif, S., & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In Proceeding of the Twelfth International Conference on Machine Learning.Google Scholar
  14. Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.MATHCrossRefGoogle Scholar
  15. Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119–127.CrossRefGoogle Scholar
  16. Kerber, R. (1991). Chimerge discretization of numeric attributes. In Proceedings of the 10 th International Conference on Artificial Intelligence (pp. 123–128).Google Scholar
  17. Kohavi, R., & Sahami, M. (1996). Error-based and entropy-based discretization of continuous features. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (pp. 114–119). Menlo Park, CA: AAAI Press/MIT Press.Google Scholar
  18. Kononenko, I., Bratko, I., & Roskar, E. (1984). Experiments in automatic learning of medical diagnostic rules (Technical Report). Ljubljana: Joseph Stefan Institute, Faculty of Electrical Engineering and Computer Science.Google Scholar
  19. Lechevallier, Y. (1990). Recherche d’une partition optimale sous contrainte d’ordre total. Technical report N 1247, INRIA.Google Scholar
  20. Liu, H., Hussain, F., Tan, C. L., & Dash, M. (2002). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4), 393–423.MathSciNetCrossRefGoogle Scholar
  21. Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.Google Scholar
  22. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.MATHCrossRefGoogle Scholar
  23. Vitanyi, P. M. B., & Li, M. (2000). Minimum description length induction, Bayesianism, and Kolmogorov Complexity. IEEE Trans. Inform. Theory, IT-46(2), 446–464.MathSciNetCrossRefGoogle Scholar
  24. Zighed, D. A., Rabaseda, S., & Rakotomalala, R. (1998). Fusinter: A method for discretization of continuous attributes for supervised learning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(33), 307–326.MATHCrossRefGoogle Scholar
  25. Zighed, D. A., Rabaseda, S., Rakotomalala, R., & Feschet F. (1999). Discretization methods in supervised learning. In Encyclopedia of Computer Science and Technology, vol. 40 (pp. 35–50) Marcel Dekker Inc.Google Scholar
  26. Zighed, D. A., & Rakotomalala, R. (2000). Graphes d’induction. (pp. 327–359) HERMES Science Publications.Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.France Telecom R&DLannionFrance

Personalised recommendations