Feature Discretization with Relevance and Mutual Information Criteria

  • Artur J. Ferreira
  • Mário A. T. Figueiredo
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 318)


Feature discretization (FD) techniques often yield adequate and compact representations of the data, suitable for machine learning and pattern recognition problems. These representations usually decrease the training time, yielding higher classification accuracy while allowing for humans to better understand and visualize the data, as compared to the use of the original features. This paper proposes two new FD techniques. The first one is based on the well-known Linde-Buzo-Gray quantization algorithm, coupled with a relevance criterion, being able perform unsupervised, supervised, or semi-supervised discretization. The second technique works in supervised mode, being based on the maximization of the mutual information between each discrete feature and the class label. Our experimental results on standard benchmark datasets show that these techniques scale up to high-dimensional data, attaining in many cases better accuracy than existing unsupervised and supervised FD approaches, while using fewer discretization intervals.


Classification Feature discretization Linde-Buzo-Gray Mutual information Quantization Relevance Supervised learning 


  1. 1.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks 5, 537–550 (1994)CrossRefGoogle Scholar
  2. 2.
    Brown, G., Pocock, A., Zhao, M., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)MathSciNetMATHGoogle Scholar
  3. 3.
    Chiu, D., Wong, A., Cheung, B.: Information discovery through hierarchical maximum entropy discretization and synthesis. In: Proceedings of the Knowledge Discovery in Databases, pp. 125–140 (1991)Google Scholar
  4. 4.
    Cover, T., Thomas, J.: Elements of Information Theory. Wiley, Hoboken (1991)CrossRefMATHGoogle Scholar
  5. 5.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: International Conference on Machine Learning (ICML), pp. 194–202 (1995)Google Scholar
  6. 6.
    Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of International Joint Conference on Artifficial Intelligence (IJCAI), pp. 1022–1027 (1993)Google Scholar
  7. 7.
    Ferreira, A., Figueiredo, M.: An unsupervised approach to feature discretization and selection. Pattern Recog. 45, 3048–3060 (2012)CrossRefGoogle Scholar
  8. 8.
    Frank, A., Asuncion, A.: UCI machine learning repository, available at (2010)
  9. 9.
    Garcia, S., Luengo, J., Saez, J., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)CrossRefGoogle Scholar
  10. 10.
    Hellman, M.: Probability of error, equivocation, and the Chernoff bound. IEEE Trans. Inf. Theory 16(4), 368–372 (1970)CrossRefMathSciNetMATHGoogle Scholar
  11. 11.
    Jin, R., Breitbart, Y., Muoh, C.: Data discretization unification. Knowl. Inf. Syst. 19(1), 1–29 (2009)CrossRefGoogle Scholar
  12. 12.
    Kononenko, I.: On biases in estimating multi-valued attributes. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 1034–1040 (1995)Google Scholar
  13. 13.
    Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)Google Scholar
  14. 14.
    Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)CrossRefGoogle Scholar
  15. 15.
    Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Trans. Commun. 28, 84–94 (1980)CrossRefGoogle Scholar
  16. 16.
    Liu, H., Hussain, F., Tan, C., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6(4), 393–423 (2002)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Principe, J.: Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives, 1st edn. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Santhi, N., Vardy, A.: On an improvement over Rényi’s equivocation bound. In: 44-th Annual Allerton Conference on Communication, Control, and Computing (2006)Google Scholar
  19. 19.
    Tsai, C.-J., Lee, C.-I., Yang, W.-P.: A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci. 178, 714–731 (2008)CrossRefGoogle Scholar
  20. 20.
    Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Morgan Kauffmann, Burlington (2005)Google Scholar
  21. 21.
    Yang, Y., Webb, G.: Proportional k-interval discretization for naïve-Bayes classifiers. In: 12th European Conference on Machine Learning, (ECML), pp. 564–575 (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Artur J. Ferreira
    • 1
    • 3
  • Mário A. T. Figueiredo
    • 2
    • 3
  1. 1.ADEETCInstituto Superior de Engenharia de LisboaLisbonPortugal
  2. 2.Instituto Superior Tecnico Instituto de Telecomunicacoes - Torre NorteLisbonPortugal
  3. 3.Instituto de TelecomunicaçõesLisbonPortugal

Personalised recommendations