Machine Learning

, Volume 107, Issue 2, pp 333–355 | Cite as

Learning data discretization via convex optimization

  • Vojtech Franc
  • Ondrej Fikar
  • Karel Bartos
  • Michal Sofka


Discretization of continuous input functions into piecewise constant or piecewise linear approximations is needed in many mathematical modeling problems. It has been shown that choosing the length of the piecewise segments adaptively based on data samples leads to improved accuracy of the subsequent processing such as classification. Traditional approaches are often tied to a particular classification model which results in local greedy optimization of a criterion function. This paper proposes a technique for learning the discretization parameters along with the parameters of a decision function in a convex optimization of the true objective. The general formulation is applicable to a wide range of learning problems. Empirical evaluation demonstrates that the proposed convex algorithms yield models with fewer number of parameters with comparable or better accuracy than the existing methods.


Piecewise constant embedding Piecewise linear embedding Parameter discretization Convex optimization Classification Histograms 



VF was supported by Czech Science Foundation Grant 16-05872S. OF was supported by the internal CTU Funding SGS17/185/OHK3/3T/13.


  1. Bartos, K., & Sofka, M. (2015). Robust representation for domain adaptation in network security. In In proceedings of ECML/PKDD, volume 3, (pp. 116–132).Google Scholar
  2. Bhatt, R., & Dhall, A. (2010). Skin segmentation dataset. UCI Machine Learning Repository.
  3. Boullé, M. (2006). Modl: A bayes optimal discretization method for continuous attributes. Machine Learning, 65(1), 131–165.CrossRefGoogle Scholar
  4. Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge: Cambridge University Press.CrossRefMATHGoogle Scholar
  5. Candes, E., Romberg, J., & Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8), 1207–1223.MathSciNetCrossRefMATHGoogle Scholar
  6. Candes, E., & Tao, T. (2005). Decoding by linear programming. IEEE Transactions on Infromation Theory, 51(12), 4203–4215.MathSciNetCrossRefMATHGoogle Scholar
  7. Chapelle, O., Haffner, P., & Vapnik, V. N. (1999). Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5), 1055–1064.CrossRefGoogle Scholar
  8. Cios, K., Pedrycz, W., Swiniarski, R., & Kurgan, L. (2007). Data Mining: A Knowledge Discovery Approach. Berlin: Springer.MATHGoogle Scholar
  9. Dalal, N., & Triggs, B. (2005). Histogram of oriented gradients for human detection. In Proceedings of computer vision and pattern recognition, volume 1, (pp. 886–893).Google Scholar
  10. Donoho, D. (2006). Compressed sensing. IEEE Transactions on Infromation Theory, 52(4), 1289–1306.MathSciNetCrossRefMATHGoogle Scholar
  11. Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of international conference on machine learning, Morgan Kaufmann, (pp. 194–202).Google Scholar
  12. Elomaa, T., & Rousu, J. (1999). General and efficient multisplitting of numerical attributes. Machine Learning, 36(3), 201–244.CrossRefMATHGoogle Scholar
  13. Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 87–102.MATHGoogle Scholar
  14. Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of international joint conference on artificial intelligence, (pp. 1022–1029).Google Scholar
  15. Franc, V., & Sonneburg, S. (2009). Optimized cutting plane algorithm for large-scale risk minimization. Journal of Machine Learning Research, 10, 2157–2232.MathSciNetMATHGoogle Scholar
  16. Friedman, N., & Goldszmidt, M. (1996). Discretizing continuous attributes while learning bayesian networks. In Proceedings of international conference on machine learning, (pp. 157–165).Google Scholar
  17. García, S., Luengo, J., Sáez, J. A., López, V., & Herrera, F. (2013). A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734–750.CrossRefGoogle Scholar
  18. Hue, C., & Boullé, M. (2007). A new probabilistic approach in rank regression with optimal bayesian partitioning. Journal of Machine Learning Research, 8, 2727–2754.MATHGoogle Scholar
  19. Johnson, B. A., Tateishi, R., & Hoan, N. T. (2013). A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees. International Journal of Remote Sensing, 34(20), 6969–6982.CrossRefGoogle Scholar
  20. Kerber, R. (1992). Chimerge: Discretization of numeric attributes. In Proceedings of the tenth national conference on artificial intelligence, AAAI’92, (pp. 123–128).Google Scholar
  21. Lichman, M. (2013). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.Google Scholar
  22. Liu, H., Hussain, F., Tan, C. L., & Dash, M. (2002). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4), 393–423.MathSciNetCrossRefGoogle Scholar
  23. Pele, O., Taskar, B., Globerson, A., & Werman, M. (2013). The pairwise piecewise-linear embedding for efficient non-linear classification. In Proceedings of the international conference on machine learning, (pp. 205–213).Google Scholar
  24. Rao, C. (2005). Data mining and data visualization. In C. R. Rao, E. J. Wegman, & J. L. Solka (Eds.), Handbook of Statistics, volume 24. Newyork: Elsevier.Google Scholar
  25. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. London, New York: Chapman & Hall.Google Scholar
  26. Yang, Y., & Webb, G. I. (2008). Discretization for naive–bayes learning: Managing discretization bias and variance. Machine Learning, 74(1), 39–74.CrossRefGoogle Scholar
  27. Yeh, I.-C., Yang, K.-J., & Ting, T.-M. (2008). Knowledge discovery on RFM model using bernoulli sequence. Expert Systems with Applications, 36(3), 5866–5871.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Faculty of Electrical EngineeringCzech Technical University in PraguePrague 6Czech Republic
  2. 2.Cisco Systems (Czech Republic)Praha 2Czech Republic

Personalised recommendations