Abstract
Decision trees are a popular classification model in machine learning due to their interpretability and performance. Decision-tree classifiers are traditionally constructed using greedy heuristic algorithms that do not provide guarantees regarding the quality of the resultant trees. In contrast, a recent line of work employed exact optimization techniques to construct optimal decision-tree classifiers. However, most of these approaches are designed for datasets with binary features. While numeric and categorical features can be transformed into binary features, this transformation can introduce a large number of binary features and may not be efficient in practice. In this work, we present a SAT-based encoding for decision trees that directly supports non-binary data and use it to solve two well-studied variants of the optimal decision tree problem. Furthermore, we extend our approach to support cost-sensitive learning of optimal decision trees and introduce tree pruning constraints to reduce overfitting. We perform extensive experiments based on real-world and synthetic datasets that show that our approach obtains superior performance to state-of-the-art exact techniques on non-binary datasets and has significantly smaller memory consumption. We also show that our extension for cost-sensitive learning and our tree pruning constraints can help improve the prediction quality on unseen test data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The paper suggests converting numeric features to categorical features with limited number of categories via thresholding [16], however this conversion does not preserve the optimality of solutions w.r.t. the original numeric values.
Note that since |X| is fixed, maximizing the number of correctly classified training examples is identical to maximizing the accuracy in Definition (6).
Note that the root is by definition a branching node (Definition 1). However, in this procedure we treat it as a leaf node at the start. Since for any non-trivial dataset this procedure is guaranteed to at least iterate once, we are guaranteed to replace the leaf root node with a branching root node.
Some categorical features induce a natural ordering and can therefore be represented as numeric features. For example, a categorical feature with the categories {Low, Medium, High} can be transformed to a numeric feature with the values {1, 2, 3}.
Similar to \(\alpha \), a well-formedness condition on \(\alpha _C\) would dictate that \(\forall t\in \Pi ^C: \alpha _C(t)\subseteq dom(\beta (t))\).
Note that we could add clauses that guarantee that at least one category will go to the right, however it does not provide significant pruning and we therefore opted not to add them.
Note that we can similarly replace the degenerate node with its left child, however the resultant tree will not be a complete tree (Definition 3).
Code obtained from https://gepgitlab.laas.fr/hhu/maxsat-decision-trees.
Code obtained from https://bitbucket.org/helene_verhaeghe/classificationtree.
Code obtained from https://github.com/aglingael/dl8.5.
Code obtained from https://github.com/FlorentAvellaneda/InferDT.
In our encoding binary features are numeric features that take one of two possible values, however we list them separately in Table 1 as they are supported by the baseline methods without transformation.
Note that all approaches explore the same space of (feasible and) optimal decision-tree solutions.
References
Aghaei, S., Azizi, M.J., & Vayanos, P. (2019). Learning optimal and fair decision trees for non-discriminative decision-making. In: AAAI Conference on artificial intelligence (AAAI) (pp. 1418–1426)
Aglin, G., Nijssen, S., & Schaus, P. (2020). Learning optimal decision trees using caching branch-and-bound search. In: AAAI Conference on Artificial Intelligence (AAAI) (pp. 3146–3153)
Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., & Rudin, C. (2018). Learning certifiably optimal rule lists for categorical data. Journal of Machine Learning Research, 18, 1–78.
Avellaneda, F. (2020). Efficient inference of optimal decision trees. In: AAAI Conference on artificial intelligence (AAAI) (pp. 3195–3202)
Bennett, K. P. (1994). Global tree optimization: A non-greedy decision tree algorithm. Journal of Computing Science and Statistics, 26, 156–160.
Berg, J., Demirović, E., & Stuckey, P.J. (2019). Core-boosted linear search for incomplete MaxSAT. In: International conference on integration of constraint programming, artificial intelligence, and operations research (CPAIOR) (pp. 39–56). Springer
Bertsimas, D., & Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7), 1039–1082.
Bessiere, C., Hebrard, E., & O’Sullivan, B. (2009). Minimising decision tree size as combinatorial optimisation. In: International conference on principles and practice of constraint programming (CP) (pp. 173–187). Springer
Biere, A., Heule, M., & van Maaren, H. (2009). Handbook of satisfiability, vol. 185. IOS press
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software
Cabodi, G., Camurati, P.E., Ignatiev, A., Marques-Silva, J., Palena, M., & Pasini, P. (2021). Optimizing binary decision diagrams for interpretable machine learning classification. In: 2021 Design, automation & test in europe conference & exhibition (DATE) (pp. 1122–1125). IEEE
Dechter, R., & Mateescu, R. (2004). The impact of and/or search spaces on constraint satisfaction and counting. In: International conference on principles and practice of constraint programming (CP) (pp. 731–736)
Dua, D., & Graff, C. (2017). UCI machine learning repository. http://archive.ics.uci.edu/ml
Eén, N., & Sörensson, N. (2003). Minisat SAT solver. http://minisat.se/Main.html
Fu, Z., & Malik, S. (2006). On solving the partial MAX-SAT problem. In: International conference on theory and applications of satisfiability testing (SAT) (pp. 252–265). Springer
Günlük, O., Kalagnanam, J., Li, M., Menickelly, M., & Scheinberg, K. (2021). Optimal decision trees for categorical data via integer programming. Journal of Global Optimization, 1–28
Guyon, I. (2003). Design of experiments of the nips 2003 variable selection benchmark. In: NIPS 2003 workshop on feature extraction and feature selection, vol. 253
Guyon, I., Bennett, K., Cawley, G., Escalante, H.J., Escalera, S., Ho, T.K., Macià, N., Ray, B., Saeed, M., & Statnikov, A., et al. (2015). Design of the 2015 chalearn automl challenge. In: 2015 International joint conference on neural networks (IJCNN) (pp. 1–8). IEEE
Hancock, T., Jiang, T., Li, M., & Tromp, J. (1996). Lower bounds on learning decision lists and trees. Information and Computation, 126(2), 114–122.
Hastie, T., Tibshirani, R., Friedman, J.H., & Friedman, J.H. (2009). The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer
Hu, H., Siala, M., Hébrard, E., & Huguet, M.J. (2020). Learning optimal decision trees with MaxSAT and its integration in AdaBoost. In: International joint conference on artificial intelligence and pacific rim international conference on artificial intelligence (IJCAI-PRICAI)
Ignatiev, A., Lam, E., Stuckey, P.J., & Marques-Silva, J. (2021). A scalable two stage approach to computing optimal decision sets. arXiv preprint arXiv:2102.01904
Ignatiev, A., Marques-Silva, J., Narodytska, N., & Stuckey, P.J. (2021). Reasoning-based learning of interpretable ML models. In: International Joint Conference on Artificial Intelligence (IJCAI) p. in press
Ignatiev, A., Pereira, F., Narodytska, N., & Marques-Silva, J. (2018). A sat-based approach to learn explainable decision sets. In: International joint conference on automated reasoning (pp. 627–645). Springer
Janota, M., & Morgado, A. (2020). Sat-based encodings for optimal decision trees with explicit paths. In: International conference on theory and applications of satisfiability testing (pp. 501–518). Springer
Kelleher, J.D., Mac Namee, B., & D’arcy, A. (2020). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT press
Kotsiantis, S. B. (2013). Decision trees: a recent overview. Artificial Intelligence Review, 39(4), 261–283.
Laurent, H., & Rivest, R. L. (1976). Constructing optimal binary decision trees is np-complete. Information Processing Letters, 5(1), 15–17.
Maloof, M.A. (2003). Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets II, (vol. 2, pp. 2–1)
Mosley, L. (2013). A balanced approach to the multi-class imbalance problem. Ph.D. thesis, Iowa State University
Narodytska, N., Ignatiev, A., Pereira, F., Marques-Silva, J., & RAS, I. (2018). Learning optimal decision trees with SAT. In: International joint conference on artificial intelligence (IJCAI) (pp. 1362–1368)
Nijssen, S., & Fromont, E. (2007). Mining optimal decision trees from itemset lattices. In: SIGKDD International conference on knowledge discovery and data mining (KDD) (pp. 530–539)
Nijssen, S., & Fromont, E. (2010). Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery, 21(1), 9–51.
OscaR Team (2012). OscaR: Scala in OR . https://bitbucket.org/oscarlib/oscar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Potdar, K., Pardawala, T. S., & Pai, C. D. (2017). A comparative study of categorical variable encoding techniques for neural network classifiers. International Journal Of Computer Applications, 175(4), 7–9.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Quinlan, J.R. (2014). C4. 5: programs for machine learning. Elsevier
Rudin, C., & Ertekin, Ş. (2018). Learning customized and optimized lists of rules with mathematical programming. Mathematical Programming Computation, 10(4), 659–702.
Schaus, P., Aoga, J.O., & Guns, T. (2017). Coversize: A global constraint for frequency-based itemset mining. In: International conference on principles and practice of constraint programming (CP) (pp. 529–546). Springer
Shati, P., Cohen, E., & McIlraith, S. (2021). Sat-based approach for learning optimal decision trees with non-binary features. In: 27th International Conference on Principles and Practice of Constraint Programming (CP 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Sinz, C. (2005). Towards an optimal cnf encoding of boolean cardinality constraints. In: International conference on principles and practice of constraint programming (pp. 827–831). Springer
Verhaeghe, H., Nijssen, S., Pesant, G., Quimper, C. G., & Schaus, P. (2020). Learning optimal decision trees using constraint programming. Constraints, 25(3), 226–250.
Verwer, S., & Zhang, Y. (2019). Learning optimal classification trees using a binary linear program formulation. In: AAAI Conference on artificial intelligence (AAAI), (pp. 1625–1632)
Weiss, G. M. (2004). Mining with rarity: a unifying framework. ACM Sigkdd Explorations Newsletter, 6(1), 7–19.
Yu, J., Ignatiev, A., Bodic, P.L., & Stuckey, P.J. (2020). Optimal decision lists using sat. arXiv preprint. arXiv:2010.09919
Yu, J., Ignatiev, A., Stuckey, P.J., & Le Bodic, P. (2020). Computing optimal decision sets with sat. In: International conference on principles and practice of constraint programming (CP) (pp. 952–970). Springer
Yu, J., Ignatiev, A., Stuckey, P. J., & Le Bodic, P. (2021). Learning optimal decision sets and lists with sat. Journal of Artificial Intelligence Research, 72, 1251–1279.
Zhou, Z. H., & Liu, X. Y. (2005). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge And Data Engineering, 18(1), 63–77.
Funding
The authors gratefully acknowledge funding from NSERC, the CIFAR AI Chairs program (Vector Institute), and from Microsoft Research
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests/Competing interests
The authors have no relevant conflicts of interest/competing interests to declare
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shati, P., Cohen, E. & McIlraith, S.A. SAT-based optimal classification trees for non-binary data. Constraints 28, 166–202 (2023). https://doi.org/10.1007/s10601-023-09348-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10601-023-09348-1