Mathematical Programming Computation

, Volume 10, Issue 4, pp 659–702 | Cite as

Learning customized and optimized lists of rules with mathematical programming

  • Cynthia RudinEmail author
  • Şeyda Ertekin
Full Length Paper


We introduce a mathematical programming approach to building rule lists, which are a type of interpretable, nonlinear, and logical machine learning classifier involving IF-THEN rules. Unlike traditional decision tree algorithms like CART and C5.0, this method does not use greedy splitting and pruning. Instead, it aims to fully optimize a combination of accuracy and sparsity, obeying user-defined constraints. This method is useful for producing non-black-box predictive models, and has the benefit of a clear user-defined tradeoff between training accuracy and sparsity. The flexible framework of mathematical programming allows users to create customized models with a provable guarantee of optimality. The software reviewed as part of this submission was given the DOI (Digital Object Identifier)


Mixed-integer programming Decision trees Decision lists Sparsity Interpretable modeling Associative classification 68T05—Computer Science Artificial intelligence Learning and adaptive systems 

Mathematics Subject Classification

68T05 Learning and adaptive systems 90C11 Mixed integer programming 62-04 Explicit machine computation and programs (not the theory of computation or programming) 



We gratefully acknowledge funding from the MIT Big Data Initiative, and the National Science Foundation under grant IIS-1053407. Thanks to Daniel Bienstock and anonymous reviewers for encouragement and for helping us to improve the readability of the manuscript.


  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases, pp. 487–499 (1994)Google Scholar
  2. 2.
    Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., Rudin, C.: Learning certifiably optimal rule lists for categorical data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2017)Google Scholar
  3. 3.
    Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., Rudin, C.: Learning certifiably optimal rule lists for categorical data. J. Mach. Learn. Res. 18, 1–78 (2018)Google Scholar
  4. 4.
    Anthony, M.: Decision lists. Tech. rep., CDAM Research Report LSE-CDAM-2005-23 (2005)Google Scholar
  5. 5.
    Bache, K., Lichman, M.: UCI machine learning repository. (2013)
  6. 6.
    Bayardo, R.J., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)Google Scholar
  7. 7.
    Bennett, K.P., Blue, J.A.: Optimal decision trees. Tech. rep., R.P.I. Math Report No. 214, Rensselaer Polytechnic Institute (1996)Google Scholar
  8. 8.
    Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 7, 1039–1082 (2017)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Boros, E., Hammer, P.L., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Trans. Knowl. Data Eng. 12(2), 292–306 (2000)CrossRefGoogle Scholar
  10. 10.
    Breiman, L.: Random forests. Mach Learn 45(1), 5–32 (2001)CrossRefGoogle Scholar
  11. 11.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)zbMATHGoogle Scholar
  12. 12.
    Chang, A.: Integer optimization methods for machine learning. Ph.D. thesis, Massachusetts Institute of Technology (2012)Google Scholar
  13. 13.
    Chen, C., Rudin, C.: An optimization approach to learning falling rule lists. In: Proceedings of Artificial Intelligence and Statistics (AISTATS) (2018)Google Scholar
  14. 14.
    Chipman, H.A., George, E.I., McCulloch, R.E.: Bayesian CART model search. J. Am. Stat. Assoc. 93(443), 935–948 (1998)CrossRefGoogle Scholar
  15. 15.
    Cieslak, D.A., Chawla, N.V.: Learning decision trees for unbalanced data. In: Daelemans, W., Goethals, B., Morik, K. (eds.) Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 5211, pp. 241–256. Springer, Berlin (2008)CrossRefGoogle Scholar
  16. 16.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann (1995)Google Scholar
  17. 17.
    Cusick, G.R., Courtney, M.E., Havlicek, J., Hess, N.: Crime during the transition to adulthood: how youth fare as they leave out-of-home care. National Institute of Justice, Office of Justice Programs, US Department of Justice (2010)Google Scholar
  18. 18.
    Dobkin, D., Fulton, T., Gunopulos, D., Kasif, S., Salzberg, S.: Induction of shallow decision trees (1996)Google Scholar
  19. 19.
    Farhangfar, A., Greiner, R., Zinkevich, M.: A fast way to produce optimal fixed-depth decision trees. In: International Symposium on Artificial Intelligence and Mathematics (ISAIM 2008), Fort Lauderdale, Florida, USA, January 2–4 (2008)Google Scholar
  20. 20.
    Fawcett, T.: Prie: a system for generating rulelists to maximize roc performance. Data Min. Knowl. Discov. 17(2), 207–224 (2008)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Freitas, A.A.: Comprehensible classification models: a position paper. ACM SIGKDD Explor. Newsl. 15(1), 1–10 (2014)CrossRefGoogle Scholar
  22. 22.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2(3), 916–954 (2008)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (2006). CrossRefGoogle Scholar
  25. 25.
    Goethals, B.: Survey on frequent pattern mining. Tech. rep., Helsinki Institute for Information Technology (2003)Google Scholar
  26. 26.
    Goh, S.T., Rudin, C.: Box drawings for learning with imbalanced data. In: Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2014)Google Scholar
  27. 27.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). CrossRefGoogle Scholar
  28. 28.
    Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15, 55–86 (2007)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Hata, I., Veloso, A., Ziviani, N.: Learning accurate and interpretable classifiers using optimal multi-criteria rules. J. Inf. Data Manag. 4(3) (2013)Google Scholar
  30. 30.
    Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining: a general survey and comparison. SIGKDD Explor. 2, 58–64 (2000)CrossRefGoogle Scholar
  31. 31.
    Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., Baesens, B.: An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decis. Support Syst. 51(1), 141–154 (2011)CrossRefGoogle Scholar
  32. 32.
    Jennings, D.L., Amabile, T.M., Ross, L.: Informal covariation assessments: Data-based versus theory-based judgements. In: Kahneman, D., Slovic, P., Tversky, A. (eds.) Judgment Under Uncertainty: Heuristics and Biases, pp. 211–230. Cambridge Press, Cambridge (1982)CrossRefGoogle Scholar
  33. 33.
    Klivans, A.R., Servedio, R.A.: Toward attribute efficient learning of decision lists and parities. J. Mach. Learn. Res. 7, 587–602 (2006)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Kuhn, M., Weston, S., Coulter, N.: C50: C5.0 Decision Trees and Rule-Based Models, C Code for C5.0 by R. Quinlan. r package version 0.1.0-013 (2012)
  35. 35.
    Lakkaraju, H., Rudin, C.: Learning cost effective and interpretable treatment regimes in the form of rule lists. In: Proceedings of Artificial Intelligence and Statistics (AISTATS) (2017)Google Scholar
  36. 36.
    Leondes, C.T.: Expert Systems: The Technology of Knowledge Management and Decision Making for the 21st Century. Academic Press, London (2002)Google Scholar
  37. 37.
    Letham, B., Rudin, C., McCormick, T.H., Madigan, D.: Interpretable classifiers using rules and bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9(3), 1350–1371 (2015)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. IEEE International Conference on Data Mining, pp. 369–376 (2001)Google Scholar
  39. 39.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 80–96 (1998)Google Scholar
  40. 40.
    Long, P.M., Servedio, R.A.: Attribute-efficient learning of decision lists and linear threshold functions under unconcentrated distributions. Adv. Neural Inf. Process. Syst. 19, 921–928 (2007)Google Scholar
  41. 41.
    Malioutov, D., Varshney, K.: Exact rule learning via boolean compressed sensing. In: Proceedings of The 30th International Conference on Machine Learning, pp. 765–773 (2013)Google Scholar
  42. 42.
    Marchand, M., Sokolova, M.: Learning with decision lists of data-dependent features. J. Mach. Learn. Res. 6, 427–451 (2005)MathSciNetzbMATHGoogle Scholar
  43. 43.
    McCormick, T.H., Rudin, C., Madigan, D.: Bayesian hierarchical modeling for predicting medical conditions. Ann. Appl. Stat. 6(2), 652–668 (2012)MathSciNetCrossRefGoogle Scholar
  44. 44.
    McGarry, K.: A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev. 20, 39–61 (2005)CrossRefGoogle Scholar
  45. 45.
    Meinshausen, N.: Node harvest. Ann. Appl. Stat. 4(4), 2049–2072 (2010)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Miller, G.A.: The magical number seven, plus or minus two: Some limits to our capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956)CrossRefGoogle Scholar
  47. 47.
    Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Log. Program. 19, 629–679 (1994)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Naumov, G.: NP-completeness of problems of construction of optimal decision trees. Sov. Phys. Dokl. 36(4), 270–271 (1991)zbMATHGoogle Scholar
  49. 49.
    Nijssen, S., Fromont, E.: Mining optimal decision trees from itemset lattices. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2007)Google Scholar
  50. 50.
    Nijssen, S., Fromont, E.: Optimal constraint-based decision tree induction from itemset lattices. Data Min. Knowl. Discov. 21(1), 9–51 (2010)MathSciNetCrossRefGoogle Scholar
  51. 51.
    Norouzi, M., Collins, M., Johnson, M.A., Fleet, D.J., Kohli, P.: Efficient non-greedy optimization of decision trees. Adv. Neural Inf. Process. Syst. 28, 1729–1737 (2015)Google Scholar
  52. 52.
    Plate, T.A.: Accuracy versus interpretability in flexible modeling: implementing a tradeoff using gaussian process models. Behaviormetrika 26, 29–50 (1999)CrossRefGoogle Scholar
  53. 53.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993)Google Scholar
  54. 54.
    Ridgeway, G.: The pitfalls of prediction. NIJ J. Natl. Inst. Justice 271, 34–40 (2013)Google Scholar
  55. 55.
    Rivest, R.L.: Learning decision lists. Mach. Learn. 2(3), 229–246 (1987)Google Scholar
  56. 56.
    Rückert, U.: A statistical approach to rule learning. Ph.D. thesis, Technischen Universität München (2008)Google Scholar
  57. 57.
    Rudin, C., Letham, B., Salleb-Aouissi, A., Kogan, E., Madigan, D.: Sequential event prediction with association rules. In: Proceedings of the 24th Annual Conference on Learning Theory (COLT) (2011)Google Scholar
  58. 58.
    Rudin, C., Letham, B., Madigan, D.: Learning theory analysis for association rules and sequential event prediction. J. Mach. Learn. Res. 14, 3384–3436 (2013)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Rüping, S.: Learning interpretable models. Ph.D. thesis, Universität Dortmund (2006)Google Scholar
  60. 60.
    Simon, G.J., Kumar, V., Li, P.W.: A simple statistical model and association rule filtering for classification. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 823–831 (2011)Google Scholar
  61. 61.
    Su, G., Wei, D., Varshney, K.R., Malioutov, D.M.: Interpretable two-level boolean rule learning for classification. In: ICML Workshop on Human Interpretability in Machine Learning (WHI 2016) (2016). arXiv:1606.05798
  62. 62.
    Tan, P.N., Kumar, V.: Interestingness measures for association patterns: a perspective. Tech. rep., Department of Computer Science, University of Minnesota (2000)Google Scholar
  63. 63.
    Thabtah, F.: A review of associative classification mining. Knowl. Eng. Rev. 22, 37–65 (2007)CrossRefGoogle Scholar
  64. 64.
    Ustun, B., Rudin, C.: Supersparse linear integer models for optimized medical scoring systems. Mach. Learn. 102(3), 349–391 (2016)MathSciNetCrossRefGoogle Scholar
  65. 65.
    Ustun, B., Rudin, C.: Optimized risk scores. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2017)Google Scholar
  66. 66.
    Vanhoof, K., Depaire, B.: Structure of association rule classifiers: a review. In: Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering (ISKE), pp. 9–12 (2010)Google Scholar
  67. 67.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  68. 68.
    Vellido, A., Martín-Guerrero, J.D., Lisboa, P.J.: Making machine learning models interpretable. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2012)Google Scholar
  69. 69.
    Verwer, S., Zhang, Y.: Learning decision trees with flexible constraints and objectives using integer optimization In: Salvagnin, D., Lombardi, M. (eds.) Integration of AI and OR Techniques in Constraint Programming. CPAIOR 2017. Lecture Notes in Computer Science, vol. 10335, pp 94–103. Springer (2017)Google Scholar
  70. 70.
    Wang, F., Rudin, C.: Falling rule lists. In: Proceedings of Artificial Intelligence and Statistics (AISTATS) (2015)Google Scholar
  71. 71.
    Wang, T., Rudin, C., Doshi-Velez, F., Liu, Y., Klampfl, E., MacNeille, P.: A Bayesian framework for learning rule sets for interpretable classification. J. Mach. Learn. Res. 18(70), 1–37 (2017)MathSciNetzbMATHGoogle Scholar
  72. 72.
    Wu, Y., Tjelmeland, H., West, M.: Bayesian CART: prior specification and posterior simulation. J. Comput. Graph. Stat. 16(1), 44–66 (2007)MathSciNetCrossRefGoogle Scholar
  73. 73.
    Yang, H., Rudin, C., Seltzer, M.: Scalable Bayesian rule lists. In: Proceedings of the 34th International Conference on Machine Learning (ICML) (2017)Google Scholar
  74. 74.
    Yin, X., Han, J.: CPAR: Classification based on predictive association rules. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 331–335 (2003)CrossRefGoogle Scholar
  75. 75.
    Zeng, J., Ustun, B., Rudin, C.: Interpretable classification models for recidivism prediction. J. R. Stat. Soc. Ser. A (Stat. Soc.) 180(3), 689–722 (2017)MathSciNetCrossRefGoogle Scholar
  76. 76.
    Zhang, Y., Laber, E.B., Tsiatis, A., Davidian, M.: Using decision lists to construct interpretable and parsimonious treatment regimes. Biometrics 71(4), 895–904 (2015)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and The Mathematical Programming Society 2018

Authors and Affiliations

  1. 1.Departments of Computer Science, Electrical and Computer Engineering, and Statistical ScienceDuke UniversityDurhamUSA
  2. 2.Department of Computer EngineeringMiddle Eastern Technical UniversityAnkaraTurkey
  3. 3.MIT Sloan School of ManagementMassachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations