Learning Interpretable Rules for Multi-Label Classification

  • Eneldo Loza MencíaEmail author
  • Johannes Fürnkranz
  • Eyke Hüllermeier
  • Michael Rapp
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.


Multi-label classification Label-dependencies Rule learning Separate-and-conquer 



We would like to thank Frederik Janssen for his contributions to this work. Computations for this research were conducted on the Lichtenberg high performance computer of the TU Darmstadt.


  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press (1995)Google Scholar
  2. Allahyari, H., Lavesson, N.: User-oriented assessment of classification model understandability. In: Kofod-Petersen, A., Heintz, F., Langseth, H. (eds.) Proceedings of the 11th Scandinavian Conference on Artificial Intelligence (SCAI-11). Frontiers in Artificial Intelligence and Applications, vol. 227, pp. 11–19. IOS Press, Trondheim, Norway (2011)Google Scholar
  3. Allamanis, M., Tzima, F., Mitkas, P.: Effective Rule-Based Multi-label Classification with Learning Classifier Systems. In: Adaptive and Natural Computing Algorithms, 11th International Conference, ICANNGA 2013. pp. 466–476 (2013)Google Scholar
  4. Arunadevi, J., Rajamani, V.: An evolutionary multi label classification using associative rule mining for spatial preferences. IJCA Special Issue on Artificial Intelligence Techniques - Novel Approaches and Practical Applications (3), 28–37 (2011)Google Scholar
  5. Atzmüller, M.: Subgroup discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(1), 35–49 (2015)Google Scholar
  6. Ávila, J., Galindo, E., Ventura, S.: Evolving Multi-label Classification Rules with Gene Expression Programming: A Preliminary Study. In: Hybrid Artificial Intelligence Systems. vol. 6077, pp. 9–16. Springer (2010)Google Scholar
  7. Beckerle, M.: Interaktives Regellernen. Diploma thesis, Technische Universtität Darmstadt (2009), in GermanGoogle Scholar
  8. Bosc, G., Golebiowski, J., Bensafi, M., Robardet, C., Plantevit, M., Boulicaut, J.F., Kaytoue, M.: Local subgroup discovery for eliciting and understanding new structure-odor relationships. In: Calders, T., Ceci, M., Malerba, D. (eds.) Proceedings of the 19th International Conference on Discovery Science (DS-16). Lecture Notes in Computer Science, vol. 9956, pp. 19–34. Bari, Italy (2016)Google Scholar
  9. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.C.M.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004)Google Scholar
  10. Bryce Goodman, S.F.: European union regulations on algorithmic decision-making and a “right to explanation”. In: Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016). pp. 26–30 (2016), arXiv:1606.08813 [stat.ML]Google Scholar
  11. Cameron-Jones, R.M., Quinlan, J.R.: Avoiding pitfalls when learning recursive theories. In: Bajcsy, R. (ed.) Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93). pp. 1050–1057. Chambéry, France (1993)Google Scholar
  12. Charte, F., Rivera, A.J., del Jesús, M.J., Herrera, F.: LI-MLC: A label inference methodology for addressing high dimensionality in the label space for multilabel classification. IEEE Transactions on Neural Networks and Learning Systems 25(10), 1842–1854 (2014)Google Scholar
  13. Chekina, L., Gutfreund, D., Kontorovich, A., Rokach, L., Shapira, B.: Exploiting label dependencies for improved sample complexity. Machine Learning 91(1), 1–42 (2013)MathSciNetzbMATHGoogle Scholar
  14. Cohen, W.W.: Fast effective rule induction. In: Prieditis, A., Russell, S. (eds.) Proceedings of the 12th International Conference on Machine Learning (ML-95). pp. 115–123. Morgan Kaufmann, Lake Tahoe, CA (1995)Google Scholar
  15. De Raedt, L., Lavrač, N., Džeroski, S.: Multiple predicate learning. In: Bajcsy, R. (ed.) Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93). pp. 1037–1043. Morgan Kaufmann, Chambéry, France (1993)Google Scholar
  16. Dembczyński, K., Kotłowski, W., Słowiski, R.: ENDER: a statistical framework for boosting decision rules. Data Mining and Knowledge Discovery 21(1), 52–90 (2010)MathSciNetGoogle Scholar
  17. Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, E.: On label dependence and loss minimization in multi-label classification. Machine Learning 88(1–2), 5–45 (2012)MathSciNetzbMATHGoogle Scholar
  18. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  19. Duivesteijn, W., Feelders, A., Knobbe, A.J.: Exceptional model mining – supervised descriptive local pattern mining with complex target concepts. Data Mining and Knowledge Discovery 30(1), 47–98 (2016)MathSciNetGoogle Scholar
  20. Duivesteijn, W., Loza Mencía, E., Fürnkranz, J., Knobbe, A.J.: Multi-label lego – enhancing multi-label classifiers with local patterns. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) Advances in Intelligent Data Analysis XI – Proceedings of the 11th International Symposium on Data Analysis (IDA-11). Lecture Notes in Computer Science, vol. 7619, pp. 114–125. Springer (2012)Google Scholar
  21. Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems. vol. 14, pp. 681–687. MIT Press (2001)Google Scholar
  22. Freitas, A.A.: Comprehensible classification models: a position paper. SIGKDD Explorations 15(1), 1–10 (2013)Google Scholar
  23. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32, 675–701 (1937)zbMATHGoogle Scholar
  24. Fürnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Review 13(1), 3–54 (1999)zbMATHGoogle Scholar
  25. Fürnkranz, J.: From local to global patterns: Evaluation issues in rule learning algorithms. In: Morik, K., Boulicaut, J.F., Siebes, A. (eds.) Local Pattern Detection. pp. 20–38. Springer-Verlag (2005)Google Scholar
  26. Fürnkranz, J., Gamberger, D., Lavrač, N.: Foundations of Rule Learning. Springer-Verlag (2012)Google Scholar
  27. Fürnkranz, J., Kliegr, T., Paulheim, H.: On cognitive preferences and the interpretability of rule-based models. arXiv preprint arXiv:1803.01316 (2018)Google Scholar
  28. Gabriel, A., Paulheim, H., Janssen, F.: Learning semantically coherent rules. In: Cellier, P., Charnois, T., Hotho, A., Matwin, S., Moens, M.F., Toussaint, Y. (eds.) Proceedings of the 1st International Workshop on Interactions between Data Mining and Natural Language Processing co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2014). vol. 1202, pp. 49–63. CEUR Workshop Proceedings, Nancy, France (2014)Google Scholar
  29. Gibaja, E., Ventura, S.: Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdisciplinary Review: Data Mining and Knowledge Discovery 4(6), 411–444 (2014)Google Scholar
  30. Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surv. 47(3), 52 (2015)Google Scholar
  31. Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Advances in Knowledge Discovery and Data Mining (PAKDD 2004). pp. 22–30 (2004)Google Scholar
  32. Goethals, B.: Frequent set mining. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 377–397. Springer-Verlag (2005)Google Scholar
  33. Guo, Y., Gu, S.: Multi-label classification using conditional dependency networks. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two. pp. 1300–1305. IJCAI’11, AAAI Press (2011)Google Scholar
  34. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery 8(1), 53–87 (2004)MathSciNetGoogle Scholar
  35. Hayes, P.J., Weinstein, S.P.: CONSTRUE/TIS: A system for content-based indexing of a database of news stories. In: Rappaport, A.T., Smith, R.G. (eds.) Proceedings of the 2nd Conference on Innovative Applications of Artificial Intelligence (IAAI-90), May 1–3, 1990, Washington, DC, USA. pp. 49–64. IAAI ’90, AAAI Press, Chicago, IL, USA (1991)Google Scholar
  36. Herrera, F., Charte, F., Rivera, A.J., del Jesús, M.J.: Multilabel Classification - Problem Analysis, Metrics and Techniques. Springer (2016)Google Scholar
  37. Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining – a general survey and comparison. SIGKDD explorations 2(1), 58–64 (2000)Google Scholar
  38. Janssen, F., Fürnkranz, J.: On the quest for optimal rule learning heuristics. Machine Learning 78(3), 343–379 (2010)MathSciNetGoogle Scholar
  39. Knobbe, A.J., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: The LeGo approach to data mining. In: Knobbe, A.J. (ed.) From Local Patterns to Global Models: Proceedings of the ECML/PKDD-08 Workshop (LeGo-08). pp. 1–16. Antwerp, Belgium (2008)Google Scholar
  40. Kralj Novak, P., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10, 377–403 (2009)zbMATHGoogle Scholar
  41. Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Devlopment in Information Retrieval. pp. 37–50 (1992)Google Scholar
  42. Lewis, D.D.: Reuters-21578 text categorization test collection distribution 1.0. README file (V 1.3) (2004)Google Scholar
  43. Li, B., Li, H., Wu, M., Li, P.: Multi-label Classification based on Association Rules with Application to Scene Classification. In: Proceedings of the 2008 The 9th International Conference for Young Computer Scientists. pp. 36–41. IEEE Computer Society (2008)Google Scholar
  44. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Agrawal, R., Stolorz, P., Piatetsky-Shapiro, G. (eds.) Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98). pp. 80–86 (1998)Google Scholar
  45. Liu, B., Ma, Y., Wong, C.K.: Improving an exhaustive search based rule learner. In: Zighed, D.A., Komorowski, H.J., Zytkow, J.M. (eds.) Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000). pp. 504–509. Lyon, France (2000)Google Scholar
  46. Loza Mencía, E., Janssen, F.: Stacking label features for learning multilabel rules. In: Discovery Science - 17th International Conference, DS 2014, Bled, Slovenia, October 8–10, 2014, Proceedings, Lecture Notes in Computer Science, vol. 8777, pp. 192–203. Springer (2014)Google Scholar
  47. Loza Mencía, E., Janssen, F.: Learning rules for multi-label classification: a stacking and a separate-and-conquer approach. Machine Learning 105(1), 77–126 (2016)MathSciNetGoogle Scholar
  48. Malerba, D.: Learning recursive theories in the normal ilp setting. Fundamenta Informaticae 57(1), 39–77 (2003)MathSciNetzbMATHGoogle Scholar
  49. Malerba, D., Semeraro, G., Esposito, F.: A multistrategy approach to learning multiple dependent concepts. In: Machine Learning and Statistics: The Interface, chap. 4, pp. 87–106 (1997)Google Scholar
  50. Minnaert, B., Martens, D., Backer, M.D., Baesens, B.: To tune or not to tune: Rule evaluation for metaheuristic-based sequential covering algorithms. Data Mining and Knowledge Discovery 29(1), 237–272 (2015)MathSciNetGoogle Scholar
  51. Montañés, E., Senge, R., Barranquero, J., Quevedo, J.R., del Coz, J.J., Hüllermeier, E.: Dependent binary relevance models for multi-label classification. Pattern Recognition 47(3), 1494–1508 (2014)Google Scholar
  52. Nemenyi, P.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University (1963)Google Scholar
  53. Papagiannopoulou, C., Tsoumakas, G., Tsamardinos, I.: Discovering and exploiting deterministic label relationships in multi-label learning. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 915–924. KDD ’15, ACM, New York, NY, USA (2015)Google Scholar
  54. Park, S.H., Fürnkranz, J.: Multi-label classification with label constraints. In: Hüllermeier, E., Fürnkranz, J. (eds.) Proceedings of the ECML PKDD 2008 Workshop on Preference Learning (PL-08, Antwerp, Belgium). pp. 157–171 (2008)Google Scholar
  55. Rapp, M.: A Separate-and-Conquer Algorithm for Learning Multi-Label Head Rules. Master thesis, TU Darmstadt, Knowledge Engineering Group (2016)Google Scholar
  56. Rapp, M., Loza Mencía, E., Fürnkranz, J.: Exploiting anti-monotonicity of multi-label evaluation measures for inducing multi-label rules. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-18). Springer-Verlag (2018), to appearGoogle Scholar
  57. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Machine Learning 85(3), 333–359 (2011)MathSciNetGoogle Scholar
  58. Senge, R., del Coz, J.J., Hüllermeier, E.: On the problem of error propagation in classifier chains for multi-label classification. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds.) Proceedings of the 36th Annual Conference of the Gesellschaft für Klassifikation (GfKl-12). pp. 163–170. Hildesheim, Germany (2012)Google Scholar
  59. Stecher, J., Janssen, F., Fürnkranz, J.: Shorter rules are better, aren’t they? In: Calders, T., Ceci, M., Malerba, D. (eds.) Proceedings of the 19th International Conference on Discovery Science (DS-16). pp. 279–294. Springer-Verlag (2016)Google Scholar
  60. Sucar, L.E., Bielza, C., Morales, E.F., Hernandez-Leal, P., Zaragoza, J.H., Larrañaga, P.: Multi-label classification with Bayesian network-based chain classifiers. Pattern Recognition Letters 41, 14–22 (2014)Google Scholar
  61. Sulzmann, J.N., Fürnkranz, J.: A comparison of techniques for selecting and combining class association rules. In: Knobbe, A.J. (ed.) From Local Patterns to Global Models: Proceedings of the ECML/PKDD-08 Workshop (LeGo-08). pp. 154–168. Antwerp, Belgium (2008)Google Scholar
  62. Thabtah, F., Cowling, P., Peng, Y.: MMAC: A New Multi-Class, Multi-Label Associative Classification Approach. In: Proceedings of the 4th IEEE ICDM. pp. 217–224 (2004)Google Scholar
  63. Thabtah, F., Cowling, P., Peng, Y.: Multiple labels associative classification. Knowledge and Information Systems 9(1), 109–129 (2006)Google Scholar
  64. Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.P.: Multilabel classification of music into emotions. In: Proc. 9th International Conference on Music Information Retrieval (ISMIR 2008). pp. 325–330 (2008)Google Scholar
  65. Tsoumakas, G., Katakis, I., Vlahavas, I.P.: Mining Multi-label Data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685 (2010)Google Scholar
  66. Tsoumakas, G., Zhang, M., Zhou, Z.: Introduction to the special issue on learning from multi-label data. Machine Learning 88(1–2), 1–4 (2012)MathSciNetzbMATHGoogle Scholar
  67. Varma, M., Cissé, M. (eds.): Proceedings of the NIPS-15 Workshop on Extreme Classification: Multi-class and Multi-label Learning in Extremely Large Label Spaces (XC-15) (2015)Google Scholar
  68. Veloso, A., Meira, Jr., W., Gonçalves, M., Zaki, M.: Multi-label lazy associative classification. In: Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases. pp. 605–612. PKDD 2007 (2007)Google Scholar
  69. Waegeman, W., Dembczyńki, K., Jachnik, A., Cheng, W., Hüllermeier, E.: On the bayes-optimality of f-measure maximizers. Journal of Machine Learning Research 15(1), 3333–3388 (2014)MathSciNetzbMATHGoogle Scholar
  70. Webb, G.I.: Recent progress in learning decision lists by prepending inferred rules. In: Proceedings of the 2nd Singapore International Conference on Intelligent Systems. pp. B280–B285 (1994)Google Scholar
  71. Webb, G.I.: Efficient search for association rules. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000). pp. 99–107. Boston, MA (2000)Google Scholar
  72. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD-97). pp. 283–286. Newport, CA (1997)Google Scholar
  73. Zhang, C., Zhang, S.: Association Rule Mining: Models and Algorithms. Springer-Verlag (2002)Google Scholar
  74. Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26(8), 1819–1837 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Eneldo Loza Mencía
    • 1
    Email author
  • Johannes Fürnkranz
    • 1
  • Eyke Hüllermeier
    • 2
  • Michael Rapp
    • 1
  1. 1.Knowledge Engineering GroupTechnische Universität DarmstadtDarmstadtGermany
  2. 2.Intelligent SystemsUniversität PaderbornPaderbornGermany

Personalised recommendations