Rules, Subgroups and Redescriptions as Features in Classification Tasks

Mihelčić, Matej; Šmuc, Tomislav

doi:10.1007/978-3-031-23618-1_17

Rules, Subgroups and Redescriptions as Features in Classification Tasks

Matej Mihelčić⁴⁶ &
Tomislav Šmuc⁴⁷

Conference paper
First Online: 31 January 2023

799 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1752))

Abstract

We evaluate the suitability of using supervised and unsupervised rules, subgroups and redescriptions as new features and meaningful, interpretable representations for classification tasks. Although using supervised rules as features is known to allow increase in performance of classification algorithms, advantages of using unsupervised rules, subgroups, redescriptions and in particular their synergy with rules are still largely unexplored for classification tasks. To research this topic, we developed a fully automated framework for feature construction, selection and testing called DAFNE – Descriptive Automated Feature Construction and Evaluation. As with other available tools for rule-based feature construction, DAFNE provides fully interpretable features with in-depth knowledge about the studied domain problem. The performed results show that DAFNE is capable of producing provably useful features that increase overall predictive performance of different classification algorithms on a set of different classification datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26(10), 1340–1347 (2010)
Article Google Scholar
Arik, S.O., Pfister, T.: TabNet: attentive interpretable tabular learning. In: AAAI vol. 35, no. 8, pp. 6679–6687 (2021)
Google Scholar
Atzmueller, M., Lemmerich, F., Krause, B., Hotho, A.: Towards understanding spammers-discovering local patterns for concept description. In: LeGo ECML/PKDD Workshop (2009)
Google Scholar
Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: ICML, pp. 55–63. Morgan Kaufmann (1998)
Google Scholar
Dembczyński, K., Kotłowski, W., Słowiński, R.: A general framework for learning an ensemble of decision rules. In: LeGo ECML/PKDD Workshop (2008)
Google Scholar
Díaz-Uriarte, R., Alvarez de Andrés, S.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7(1), 3 (2006)
Google Scholar
Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. Data Min. Knowl. Disc. 30(1), 47–98 (2016)
Article MATH Google Scholar
Eibe, F., Hall, M.A., Witten, I.H.: The WEKA workbench. Online appendix for data mining: practical machine learning tools and techniques. In: Morgan Kaufmann. Morgan Kaufmann Publishers (2016)
Google Scholar
Galbrun, E., Miettinen, P.: Redescription Mining. SCS, Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72889-6
Book MATH Google Scholar
García, D., Stavrakoudis, D., González, A., Pérez, R., Theocharis, J.B.: A fuzzy rule-based feature construction approach applied to remotely sensed imagery. In: IFSA-EUSFLAT. Atlantis Press (2015)
Google Scholar
Giacometti, A., Miyaneh, E.K., Marcel, P., Soulet, A.: A generic framework for rule-based classification. In: LeGo ECML/PKDD Workshop, pp. 37–54 (2008)
Google Scholar
Gomez, G., Morales, E.F.: Automatic feature construction and a simple rule induction algorithm for skin detection. In: ICML Workshop on Machine Learning in Computer Vision, pp. 31–38 (2002)
Google Scholar
Grosskreutz, H.: Cascaded subgroups discovery with an application to regression. In: ECML/PKDD, vol. 5211, p. 33 (2008)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hapfelmeier, A., Ulm, K.: A new variable selection approach using random forests. Comput. Stat. Data Anal. 60, 50–69 (2013)
Article MATH Google Scholar
Haury, A.C., Gestraud, P., Vert, J.P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS ONE 6(12), 1–12 (2011)
Article Google Scholar
Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)
Article Google Scholar
Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta-a system for feature selection. Fund. Inform. 101(4), 271–285 (2010)
Google Scholar
Langley, P., Bradshaw, G.L., Simon, H.A.: Rediscovering chemistry with the bacon system. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning. Symbolic Computation, pp. 307–329. Springer, Heidelberg (1983). https://doi.org/10.1007/978-3-662-12405-5_10
Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)
Google Scholar
Liu, H., Motoda, H., Yu, L., Ye, N.: Feature Extraction, Selection, and Construction. The Handbook of Data Mining, pp. 409–424 (2003)
Google Scholar
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective, vol. 453. Springer, New York (1998). https://doi.org/10.1007/978-1-4615-5725-8
Mansbridge, N.: Feature selection and comparison of machine learning algorithms in classification of grazing and rumination behaviour in sheep. Sensors 18(10), 3532 (2018)
Article Google Scholar
Markovitch, S., Rosenstein, D.: Feature generation using general constructor functions. Mach. Learn. 49(1), 59–98 (2002)
Article MATH Google Scholar
Matheus, C.J., Rendell, L.A.: Constructive induction on decision trees. In: IJCAI - Volume 1, pp. 645–650. Morgan Kaufmann Publishers Inc., San Francisco (1989)
Google Scholar
Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T.: A framework for redescription set construction. Expert Syst. Appl. 68, 196–215 (2017)
Article Google Scholar
Mozina, M., Bratko, I.: Rectifying predictions of classifiers by local rules. In: LeGo ECML/PKDD Workshop (2008)
Google Scholar
Murphy, P.M., Pazzani, M.J.: ID2-of-3: constructive induction of M-of-N concepts for discriminators in decision trees. In: Machine Learning Proceedings 1991, pp. 183–187. Elsevier (1991)
Google Scholar
Oglic, D., Gärtner, T.: Greedy feature construction. In: NIPS, pp. 3945–3953. Curran Associates, Inc. (2016)
Google Scholar
Pagallo, G.: Learning dnf by decision trees. In: IJCAI - Volume 1. pp. 639–644. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1989)
Google Scholar
Pagallo, G.M.: Adaptative decision tree algorithms for learning from examples (PH.D. thesis). Technical report, Santa Cruz, CA, USA (1990)
Google Scholar
Ragavan, H., Rendell, L.A.: Lookahead feature construction for learning hard concepts. In: ICML, pp. 252–259. Morgan Kaufmann Publishers Inc. (1993)
Google Scholar
Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., Helm, R.F.: Turning cartwheels: an alternating algorithm for mining redescriptions. In: KDD, pp. 266–275. ACM, New York (2004)
Google Scholar
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), 1–21 (2015)
Article Google Scholar
Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 334–343. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_33
Chapter Google Scholar
Tran, B., Xue, B., Zhang, M.: Using feature clustering for GP-based feature construction on high-dimensional data. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 210–226. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_14
Chapter Google Scholar
Ucar, T., Hajiramezanali, E., Edwards, L.: SubTab: subsetting features of tabular data for self-supervised representation learning. In: NeurIPS, pp. 18853–18865 (2021)
Google Scholar
UCI: UCI machine learning repository. https://archive.ics.uci.edu/ml/index.php. Accessed 05 July 2022
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)
Google Scholar
Vens, C., Costa, F.: Random forest based feature induction. In: Cook, D.J., Pei, J., Wang, W., Zaïane, O.R., Wu, X. (eds.) ICDM, pp. 744–753. IEEE Computer Society (2011)
Google Scholar
Wang, M., Chen, X., Zhang, H.: Maximal conditional chi-square importance in random forests. Bioinformatics 26(6), 831–7 (2010)
Article Google Scholar
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108
Chapter Google Scholar
Yang, D.S., Rendell, L., Blix, G.: A scheme for feature construction and a comparison of empirical methods. In: IJCAI - Volume 2, pp. 699–704. Morgan Kaufmann Publishers Inc., San Francisco (1991)
Google Scholar
Zheng, Z.: Constructing nominal X-of-N attributes. In: IJCAI - Volume 2, pp. 1064–1070. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Google Scholar
Zhou, Z., Feng, J.: Deep forest: towards an alternative to deep neural networks. In: Sierra, C. (ed.) IJCAI, pp. 3553–3559. ijcai.org (2017)
Google Scholar

Download references

Acknowledgement

The authors acknowledge support by the Research Cooperability Program of the Croatian Science Foundation, funded by the European Union from the European Social Fund under the Operational Programme Efficient Human Resources 2014–2020, through the Grant 8525: Augmented Intelligence Workflows for Prediction, Discovery, and Understanding in Genomics and Pharmacogenomics.

Author information

Authors and Affiliations

Department of Mathematics, Faculty of Science, University of Zagreb, Bijenička cesta 30, 10000, Zagreb, Croatia
Matej Mihelčić
Ruđer Bošković Institute, Bijenička cesta 54, 10000, Zagreb, Croatia
Tomislav Šmuc

Authors

Matej Mihelčić
View author publications
You can also search for this author in PubMed Google Scholar
Tomislav Šmuc
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matej Mihelčić .

Editor information

Editors and Affiliations

University of Sydney, Sydney, Australia
Irena Koprinska
University of Bari Aldo Moro, Bari, Italy
Paolo Mignone
University of Pisa, Pisa, Italy
Riccardo Guidotti
Warsaw University of Technology, Warsaw, Poland
Szymon Jaroszewicz
Heidelberg University, Heidelberg, Germany
Holger Fröning
UniCredit, Rome, Italy
Francesco Gullo
University of Lisbon, Lisbon, Portugal
Pedro M. Ferreira
Roche, Basel, Switzerland
Damian Roqueiro
Barcelona Supercomputing Center, Barcelona, Spain
Gaia Ceddia
Halmstad University, Halmstad, Sweden
Slawomir Nowaczyk
University of Porto, Porto, Portugal
João Gama
University of Porto, Porto, Portugal
Rita Ribeiro
UPC BarcelonaTech, Barcelona, Spain
Ricard Gavaldà
University of Naples Federico II, Naples, Italy
Elio Masciari
University of North Carolina, Charlotte, USA
Zbigniew Ras
ICAR-CNR, Rende, Italy
Ettore Ritacco
University of Pisa, Pisa, Italy
Francesca Naretto
Aalen University of Applied Sciences, Aalen, Germany
Andreas Theissler
Warsaw University of Technology, Warszaw, Poland
Przemyslaw Biecek
KU Leuven, Leuven, Belgium
Wouter Verbeke
University of Duisburg-Essen, Essen, Germany
Gregor Schiele
Graz University of Technology, Graz, Austria
Franz Pernkopf
AMD, Dublin, Ireland
Michaela Blott
UniCredit, Rome, Italy
Ilaria Bordino
UniCredit, Milan, Italy
Ivan Luciano Danesi
National Agency for New Technologies, Rome, Italy
Giovanni Ponti
Unicredit, Rome, Italy
Lorenzo Severini
University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Bari Aldo Moro, Bari, Italy
Giuseppina Andresini
University of Lisbon, Lisbon, Portugal
Ibéria Medeiros
University of Lisbon, Lisbon, Portugal
Guilherme Graça
Northwestern University, Chicago, USA
Lee Cooper
Roche, Basel, Switzerland
Naghmeh Ghazaleh
University of Lausanne, Lausanne, Switzerland
Jonas Richiardi
Novartis, Basel, Switzerland
Diego Saldana
Novartis, Basel, Switzerland
Konstantinos Sechidis
Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
Arif Canakoglu
Politecnico di Milano, Milan, Italy
Sara Pido
Politecnico di Milano, Milan, Italy
Pietro Pinoli
University of Waikato, Hamilton, New Zealand
Albert Bifet
Halmstad University, Halmstad, Sweden
Sepideh Pashami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mihelčić, M., Šmuc, T. (2023). Rules, Subgroups and Redescriptions as Features in Classification Tasks. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1752. Springer, Cham. https://doi.org/10.1007/978-3-031-23618-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-23618-1_17
Published: 31 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23617-4
Online ISBN: 978-3-031-23618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics