Skip to main content

Rules, Subgroups and Redescriptions as Features in Classification Tasks

  • Conference paper
  • First Online:
  • 799 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1752))

Abstract

We evaluate the suitability of using supervised and unsupervised rules, subgroups and redescriptions as new features and meaningful, interpretable representations for classification tasks. Although using supervised rules as features is known to allow increase in performance of classification algorithms, advantages of using unsupervised rules, subgroups, redescriptions and in particular their synergy with rules are still largely unexplored for classification tasks. To research this topic, we developed a fully automated framework for feature construction, selection and testing called DAFNE – Descriptive Automated Feature Construction and Evaluation. As with other available tools for rule-based feature construction, DAFNE provides fully interpretable features with in-depth knowledge about the studied domain problem. The performed results show that DAFNE is capable of producing provably useful features that increase overall predictive performance of different classification algorithms on a set of different classification datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26(10), 1340–1347 (2010)

    Article  Google Scholar 

  2. Arik, S.O., Pfister, T.: TabNet: attentive interpretable tabular learning. In: AAAI vol. 35, no. 8, pp. 6679–6687 (2021)

    Google Scholar 

  3. Atzmueller, M., Lemmerich, F., Krause, B., Hotho, A.: Towards understanding spammers-discovering local patterns for concept description. In: LeGo ECML/PKDD Workshop (2009)

    Google Scholar 

  4. Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: ICML, pp. 55–63. Morgan Kaufmann (1998)

    Google Scholar 

  5. Dembczyński, K., Kotłowski, W., Słowiński, R.: A general framework for learning an ensemble of decision rules. In: LeGo ECML/PKDD Workshop (2008)

    Google Scholar 

  6. Díaz-Uriarte, R., Alvarez de Andrés, S.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7(1), 3 (2006)

    Google Scholar 

  7. Duivesteijn, W., Feelders, A.J., Knobbe, A.: Exceptional model mining. Data Min. Knowl. Disc. 30(1), 47–98 (2016)

    Article  MATH  Google Scholar 

  8. Eibe, F., Hall, M.A., Witten, I.H.: The WEKA workbench. Online appendix for data mining: practical machine learning tools and techniques. In: Morgan Kaufmann. Morgan Kaufmann Publishers (2016)

    Google Scholar 

  9. Galbrun, E., Miettinen, P.: Redescription Mining. SCS, Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72889-6

    Book  MATH  Google Scholar 

  10. García, D., Stavrakoudis, D., González, A., Pérez, R., Theocharis, J.B.: A fuzzy rule-based feature construction approach applied to remotely sensed imagery. In: IFSA-EUSFLAT. Atlantis Press (2015)

    Google Scholar 

  11. Giacometti, A., Miyaneh, E.K., Marcel, P., Soulet, A.: A generic framework for rule-based classification. In: LeGo ECML/PKDD Workshop, pp. 37–54 (2008)

    Google Scholar 

  12. Gomez, G., Morales, E.F.: Automatic feature construction and a simple rule induction algorithm for skin detection. In: ICML Workshop on Machine Learning in Computer Vision, pp. 31–38 (2002)

    Google Scholar 

  13. Grosskreutz, H.: Cascaded subgroups discovery with an application to regression. In: ECML/PKDD, vol. 5211, p. 33 (2008)

    Google Scholar 

  14. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  15. Hapfelmeier, A., Ulm, K.: A new variable selection approach using random forests. Comput. Stat. Data Anal. 60, 50–69 (2013)

    Article  MATH  Google Scholar 

  16. Haury, A.C., Gestraud, P., Vert, J.P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS ONE 6(12), 1–12 (2011)

    Article  Google Scholar 

  17. Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)

    Article  Google Scholar 

  18. Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta-a system for feature selection. Fund. Inform. 101(4), 271–285 (2010)

    Google Scholar 

  19. Langley, P., Bradshaw, G.L., Simon, H.A.: Rediscovering chemistry with the bacon system. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning. Symbolic Computation, pp. 307–329. Springer, Heidelberg (1983). https://doi.org/10.1007/978-3-662-12405-5_10

  20. Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)

    Google Scholar 

  21. Liu, H., Motoda, H., Yu, L., Ye, N.: Feature Extraction, Selection, and Construction. The Handbook of Data Mining, pp. 409–424 (2003)

    Google Scholar 

  22. Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective, vol. 453. Springer, New York (1998). https://doi.org/10.1007/978-1-4615-5725-8

  23. Mansbridge, N.: Feature selection and comparison of machine learning algorithms in classification of grazing and rumination behaviour in sheep. Sensors 18(10), 3532 (2018)

    Article  Google Scholar 

  24. Markovitch, S., Rosenstein, D.: Feature generation using general constructor functions. Mach. Learn. 49(1), 59–98 (2002)

    Article  MATH  Google Scholar 

  25. Matheus, C.J., Rendell, L.A.: Constructive induction on decision trees. In: IJCAI - Volume 1, pp. 645–650. Morgan Kaufmann Publishers Inc., San Francisco (1989)

    Google Scholar 

  26. Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T.: A framework for redescription set construction. Expert Syst. Appl. 68, 196–215 (2017)

    Article  Google Scholar 

  27. Mozina, M., Bratko, I.: Rectifying predictions of classifiers by local rules. In: LeGo ECML/PKDD Workshop (2008)

    Google Scholar 

  28. Murphy, P.M., Pazzani, M.J.: ID2-of-3: constructive induction of M-of-N concepts for discriminators in decision trees. In: Machine Learning Proceedings 1991, pp. 183–187. Elsevier (1991)

    Google Scholar 

  29. Oglic, D., Gärtner, T.: Greedy feature construction. In: NIPS, pp. 3945–3953. Curran Associates, Inc. (2016)

    Google Scholar 

  30. Pagallo, G.: Learning dnf by decision trees. In: IJCAI - Volume 1. pp. 639–644. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1989)

    Google Scholar 

  31. Pagallo, G.M.: Adaptative decision tree algorithms for learning from examples (PH.D. thesis). Technical report, Santa Cruz, CA, USA (1990)

    Google Scholar 

  32. Ragavan, H., Rendell, L.A.: Lookahead feature construction for learning hard concepts. In: ICML, pp. 252–259. Morgan Kaufmann Publishers Inc. (1993)

    Google Scholar 

  33. Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., Helm, R.F.: Turning cartwheels: an alternating algorithm for mining redescriptions. In: KDD, pp. 266–275. ACM, New York (2004)

    Google Scholar 

  34. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), 1–21 (2015)

    Article  Google Scholar 

  35. Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 334–343. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4_33

    Chapter  Google Scholar 

  36. Tran, B., Xue, B., Zhang, M.: Using feature clustering for GP-based feature construction on high-dimensional data. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 210–226. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_14

    Chapter  Google Scholar 

  37. Ucar, T., Hajiramezanali, E., Edwards, L.: SubTab: subsetting features of tabular data for self-supervised representation learning. In: NeurIPS, pp. 18853–18865 (2021)

    Google Scholar 

  38. UCI: UCI machine learning repository. https://archive.ics.uci.edu/ml/index.php. Accessed 05 July 2022

  39. Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)

    Google Scholar 

  40. Vens, C., Costa, F.: Random forest based feature induction. In: Cook, D.J., Pei, J., Wang, W., Zaïane, O.R., Wu, X. (eds.) ICDM, pp. 744–753. IEEE Computer Society (2011)

    Google Scholar 

  41. Wang, M., Chen, X., Zhang, H.: Maximal conditional chi-square importance in random forests. Bioinformatics 26(6), 831–7 (2010)

    Article  Google Scholar 

  42. Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108

    Chapter  Google Scholar 

  43. Yang, D.S., Rendell, L., Blix, G.: A scheme for feature construction and a comparison of empirical methods. In: IJCAI - Volume 2, pp. 699–704. Morgan Kaufmann Publishers Inc., San Francisco (1991)

    Google Scholar 

  44. Zheng, Z.: Constructing nominal X-of-N attributes. In: IJCAI - Volume 2, pp. 1064–1070. Morgan Kaufmann Publishers Inc., San Francisco (1995)

    Google Scholar 

  45. Zhou, Z., Feng, J.: Deep forest: towards an alternative to deep neural networks. In: Sierra, C. (ed.) IJCAI, pp. 3553–3559. ijcai.org (2017)

    Google Scholar 

Download references

Acknowledgement

The authors acknowledge support by the Research Cooperability Program of the Croatian Science Foundation, funded by the European Union from the European Social Fund under the Operational Programme Efficient Human Resources 2014–2020, through the Grant 8525: Augmented Intelligence Workflows for Prediction, Discovery, and Understanding in Genomics and Pharmacogenomics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matej Mihelčić .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mihelčić, M., Šmuc, T. (2023). Rules, Subgroups and Redescriptions as Features in Classification Tasks. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1752. Springer, Cham. https://doi.org/10.1007/978-3-031-23618-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23618-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23617-4

  • Online ISBN: 978-3-031-23618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics