Skip to main content

Comparison Between SVM and DistilBERT for Multi-label Text Classification of Scientific Papers Aligned with Sustainable Development Goals

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2022)

Abstract

The scientific articles identification with the 17 sustainable development goals of the UN 2030 Agenda is a valuable task for research and educational institutions. Finding an efficient and practical multi-label classification model using machine or deep learning remains relevant. This work refers to the performance comparison of a text classification model that combines Label Powerset (LP) and Support Vector Machine (SVM) against a transfer learning language model such as DistilBERT in 5 different imbalanced and balanced dataset scenarios of scientific papers. A proposed classification process was implemented with performance metrics, which have confirmed that the combination LP-SVM continues to be an option with remarkable results in multi-label text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Medina, S.R., Niamir, A., Dadvar, M.: Multi-Label Text Classification with Transfer Learning for Policy Documents The Case of the Sustainable Development Goals. Uppsala University (2019)

    Google Scholar 

  2. Aggarwal, C. .:Data Classification: Algorithms and Applications. CRC press (2014)

    Google Scholar 

  3. Rivolli, A., Read, J., Soares, C., Pfahringer, B., de Carvalho, A.C.P.L.F.: An empirical analysis of binary transformation strategies and base algorithms for multi-label learning. Mach. Learn. 109(8), 1509–1563 (2020). https://doi.org/10.1007/s10994-020-05879-3

    Article  MathSciNet  MATH  Google Scholar 

  4. Dudzik, W., Nalepa, J., Kawulok, M.: Evolving data-adaptive support vector machines for binary classification. Knowl.-Based Syst. 227, 107221 (2021). https://doi.org/10.1016/j.knosys.2021.107221

    Article  Google Scholar 

  5. Shah, K., Patel, H., Sanghvi, D., Shah, M.: A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Hum. Res. 5(1), 1–16 (2020). https://doi.org/10.1007/s41133-020-00032-0

    Article  Google Scholar 

  6. Xu, S.: Bayesian Naïve Bayes classifiers to text classification. J. Inf. Sci. 44(1), 48–59 (2018). https://doi.org/10.1177/0165551516677946

    Article  Google Scholar 

  7. Wu, X., Gao, Y., Jiao, D.: Multi-Label classification based on random forest algorithm for non-intrusive load monitoring system. Processes 7(6), 337 (2019). https://doi.org/10.3390/pr7060337

    Article  Google Scholar 

  8. Abdullahi, A., Samsudin, N.A., Khalid, S.K.A., Othman, Z.A.: An improved multi-label classifier chain method for automated text classification. Int. J. Adv. Comput. Sci. Appl. 12(3), 442–449 (2021). https://doi.org/10.14569/IJACSA.2021.0120352

    Article  Google Scholar 

  9. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171– 4186 (2019). Available: http://arxiv.org/abs/1810.04805

  10. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2) co-located with the Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019), pp. 1–5 (2019).  Available: http://arxiv.org/abs/1910.01108

  11. Adhikari, A., Ram, A., Tang, R., Lin, J..: DocBERT: BERT for document classification. In: Proceedings of the 5th Workshop on Representation Learning for NLP, pp. 72–77 (2020). Accessed: 26 Jun 2022. [Online]. Available: https://aclanthology.org/2020.repl4nlp-1.10.pdf

  12. Bambroo P., Awasthi, A.: LegalDB: long distilbert for legal document classification. In: Proceedings of the 2021 1st International Conference on Advances in Electrical, Computing, Communications and Sustainable Technologies, ICAECT 2021 (2021). https://doi.org/10.1109/ICAECT49130.2021.9392558

  13. Jiao, X., Hui, K., Sun, L., Sun, Y.: TinyBERT: distilling BERT for natural language understanding. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4163–4174 (2020). Accessed: 26 May 2022 [Online]. Available: https://aclanthology.org/2020.findings-emnlp.372.pdf

  14. United-Nations, “Resolution 70/1. Transforming our world: the 2030 Agenda for Sustainable Development,” United Nations (2015)

    Google Scholar 

  15. Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45(9), 3084–3104 (2012). https://doi.org/10.1016/j.patcog.2012.03.004

    Article  Google Scholar 

  16. Tsoumakas, G., Katakis, I.: Multi-Label classification: an overview. Int. J. Data Warehouse. Min. 3(3), 1–13 (2007). https://doi.org/10.4018/jdwm.2007070101

    Article  Google Scholar 

  17. Read, J.: Advances in Multi-label Classification (2011)

    Google Scholar 

  18. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L.D., Matwin, S., Mladenič, D., Skowron, A. (eds.) Machine Learning: ECML 2007. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38

    Chapter  Google Scholar 

  19. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Article  Google Scholar 

  20. Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., Lopez, A.: A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408, 189–215 (2020). https://doi.org/10.1016/j.neucom.2019.10.118

    Article  Google Scholar 

  21. Hana, K.M., Adiwijaya, S., Faraby, A., Bramantoro, A.: Multi-label classification of Indonesian hate speech on Twitter using support vector machines. In: 2020 International Conference on Data Science and Its Applications (ICoDSA), pp. 1–7 (2020).  https://doi.org/10.1109/ICoDSA50139.2020.9212992

  22. Saeed, S., Ong, H.C.: Performance of SVM with multiple kernel learning for classification tasks of imbalanced datasets. Pertanika J. Sci. Technol. 27(1), 527–545 (2019)

    Google Scholar 

  23. Büyüköz, B., Hürriyetoğlu, A., Özgür, A.: Analyzing ELMo and DistilBERT on socio-political news classification. In: Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020, pp. 9–18 (2020). Available: https://www.aclweb.org/anthology/2020.aespen-1.4

  24. Clavié, B., Alphonsus, M.: The unreasonable effectiveness of the baseline: discussing SVMs in legal text classification. Front. Artif. Intell. Appl. 346, 58–61 (2021). https://doi.org/10.3233/FAIA210317

    Article  Google Scholar 

  25. Menger, V., Scheepers, F., Spruit, M.: Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text. Appl. Sci. (Switzerland) 8(6), (2018). https://doi.org/10.3390/app8060981

  26. Alammary, A.S.: BERT models for Arabic text classification: a systematic review. Appl. Sci. 12(11), 5720 (2022). https://doi.org/10.3390/app12115720

    Article  Google Scholar 

  27. Lagutina, K.: Topical text classification of Russian news: a comparison of BERT and standard models. In: 2022 31st Conference of Open Innovations Association (FRUCT), pp. 160–166 (2022). https://doi.org/10.23919/FRUCT54823.2022.9770920

  28. Wastl, J., Porter, S., Draux, H., Fane, B., Hook, D.: Contextualizing sustainable development research. Digit. Sci. (2020). Available: https://doi.org/10.6084/m9.figshare.12200081

  29. Mishra, A., Vishwakarma, S.: Analysis of TF-IDF model and its variant for document retrieval. In: 2015 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 772–776 (2015). https://doi.org/10.1109/CICN.2015.157

  30. Nasierding, G., Kouzani, A.Z.: Comparative evaluation of multi-label classification methods. In: Proceedings - 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2012, pp. 679–683 (2012). https://doi.org/10.1109/FSKD.2012.6234347

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joaquín Gutiérrez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Morales-Hernández, R.C., Becerra-Alonso, D., Vivas, E.R., Gutiérrez, J. (2022). Comparison Between SVM and DistilBERT for Multi-label Text Classification of Scientific Papers Aligned with Sustainable Development Goals. In: Pichardo Lagunas, O., Martínez-Miranda, J., Martínez Seis, B. (eds) Advances in Computational Intelligence. MICAI 2022. Lecture Notes in Computer Science(), vol 13613. Springer, Cham. https://doi.org/10.1007/978-3-031-19496-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19496-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19495-5

  • Online ISBN: 978-3-031-19496-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics