Skip to main content

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

  • Conference paper
  • First Online:
Machine Learning for Predictive Analysis

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 141))

Abstract

Web-based life has turned into an integral part of the regular day-to-day existence of a large number of individuals around the globe. Online commenting spaces generate a plethora of expressive content in the public domain, which contributes to a healthy environment for humans. However, it also has threats and dangers of cyberbullying, personal attacks, and the use of abusive language. This motivates industry researchers to model an automated process to curb this phenomenon. The aim of this paper is to perform multi-label text categorization, where each comment could belong to multiple toxic labels at the same time. We tested two models: RNN and LSTM. Their performance is significantly better than that of Logistic Regression and ExtraTrees, which are baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Online harassment pew research center (Online). Available http://www.pewinternet.org/2017/07/11/online-harassment-2017/

  2. Social Network Ranking (Online). Available https://www.statista.com/statistics/272014/global-social-networksranked-by-number-of-users/

  3. F. Mohammad (2018) Is preprocessing of text really worth your time for online comment classification? arXiv preprint arXiv:1806.02908

  4. A. Lenhart, M. Ybarra, K. Zickuhr, M. Price-Feeney, Online harassment, digital abuse, and cyberstalking in America. Data and Society Research Institute (2016)

    Google Scholar 

  5. Teen Internet Safety Survey 2014 (Online). Available https://www.cox.com/content/dam/cox/aboutus/documents/tweeninternet-safety-survey.pdf

  6. E. Wulczyn, N. Thain, L. Dixon, Ex machina: personal attacks seen at scale, in Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2017), pp. 1391–1399

    Google Scholar 

  7. Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P. Convolutional neural networks for toxic comment classification, in Proceedings of the 10th Hellenic Conference on Artificial Intelligence (ACM, 2018), p. 35

    Google Scholar 

  8. S. Vijayarani, M.J. Ilamathi, M. Nithya, Preprocessing techniques for text mining-an overview. Int. J. Comput. Sci. Commun. Netw. 5(1), 7–16 (2015)

    Google Scholar 

  9. J.H. Park, P. Fung, One-step and two-step classification for abusive language detection on twitter. Hong Kong University of Science and Technology

    Google Scholar 

  10. P. Fortuna, J. Ferreira, L. Pires, G. Routar, S. Nunes, Merging datasets for aggressive text identification, in Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) (2018), pp. 128–139

    Google Scholar 

  11. C. Guibin, Y. Deheng, Z.X.C. Jieshan, C. Erik, Ensemble application of convolutional and recurrent neural networks for multilabel text categorization, in International Joint Conference on Neural Networks (IJCNN) (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Darshin Kalpesh Shah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, D.K., Sanghvi, M.A., Mehta, R.P., Shah, P.S., Singh, A. (2021). Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms. In: Joshi, A., Khosravy, M., Gupta, N. (eds) Machine Learning for Predictive Analysis. Lecture Notes in Networks and Systems, vol 141. Springer, Singapore. https://doi.org/10.1007/978-981-15-7106-0_3

Download citation

Publish with us

Policies and ethics