Skip to main content

Vulgarity Classification in Comments Using SVM and LSTM

  • Conference paper
  • First Online:

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 141))

Abstract

Multitudes of textual matter appear daily online. People, possessing the freedom of speech, very often tend to offend the sentiments of readers. Numerous accounts of online harassing, defaming, and bullying prevail in various social networking sites. Posting such content cannot be controlled but thanks to machine learning and deep learning such content can be identified and then removed. Jigsaw and Google have prepared tools to identify such kind of profanity appearing online, but they have not been successful to identify the type of toxicity a comment possesses. Kaggle hence put forth a challenge wherein besides identifying whether a comment is toxic, the comment can be classified into kinds of toxicity. In this challenge, categories like threats, insult, identity hate, and obscenity are taken into consideration. To complete this challenge, various machine learning and deep learning models are applied such as SVM and RNN-LSTM. Our main aim during this challenge is to study the results of using RNN-LSTM for toxic classification. The data is first vectorized using TF-IDF and bag of words. This paper also discusses the nature of the dataset. The results found to give a promising assurance in finding a solution to this problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Taspinar, A.: Text Classification and Sentiment Analysis (2015)

    Google Scholar 

  2. Bretschneider, U.: Wöhner, T., Peters, R.: Detecting Online Harassment in Social Networks (2014)

    Google Scholar 

  3. Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P.: Convolution Neural Network for Toxic Comment Classification. In: Proceedings of the 10th Hellenic Conference on Artificial Intelligence, p. 35. ACM (2018)

    Google Scholar 

  4. Liu, F., Wu, X.: Toxic Comment Detection with Bidirectional LSTM. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6838601.pdf

  5. Durgesh, K.S., Lekha, B.: Data Classification using Support Vector Machine. J Theor Appl Inf Technol. 12(1), 1–7 (2010)

    Google Scholar 

  6. Kowalczyk, A.: Linear kernel: why is it recommended for text classification? Text classification, SVM Tutorial (2014)

    Google Scholar 

  7. Olah, C.: Understanding LSTM Networks (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Crystal Dias .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dias, C., Jangid, M. (2020). Vulgarity Classification in Comments Using SVM and LSTM. In: Somani, A.K., Shekhawat, R.S., Mundra, A., Srivastava, S., Verma, V.K. (eds) Smart Systems and IoT: Innovations in Computing. Smart Innovation, Systems and Technologies, vol 141. Springer, Singapore. https://doi.org/10.1007/978-981-13-8406-6_52

Download citation

Publish with us

Policies and ethics