Advertisement

Vulgarity Classification in Comments Using SVM and LSTM

  • Crystal DiasEmail author
  • Mahesh Jangid
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 141)

Abstract

Multitudes of textual matter appear daily online. People, possessing the freedom of speech, very often tend to offend the sentiments of readers. Numerous accounts of online harassing, defaming, and bullying prevail in various social networking sites. Posting such content cannot be controlled but thanks to machine learning and deep learning such content can be identified and then removed. Jigsaw and Google have prepared tools to identify such kind of profanity appearing online, but they have not been successful to identify the type of toxicity a comment possesses. Kaggle hence put forth a challenge wherein besides identifying whether a comment is toxic, the comment can be classified into kinds of toxicity. In this challenge, categories like threats, insult, identity hate, and obscenity are taken into consideration. To complete this challenge, various machine learning and deep learning models are applied such as SVM and RNN-LSTM. Our main aim during this challenge is to study the results of using RNN-LSTM for toxic classification. The data is first vectorized using TF-IDF and bag of words. This paper also discusses the nature of the dataset. The results found to give a promising assurance in finding a solution to this problem.

Keywords

SVM LSTM Bag of words TF-IDF toxic comment classification 

References

  1. 1.
    Taspinar, A.: Text Classification and Sentiment Analysis (2015)Google Scholar
  2. 2.
    Bretschneider, U.: Wöhner, T., Peters, R.: Detecting Online Harassment in Social Networks (2014)Google Scholar
  3. 3.
    Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P.: Convolution Neural Network for Toxic Comment Classification. In: Proceedings of the 10th Hellenic Conference on Artificial Intelligence, p. 35. ACM (2018)Google Scholar
  4. 4.
    Liu, F., Wu, X.: Toxic Comment Detection with Bidirectional LSTM. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6838601.pdf
  5. 5.
    Durgesh, K.S., Lekha, B.: Data Classification using Support Vector Machine. J Theor Appl Inf Technol. 12(1), 1–7 (2010)Google Scholar
  6. 6.
    Kowalczyk, A.: Linear kernel: why is it recommended for text classification? Text classification, SVM Tutorial (2014)Google Scholar
  7. 7.
    Olah, C.: Understanding LSTM Networks (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringManipal University JaipurJaipurIndia

Personalised recommendations