Abstract
Multitudes of textual matter appear daily online. People, possessing the freedom of speech, very often tend to offend the sentiments of readers. Numerous accounts of online harassing, defaming, and bullying prevail in various social networking sites. Posting such content cannot be controlled but thanks to machine learning and deep learning such content can be identified and then removed. Jigsaw and Google have prepared tools to identify such kind of profanity appearing online, but they have not been successful to identify the type of toxicity a comment possesses. Kaggle hence put forth a challenge wherein besides identifying whether a comment is toxic, the comment can be classified into kinds of toxicity. In this challenge, categories like threats, insult, identity hate, and obscenity are taken into consideration. To complete this challenge, various machine learning and deep learning models are applied such as SVM and RNN-LSTM. Our main aim during this challenge is to study the results of using RNN-LSTM for toxic classification. The data is first vectorized using TF-IDF and bag of words. This paper also discusses the nature of the dataset. The results found to give a promising assurance in finding a solution to this problem.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Taspinar, A.: Text Classification and Sentiment Analysis (2015)
Bretschneider, U.: Wöhner, T., Peters, R.: Detecting Online Harassment in Social Networks (2014)
Georgakopoulos, S.V., Tasoulis, S.K., Vrahatis, A.G., Plagianakos, V.P.: Convolution Neural Network for Toxic Comment Classification. In: Proceedings of the 10th Hellenic Conference on Artificial Intelligence, p. 35. ACM (2018)
Liu, F., Wu, X.: Toxic Comment Detection with Bidirectional LSTM. https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1184/reports/6838601.pdf
Durgesh, K.S., Lekha, B.: Data Classification using Support Vector Machine. J Theor Appl Inf Technol. 12(1), 1–7 (2010)
Kowalczyk, A.: Linear kernel: why is it recommended for text classification? Text classification, SVM Tutorial (2014)
Olah, C.: Understanding LSTM Networks (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dias, C., Jangid, M. (2020). Vulgarity Classification in Comments Using SVM and LSTM. In: Somani, A.K., Shekhawat, R.S., Mundra, A., Srivastava, S., Verma, V.K. (eds) Smart Systems and IoT: Innovations in Computing. Smart Innovation, Systems and Technologies, vol 141. Springer, Singapore. https://doi.org/10.1007/978-981-13-8406-6_52
Download citation
DOI: https://doi.org/10.1007/978-981-13-8406-6_52
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8405-9
Online ISBN: 978-981-13-8406-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)