Identifying Offensive Content in Social Media Posts

Singh, Ashwin; Ray, Rudraroop

doi:10.1007/978-3-030-73696-5_1

Ashwin Singh¹⁰ &
Rudraroop Ray¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1402))

Included in the following conference series:

International Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation

1317 Accesses
1 Citations

Abstract

The identification of offensive language on social media has been a widely studied problem in recent years owing to the volume of data generated by these platforms and its consequences. In this paper, we present the results of our experiments on the OLID dataset from the OffensEval shared from SemEval 2019. We use both traditional machine learning methods and state of the art transformer models like BERT to set a baseline for our experiments. Following this, we propose the use of fine-tuning Distilled Bert using both OLID and an additional hate speech and offensive language dataset. Then, we evaluate our model on the test set, yielding a macro f1 score of 78.8.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Link to Github Repository.

References

Ahn, L.V.: Offensive/profane word list, useful resources (2009). https://www.cs.cmu.edu/~biglou/resources/
Davidson, T., Warmsley, D., Macy, M.W., Weber, I.: Automated hate speech detection and the problem of offensive language. CoRR abs/1703.04009 (2017). http://arxiv.org/abs/1703.04009
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Doostmohammadi, E., Sameti, H., Saffar, A.: Ghmerti at SemEval-2019 task 6: a deep word- and character-based approach to offensive language identification. In: SemEval@NAACL-HLT (2019)
Google Scholar
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4) (2018). https://doi.org/10.1145/3232676
Gambäck, B., Sikdar, U.K.: Using convolutional neural networks to classify hate-speech. In: Proceedings of the First Workshop on Abusive Language Online, Vancouver, BC, Canada, pp. 85–90. Association for Computational Linguistics, August 2017. https://doi.org/10.18653/v1/W17-3013, https://www.aclweb.org/anthology/W17-3013
Hao, Y., Dong, L., Wei, F., Xu, K.: Visualizing and understanding the effectiveness of BERT. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019, pp. 4141–4150. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1424
Hutto, C., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text, January 2015
Google Scholar
Kebriaei, E., Karimi, S., Sabri, N., Shakery, A.: Emad at SemEval-2019 task 6: offensive language identification using traditional machine learning and deep learning approaches. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 600–603, Minneapolis, Minnesota, USA. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/S19-2107, https://www.aclweb.org/anthology/S19-2107
Malmasi, S., Zampieri, M.: Detecting hate speech in social media. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria, pp. 467–472. INCOMA Ltd., September 2017. https://doi.org/10.26615/978-954-452-049-6_062
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3982–3992. Association for Computational Linguistics, November 2019. https://doi.org/10.18653/v1/D19-1410, https://www.aclweb.org/anthology/D19-1410
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Valencia, Spain, pp. 1–10. Association for Computational Linguistics, April 2017. https://doi.org/10.18653/v1/W17-1101, https://www.aclweb.org/anthology/W17-1101
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008). http://jmlr.org/papers/v9/vandermaaten08a.html
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: SemEval-2019 task 6: identifying and categorizing offensive language in social media (OffensEval). In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, Minnesota, USA, pp. 75–86. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/S19-2010, https://www.aclweb.org/anthology/S19-2010
Zhang, Z., Robinson, D., Tepper, J.: Detecting hate speech on twitter using a convolution-GRU based deep neural network. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 745–760. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_48
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Indraprastha Institute of Information Technology, Delhi, India
Ashwin Singh & Rudraroop Ray

Authors

Ashwin Singh
View author publications
You can also search for this author in PubMed Google Scholar
Rudraroop Ray
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashwin Singh .

Editor information

Editors and Affiliations

IIIT Delhi, New Delhi, India
Tanmoy Chakraborty
Illinois Institute of Technology, Chicago, IL, USA
Kai Shu
Arizona State University, Tempe, AZ, USA
H. Russell Bernard
Arizona State University, Tempe, AZ, USA
Huan Liu
IIIT Delhi, New Delhi, India
Md Shad Akhtar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Ray, R. (2021). Identifying Offensive Content in Social Media Posts. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds) Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-73696-5_1
Published: 09 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73695-8
Online ISBN: 978-3-030-73696-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics