Skip to main content

Identifying Offensive Content in Social Media Posts

  • Conference paper
  • First Online:
Combating Online Hostile Posts in Regional Languages during Emergency Situation (CONSTRAINT 2021)

Abstract

The identification of offensive language on social media has been a widely studied problem in recent years owing to the volume of data generated by these platforms and its consequences. In this paper, we present the results of our experiments on the OLID dataset from the OffensEval shared from SemEval 2019. We use both traditional machine learning methods and state of the art transformer models like BERT to set a baseline for our experiments. Following this, we propose the use of fine-tuning Distilled Bert using both OLID and an additional hate speech and offensive language dataset. Then, we evaluate our model on the test set, yielding a macro f1 score of 78.8.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Link to Github Repository.

References

  1. Ahn, L.V.: Offensive/profane word list, useful resources (2009). https://www.cs.cmu.edu/~biglou/resources/

  2. Davidson, T., Warmsley, D., Macy, M.W., Weber, I.: Automated hate speech detection and the problem of offensive language. CoRR abs/1703.04009 (2017). http://arxiv.org/abs/1703.04009

  3. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

  4. Doostmohammadi, E., Sameti, H., Saffar, A.: Ghmerti at SemEval-2019 task 6: a deep word- and character-based approach to offensive language identification. In: SemEval@NAACL-HLT (2019)

    Google Scholar 

  5. Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4) (2018). https://doi.org/10.1145/3232676

  6. Gambäck, B., Sikdar, U.K.: Using convolutional neural networks to classify hate-speech. In: Proceedings of the First Workshop on Abusive Language Online, Vancouver, BC, Canada, pp. 85–90. Association for Computational Linguistics, August 2017. https://doi.org/10.18653/v1/W17-3013, https://www.aclweb.org/anthology/W17-3013

  7. Hao, Y., Dong, L., Wei, F., Xu, K.: Visualizing and understanding the effectiveness of BERT. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019, pp. 4141–4150. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1424

  8. Hutto, C., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text, January 2015

    Google Scholar 

  9. Kebriaei, E., Karimi, S., Sabri, N., Shakery, A.: Emad at SemEval-2019 task 6: offensive language identification using traditional machine learning and deep learning approaches. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 600–603, Minneapolis, Minnesota, USA. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/S19-2107, https://www.aclweb.org/anthology/S19-2107

  10. Malmasi, S., Zampieri, M.: Detecting hate speech in social media. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria, pp. 467–472. INCOMA Ltd., September 2017. https://doi.org/10.26615/978-954-452-049-6_062

  11. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3982–3992. Association for Computational Linguistics, November 2019. https://doi.org/10.18653/v1/D19-1410, https://www.aclweb.org/anthology/D19-1410

  12. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Valencia, Spain, pp. 1–10. Association for Computational Linguistics, April 2017. https://doi.org/10.18653/v1/W17-1101, https://www.aclweb.org/anthology/W17-1101

  13. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008). http://jmlr.org/papers/v9/vandermaaten08a.html

  14. Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: SemEval-2019 task 6: identifying and categorizing offensive language in social media (OffensEval). In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, Minnesota, USA, pp. 75–86. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/S19-2010, https://www.aclweb.org/anthology/S19-2010

  15. Zhang, Z., Robinson, D., Tepper, J.: Detecting hate speech on twitter using a convolution-GRU based deep neural network. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 745–760. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_48

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashwin Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A., Ray, R. (2021). Identifying Offensive Content in Social Media Posts. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds) Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73696-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73695-8

  • Online ISBN: 978-3-030-73696-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics