Skip to main content

Classification of Hateful Memes Using Multimodal Models

  • 274 Accesses

Part of the Algorithms for Intelligent Systems book series (AIS)

Abstract

Nowadays memes have become popular and are a convenient medium for internet communication. They can spread ideas over the internet and influence people in no time. Most memes are humorous, but some of them may also contain hatred which in some way or the other can be offensive to some people. Hence, an algorithm that identifies offensive memes on the social media platform to avoid spreading hate is required. The classification into hateful or non-hateful memes and analysis of such memes is currently an active domain for researchers. A Meme can be hateful when read along with the image and text combined, which implies that the meme may not be hateful if one pays attention to only the image or text in the meme. Therefore, there is a need for a multimodal approach that understands the relativeness of visual and language information present in the meme. Our work in this paper focuses on the multimodal classification of hate memes using fusion techniques. We have used the dataset provided by Facebook AI for its hateful memes challenge. We use the early fusion technique to combine the image and text modality to build a classifier for this project. For fusion, we have used the baseline models for classification of both image and text that are Inception v3 and BERT, respectively. And we were able to achieve 0.79 as AUC score with 63.3 percent of model accuracy.

Keywords

  • Multimodal classification
  • Deep learning
  • Fusion
  • Memes
  • Social media
  • Inception V3
  • BERT
  • Visual and linguistic

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-16-6460-1_13
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   229.00
Price excludes VAT (USA)
  • ISBN: 978-981-16-6460-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   299.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Zannettou S, Caulfieldz T, Blackburny J, De Cristofaroz E, Sirivianos M, Stringhiniz G, Suarez G (2016) On the origins of memes by means of fringe web communities

    Google Scholar 

  2. Das A, Wahi JS, Li S (2020) Detecting hate speech in multi-modal memes

    Google Scholar 

  3. Kiela D, Firooz H, Mohan A, Goswami V, Singh A, Ringshia P, Testuggine D (2020) The hateful memes challenge: detecting hate speech in multimodal memes

    Google Scholar 

  4. Suryawanshi S, Chakravarthi BR, Arcan M, Buitelaar P (2020) Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text

    Google Scholar 

  5. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn

    Google Scholar 

  6. Gu J, Kuen J, Joty S, Cai J, Morariu V, Zhao H, Sun T (2020) Selfsupervised relationship probing. NeurIPS

    Google Scholar 

  7. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) VQA: visual question answering. In: Proceedings of ICCV

    Google Scholar 

  8. Goyal Y, Khot T, Summers-Stay D, Batra D, Parikh D (2017) Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of CVPR

    Google Scholar 

  9. Chen X, Fang H, Lin T-Y, Vedantam R, Gupta S, Doll'ar P, Zitnick CL (2015) Microsoft coco captions: data collection and evaluation server. arXiv preprint. arXiv:1504.00325

  10. Gu J, Joty S, Cai J, Zhao H, Yang X, Wang G (2019) Unpaired image captioning via scene graph alignments. In: Proceedings of ICCV

    Google Scholar 

  11. Gu J, Joty S, Cai J, Wang G (2018) Unpaired image captioning by language pivoting. In: Proceedings of ECCV

    Google Scholar 

  12. Gao J, Zhou Y, Yu PLH, Gu J (2020) Unsupervised cross-lingual image captioning. arXiv preprint. arXiv:2010.01288

  13. Gu J, Wang G, Cai J, Chen T (2017) An empirical study of language CNN for image captioning. In: Proceedings of ICCV

    Google Scholar 

  14. Suryawanshi S, Raja Chakavarthi B, Arcan M, Buitelaar P (2020) Multimodal memedataset (Multioff) for identifying offensive content in image and text. In: Proceedings of the second workshop on trolling, aggression and cyberbullying, language resources and evaluation conference, Mareseille, pp 32–41

    Google Scholar 

  15. Kiela D, Bhooshan S, Firooz H, Testuggine D (2019) Supervised multimodal bi-transformers for classifying images and text

    Google Scholar 

  16. Zhang C, Yang Z, He X, Deng L (2019) Multimodal intelligence: representation learning, information fusion, and applications

    Google Scholar 

  17. Peters M et al (2018) Deep contextualized word representations. Presented at the conference of the North American chapter of the Association for computational linguistics, New Orleans, LA, USA

    Google Scholar 

  18. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improvinglanguage understanding by generative pretraining [Online]

    Google Scholar 

  19. Devlin J,Chang M-W, Lee K, Toutanova K (2019) BERT: pretraining of deep bidirectional transformers for language understanding. Presented at the conference of the North American chapter of the Association for computational linguistics: human language technologies, Minneapolis, MN, USA

    Google Scholar 

  20. Keswani V, Singh S, Agarwal S, Modi A (2020) IITK at SemEval-2020 Task 8: “Unimodal and Bimodal Sentiment Analysis of Internet Memes”, 21 July 2020

    Google Scholar 

  21. Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language

    Google Scholar 

  22. Ishikawa S, Laaksonen J (2016) Comparing and combining unimodal methods for multimodal recognition

    Google Scholar 

  23. Zhong H, Li H, Squicciarini A, Rajtmajer S, Griffin C, Miller D, Caragea C (2016) Content-driven detection of cyberbullying on the Instagram social network

    Google Scholar 

  24. Deepa R, Lalwani KN (2019) Image classification and text extraction using machine learning

    Google Scholar 

  25. Sabat BO, Ferrer CC, Giro-i-Nieto X (2019) Hate speech in pixels: detection of offensive memes towards automatic moderation

    Google Scholar 

  26. Gomez R, Gibert J, Gomez L, Karatzas D (2020) Exploring hate speech detection in multimodal publications

    Google Scholar 

  27. Gallo I, Calefati A, Nawas S (2017) Multimodal classification fusion in real-world scenarios

    Google Scholar 

  28. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pretraining of deep bidirectional transformers for language understanding

    Google Scholar 

  29. Lu J, Batra D, Parikh D, Lee S (2019) Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks

    Google Scholar 

  30. Li LH, Yatskar M, Yin D, Hsieh C-J, Chang K-W (2019) Visualbert: a simple and performant baseline for vision and language

    Google Scholar 

  31. Sharma P, Ding N, Goodman S, Soricut R (2018) Conceptual captions: a cleaned, hypernym, image alt-text dataset for automatic image captioning

    Google Scholar 

  32. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context

    Google Scholar 

  33. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space Odyssey

    Google Scholar 

  34. Anhar R, Adji TB, Akhmad Setiawan N (2019) Question classification on question-answer system using bidirectional-LSTM. In: 2019 5th international conference on science and technology (ICST), Yogyakarta, Indonesia, pp 1–5. https://doi.org/10.1109/ICST47872.2019.9166190

  35. Sutskever l, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks

    Google Scholar 

  36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need

    Google Scholar 

  37. Devlin K (2019) BERT: pretraining of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the Association for computational linguistics: human language technologies, vol 1 (Long and Short Papers), pp 4171–4186). Association for Computational Linguistics

    Google Scholar 

  38. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision

    Google Scholar 

  39. O'Shea K, Nash R (2015) An introduction to convolutional neural networks

    Google Scholar 

  40. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions

    Google Scholar 

  41. Mervitz JH, de Villiers JP, Jacobs JP, Kloppers MHO (2020) Comparison of early and late fusion techniques for movie trailer genre labelling. In: 2020 IEEE 23rd international conference on information fusion (FUSION), Rustenburg, South Africa, pp 1–8. https://doi.org/10.23919/FUSION45008.2020.9190344

  42. Liu K, Li Y, Xu N, Natarajan P (2018) Learn to combine modalities in multimodal deep learning

    Google Scholar 

  43. Al-bayati JSH, Üstündağ BB (2020) Early and late fusion of deep convolutional neural networks and evolutionary feature optimization for plant leaf illness recognition

    Google Scholar 

  44. Yang G (2019) Exploring deep multimodal fusion of text and photo for hate speech classification. In: Proceedings of the third workshop on abusive language online. Association for Computational Linguistics, pp 11–18

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Singh, B., Upadhyay, N., Verma, S., Bhandari, S. (2022). Classification of Hateful Memes Using Multimodal Models. In: Jacob, I.J., Kolandapalayam Shanmugam, S., Bestak, R. (eds) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-6460-1_13

Download citation