CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features

Asgari-Chenaghlu, Meysam; Feizi-Derakhshi, M. Reza; Farzinvash, Leili; Balafar, M. A.; Motamed, Cina

doi:10.1007/s00521-021-06488-4

CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features

Original Article
Published: 15 September 2021

Volume 34, pages 1905–1922, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Meysam Asgari-Chenaghlu ORCID: orcid.org/0000-0002-7892-9675¹,
M. Reza Feizi-Derakhshi¹,
Leili Farzinvash¹,
M. A. Balafar¹ &
…
Cina Motamed²

1224 Accesses
16 Citations
4 Altmetric
Explore all metrics

Abstract

Named entity recognition (NER) from social media posts is a challenging task. User-generated content that forms the nature of social media is noisy and contains grammatical and linguistic errors. This noisy content makes tasks such as NER much harder. We propose two novel deep learning approaches utilizing multimodal deep learning and transformers. Both of our approaches use image features from short social media posts to provide better results on the NER task. On the first approach, we extract image features using InceptionV3 and use fusion to combine textual and image features. This approach presents more reliable name entity recognition when the images related to the entities are provided by the user. On the second approach, we use image features combined with text and feed it into a BERT-like transformer. The experimental results using precision, recall, and F1 score metrics show the superiority of our work compared to other state-of-the-art NER solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

Sentiment analysis using deep learning architectures: a review

Article 02 December 2019

Notes

https://developer.twitter.com/en/docs/counting-characters.
Multimodal Named Entity Recognizer.
A multimedia messaging application.
6 billion tokens with 200-dimensional word vectors, available at: http://nlp.stanford.edu/data/glove.6B.zip.
16 billion tokens with 300-dimensional word vectors, available at: https://dl.fbaipublicfiles.com/fastText/vectors-english/wiki-news-300d-1M.vec.zip.
InceptionV3 pretrained model on ImageNet, available at: https://keras.io/applications/#inceptionv3.
https://github.com/google-research/bert.
BERT-Tiny: https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-2_H-128_A-2.zip.
BERT-Small: https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-4_H-512_A-8.zip.

References

Twitter. About Twitter, Inc, (2014). ISSN 01962892
Osborne M, Lavrenko V, Petrovic S (2010) Streaming first story detection with application to Twitter. Comput Linguist ISSN 1095-6859. https://doi.org/10.1016/j.ygyno.2008.10.024
Panem S, Gupta M, Varma V (2014) Structured information extraction from natural disaster events on twitter. In Proceedings of the 5th international workshop on web-scale knowledge representation retrieval & reasoning, pp 1–8
Li C, Weng J, He Q, Yao Y, Datta A, Sun A, Lee B-S (2012) Twiner: named entity recognition in targeted twitter stream. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp 721–730
Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Invest 30(1):3–26
Article Google Scholar
Efthymios K, Theresa W, Johanna M (2011) Twitter sentiment analysis: the good the bad and the omg! In Proceedings of the international AAAI conference on web and social media, vol 5,
Singh T, Kumari M (2016) Role of text pre-processing in twitter sentiment analysis. Proc Comput Sci 89:549–554
Article Google Scholar
Clark E, Araki K (2011) Text normalization in social media: progress, problems and applications for a pre-processing system of casual English. Proc-Soc Behav Sci 27:2–11
Article Google Scholar
Atefeh F, Khreich W (2015) A survey of techniques for event detection in twitter. Comput Intell 31(1):132–164
Article MathSciNet Google Scholar
Firoj A, Ferda O, Muhammad I (2018) Crisismmd: Multimodal twitter datasets from natural disasters. In Proceedings of the international AAAI conference on web and social media, vol 12
Qi Z, Jinlan F, Xiaoyu L, Xuanjing H (2018) Adaptive co-attention network for named entity recognition in tweets. AAAI, ISSN 0028-0836. https://doi.org/10.1001/jamapsychiatry.2014.1105
Ritter A, Clark S, Etzioni M, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In Proceedings of the conference on empirical methods in natural language processing(EMNLP’11), 2011. ISBN 978-1-937284-11-4. https://doi.org/10.1075/li.30.1.03nad
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), ISBN 9781937284961. https://doi.org/10.3115/v1/D14-1162
Armand J, Edouard G, Piotr B, Tomas M (2016) Bag of tricks for efficient text classification. arXiv:1607.01759
Piotr B, Edouard G, Armand J, Tomas M (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Article Google Scholar
Tomas M, Kai C, Greg C, Jeffrey D (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Sharnagat R (2014) Named entity recognition literature survey. In 11305R013
Li C, Sun A, Weng J, He Q (2015a) Tweet segmentation and its application to named entity recognition. IEEE Trans Knowl Data Eng 27(2):558–570. https://doi.org/10.1109/TKDE.2014.2327042
Article Google Scholar
Sang EF, Veenstra J (1999) Representing text chunks. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pp 173–179. Association for Computational Linguistics
Li K, Ai W, Tang Z, Zhang F, Jiang L, Li K, Hwang K (2015b) Hadoop recognition of biomedical named entity using conditional random fields. IEEE Trans Parallel Distrib Syst 26(11):3040–3051. https://doi.org/10.1109/TPDS.2014.2368568
Article Google Scholar
Wei C, Leaman R, Lu Z (2015) Simconcept: a hybrid approach for simplifying composite named entities in biomedical text. IEEE J Biomed Health Inform 19(4):1385–1391. https://doi.org/10.1109/JBHI.2015.2422651
Article Google Scholar
Li J, Sun A, Han J, Li C (2018) A survey on deep learning for named entity recognition. arXiv:1812.09449
Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, 2014. ISBN 9781941643006. https://doi.org/10.3115/v1/P14-5010
Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv:1508.01991
Stanislawek T, Wróblewska A, Wójcika A, Ziembicki D, Biecek P (2019) Named entity recognition–is there a glass ceiling? arXiv:1910.02403
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics - ACL ’05, ISBN 3-540-63438-X. https://doi.org/10.3115/1219840.1219885
Collins M, Singer Y (1999) Unsupervised models for named entity classification. Proceedings of EMNLP/VLC-99 10.1.1.114.3629
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Ma X, EHovy X (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv:1603.01354
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pp 5754–5764
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations. arXiv:1909.11942
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2019) Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv:1910.10683
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2020) Deep learning based text classification: a comprehensive review. arXiv:2004.03705
Shibata Y, Kida T, Fukamachi S, Takeda M, Shinohara A, Shinohara T, Arikawa S (1999) Byte pair encoding: a text compression scheme that accelerates pattern matching. Technical report, Technical Report DOI-TR-161, Department of Informatics, Kyushu University
Sennric R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv:1508.07909
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025
Arkhipov M, Trofimova M, Kuratov Y, Sorokin A (2019) Tuning multilingual transformers for named entity recognition on slavic languages. BSNLP–2019
Bernal EA, Yang X, Li Q, Kumar J, Madhvanath S, Ramesh P, Bala R (2018) Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors. IEEE Trans Multimed 20(1):107–118. https://doi.org/10.1109/TMM.2017.2726187
Article Google Scholar
Wang D, Cui P, Ou M, Zhu W (2015) Learning compact hash codes for multimodal representations using orthogonal deep structure. IEEE Trans Multimed 17(9):1404–1416. https://doi.org/10.1109/TMM.2015.2455415
Article Google Scholar
Ding C, Tao D (2015) Robust face recognition via multimodal deep face representation. IEEE Trans Multimed 17(11):2049–2058. https://doi.org/10.1109/TMM.2015.2477042
Article Google Scholar
Chen F, Ji R, Su J, Cao D, Gao Y (2018) Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans Multimed 20(4):997–1007. https://doi.org/10.1109/TMM.2017.2757769
Article Google Scholar
Li H, Sun J, Xu Z, Chen L (2017) Multimodal 2d+3d facial expression recognition with deep fusion convolutional neural network. IEEE Trans Multimed 19(12):2816–2831. https://doi.org/10.1109/TMM.2017.2713408
Article Google Scholar
Pang L, Zhu S, Ngo C (2015) Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimed 17(11):2008–2020. https://doi.org/10.1109/TMM.2015.2482228
Article Google Scholar
Jiang Y, Wu Z, Tang J, Li Z, Xue X, Chang S (2018) Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Trans Multimed 20(11):3137–3147. https://doi.org/10.1109/TMM.2018.2823900
Article Google Scholar
Shi J, Zheng X, Li Y, Zhang Q, Ying S (2018) Multimodal neuroimaging feature learning with multimodal stacked deep polynomial networks for diagnosis of Aalzheimer’s disease. IEEE J Biomed Health Informat 22(1):173–183. https://doi.org/10.1109/JBHI.2017.2655720
Article Google Scholar
Ramachandram D, Taylor GW (2017) Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Magaz 34(6):96–108. https://doi.org/10.1109/MSP.2017.2738401
Article Google Scholar
Moon S, Neves L, Carvalho V (2018a) Multimodal named entity disambiguation for noisy social media posts. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). https://doi.org/10.3322/caac.21166
Liu K, Li Y, Xu N, Natarajan P (2018) Learn to combine modalities in multimodal deep learning. arXiv:1805.11730
Beinborn L, Botschen T, Gurevych I (2018) Multimodal grounding for language processing. arXiv:1806.06371
Ngiam J, Khosla A Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11), pp 689–696
Ebrahimi Kahou S, Michalski V, Konda K, Memisevic R, Pal C (2015) Recurrent neural networks for emotion recognition in video. In Proceedings of the 2015 ACM on international conference on multimodal interaction–ICMI ’15. ISBN 9781450339124. https://doi.org/10.1145/2818346.2830596
Liu W, Zheng WL, Lu BL (2016) Emotion recognition using multimodal deep learning. In Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). ISBN 9783319466712. https://doi.org/10.1007/978-3-319-46672-9_58
Ebrahimi Kahou S, Bouthillier X, Lamblin P, Gulcehre C, Michalski V, Konda K, Jean S, Froumenty P, Dauphin Y, Boulanger-Lewandowski N, Chandias Ferrari R, Mirza M, Warde-Farley D, Courville A, Vincent P, Memisevic R, Pal C, Bengio Y (2016) EmoNets: multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces. https://doi.org/10.1007/s12193-015-0195-2
Article Google Scholar
Suk HI, Lee SW, Shen D (2014) Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage. https://doi.org/10.1016/j.neuroimage.2014.06.077
Article Google Scholar
Cheng X, Zhang L, Zheng Y (2018) Deep similarity learning for multimodal medical images. Comput Methods Biomech Biomed Eng Imag Vis. https://doi.org/10.1080/21681163.2015.1135299
Article Google Scholar
Di W, Pigou L, Kindermans PJ, Le NDH, Shao L, Dambre J, Odobez JM (2016) Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2537340
Article Google Scholar
Moon S, Neves L, Carvalho V (2018b) Multimodal named entity recognition for short social media posts. arXiv:1802.07862
Diego E, Rafael P, Jens L, Giulio N (2018) Named entity recognition in twitter using images and text. In Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISBN 9783319744322. https://doi.org/10.1007/978-3-319-74433-9_17
Passos A, Kumar V, McCallum A (2014) Lexicon infused phrase embeddings for named entity resolution. arXiv:1404.5367
Chiu JPC, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguist 4:357–370
Article Google Scholar
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. arXiv:1603.01360
Timothy B, de Marie-Catherine M, Bo H, Young-Bum K, Alan R, Xu W (2015) Shared tasks of the 2015 workshop on noisy user-generated text: twitter lexical normalization and named entity recognition. In Proceedings of the workshop on noisy user-generated text, pp 126–135
Gustavo A, Suraj M, Pastor López MA, Thamar S(2017) A multi-task approach for named entity recognition in social media data. In Proceedings of the 3rd workshop on noisy usergenerated text, pp 148–153
Strubell E, Verga P, Belanger D, McCallum A (2017) Fast and accurate entity recognition with iterated dilated convolutions. arXiv:1702.02098
Choi H, Cho K, Bengio Y (2018) Fine-grained attention mechanism for neural machine translation. Neurocomputing, ISSN 18728286. https://doi.org/10.1016/j.neucom.2018.01.007
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Gomez AN, Zhang I, Swersky K, Gal Y, Hinton GE (2019) Learning sparse networks using targeted dropout. arXiv:abs/1905.13678
Wu Y, Kaiming H (2018) Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp 3–19
Rodrigues W (2019) Sinerelu–an alternative to the relu activation function. https://medium.com/@wilder.rodrigues/sinerelu-an-alternative-to-the-relu-activation-function-e46a6199997d
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Deng J, Dong W, Socher R, Li L, Kai Li, Li Fei-Fei (2009) Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Sang EF, De Meulder F (2003) Introduction to the conll-2003 shared task: language-independent named entity recognition. arXiv:cs/0306050

Download references

Acknowledgements

The authors would like to thank Shervin Minaee for reviewing this work, and providing very useful comments to improve this work.

Author information

Authors and Affiliations

Department of Computer Engineering, University of Tabriz, Tabriz, Iran
Meysam Asgari-Chenaghlu, M. Reza Feizi-Derakhshi, Leili Farzinvash & M. A. Balafar
University of Orleans, Orleans, France
Cina Motamed

Authors

Meysam Asgari-Chenaghlu
View author publications
You can also search for this author in PubMed Google Scholar
M. Reza Feizi-Derakhshi
View author publications
You can also search for this author in PubMed Google Scholar
Leili Farzinvash
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Balafar
View author publications
You can also search for this author in PubMed Google Scholar
Cina Motamed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meysam Asgari-Chenaghlu.

Ethics declarations

Conflict of interest

The authors certify that there is no actual or potential conflict of interest in relation to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asgari-Chenaghlu, M., Feizi-Derakhshi, M.R., Farzinvash, L. et al. CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features. Neural Comput & Applic 34, 1905–1922 (2022). https://doi.org/10.1007/s00521-021-06488-4

Download citation

Received: 19 September 2020
Accepted: 30 August 2021
Published: 15 September 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00521-021-06488-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features

Abstract

Access this article

Similar content being viewed by others

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Impact of word embedding models on text analytics in deep learning environment: a review

Sentiment analysis using deep learning architectures: a review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CWI: A multimodal deep learning approach for named entity recognition from social media using character, word and image features

Abstract

Access this article

Similar content being viewed by others

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Impact of word embedding models on text analytics in deep learning environment: a review

Sentiment analysis using deep learning architectures: a review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation