Cross-Media and Multilingual Image Understanding Method Based on Attention Mechanism

Gao, Lingchao; Chen, Zhenyu; Wang, Lutao; Li, Bo; Nie, Ling; Zheng, Fei

doi:10.1007/978-3-030-81007-8_90

Lingchao Gao³,
Zhenyu Chen³,
Lutao Wang³,
Bo Li³,
Ling Nie⁴ &
…
Fei Zheng⁴

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 80))

Included in the following conference series:

International Conference on Intelligent Automation and Soft Computing

2148 Accesses

Abstract

Different types of media data are represented by the underlying features of different dimensions and different attributes, resulting in heterogeneity and incomparability; on the other hand, in order to use the underlying feature information extracted from cross-media, additional information must be obtained from different styles. There is a compromise between the difference and the semantic ambiguity obtained only from the bottom layer. These problems lead to traditional feature learning methods not suitable for cross-media analysis. This paper proposes a neural network structure that combines feature extraction and context semantics and introduces a new attention mechanism. Tests on experimental data show that the network structure proposed in this paper can obtain a better BLEU score than traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
May, P., Ehrlich, H.-C., Steinke, T.: ZIB structure prediction pipeline: composing a complex biological workflow through web services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006). https://doi.org/10.1007/11823285_121
Chapter Google Scholar
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid information services for distributed resource sharing. In: 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–184. IEEE Press, New York (2001)
Google Scholar
Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid: an Open Grid Services Architecture for Distributed Systems Integration. Technical report, Global Grid Forum (2002)
Google Scholar
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2015). https://doi.org/10.1007/s11263-015-0823-z
Article MathSciNet Google Scholar
Dollar, P., et al.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Article Google Scholar
Gupta, A., et al.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Transactions on Multimedia, vol. 20, no. 11, pp. 3111–3122 (2018)
Google Scholar
Shi, B., et al.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Li, H., et al.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5248–5256 (2017)
Google Scholar
Busta, M., et al.: Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2223–2231 (2017)
Google Scholar
Patel, Y., et al.: E2E-MLT an Unconstrained End-to-End Method for Multi-Language Scene Text. arXiv Preprint ArXiv: 1801.09919 (2018)
Kojima, A., et al.: Generating natural language description of human behavior from video images. In: Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, vol. 4, pp. 728–731 (2000)
Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Zhao, B., et al.: CAM-RNN: co-attention model based RNN for video captioning. IEEE Trans. Image Process. 28(11), 5552–5565 (2019)
Article MathSciNet Google Scholar
Park, J., et al.: A study of evaluation metrics and datasets for video captioning. In: 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) (2017)
Google Scholar
Bin, Y., et al.: Describing video with attention-based bidirectional LSTM. IEEE Trans. Syst. Man Cybern. 49(7), 2631–2641 (2019)
Google Scholar
Krishna, R., et al.: Dense-captioning events in videos. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 706–715 (2017)
Google Scholar
Shen, Z., et al.: Weakly supervised dense video captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5159–5167 (2017)
Google Scholar
Guadarrama, S., et al.: YouTube2Text: recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In: 2013 IEEE International Conference on Computer Vision, pp. 2712–2719 (2013)
Google Scholar
Gao, X., Hoi, S.C., Zhang, Y., et al.: SOML: sparse online metric learning with application to image retrieval. In: AAAI, pp. 1206–1212 (2014)
Google Scholar
Zhang, Y., Gao, X., Chen, Z., et al.: Learning passive-aggressive correlation filter for long-term and short-term visual tracking. J. Electr. Imaging 28(06), 063017 (2019)
Article Google Scholar
Xia, Z., Hong, X., Gao, X., et al.: Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans. Multimed. 22(3), 626–640 (2020)
Article Google Scholar
Zhang, Y., Gao, X., Chen, Z., et al.: Mining spatial-temporal similarity for visual tracking. IEEE Trans. Image Process. 29, 8107–8119 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Big Data Center, State Grid Corporation, Beijing, 100052, China
Lingchao Gao, Zhenyu Chen, Lutao Wang & Bo Li
Beijing China-Power Information Technology Co., Ltd., Beijing, 100089, China
Ling Nie & Fei Zheng

Authors

Lingchao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lutao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Ling Nie
View author publications
You can also search for this author in PubMed Google Scholar
Fei Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electronics and Computer Engineering Technology, Indiana State University, Terre Haute, IN, USA
Xiaolong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, L., Chen, Z., Wang, L., Li, B., Nie, L., Zheng, F. (2022). Cross-Media and Multilingual Image Understanding Method Based on Attention Mechanism. In: Li, X. (eds) Advances in Intelligent Automation and Soft Computing. IASC 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 80. Springer, Cham. https://doi.org/10.1007/978-3-030-81007-8_90

Download citation

DOI: https://doi.org/10.1007/978-3-030-81007-8_90
Published: 25 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81006-1
Online ISBN: 978-3-030-81007-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics