Abstract
The existing multi-modal knowledge graph construction techniques have become mature for processing text modal data, but lack effective processing methods for other modal data such as visual modal data. Therefore, the focus of multi-modal knowledge graph construction lies in image and image and text fusion processing. At present, the construction of multi-modal knowledge graph often does not filter the image quality, and there are noises and similar repetitive images in the image set. To solve this problem, this paper studies the quality control and screening of images in the construction process of multi-modal knowledge graph, and proposes an image refining framework of multi-modal knowledge graph, which is divided into three modules. The final experiment proves that this framework can provide higher quality images for multi-modal knowledge graphs, and in the benchmark task of multi-modal entity alignment, the effect of entity alignment based on the multi-modal knowledge graphs constructed in this paper has been improved compared with previous models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anant, R., Sunita, J., Jalal, A.S., Manoj, K.: A density based algorithm for discovering density varied clusters in large spatial databases. International Journal of Computer Applications 3(6) (2010)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data: The story so far. International Journal on Semantic Web and Information Systems 5, 1ā22 (07 2009). https://doi.org/10.4018/jswis.2009081901
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248ā255. Ieee (2009)
Ferrada, S., Bustos, B., Hogan, A.: Imgpedia: a linked dataset with content-based analysis of wikimedia images. In: International Semantic Web Conference. pp. 84ā93. Springer (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770ā778 (2016)
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision 123(1), 32ā73 (2017)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al.: Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web 6(2), 167ā195 (2015)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., DollĆ”r, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision. pp. 740ā755. Springer (2014)
Liu, Y., Li, H., Garcia-Duran, A., Niepert, M., Onoro-Rubio, D., Rosenblum, D.S.: Mmkg: multi-modal knowledge graphs. In: European Semantic Web Conference. pp. 459ā474. Springer (2019)
Mousselly-Sergieh, H., Botschen, T., Gurevych, I., Roth, S.: A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. pp. 225ā234. Association for Computational Linguistics (Jun 2018). https://doi.org/10.18653/v1/S18-2027
Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE international conference on computer vision. pp. 2641ā2649 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015)
VrandeÄiÄ, D., Krƶtzsch, M.: Wikidata: a free collaborative knowledgebase. Communications of the ACM 57(10), 78ā85 (2014)
Wang, H., Zhang, Y., Ji, Z., Pang, Y., Ma, L.: Consensus-aware visual-semantic embedding for image-text matching. In: European Conference on Computer Vision. pp. 18ā34. Springer (2020)
Wang, M., Qi, G., Wang, H., Zheng, Q.: Richpedia: a comprehensive multi-modal knowledge graph. In: Joint International Semantic Technology Conference. pp. 130ā145. Springer (2020)
Wang, Z., Lv, Q., Lan, X., Zhang, Y.: Cross-lingual knowledge graph alignment via graph convolutional networks. In: Proceedings of the 2018 conference on empirical methods in natural language processing. pp. 349ā357 (2018)
Xie, R., Liu, Z., Luan, H., Sun, M.: Image-embodied knowledge representation learning. In: IJCAI. pp. 3140ā3146 (2017). https://doi.org/10.24963/ijcai.2017/438
Xueyao, J., Weichen, L., Jingping, L., Zhixu, L., Yanghua, X.: Entity image collection based on multi-modality pattern transfer(). Computer Engineer 48(08) (2022). https://doi.org/10.19678/j.issn.1000-3428.0064039
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Peng, H., Xu, H., Tang, J., Wu, J., Huang, H. (2023). Effectively Filtering Images forĀ Better Multi-modal Knowledge Graph. In: Yang, S., Islam, S. (eds) Web and Big Data. APWeb-WAIM 2022 International Workshops. APWeb-WAIM 2022. Communications in Computer and Information Science, vol 1784. Springer, Singapore. https://doi.org/10.1007/978-981-99-1354-1_2
Download citation
DOI: https://doi.org/10.1007/978-981-99-1354-1_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1353-4
Online ISBN: 978-981-99-1354-1
eBook Packages: Computer ScienceComputer Science (R0)