Abstract
With the continuous popularization and development of the Internet, online shopping has gradually become people’s main consumption mode. This paper mainly studies the image retrieval task of goods in e-commerce websites. The main purpose of applying image retrieval in shopping websites is to enable users to search for the expected goods more conveniently and accurately in the massive commodity information. Given the traditional image retrieval process, only the image or text of goods is used as the retrieval object. The query results obtained do not take advantage of the information relevance and complementarity between text and images, which loses the retrieval advantages of goods. To solve the problem, this paper first designs an end-to-end supervised learning algorithm to project heterogeneous data into a common metric space and apply traditional indexing schemes in this space to achieve efficient image retrieval. Secondly, a fusion method is proposed to give better data and higher weight according to the semantic capture quality of input features. Finally, an objective function is proposed, which can correctly embed the fusion features of image and text into their respective feature space, make the fusion features of the same kind of image and text closer to each other, and separate the dissimilar features. The experimental results show that the average accuracy of this method on the test set of commodity data is 70%, which is about 6% higher than the image content-based and text-based image retrieval methods, which proves the effectiveness.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jo, Y., Wi, J., Kim, M., Lee, J.Y.: Flexible fashion product retrieval using multimodality-based deep learning. Appl. Sci. 10(5), 1569 (2020)
Hamiti, A., Hamiti, A.: A comparative study of text-based image retrieval and content-based image retrieval techniques. J. Capital Normal Univ. (Nat. Sci. Edn.) 33(4), 4 (2012)
Arevalo, J., Solorio, T., Montes-y-Gómez, M., et al.: Gated multimodal units for information fusion (2017)
Sun, J., Yuan, F.: Content-based image retrieval technology. Comput. Syst. Appl. 20(8), 5 (2011)
Hao, X., Zhang, G., Ma, S.: Deep learning. Int. J. Semant. Comput. 10(03), 417–439 (2016)
Lin, K., Yang, H.F., Liu, K.H., et al.: Rapid clothing retrieval via deep learning of binary codes and hierarchical search. In: ACM on International Conference on Multimedia Retrieval. ACM (2015)
Luo, Z.: Combined with the characteristics of different layers of convolutional neural network, package commodity retrieval is carried out. Comput. Appl. Softw. 35(1), 6 (2018)
Kiapour, M.H., Han, X., Lazebnik, S., et al.: Where to buy it: matching street clothing photos in online shops. In: IEEE International Conference on Computer Vision. IEEE (2015)
Huang, J., Feris, R.S., Chen, Q., et al.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: IEEE International Conference on Computer Vision. IEEE (2015)
Qi, W., Teney, D., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Visual Question answering: a survey of methods and datasets. Comput. Vis. Image Underst. 163, 21–40 (2017). https://doi.org/10.1016/j.cviu.2017.05.001
Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. Litoral Revista De La Poesía Y El Pensamiento, 2953–2961 (2015)
Fukui, A., Park, D.H., Yang, D., et al.: Multimodal compact bilinear pooling for visual question answering and visual grounding (2016)
Zahavy, T., Magnani, A., Krishnan, A., et al.: Is a picture worth a thousand words? A deep multimodal fusion architecture for product classification in e-commerce (2016)
Gallo, I., Calefati, A., Nawaz, S., et al.: Image and encoded text fusion for multimodal classification. In: 2018 Digital Image Computing: Techniques and Applications (DICTA) (2018)
Kiela, D., Bottou, L.: Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
Guo, X., Wu, H., Cheng, Y., et al.: Dialog-based interactive image retrieval. arXiv preprint arXiv:1805.00145 (2018)
Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, pp. 1160–1169 (2017)
Kiela, D., Grave, E., Joulin, A., et al.: Efficient large-scale multi-modal classification (2018)
Anwaar, M.U., Labintcev, E., Kleinsteuber, M.: Compositional learning of image-text query for image retrieval (2020)
Narayana, P., Pednekar, A., Krishnamoorthy, A., et al.: HUSE: hierarchical universal semantic embeddings (2019)
Wang, K., Yin, Q., Wei, W., et al.: A comprehensive survey on cross-modal retrieval (2016)
Wattenberg, M., Viégas, F., Johnson, I.: How to use t-SNE effectively. Distill 1(10), e2 (2016)
Ting, S., Guohua, G.: Image retrieval method for deep neural network. Int. J. Sig. Process. Image Process. Pattern Recogn. 9(7), 33–42 (2016). NADIA, ISSN 2005-4254 (Print); 2207-970X (Online). https://doi.org/10.14257/ijsip.2016.9.7.04
Bagri, N., Johari, P.K.: A comparative study on feature extraction using texture and shape for content-based image retrieval. Int. J. Adv. Sci. Technol. 80, 41–52 (2015). NADIA, ISSN 2005-4238 (Print); 2207-6360 (Online). https://doi.org/10.14257/ijast.2015.80.04
Yang, D., Grice, S.: Research on the design of E-commerce recommendation system. Int. J. Smart Bus. Technol. 6(1), 15–30 (2018). https://doi.org/10.21742/IJSBT.2018.6.1.02
Acknowledgment
This paper is supported by the Natural Science Foundation of Heilongjiang Province Project Funding (LH2021F036).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, H., Xu, J., Sun, H., Zhao, Z. (2022). Commodity Image Retrieval Based on Image and Text Data. In: Hassanien, A.E., Rizk, R.Y., Snášel, V., Abdel-Kader, R.F. (eds) The 8th International Conference on Advanced Machine Learning and Technologies and Applications (AMLTA2022). AMLTA 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-031-03918-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-03918-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-03917-1
Online ISBN: 978-3-031-03918-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)