Skip to main content

Commodity Image Retrieval Based on Image and Text Data

  • Conference paper
  • First Online:
  • 1070 Accesses

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 113))

Abstract

With the continuous popularization and development of the Internet, online shopping has gradually become people’s main consumption mode. This paper mainly studies the image retrieval task of goods in e-commerce websites. The main purpose of applying image retrieval in shopping websites is to enable users to search for the expected goods more conveniently and accurately in the massive commodity information. Given the traditional image retrieval process, only the image or text of goods is used as the retrieval object. The query results obtained do not take advantage of the information relevance and complementarity between text and images, which loses the retrieval advantages of goods. To solve the problem, this paper first designs an end-to-end supervised learning algorithm to project heterogeneous data into a common metric space and apply traditional indexing schemes in this space to achieve efficient image retrieval. Secondly, a fusion method is proposed to give better data and higher weight according to the semantic capture quality of input features. Finally, an objective function is proposed, which can correctly embed the fusion features of image and text into their respective feature space, make the fusion features of the same kind of image and text closer to each other, and separate the dissimilar features. The experimental results show that the average accuracy of this method on the test set of commodity data is 70%, which is about 6% higher than the image content-based and text-based image retrieval methods, which proves the effectiveness.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Jo, Y., Wi, J., Kim, M., Lee, J.Y.: Flexible fashion product retrieval using multimodality-based deep learning. Appl. Sci. 10(5), 1569 (2020)

    Article  Google Scholar 

  2. Hamiti, A., Hamiti, A.: A comparative study of text-based image retrieval and content-based image retrieval techniques. J. Capital Normal Univ. (Nat. Sci. Edn.) 33(4), 4 (2012)

    Google Scholar 

  3. Arevalo, J., Solorio, T., Montes-y-Gómez, M., et al.: Gated multimodal units for information fusion (2017)

    Google Scholar 

  4. Sun, J., Yuan, F.: Content-based image retrieval technology. Comput. Syst. Appl. 20(8), 5 (2011)

    Google Scholar 

  5. Hao, X., Zhang, G., Ma, S.: Deep learning. Int. J. Semant. Comput. 10(03), 417–439 (2016)

    Article  Google Scholar 

  6. Lin, K., Yang, H.F., Liu, K.H., et al.: Rapid clothing retrieval via deep learning of binary codes and hierarchical search. In: ACM on International Conference on Multimedia Retrieval. ACM (2015)

    Google Scholar 

  7. Luo, Z.: Combined with the characteristics of different layers of convolutional neural network, package commodity retrieval is carried out. Comput. Appl. Softw. 35(1), 6 (2018)

    Google Scholar 

  8. Kiapour, M.H., Han, X., Lazebnik, S., et al.: Where to buy it: matching street clothing photos in online shops. In: IEEE International Conference on Computer Vision. IEEE (2015)

    Google Scholar 

  9. Huang, J., Feris, R.S., Chen, Q., et al.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: IEEE International Conference on Computer Vision. IEEE (2015)

    Google Scholar 

  10. Qi, W., Teney, D., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Visual Question answering: a survey of methods and datasets. Comput. Vis. Image Underst. 163, 21–40 (2017). https://doi.org/10.1016/j.cviu.2017.05.001

    Article  Google Scholar 

  11. Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. Litoral Revista De La Poesía Y El Pensamiento, 2953–2961 (2015)

    Google Scholar 

  12. Fukui, A., Park, D.H., Yang, D., et al.: Multimodal compact bilinear pooling for visual question answering and visual grounding (2016)

    Google Scholar 

  13. Zahavy, T., Magnani, A., Krishnan, A., et al.: Is a picture worth a thousand words? A deep multimodal fusion architecture for product classification in e-commerce (2016)

    Google Scholar 

  14. Gallo, I., Calefati, A., Nawaz, S., et al.: Image and encoded text fusion for multimodal classification. In: 2018 Digital Image Computing: Techniques and Applications (DICTA) (2018)

    Google Scholar 

  15. Kiela, D., Bottou, L.: Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)

    Google Scholar 

  16. Guo, X., Wu, H., Cheng, Y., et al.: Dialog-based interactive image retrieval. arXiv preprint arXiv:1805.00145 (2018)

  17. Misra, I., Gupta, A., Hebert, M.: From red wine to red tomato: composition with context. In: IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, pp. 1160–1169 (2017)

    Google Scholar 

  18. Kiela, D., Grave, E., Joulin, A., et al.: Efficient large-scale multi-modal classification (2018)

    Google Scholar 

  19. Anwaar, M.U., Labintcev, E., Kleinsteuber, M.: Compositional learning of image-text query for image retrieval (2020)

    Google Scholar 

  20. Narayana, P., Pednekar, A., Krishnamoorthy, A., et al.: HUSE: hierarchical universal semantic embeddings (2019)

    Google Scholar 

  21. Wang, K., Yin, Q., Wei, W., et al.: A comprehensive survey on cross-modal retrieval (2016)

    Google Scholar 

  22. Wattenberg, M., Viégas, F., Johnson, I.: How to use t-SNE effectively. Distill 1(10), e2 (2016)

    Article  Google Scholar 

  23. Ting, S., Guohua, G.: Image retrieval method for deep neural network. Int. J. Sig. Process. Image Process. Pattern Recogn. 9(7), 33–42 (2016). NADIA, ISSN 2005-4254 (Print); 2207-970X (Online). https://doi.org/10.14257/ijsip.2016.9.7.04

  24. Bagri, N., Johari, P.K.: A comparative study on feature extraction using texture and shape for content-based image retrieval. Int. J. Adv. Sci. Technol. 80, 41–52 (2015). NADIA, ISSN 2005-4238 (Print); 2207-6360 (Online). https://doi.org/10.14257/ijast.2015.80.04

  25. Yang, D., Grice, S.: Research on the design of E-commerce recommendation system. Int. J. Smart Bus. Technol. 6(1), 15–30 (2018). https://doi.org/10.21742/IJSBT.2018.6.1.02

    Article  Google Scholar 

Download references

Acknowledgment

This paper is supported by the Natural Science Foundation of Heilongjiang Province Project Funding (LH2021F036).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhijie Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, H., Xu, J., Sun, H., Zhao, Z. (2022). Commodity Image Retrieval Based on Image and Text Data. In: Hassanien, A.E., Rizk, R.Y., Snášel, V., Abdel-Kader, R.F. (eds) The 8th International Conference on Advanced Machine Learning and Technologies and Applications (AMLTA2022). AMLTA 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-031-03918-8_10

Download citation

Publish with us

Policies and ethics