Two-attribute e-commerce image classification based on a convolutional neural network

  • Zhihao Cao
  • Shaomin MuEmail author
  • Mengping Dong
Original Article


A novel two-task learning method based on an improved convolutional neural network (CNN) using the idea of parameter transfer in transfer learning is proposed, aiming at the problem that a traditional convolutional neural network cannot simultaneously classify two attributes of e-commerce images. The network designed in this method has two channels, and each channel is responsible for learning a unique attribute of the image. First, the network is pre-trained by the channel corresponding to the most important attribute in the image, and the former network parameters are optimized. Then, two channels are used to train the network simultaneously. In the training process, the two learning tasks help each other by sharing parameters, which improves the convergence speed of the network and the generalization ability of the model. Aiming at the problem that there are fewer specific types of e-commerce images in datasets and the problem of class imbalance exists, a method of over-sampling based on the mix-up algorithm is proposed. The relationship between the complexity of the two attributes and the sparse rate of the CNN output feature matrix is studied, and the improved Grad-CAM algorithm is used to visualize and analyze the key areas for classification of two attributes, which improves the interpretability of the network. Experiments show that the proposed CNN method has good classification effect for two-attribute e-commerce images and traditional images.


Convolutional neural network Two-attribute image Multi-task learning Transfer learning 



This work is supported by the First Class Discipline Funding of Shandong Agricultural University.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimed. 17(11), 1949–1959 (2015)CrossRefGoogle Scholar
  2. 2.
    Ak, K.E., Lim, J.H., Tham, J.Y., Kassim, A.A.: Efficient multi-attribute similarity learning towards attribute-based fashion search. In: 2018 IEEE Winter Conference on Applications of Computer Vision, IEEE, pp. 1671–1679 (2018)Google Scholar
  3. 3.
    Bao, Q.P., Sun, Z.F.: Clothing image classification and retrieval based on metric learning. Comput. Appl. Softw. 34(4), 255–259 (2017). CrossRefGoogle Scholar
  4. 4.
    Baxter, J.: A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 28(1), 7–39 (1997)CrossRefGoogle Scholar
  5. 5.
    Bonilla, E.V., Chai, K.M., Williams, C.: Multi-task Gaussian process prediction. In: Advances in Neural Information Processing Systems, pp. 153–160 (2008)Google Scholar
  6. 6.
    Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.: Apparel classification with style. In: Asian Conference on Computer Vision. Springer, pp. 321–335 (2012)Google Scholar
  7. 7.
    Bui, G., Le, T., Morago, B., Duan, Y.: Point-based rendering enhancement via deep learning. Vis. Comput. 34(6–8), 829–841 (2018)CrossRefGoogle Scholar
  8. 8.
    Cai, N., Su, Z., Lin, Z., Wang, H., Yang, Z., Ling, B.W.K.: Blind inpainting using the fully convolutional neural network. Vis. Comput. 33(2), 249–261 (2017)CrossRefGoogle Scholar
  9. 9.
    Ching, T., Himmelstein, D.S., Beaulieu-Jones, B.K., Kalinin, A.A., Do, B.T., Way, G.P., Xie, W.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018)CrossRefGoogle Scholar
  10. 10.
    Das, A., Agrawal, H., Zitnick, L., Parikh, D., Batra, D.: Human attention in visual question answering: do humans and deep networks look at the same regions? Comput. Vis. Image Underst. 163, 90–100 (2017)CrossRefGoogle Scholar
  11. 11.
    Eaton-Rosen, Z., Bragman, F., Ourselin, S., Cardoso, M.J.: Improving data augmentation for medical image segmentation. In: International Conference on Medical Imaging with Deep Learning (2018)Google Scholar
  12. 12.
    Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 109–117 (2004)Google Scholar
  13. 13.
    Finkel, J.R., Manning, C.D.: Hierarchical bayesian domain adaptation. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 602–610 (2009)Google Scholar
  14. 14.
    Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  16. 16.
    Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., Hauptmann, A.: GNAS: a greedy neural architecture search method for multi-attribute learning. In: 2018 ACM Multimedia Conference on Multimedia Conference, ACM, pp. 2049–2057 (2018)Google Scholar
  17. 17.
    Inoue, H.: Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929 (2018)
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  19. 19.
    Li, D., Chen, X., Huang, K.: Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), IEEE, pp. 111–115 (2015)Google Scholar
  20. 20.
    Li, J.C., Yuan, C., Song, Y.: Multi-label image annotation based on convolutional neural network. Comput. Sci. 43(07), 41–45 (2016)Google Scholar
  21. 21.
    Li, X., Huang, H., Zhao, H., Wang, Y., Hu, M.: Learning a convolutional neural network for propagation-based stereo image segmentation. Vis. Comput. (2018). CrossRefGoogle Scholar
  22. 22.
    Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., Yan, S.: Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 3330–3337 (2012)Google Scholar
  23. 23.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp. 1717–1724 (2014)Google Scholar
  24. 24.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  25. 25.
    Park, J.K., Kang, D.J.: Unified convolutional neural network for direct facial keypoints detection. Vis. Comput. (2018). CrossRefGoogle Scholar
  26. 26.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  27. 27.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  28. 28.
    Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
  29. 29.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)Google Scholar
  30. 30.
    Shu, X., Qi, G.J., Tang, J., Wang, J.: Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 35–44 (2015)Google Scholar
  31. 31.
    Shu, X., Tang, J., Qi, G.J., Li, Z., Jiang, Y.G., Yan, S.: Image classification with tailored fine-grained dictionaries. IEEE Trans. Circuits Syst. Video Technol. 28(2), 454–467 (2018)CrossRefGoogle Scholar
  32. 32.
    Shu, X., Tang, J., Qi, G.J., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. arXiv preprint arXiv: 1811.00270 (2018)Google Scholar
  33. 33.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  34. 34.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  35. 35.
    Taigman, Y., Yang, M., Ranzato, M.A., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)Google Scholar
  36. 36.
    Verma, V., Lamb, A., Beckham, C., Courville, A., Mitliagkis, I., Bengio, Y.: Manifold mixup: encouraging meaningful on-manifold interpolation as a regularizer. arXiv preprint arXiv:1806.05236 (2018)
  37. 37.
    Wang, Y.W., Tang, L., Liu, Y.L., Chen, Q.B.: Vehicle multi-attribute recognition based on multi-task convolutional neural network. Comput. Eng. Appl. 54(08), 21–27 (2018). CrossRefGoogle Scholar
  38. 38.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)Google Scholar
  39. 39.
    Yu, K., Tresp, V., Schwaighofer, A.: Learning Gaussian processes from multiple tasks. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 1012–1019 (2005)Google Scholar
  40. 40.
    Zhang, H., Cisse, M., Dauphin, Y. N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
  41. 41.
    Zhao, J., Mao, X., Zhang, J.: Learning deep facial expression features from image and optical flow sequences using 3D CNN. Vis. Comput. 34(10), 1461–1475 (2018)CrossRefGoogle Scholar
  42. 42.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)Google Scholar
  43. 43.
    Zhou, F., Hu, Y., Shen, X.: MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis. Comput. (2018). CrossRefGoogle Scholar
  44. 44.
    Zhou, Z.H.: Machine Learning. Tsinghua University Press, Beijing (2015). (in Chinese) Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Information Science and EngineeringShandong Agricultural UniversityTai’anChina

Personalised recommendations