Advertisement

Semi-supervised Learning with a Teacher-Student Network for Generalized Attribute Prediction

Conference paper
  • 495 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)

Abstract

This paper presents a study on semi-supervised learning to solve the visual attribute prediction problem. In many applications of vision algorithms, the precise recognition of visual attributes of objects is important but still challenging. This is because defining a class hierarchy of attributes is ambiguous, so training data inevitably suffer from class imbalance and label sparsity, leading to a lack of effective annotations. An intuitive solution is to find a method to effectively learn image representations by utilizing unlabeled images. With that in mind, we propose a multi-teacher-single-student (MTSS) approach inspired by the multi-task learning and the distillation of semi-supervised learning. Our MTSS learns task-specific domain experts called teacher networks using the label embedding technique and learns a unified model called a student network by forcing a model to mimic the distributions learned by domain experts. Our experiments demonstrate that our method not only achieves competitive performance on various benchmarks for fashion attribute prediction, but also improves robustness and cross-domain adaptability for unseen domains.

Keywords

Semi-supervised learning Unlabeled data Visual attributes 

Supplementary material

504452_1_En_30_MOESM1_ESM.pdf (4.6 mb)
Supplementary material 1 (pdf 4722 KB)

References

  1. 1.
    Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimedia 17(11), 1949–1959 (2015)CrossRefGoogle Scholar
  2. 2.
    Adhikari, S.S., Singh, S., Rajagopal, A., Rajan, A.: Progressive fashion attribute extraction. arXiv preprint arXiv:1907.00157 (2019)
  3. 3.
    Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2015)CrossRefGoogle Scholar
  4. 4.
    Arslan, H.S., Sirts, K., Fishel, M., Anbarjafari, G.: Multimodal sequential fashion attribute prediction. Information 10(10), 308 (2019)CrossRefGoogle Scholar
  5. 5.
    Cevikalp, H., Benligiray, B., Gerek, O.N.: Semi-supervised robust deep neural networks for multi-label image classification. Pattern Recogn. 100, 107164 (2019)CrossRefGoogle Scholar
  6. 6.
    Chen, H., Gallagher, A., Girod, B.: Describing clothing by semantic attributes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 609–623. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33712-3_44CrossRefGoogle Scholar
  7. 7.
    Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5177–5186 (2019)Google Scholar
  8. 8.
    Corbiere, C., Ben-Younes, H., Ramé, A., Ollion, C.: Leveraging weakly annotated data for fashion image retrieval and label prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2268–2274 (2017)Google Scholar
  9. 9.
    Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 200–215. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-23528-8_13CrossRefGoogle Scholar
  10. 10.
    van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2019).  https://doi.org/10.1007/s10994-019-05855-6MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Gong, C., Chang, X., Fang, M., Yang, J.: Teaching semi-supervised classifier via generalized distillation. In: IJCAI, pp. 2156–2162 (2018)Google Scholar
  12. 12.
    Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 729–739 (2019)Google Scholar
  13. 13.
    Guo, S., et al.: The imaterialist fashion attribute dataset. arXiv preprint arXiv:1906.05750 (2019)
  14. 14.
    Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019)
  15. 15.
    Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24261-3_7CrossRefGoogle Scholar
  16. 16.
    Huang, J., Feris, R.S., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1062–1070 (2015)Google Scholar
  17. 17.
    Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. arXiv preprint arXiv:1710.05468 (2017)
  18. 18.
    Khodadadeh, S., Boloni, L., Shah, M.: Unsupervised meta-learning for few-shot image classification. In: Advances in Neural Information Processing Systems, pp. 10132–10142 (2019)Google Scholar
  19. 19.
    Kim, J., Kim, T., Kim, S., Yoo, C.D.: Edge-labeling graph neural network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11–20 (2019)Google Scholar
  20. 20.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  21. 21.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  22. 22.
    Liu, J., Lu, H.: Deep fashion analysis with feature map upsampling and landmark-driven attention. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 0–0 (2018)Google Scholar
  23. 23.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104 (2016)Google Scholar
  24. 24.
    Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5334–5343 (2017)Google Scholar
  25. 25.
    Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)Google Scholar
  26. 26.
    Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, pp. 5947–5956 (2017)Google Scholar
  27. 27.
    Orbes-Arteainst, M., et al.: Knowledge distillation for semi-supervised domain adaptation. In: Zhou, L., Sarikaya, D., Kia, S.M., Speidel, S., Malpani, A., Hashimoto, D., Habes, M., Löfstedt, T., Ritter, K., Wang, H. (eds.) OR 2.0/MLCN -2019. LNCS, vol. 11796, pp. 68–76. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-32695-1_8CrossRefGoogle Scholar
  28. 28.
    Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Advances in Neural Information Processing Systems, pp. 1410–1418 (2009)Google Scholar
  29. 29.
    Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., Talwar, K.: Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755 (2016)
  30. 30.
    Park, S., Shin, M., Ham, S., Choe, S., Kang, Y.: Study on fashion image retrieval methods for efficient fashion visual search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  31. 31.
    Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)Google Scholar
  32. 32.
    Quintino Ferreira, B., Costeira, J.P., Sousa, R.G., Gui, L.Y., Gomes, J.P.: Pose guided attention for multi-label fashion image classification. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)Google Scholar
  33. 33.
    Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676 (2018)
  34. 34.
    Shin, M., Park, S., Kim, T.: Semi-supervised feature-level attribute manipulation for fashion image retrieval. In: Proceedings of the British Machine Vision Conference (2019)Google Scholar
  35. 35.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems, pp. 4077–4087 (2017)Google Scholar
  36. 36.
    Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)Google Scholar
  37. 37.
    Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)Google Scholar
  38. 38.
    Wang, W., Xu, Y., Shen, J., Zhu, S.C.: Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4271–4280 (2018)Google Scholar
  39. 39.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848 (2017)Google Scholar
  40. 40.
    Xie, Q., Hovy, E., Luong, M.T., Le, Q.V.: Self-training with noisy student improves imagenet classification. arXiv preprint arXiv:1911.04252 (2019)
  41. 41.
    Yalniz, I.Z., Jégou, H., Chen, K., Paluri, M., Mahajan, D.: Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019)
  42. 42.
    Zhang, S., Song, Z., Cao, X., Zhang, H., Zhou, J.: Task-aware attention model for clothing attribute prediction. IEEE Trans. Circuits Syst. Video Technol. 30(4), 1051–1064 (2019)CrossRefGoogle Scholar
  43. 43.
    Zheng, S., Song, Y., Leung, T., Goodfellow, I.: Improving the robustness of deep neural networks via stability training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4480–4488 (2016)Google Scholar
  44. 44.
    Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Search Solutions Inc.Gyeonggi-doRepublic of Korea

Personalised recommendations