Fast Cross-Scenario Clothing Retrieval Based on Indexing Deep Features

  • Zongmin LiEmail author
  • Yante Li
  • Yongbiao Gao
  • Yujie Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9916)


In this paper, we propose a new approach for large scale daily clothing retrieval. Fast clothing image search in cross scenarios is a challenging task due to the large amount of clothing images on the internet and visual differences between street photos (pictures of people wearing clothing taken in our daily life with complex background) and online shop photos (pictures of clothing items on people, captured by professionals in more controlled settings). We tackle the problem of cross-scenario clothing retrieval through clothing segmentation based on coarse-fine hierarchical superpixel segmentation and pose estimation to remove the background of clothing image and employ deep features representing the clothing item aimed at describing various clothing effectively. In addition, in order to speed up the retrieval process for large scale online clothing images, we adopt inverted indexing on deep feature by regarding deep features as Bag-of-Word model. In this way, we obtain similar clothing items far faster. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches.


Clothing retrieval Over segmentation Cross-scenario Deep learning Inverted index Bag-of-Words 



The authors would like to thank the support of National Natural Science Foundation of China, the Scientific Research Foundation for the Excellent Middle-Aged and Youth Scientists of Shandong Province of China.


  1. 1.
    Deng, J., Dong, W., Socher, R, Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  2. 2.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  3. 3.
    Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp. 1470–1477 (2003)Google Scholar
  4. 4.
    Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2012)Google Scholar
  5. 5.
    Rother, C., Kolmogorov, V., Blake, A.: Grabcut - interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23, 309–314 (2004)CrossRefGoogle Scholar
  6. 6.
    Liu, S., Song, Z., Liu, G.: Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3330–3337 (2012)Google Scholar
  7. 7.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 1385–1392 (2011)Google Scholar
  8. 8.
    Liu, S., et al.: Hi, magic closet, tell me what to wear! In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 619–628 (2012)Google Scholar
  9. 9.
    Fu, J., Wang, J., Li, Z., et al.: Efficient clothing retrieval with semantic preserving visual phrases. In: Proceedings of 11th Asian Conference on Computer Vision, pp. 420–431 (2013)Google Scholar
  10. 10.
    Di, W., Wah, C., Bhardwaj, A., Piramuthu, R., Sundaresan, N.: Style finder: fine-grained clothing style recognition and retrieval. In: Computer Vision and Pattern Recognition Workshops, pp. 8–13 (2013)Google Scholar
  11. 11.
    Chen, H. Gallagher, A. Girod, B.: Describing clothing by semantic attributes. In: Proceedings of the 12th European Conference on Computer Vision, pp. 609–623 (2012)Google Scholar
  12. 12.
    Kalantidis, Y., Kennedy, L., Li, L.J.: Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In: The 3rd ACM Conference on International Conference on Multimedia Retrieval, pp. 105–112 (2013)Google Scholar
  13. 13.
    Malisiewicz, T., Gupta, A., Efros, A.A.: A. Ensemble of exemplar-SVMs for object detection. In: International Conference on Computer Vision, pp. 89–96 (2011)Google Scholar
  14. 14.
    Sutskever, I., Krizhevsky, A., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  15. 15.
    Babenko, A., Lempitsky, V.: The inverted multi-index. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3069–3076 (2012)Google Scholar
  16. 16.
    Jurie, F., Nowak, E., Triggs, B.: Sampling strategies for bag of features image classification. In: European Conference on Computer Vision, pp. 490–503 (2006)Google Scholar
  17. 17.
    Kiapour, M.H., Lazebnik, S., Han, X.: Where to buy it: matching street clothing photos in online shops. In: IEEE International Conference on Computer Vision, pp. 3343–3351 (2015)Google Scholar
  18. 18.
    Girshick, R., Donahue, J., Darrell, T.: Region based convolutional networks for accurate object detection and semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 142–158 (2015)CrossRefGoogle Scholar
  19. 19.
    Kuang, Z., Li, Z., Lv, Q.: Modal function transformation for isometric 3D shape representation. Comput. Graph. 46, 209–220 (2015)CrossRefGoogle Scholar
  20. 20.
    Liu, R., Zhao, Y., Wei, S., Zhu, Z., Liao, L., Qiu, S.: Indexing of CNN features for large scale image search. CoRR, abs/1508.00217 (2015)Google Scholar
  21. 21.
    Uijlings, J., van Sande, K.E.A.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)CrossRefGoogle Scholar
  22. 22.
    Avrithis, Y., Kalantidis, Y.: Approximate Gaussian mixtures for large scale vocabularies. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 15–28. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33712-3_2 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.College of Computer and Communication EngineeringChina University of Petroleum (East China)BeijingChina

Personalised recommendations