Learning Type-Aware Embeddings for Fashion Compatibility

  • Mariya I. Vasileva
  • Bryan A. PlummerEmail author
  • Krishna Dusad
  • Shreya Rajpal
  • Ranjitha Kumar
  • David Forsyth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)


Outfits in online fashion data are composed of items of many different types (e.g. top, bottom, shoes) that share some stylistic relationship with one another. A representation for building outfits requires a method that can learn both notions of similarity (for example, when two tops are interchangeable) and compatibility (items of possibly different type that can go together in an outfit). This paper presents an approach to learning an image embedding that respects item type, and jointly learns notions of item similarity and compatibility in an end-to-end model. To evaluate the learned representation, we crawled 68,306 outfits created by users on the Polyvore website. Our approach obtains 3–5% improvement over the state-of-the-art on outfit compatibility prediction and fill-in-the-blank tasks using our dataset, as well as an established smaller dataset, while supporting a variety of useful queries (Code and data:


Fashion Embedding methods Appearance representations 



This work is supported in part by ONR MURI Award N00014-16-1-2007, in part by NSF under Grant No. NSF IIS-1421521, and in part by a Google MURA Award and an Amazon Research Faculty Award.

Supplementary material

474218_1_En_24_MOESM1_ESM.pdf (7.3 mb)
Supplementary material 1 (pdf 7486 KB)


  1. 1.
    Al-Halah, Z., Stiefelhagen, R., Grauman, K.: Fashion forward: forecasting visual style in fashion. In: ICCV (2017)Google Scholar
  2. 2.
    Bell, S., Bala, K.: Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. (SIGGRAPH) 34(4) (2015)CrossRefGoogle Scholar
  3. 3.
    Bromley, J., et al.: Signature verification using a “siamese” time delay neural network. In: IJPRAI (1993)Google Scholar
  4. 4.
    Chen, H., Gallagher, A., Girod, B.: Describing clothing by semantic attributes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 609–623. Springer, Heidelberg (2012). Scholar
  5. 5.
    Chen, Q., Huang, J., Feris, R., Brown, L.M., Dong, J., Yan, S.: Deep domain adaptation for describing people based on fine-grained clothing attributes. In: CVPR (2015)Google Scholar
  6. 6.
    Corbiere, C., Ben-Younes, H., Rame, A., Ollion, C.: Leveraging weakly annotated data for fashion image retrieval and label prediction. In: ICCV Workshops (2017)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  8. 8.
    Di, W., Wah, C., Bhardwaj, A., Piramuthu, R., Sundaresan, N.: Style finder: fine-grained clothing style detection and retrieval. In: CVPR Workshops (2013)Google Scholar
  9. 9.
    Garcia, N., Vogiatzis, G.: Dress like a star: retrieving fashion products from videos. In: ICCV Workshops (2017)Google Scholar
  10. 10.
    Gomez, L., Patel, Y., Rusinol, M., Karatzas, D., Jawahar, C.V.: Self-supervised learning of visual features through embedding images into text topic spaces. In: CVPR (2017)Google Scholar
  11. 11.
    Hadi Kiapour, M., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it: matching street clothing photos in online shops. In: ICCV (2015)Google Scholar
  12. 12.
    Han, X., Wu, Z., Jiang, Y.G., Davis, L.S.: Learning fashion compatibility with bidirectional LSTMS. In: ACM MM (2017)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  14. 14.
    He, R., Packer, C., McAuley, J.: Learning compatibility across categories for heterogeneous item recommendation. In: International Conference on Data Mining (2016)Google Scholar
  15. 15.
    Hsiao, W.L., Grauman, K.: Learning the latent “look”: unsupervised discovery of a style-coherent embedding from fashion images. In: ICCV (2017)Google Scholar
  16. 16.
    Kiapour, M.H., Yamaguchi, K., Berg, A.C., Berg, T.L.: Hipster wars: discovering elements of fashion styles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 472–488. Springer, Cham (2014). Scholar
  17. 17.
    Klein, B., Lev, G., Sadeh, G., Wolf, L.: Fisher vectors derived from hybrid Gaussian-Laplacian mixture models for image annotation. In: CVPR (2015)Google Scholar
  18. 18.
    Li, Y., Cao, L., Zhu, J., Luo, J.: Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Trans. Multimed. 19(8), 1946–1955 (2017)CrossRefGoogle Scholar
  19. 19.
    Liu, S., et al.: Hi, magic closet, tell me what to wear! In: ACM MM (2012)Google Scholar
  20. 20.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)Google Scholar
  21. 21.
    van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. JMLR 9, 2579–2605 (2008)zbMATHGoogle Scholar
  22. 22.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
  23. 23.
    Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: NIPS (2003)Google Scholar
  24. 24.
    Rubio, A., Yu, L., Simo-Serra, E., Moreno-Noguer, F.: Multi-modal embedding for main product detection in fashion. In: ICCV Workshops (2017)Google Scholar
  25. 25.
    Salvador, A., et al.: Learning cross-modal embeddings for cooking recipes and food images. In: CVPR (2017)Google Scholar
  26. 26.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  27. 27.
    Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: Neuroaesthetics in fashion: modeling the perception of fashionability. In: CVPR (2015)Google Scholar
  28. 28.
    Simo-Serra, E., Ishikawa, H.: Fashion style in 128 floats: joint ranking and classification using weak data for feature extraction. In: CVPR (2016)Google Scholar
  29. 29.
    Singh, K.K., Lee, Y.J.: End-to-End localization and ranking for relative attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 753–769. Springer, Cham (2016). Scholar
  30. 30.
    Song, Y., Li, Y., Wu, B., Chen, C.Y., Zhang, X., Adam, H.: Learning unified embedding for apparel recognition. In: ICCV Workshops (2017)Google Scholar
  31. 31.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)Google Scholar
  32. 32.
    Vaccaro, K., Shivakumar, S., Ding, Z., Karahalios, K., Kumar, R.: The elements of fashion style. In: Proceedings of the 29th Annual Symposium on User Interface Software and Technology (2016)Google Scholar
  33. 33.
    Veit, A., Belongie, S., Karaletsos, T.: Conditional similarity networks. In: CVPR (2017)Google Scholar
  34. 34.
    Veit, A., Kovacs, B., Bell, S., McAuley, J., Bala, K., Belongie, S.: Learning visual clothing style with heterogeneous dyadic co-occurrences. In: ICCV (2015)Google Scholar
  35. 35.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: ICCV (2017)Google Scholar
  36. 36.
    Xiao, H., Huang, M., Zhu, X.: SSP: semantic space projection for knowledge graph embedding with text descriptions. In: AAAI (2017)Google Scholar
  37. 37.
    Yamaguchi, K., Okatani, T., Sudo, K., Murasaki, K., Taniguchi, Y.: Mix and match: joint model for clothing and attribute recognition. In: BMVC (2015)Google Scholar
  38. 38.
    Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: CVPR (2014)Google Scholar
  39. 39.
    Yu, A., Grauman, K.: Just noticeable differences in visual attributes. In: ICCV (2015)Google Scholar
  40. 40.
    Yu, A., Grauman, K.: Semantic jitter: dense supervision for visual comparisons via synthetic images. In: ICCV (2017)Google Scholar
  41. 41.
    Zhao, B., Feng, J., Wu, X., Yan, S.: Memory-augmented attribute manipulation networks for interactive fashion search. In: CVPR (2017)Google Scholar
  42. 42.
    Zhu, S., Urtasun, R., Fidler, S., Lin, D., Loy, C.C.: Be your own prada: fashion synthesis with structural coherence. In: ICCV (2017)Google Scholar
  43. 43.
    Zhuang, B., Lin, G., Shen, C., Reid, I.: Fast training of triplet-based deep binary embedding networks. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Mariya I. Vasileva
    • 1
  • Bryan A. Plummer
    • 1
    Email author
  • Krishna Dusad
    • 1
  • Shreya Rajpal
    • 1
  • Ranjitha Kumar
    • 1
  • David Forsyth
    • 1
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations