Skip to main content

Attention-Based Fusion for Outfit Recommendation

  • Conference paper
  • First Online:
Fashion Recommender Systems

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Abstract

This paper describes an attention-based fusion method for outfit recommendation which fuses the information in the product image and description to capture the most important, fine-grained product features into the item representation. We experiment with different kinds of attention mechanisms and demonstrate that the attention-based fusion improves item understanding. We outperform state-of-the-art outfit recommendation results on three benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Alternatively, we could compare such item pairs in the semantic space instead. This has a negligible effect on experimental results.

  2. 2.

    https://github.com/mvasil/fashion-compatibility

  3. 3.

    https://github.com/xthan/polyvore-dataset

References

  1. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473. http://arxiv.org/abs/1409.0473

  2. Chen W, Huang P, Xu J, Guo X, Guo C, Sun F, Li C, Pfadler A, Zhao H, Zhao B (2019) POG: personalized outfit generation for fashion recommendation at Alibaba iFashion. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, pp 2662–2670

    Google Scholar 

  3. Han X, Wu Z, Jiang YG, Davis LS (2017) Learning fashion compatibility with bidirectional lstms. In: ACM International Conference on Multimedia (ACM-MM)

    Google Scholar 

  4. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

    Google Scholar 

  5. He R, Packer C, McAuley J (2016) Learning compatibility across categories for heterogeneous item recommendation. In: IEEE International Conference on Data Mining (ICDM)

    Google Scholar 

  6. Hsiao W, Grauman K (2018) Creating capsule wardrobes from fashion images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7161–7170

    Google Scholar 

  7. Li Y, Cao L, Zhu J, Luo J (2017) Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Trans Multimedia 19:1946–1955

    Article  Google Scholar 

  8. Li X, Song J, Gao L, Liu X, Huang W, Gan C, He X (2019) Beyond RNNS: positional self-attention with co-attention for video question answering. In: AAAI Conference on Artificial Intelligence

    Google Scholar 

  9. Lin Y, Ren P, Chen Z, Ren Z, Ma J, de Rijke M (2019) Improving outfit recommendation with co-supervision of fashion generation. In: The World Wide Web Conference, pp 1095–1105

    Google Scholar 

  10. Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In: Advances in Neural Information Processing Systems (NIPS), pp 289–297

    Google Scholar 

  11. Nam H, Ha JW, Kim J (2017) Dual attention networks for multimodal reasoning and matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

    Google Scholar 

  12. Seo MJ, Kembhavi A, Farhadi A, Hajishirzi H (2017) Bidirectional attention flow for machine comprehension. In: International Conference on Learning Representations (ICLR)

    Google Scholar 

  13. Simo-Serra E, Fidler S, Moreno-Noguer F, Urtasun R (2015) Neuroaesthetics in fashion: modeling the perception of fashionability. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 869–877

    Google Scholar 

  14. Vasileva MI, Plummer BA, Dusad K, Rajpal S, Kumar R, Forsyth DA (2018) Learning type-aware embeddings for fashion compatibility. In: The European Conference on Computer Vision (ECCV)

    Google Scholar 

  15. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp 5998–6008

    Google Scholar 

  16. Veit A, Kovacs B, Bell S, McAuley J, Bala K, Belongie S (2015) Learning visual clothing style with heterogeneous dyadic co-occurrences. In: IEEE International Conference on Computer Vision (ICCV), pp 4642–4650

    Google Scholar 

  17. Yang Z, He X, Gao J, Deng L, Smola AJ (2016) Stacked attention networks for image question answering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 21–29

    Google Scholar 

  18. Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: IEEE International Conference on Computer Vision (ICCV), pp 1839–1848

    Google Scholar 

Download references

Acknowledgements

The first author is supported by a grant of the Research Foundation – Flanders (FWO) no. 1S55420N.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katrien Laenen .

Editor information

Editors and Affiliations

Appendices

Appendix

A Dataset Item Types

Table 2 gives an overview of the different item types in the Polyvore68K dataset versions and the types that remain in the Polyvore21K dataset after cleaning.

Table 2 Item types kept in the Polyvore68K and Polyvore21K datasets

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Laenen, K., Moens, MF. (2020). Attention-Based Fusion for Outfit Recommendation. In: Dokoohaki, N. (eds) Fashion Recommender Systems. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-55218-3_4

Download citation

Publish with us

Policies and ethics