Abstract
This paper describes an attention-based fusion method for outfit recommendation which fuses the information in the product image and description to capture the most important, fine-grained product features into the item representation. We experiment with different kinds of attention mechanisms and demonstrate that the attention-based fusion improves item understanding. We outperform state-of-the-art outfit recommendation results on three benchmark datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Alternatively, we could compare such item pairs in the semantic space instead. This has a negligible effect on experimental results.
- 2.
- 3.
References
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473. http://arxiv.org/abs/1409.0473
Chen W, Huang P, Xu J, Guo X, Guo C, Sun F, Li C, Pfadler A, Zhao H, Zhao B (2019) POG: personalized outfit generation for fashion recommendation at Alibaba iFashion. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, pp 2662–2670
Han X, Wu Z, Jiang YG, Davis LS (2017) Learning fashion compatibility with bidirectional lstms. In: ACM International Conference on Multimedia (ACM-MM)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
He R, Packer C, McAuley J (2016) Learning compatibility across categories for heterogeneous item recommendation. In: IEEE International Conference on Data Mining (ICDM)
Hsiao W, Grauman K (2018) Creating capsule wardrobes from fashion images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7161–7170
Li Y, Cao L, Zhu J, Luo J (2017) Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Trans Multimedia 19:1946–1955
Li X, Song J, Gao L, Liu X, Huang W, Gan C, He X (2019) Beyond RNNS: positional self-attention with co-attention for video question answering. In: AAAI Conference on Artificial Intelligence
Lin Y, Ren P, Chen Z, Ren Z, Ma J, de Rijke M (2019) Improving outfit recommendation with co-supervision of fashion generation. In: The World Wide Web Conference, pp 1095–1105
Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. In: Advances in Neural Information Processing Systems (NIPS), pp 289–297
Nam H, Ha JW, Kim J (2017) Dual attention networks for multimodal reasoning and matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Seo MJ, Kembhavi A, Farhadi A, Hajishirzi H (2017) Bidirectional attention flow for machine comprehension. In: International Conference on Learning Representations (ICLR)
Simo-Serra E, Fidler S, Moreno-Noguer F, Urtasun R (2015) Neuroaesthetics in fashion: modeling the perception of fashionability. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 869–877
Vasileva MI, Plummer BA, Dusad K, Rajpal S, Kumar R, Forsyth DA (2018) Learning type-aware embeddings for fashion compatibility. In: The European Conference on Computer Vision (ECCV)
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), pp 5998–6008
Veit A, Kovacs B, Bell S, McAuley J, Bala K, Belongie S (2015) Learning visual clothing style with heterogeneous dyadic co-occurrences. In: IEEE International Conference on Computer Vision (ICCV), pp 4642–4650
Yang Z, He X, Gao J, Deng L, Smola AJ (2016) Stacked attention networks for image question answering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 21–29
Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: IEEE International Conference on Computer Vision (ICCV), pp 1839–1848
Acknowledgements
The first author is supported by a grant of the Research Foundation – Flanders (FWO) no. 1S55420N.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
A Dataset Item Types
Table 2 gives an overview of the different item types in the Polyvore68K dataset versions and the types that remain in the Polyvore21K dataset after cleaning.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Laenen, K., Moens, MF. (2020). Attention-Based Fusion for Outfit Recommendation. In: Dokoohaki, N. (eds) Fashion Recommender Systems. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-55218-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-55218-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55217-6
Online ISBN: 978-3-030-55218-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)