Abstract
In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorization (recognize one or multiple attributes). The proposed task requires both localizing an object and describing its properties. To illustrate the various aspects of this task, we focus on the domain of fashion and introduce Fashionpedia as a step toward mapping out the visual aspects of the fashion world. Fashionpedia consists of two parts: (1) an ontology built by fashion experts containing 27 main apparel categories, 19 apparel parts, 294 fine-grained attributes and their relationships; (2) a dataset with everyday and celebrity event fashion images annotated with segmentation masks and their associated per-mask fine-grained attributes, built upon the Fashionpedia ontology. In order to solve this challenging task, we propose a novel Attribute-Mask R-CNN model to jointly perform instance segmentation and localized attribute recognition, and provide a novel evaluation metric for the task. Fashionpedia is available at: https://fashionpedia.github.io/home/.
M. Jia, M. Shi, M. Sirotenko and Y. Cui—Equal Contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI (2016)
Attneave, F., Arnoult, M.D.: The quantitative study of shape and pattern perception. Psychol. Bull. 53(6), 452–471 (1956)
Bloomsbury.com: Fashion photography archive. Retrieved from 9 May 2019. https://www.bloomsbury.com/dr/digital-resources/products/fashion-photography-archive/
Bossard, L., et al.: Apparel classification with style. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) Computer Vision – ACCV 2012. Lecture Notes in Computer Science, vol. 7727, pp. 321–335. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-37447-0_25
Du, X., et al.: SpineNet: learning scale-permuted backbone for recognition and localization. In: CVPR (2020)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)
FashionAI: Retrieved from 9 May 2019 . http://fashionai.alibaba.com/
Fashionary.org: Fashionpedia – the visual dictionary of fashion design. Retrieved from 9 May 2019. https://fashionary.org/products/fashionpedia
Ferrari, V., Zisserman, A.: Learning visual attributes. In: Advances in Neural Information Processing Systems (2008)
Fu, C.Y., Berg, T.L., Berg, A.C.: IMP: instance mask projection for high accuracy semantic segmentation of things. arXiv preprint arXiv:1906.06597 (2019)
Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: Deepfashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In: CVPR (2019)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Goyal, P., et al.: Accurate, large minibatch SGD: training imageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
Guo, S., et al.: The imaterialist fashion attribute dataset. In: ICCV Workshops (2019)
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: CVPR (2019)
Han, X., et al.: Automatic spatially-aware fashion concept discovery. In: ICCV (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: WWW (2016)
Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision – ECCV 2012. Lecture Notes in Computer Science, vol. 7574, pp. 340–353. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_25
Hsiao, W.L., Grauman, K.: Learning the latent “look”: unsupervised discovery of a style-coherent embedding from fashion images. In: ICCV (2017)
Huang, J., Feris, R., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: ICCV (2015)
Inoue, N., Simo-Serra, E., Yamasaki, T., Ishikawa, H.: Multi-label fashion image classification with minimal human supervision. In: ICCV (2017)
Kendall, E.F., McGuinness, D.L.: Ontology Engineering (Synthesis Lectures on The Semantic Web: Theory and Technology), pp. 1–136. Morgan & Claypool, San Rafael (2019)
Kiapour, M.H., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it: matching street clothing photos in online shops. In: ICCV (2015)
Kiapour, M.H., Yamaguchi, K., Berg, A.C., Berg, T.L.: Hipster wars: discovering elements of fashion styles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8689, pp. 472–488. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_31
Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: CVPR (2019)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. (IJCV) 123, 32–73 (2017)
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV (2009)
Lin, T.Y., et al.: Feature pyramid networks for object detection. In: CVPR (2017)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)
Lopez, A.: Fdupes is a program for identifying or deleting duplicate files residing within specified directories. Retrieved from 9 May 2019. https://github.com/adrianlopezroche/fdupes
Mall, U., Matzen, K., Hariharan, B., Snavely, N., Bala, K.: Geostyle: discovering fashion trends and events. In: ICCV (2019)
Matzen, K., Bala, K., Snavely, N.: StreetStyle: exploring world-wide clothing styles from millions of photos. arXiv preprint arXiv:1706.01869 (2017)
Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
Rosch, E.: Cognitive representations of semantic categories. J. Exp. Psychol. Gen. 104(3), 192–233 (1975)
Rubio, A., Yu, L., Simo-Serra, E., Moreno-Noguer, F.: Multi-modal embedding for main product detection in fashion. In: ICCV (2017)
Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: Neuroaesthetics in fashion: modeling the perception of fashionability. In: CVPR (2015)
Simo-Serra, E., Ishikawa, H.: Fashion style in 128 floats: joint ranking and classification using weak data for feature extraction. In: CVPR (2016)
Takagi, M., Simo-Serra, E., Iizuka, S., Ishikawa, H.: What makes a style: experimental analysis of fashion prediction. In: ICCV (2017)
Van Horn, G., et al.: The iNaturalist species classification and detection dataset. In: CVPR (2018)
Vittayakorn, S., Yamaguchi, K., Berg, A.C., Berg, T.L.: Runway to Realway: visual analysis of fashion. In: WACV (2015)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base (2014)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Yamaguchi, K., Berg, T.L., Ortiz, L.E.: Chic or social: visual popularity analysis in online fashion networks. In: ACM MM (2014)
Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. In: CVPR (2012)
Yu, A., Grauman, K.: Semantic jitter: dense supervision for visual comparisons via synthetic images. In: ICCV (2017)
Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: CVPR (2020)
Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: Modanet: a large-scale street fashion dataset with polygon annotations. In: ACM MM (2018)
Zoph, B., et al.: Rethinking pre-training and self-training. arXiv preprint arXiv:2006.06882 (2020)
Acknowledgements
This research was partially supported by a Google Faculty Research Award. We thank Kavita Bala, Carla Gomes, Dustin Hwang, Rohun Tripathi, Omid Poursaeed, Hector Liu, and Nayanathara Palanivel, Konstantin Lopuhin for their helpful feedback and discussion in the development of Fashionpedia dataset. We also thank Zeqi Gu, Fisher Yu, Wenqi Xian, Chao Suo, Junwen Bai, Paul Upchurch, Anmol Kabra, and Brendan Rappazzo for their help developing the fine-grained attribute annotation tool.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jia, M. et al. (2020). Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12346. Springer, Cham. https://doi.org/10.1007/978-3-030-58452-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-58452-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58451-1
Online ISBN: 978-3-030-58452-8
eBook Packages: Computer ScienceComputer Science (R0)