Abstract
Zero-shot object detection (ZSD) has recently been proposed for detecting objects whose categories have never been seen during training. Existing ZSD works have some drawbacks: (a) the end-to-end methods sacrifice the mean accuracy precision (mAP) on seen classes; (b) the feature-based methods could avoid the above problem but suffer from simple feature construction. Thus, in this paper, we present a succinct but effective feature-based ZSD model whose feature construction naturally leverages the deep feature embedding of the detector itself as the visual features of the detected objects. The features we utilize, named “Detection Feature” (DetFeat), contain not only visual representations but also context and position information, which provide more discriminative information for seen and unseen objects. Additionally, we simulate the construction of the attributes defined by human experts to generate the specific label embedding for the ZSD task, named “Simulated Attributes” (Simu-Attr). We find that Simu-attr promotes better alignment between visual and semantic space for alleviating the problem of the semantic gap. Extensive experiments show that our approach improves the detection performance on unseen classes while maintaining the high detection performance on seen classes. On the challenging COCO dataset, we surpass the best existing transductive ZSD TL-ZSD with about 1% on unseen class and about 10% on seen class using mAP as metric.
Similar content being viewed by others
References
Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: Proceedings of the european conference on computer vision (ECCV), pp 384–400
Chen H, Luo Y, Cao L, Zhang B, Guo G, Wang C, Li J, Ji R (2019) Generalized zero-shot vehicle detection in remote sensing imagery via coarse-to-fine framework. In: Proceedings of the 28th International joint conference on artificial intelligence. AAAI Press, pp 687–693
Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J, Zhang Z, Cheng D, Zhu C, Cheng T, Zhao Q, Li B, Lu X, Zhu R, Wu Y, Dai J, Wang J, Shi J, Ouyang W, Loy CC, Lin D (2019) MMDetection: Open mmlab detection toolbox and benchmark. arXiv:1906.07155
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764– 773
Demirel B, Cinbis RG, Ikizler-Cinbis N (2018) Zero-shot object detection by hybrid region embedding
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T (2013) Devise: A deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440– 1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Li Z, Yao L, Zhang X, Wang X, Kanhere S, Zhang H (2019) Zero-shot object detection with textual descriptions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 8690–8697
Lin TY, Dollár P., Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2013) Zero-shot learning by convex combination of semantic embeddings. arXiv:1312.5650
Palatucci M, Pomerleau D, Hinton GE, Mitchell TM (2009) Zero-shot learning with semantic output codes. In: Advances in neural information processing systems, pp 1410–1418
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Rahman S, Khan S, Barnes N (2018) Polarity loss for zero-shot object detection. arXiv:1811.08982
Rahman S, Khan S, Barnes N (2019) Transductive learning for zero-shot object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6082–6091
Rahman S, Khan S, Porikli F (2018) Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. In: Asian conference on computer vision. Springer, pp 547–563
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Ren S, He K, Girshick R, Zhang X, Sun J (2016) Object detection networks on convolutional feature maps. IEEE Trans Pattern Anal Mach Intell 39(7):1476–1481
Zhang L, Wang X, Yao L, Wu L, Zheng F (2020) Zero-shot object detection via learning an embedding from semantic space to visual space. In: Twenty-ninth international joint conference on artificial intelligence and seventeenth pacific rim international conference on artificial intelligence {IJCAI-PRICAI-20}. international joint conferences on artificial intelligence organization
Zhu P, Wang H, Saligrama V (2019) Zero shot detection. IEEE Transactions on Circuits and Systems for Video Technology
Zitnick CL, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: European conference on computer vision. Springer, pp 391–405
Acknowledgments
This work is supported by the National Key Research and Development Program of China under Grant 2019YFC0118200, National Natural Science Foundation of China under Grant 6180332.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interests
We have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, C., Wu, W., Wang, Y. et al. A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell 52, 6905–6914 (2022). https://doi.org/10.1007/s10489-021-02746-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02746-z