Abstract
One key challenge in zero-shot classification (ZSC) is the exploration of knowledge hidden in unseen classes. Generative methods such as generative adversarial networks (GANs) are typically employed to generate the visual information of unseen classes. However, the majority of these methods exploit global semantic features while neglecting the discriminative differences of local semantic features when synthesizing images, which may lead to sub-optimal results. In fact, local semantic information can provide more discriminative knowledge than global information can. To this end, this paper presents a new triple discriminator GAN for ZSC called TDGAN, which incorporates a text-reconstruction network into a dual discriminator GAN (D2GAN), allowing to realize cross-modal mapping from text descriptions to their visual representations. The text-reconstruction network focuses on key text descriptions for aligning semantic relationships to enable synthetic visual features to effectively represent images. Sharma-Mittal entropy is exploited in the loss function to make the distribution of synthetic classes be as close as possible to the distribution of real classes. The results of extensive experiments over the Caltech-UCSD Birds-2011 and North America Birds datasets demonstrate that the proposed TDGAN method consistently yields competitive performance compared to several state-of-the-art ZSC methods.
Similar content being viewed by others
References
Fu Y W, Xiang T, Jiang Y G, et al. Recent advances in zero-shot recognition: toward data-efficient understanding of visual content. IEEE Signal Process Mag, 2018, 35: 112–125
Guo G J, Wang H Z, Yan Y, et al. Large margin deep embedding for aesthetic image classification. Sci China Inf Sci, 2020, 63: 119101
Zhu X X, Anguelov D, Ramanan D. Capturing long-tail distributions of object subcategories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014. 915–922
Guo Y C, Ding G G, Han J G, et al. Synthesizing samples for zero-shot learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017. 1774–1780
Ji Z, Sun Y X, Yu Y, et al. Attribute-guided network for cross-modal zero-shot hashing. IEEE Trans Neural Netw Learn Syst, 2020, 31: 321–330
Long Y, Liu L, Shao L, et al. From zero-shot learning to conventional supervised classification: unseen visual data synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1627–1636
Yu Y L, Ji Z, Fu Y W, et al. Stacked semantics-guided attention model for fine-grained zero-shot learning. In: Proceedings of Advances in Neural Information Processing Systems, 2018. 5995–6004
Akata Z, Perronnin F, Harchaoui Z, et al. Label embedding for attribute-based classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2013. 819–826
Akata Z, Perronnin F, Harchaoui Z, et al. Label-embedding for image classification. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 1425–1438
Changpinyo S, Chao W L, Gong B Q, et al. Synthesized classifiers for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5327–5336
Wang X Y, Ji Q. A unified probabilistic approach modeling relationships between attributes and objects. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 2120–2127
Nguyen T D, Le T, Vu H, et al. Dual discriminator generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 2670–2680
Wah C, Branson S, Welinder P, et al. The caltech-ucsd birds-200-2011 dataset. 2011. http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
van Horn G, Branson S, Farrell R, et al. Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 595–604
Elhoseiny M, Saleh B, Elgammal A. Write a classifier: zero-shot learning using purely textual descriptions. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 2584–2591
Ba J L, Swersky K, Fidler S. Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 4247–4255
Reed S, Akata Z, Lee H, et al. Learning deep representations of fine-grained visual descriptions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 49–58
Qiao R Z, Liu L Q, Shen C H, et al. Less is more: zero-shot learning from online textual documents with noise suppression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2249–2257
Elhoseiny M, Zhu Y Z, Zhang H, et al. Link the head to the “beak”: zero shot learning from noisy text description at part precision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 6288–6297
Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2223–2232
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2014. 2672–2680
Martin A, Bottou L. Towards principled methods for training generative adversarial networks. 2017. ArXiv:1701.04862
Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans. In: Proceedings of Advances in Neural Information Processing Systems, 2017. 5767–5777
Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, 2017. 214–223
Xian Y Q, Lorenz T, Schiele B, et al. Feature generating networks for zero-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 5542–5551
Schonfeld E, Ebrahimi S, Sinha S, et al. Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8247–8255
Bucher M, Herbin S, Jurie F. Generating visual representations for zero-shot classification. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 2666–2673
Zhu Y Z, Elhoseiny M, Liu B C, et al. A generative adversarial approach for zero-shot learning from noisy texts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1004–1013
Li Y J, Swersky K, Zemel R. Generative moment matching networks. In: Proceedings of International Conference on Machine Learning, 2015. 1718–1727
Zhang H, Xu T, Elhoseiny M, et al. SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1143–1152
Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manage, 1988, 24: 513–523
Akata Z, Malinowski M, Fritz M, et al. Multi-cue zero-shot learning with strong supervision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 59–68
Romera-Paredes B, Torr P. An embarrassingly simple approach to zero-shot learning. In: Proceedings of International Conference on Machine Learning, 2015. 2152–2161
Akata Z, Reed S, Walter D, et al. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015. 2927–2936
Chao W L, Changpinyo S, Gong B, et al. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Proceedings of European Conference on Computer Vision, 2016. 52–68
Ji Z, Xiong K L, Pang Y W, et al. Video summarization with attention-based encoder-decoder networks. IEEE Trans Circ Syst Video Technol, 2020, 30: 1709–1717
Wang Z H, Liu X, Lin J W, et al. Multi-attention based cross-domain beauty product image retrieval. Sci China Inf Sci, 2020, 63: 120112
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61771329, 61632018).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ji, Z., Yan, J., Wang, Q. et al. Triple discriminator generative adversarial network for zero-shot image classification. Sci. China Inf. Sci. 64, 120101 (2021). https://doi.org/10.1007/s11432-020-3032-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-020-3032-8