Few-shot image generation based on contrastive meta-learning generative adversarial network

Phaphuangwittayakul, Aniwat; Ying, Fangli; Guo, Yi; Zhou, Liting; Chakpitak, Nopasit

doi:10.1007/s00371-022-02566-3

Few-shot image generation based on contrastive meta-learning generative adversarial network

Original article
Published: 21 July 2022

Volume 39, pages 4015–4028, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Aniwat Phaphuangwittayakul¹,
Fangli Ying ORCID: orcid.org/0000-0001-8390-3229²,
Yi Guo^1,3,4,
Liting Zhou⁵ &
…
Nopasit Chakpitak⁶

777 Accesses
7 Citations
Explore all metrics

Abstract

Traditional deep generative models rely on enormous training data for generating images from a given class. However, they face the challenges associated with expensive and time-consuming in data acquisition as well as the requirements for fast learning from limited data of new categories. In this study, a contrastive meta-learning generative adversarial network (CML-GAN) is proposed to generate novel images of unseen classes from a few images by applying a self-supervised contrastive learning strategy to a fast adaptive meta-learning framework. By introducing a meta-learning framework into a GAN-based model, our model can efficiently learn the feature representations and quickly adapt to new generation tasks with only a few samples. The proposed model takes original input and generated images from the GAN-based model as inputs and evaluates both contrastive loss and distance loss based on the feature representations of the inputs extracted from the encoder. The original input image and its generated version from the generator are considered a positive pair, while the rest of the generated images in the same batch are considered negative samples. Then, the model converges to differentiate positive samples from negative ones and learns to generate distinct representations of the same samples, which prevents model overfitting. Thus, our model can generalize to generate diverse images from only a few samples of unseen categories, while fast adapting to new image generation tasks. Furthermore, the effectiveness of our model is demonstrated through extensive experiments on three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeltaGAN: Towards Diverse Few-Shot Image Generation with Sample-Specific Delta

Rethinking cross-domain semantic relation for few-shot image generation

Article 27 June 2023

AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation

Article 06 May 2023

Data availability

The datasets used to support the experiments in this paper can be accessed via the official links: MNIST (http://yann.lecun.com/exdb/mnist/), Omniglot (https://github.com/brendenlake/omniglot/), VGG-Face (https://www.robots.ox.ac.uk/~vgg/data/vgg_face/).

References

Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016)
Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows. In: International Conference on Machine Learning (2015)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014)
Google Scholar
Li, H., Zhong, Z., Guan, W., Du, C., Yang, Y., Wei, Y., Ye, C.: Generative character inpainting guided by structural information. Vis. Comput. 37, 1–12 (2021)
Article Google Scholar
Li, L., Tang, J., Ye, Z., Sheng, B., Mao, L., Ma, L.: Unsupervised face super-resolution via gradient enhancement and semantic guidance. Vis. Comput. 37, 1–13 (2021)
Article Google Scholar
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. CoRR arXiv:1312.6114 (2014)
Bartunov, S., Vetrov, D.: Few-shot generative modelling with generative matching networks. In: International Conference on Artificial Intelligence and Statistics, pp. 670–678 (2018)
Clouâtre, L., Demers, M.: FIGR: few-shot image generation with reptile. CoRR (2019)
Liang, W., Liu, Z., Liu, C.: DAWSON: a domain adaptive few shot generation framework. CoRR arXiv:2001.00576 (2020)
Phaphuangwittayakul, A., Guo, Y., Ying, F.: Fast adaptive meta-learning for few-shot image generation. IEEE Trans. Multimed. 24, 2205–2217 (2021)
Article Google Scholar
Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D., Makedon, F.: A survey on contrastive self-supervised learning. Technologies 9(1), 2 (2021)
Article Google Scholar
Wang, Y., Wu, X.-M., Li, Q., Gu, J., Xiang, W., Zhang, L., Li, V.O.K.: Large margin few-shot learning. CoRR arXiv:1807.02872 (2018)
Xiao, C., Madapana, N., Wachs, J.: One-shot image recognition using prototypical encoders with reduced hubness. In: Proceedings of IEEE/CVF Winter Conference on Applied Computing and Vision, pp. 2252–2261 (2021)
Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., de Freitas, N.: Learning to learn by gradient descent by gradient descent. CoRR arXiv:1606.04474 (2016)
Munkhdalai, T., Yu, H.: Meta networks. In: International Conference on Machine Learning, pp. 2554–2563 (2017)
Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4367–4375 (2018)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135 (2017)
Nichol, A., Achiam, J., Schulman, J.: On First-Order Meta-Learning Algorithms. CoRR arXiv:1803.02999 (2018)
Jamal, M.A., Qi, G.-J.: Task agnostic meta-learning for few-shot learning. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11719–11727 (2019)
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: International Conference on Learning Representations. OpenReview.net (2017)
Lake, B., Salakhutdinov, R., Gross, J., Tenenbaum, J.: One shot learning of simple visual concepts. In: Proceedings of Annual Meeting of the Cognitive Science Society, vol. 33, No. 33 (2011)
Rezende, D.J., Mohamed, S., Danihelka, I., Gregor, K., Wierstra, D.: One-shot generalization in deep generative models. arXiv preprint arXiv:1603.05106 (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Antoniou, A., Storkey, A.J., Edwards, H.: Data augmentation generative adversarial networks. CoRR arXiv:1711.04340 (2017)
Hong, Y., Niu, L., Zhang, J., Zhang, L.: MatchingGAN: matching-based few-shot image generation. In: 2020 IEEE International Conference on Multimedia Expo, pp. 1–6 (2020)
Hong, Y., Niu, L., Zhang, J., Zhao, W., Fu, C., Zhang, L.: F2GAN: fusing-and-filling GAN for few-shot image generation. In: Proceedings of 28th ACM International Conference on Multimedia, pp. 2535–2543 (2020)
van den Oord, A., Li, Y., Vinyals, O.: Representation Learning with Contrastive Predictive Coding. CoRR arXiv:1807.03748 (2018)
Li, J., Zhou, P., Xiong, C., Hoi, S.C.H.: Prototypical contrastive learning of unsupervised representations. In: International Conference on Learning Representations. OpenReview.net (2021)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: European Conference on Computer Vision, vol. 12356, pp. 776–794. Springer (2020)
Wang, J., Wang, Y., Liu, S., Li, A.: Few-shot fine-grained action recognition via bidirectional attention and contrastive meta-learning. CoRR arXiv:2108.06647 (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
LeCun, Y., Cortes, C.: MNIST handwritten digit database. AT&T Labs. Available http://yann.lecun.com/exdb/mnist (2010)
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face and Gesture Recognitions (FG 2018), pp. 67–74. IEEE (2018)
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GaN. In: International Conference on Learning Representations (2019)
Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., Yang, M.H.: Mode seeking generative adversarial networks for diverse image synthesis. In: Proceedings of IEEE Computer Society Conference Computer Vision and Pattern Recognitions, pp. 1429–1437 (2019)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30, 6626–6637 (2017)
Google Scholar
Xu, Q., Huang, G., Yuan, Y., Guo, C., Sun, Y., Wu, F., Weinberger, K.Q.: An empirical study on evaluation metrics of generative adversarial networks. CoRR arXiv:1806.07755 (2018)
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H.S., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognitions, pp. 1199–1208 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognitions, pp. 770–778 (2016)
Varghese, D., Tamaddoni-Nezhad, A., Moschoyiannis, S., Fodor, P., Vanthienen, J., Inclezan, D., Nikolov, N.: One-shot rule learning for challenging character recognition. RuleML+ RR, pp. 10–27 (2020)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science (80-) 350(6266), 1332–1338 (2015)
Article MathSciNet MATH Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, No. 1, p. 3 (2013)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (ICLR) (2018)

Download references

Acknowledgements

This research is financially supported by National Key Research and Development Program of China (No. 2020YFA0907800); is partially supported by open funding project of the State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China; is also supported by The National Key Research and Development Program of China (Grant Number 2018YFC0807105) and Science and Technology Committee of Shanghai Municipality (STCSM) (under Grant Numbers 17DZ1101003, 18511106602 and 18DZ2252300); and International College of Digital Innovation (ICDI), Chiang Mai University, Thailand.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China
Aniwat Phaphuangwittayakul & Yi Guo
State Key Laboratory of Bioreactor Engineering, Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China
Fangli Ying
National Engineering Laboratory for Big Data Distribution and Exchange Technologies, Shanghai, China
Yi Guo
Shanghai Engineering Research Center of Big Data and Internet Audience, Shanghai, China
Yi Guo
ADAPT Centre, School of Computing, Dublin City University, Dublin 9, Ireland
Liting Zhou
International College of Digital Innovation, Chiang Mai University, Chiang Mai, Thailand
Nopasit Chakpitak

Authors

Aniwat Phaphuangwittayakul
View author publications
You can also search for this author in PubMed Google Scholar
Fangli Ying
View author publications
You can also search for this author in PubMed Google Scholar
Yi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Liting Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Nopasit Chakpitak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fangli Ying.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest in the submission of this article for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Details of the network architecture

1.1 Generator

The generator and discriminator network are implemented from RaGAN [35]. The generator consists of one fully-connected layer followed by three blocks of \(4 \times 4\) deconvolution layer with stride 2 for upsampling. ReLU activation function and batch normalization [43] are used in these deconvolution blocks. The last block of the generator is a \(3 \times 3\) deconvolution layer with stride 1 followed by a Tanh activation function. And the input of the generator is a random noise vector \(z\).

1.2 Discriminator

The discriminator is composed of three blocks of \(3 \times 3\) convolution layer with stride 1 and \(4 \times 4\) convolution layer with stride 2 for downsampling. Both convolution layers in each block are followed by LeakyReLU [44] activation function and the spectral normalization [45]. The last block is a \(3 \times 3\) convolution layer with stride 1, followed by the LeakyReLU activation function, the spectral normalization, and one fully connected layer. The output of the discriminator network is one feature for predicting real or fake samples.

1.3 Encoder

The same network architecture as the discriminator is used, except for the output dimension of a fully connected layer. Instead of setting the dimension of a fully connected layer to 1 as a discriminator, the dimension of latent features is used. As a result, the output dimension of the encoder is equal to 100.

Appendix 2: More generation results

More example images generated by the CML-GAN with input noise vector z on MNIST, Omniglot, and VGG-Face datasets are provided in Figs. 6, 7, and 8, respectively. Four images from testing tasks in unseen categories are sampled and utilized for testing the model. As observed from the sampled images, CML-GAN can generate plausible and diverse images. However, it is challenging to generate images with structural variations of within-alphabet in the Omniglot dataset, because effective internal structural features are required for generation. Therefore, as shown in Fig. 7, the images generated from Omniglot dataset by using CML-GAN only have good spatial variation, but fail to show much structural variation on the strokes.

Appendix 3: More interpolation results

The interpolation results can be expanded with four output images generated by the CML-GAN. With the four output images in different angles, the generated images of MNIST, Omniglot, and VGG-Face datasets are demonstrated in Figs. 9, 10, and 11, respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phaphuangwittayakul, A., Ying, F., Guo, Y. et al. Few-shot image generation based on contrastive meta-learning generative adversarial network. Vis Comput 39, 4015–4028 (2023). https://doi.org/10.1007/s00371-022-02566-3

Download citation

Accepted: 30 May 2022
Published: 21 July 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00371-022-02566-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Few-shot image generation based on contrastive meta-learning generative adversarial network

Abstract

Access this article

Similar content being viewed by others

DeltaGAN: Towards Diverse Few-Shot Image Generation with Sample-Specific Delta

Rethinking cross-domain semantic relation for few-shot image generation

AMMGAN: adaptive multi-scale modulation generative adversarial network for few-shot image generation

Data availability

References

Acknowledgements