Advertisement

RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering

Conference paper
  • 906 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)

Abstract

Style transfer has attracted much interest owing to its various applications. Compared with English character or general artistic style transfer, Chinese character style transfer remains a challenge owing to the large size of the vocabulary (70224 characters in GB18010-2005) and the complexity of the structure. Recently some GAN-based methods were proposed for style transfer; however, they treated Chinese characters as a whole, ignoring the structures and radicals that compose characters. In this paper, a novel radical decomposition-and-rendering-based GAN (RD-GAN) is proposed to utilize the radical-level compositions of Chinese characters and achieves few-shot/zero-shot Chinese character style transfer. The RD-GAN consists of three components: a radical extraction module (REM), radical rendering module (RRM), and multi-level discriminator (MLD). Experiments demonstrate that our method has a powerful few-shot/zero-shot generalization ability by using the radical-level compositions of Chinese characters.

Keywords

GAN Style transfer Radical decomposition Few-Shot/Zero-Shot learning 

Notes

Acknowledgement

This research is supported in part by NSFC (Grant No.: 61936003), GD-NSF (no. 2017A030312006), Alibaba Innovative Research Foundation (no. D8200510), and Fundamental Research Funds for the Central Universities (no. D2190570).

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Azadi, S., Fisher, M., Kim, V., Wang, Z., Shechtman, E., Darrell, T.: Multi-content GAN for few-shot font style transfer (2017)Google Scholar
  5. 5.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
  6. 6.
    Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: ICASSP (2016)Google Scholar
  7. 7.
    Bai, F., Cheng, Z., Niu, Y., Pu, S., Zhou, S.: Edit probability for scene text recognition. In: CVPR (2018)Google Scholar
  8. 8.
    Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: NIPS (2016)Google Scholar
  9. 9.
    Chen, T.Q., Schmidt, M.: Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337 (2016)
  10. 10.
    Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289 (2015)
  11. 11.
    Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
  12. 12.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  13. 13.
    Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
  14. 14.
    Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_6CrossRefGoogle Scholar
  15. 15.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)Google Scholar
  16. 16.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)Google Scholar
  17. 17.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
  18. 18.
    Jiang, Y., Lian, Z., Tang, Y., Xiao, J.: SCFont: structure-guided Chinese font generation via deep stacked networks. In: AAAI (2019)Google Scholar
  19. 19.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  20. 20.
    Kim, S., Hori, T., Watanabe, S.: Joint CTC-attention based end-to-end speech recognition using multi-task learning. In: ICASSP (2017)Google Scholar
  21. 21.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: AAAI (2019)Google Scholar
  23. 23.
    Lian, Z., Zhao, B., Chen, X., Xiao, J.: EasyFont: a style learning-based system to easily build your large-scale handwriting fonts. ACM Trans. Graph. (TOG) 38, 1–18 (2018)CrossRefGoogle Scholar
  24. 24.
    Lian, Z., Zhao, B., Xiao, J.: Automatic generation of large-scale handwriting fonts via style learning. In: SIGGRAPH ASIA 2016 Technical Briefs (2016)Google Scholar
  25. 25.
    Lin, Q., Liang, L., Huang, Y., Jin, L.: Learning to generate realistic scene Chinese character images by multitask coupled GAN. In: Lai, J.-H., Liu, C.-L., Chen, X., Zhou, J., Tan, T., Zheng, N., Zha, H. (eds.) PRCV 2018. LNCS, vol. 11258, pp. 41–51. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-03338-5_4CrossRefGoogle Scholar
  26. 26.
    Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NIPS (2017)Google Scholar
  27. 27.
    Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: NIPS (2016)Google Scholar
  28. 28.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  29. 29.
    Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: CVPR (2017)Google Scholar
  30. 30.
    Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NIPS (2016)Google Scholar
  31. 31.
    Luan, Q., Wen, F., Cohen-Or, D., Liang, L., Xu, Y.Q., Shum, H.Y.: Natural image colorization. In: Proceedings of the 18th Eurographics Conference on Rendering Techniques (2007)Google Scholar
  32. 32.
    Lyu, P., Bai, X., Yao, C., Zhu, Z., Huang, T., Liu, W.: Auto-encoder guided GAN for Chinese calligraphy synthesis. In: ICDAR, vol. 1 (2017)Google Scholar
  33. 33.
    Ma, L.L., Liu, C.L.: A new radical-based approach to online handwritten Chinese character recognition. In: ICPR (2008)Google Scholar
  34. 34.
    Miyazaki, T., et al.: Automatic generation of typographic font from small font subset. IEEE Comput. Graph. Appl. 40, 99–111 (2019)CrossRefGoogle Scholar
  35. 35.
    Myers, J.: Knowing Chinese character grammar. Cognition 147, 127–132 (2016)CrossRefGoogle Scholar
  36. 36.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV (2015)Google Scholar
  37. 37.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NIPS (2016)Google Scholar
  38. 38.
    Shen, W., Zhao, K., Jiang, Y., Wang, Y., Zhang, Z., Bai, X.: Object skeleton extraction in natural images by fusing scale-associated deep side outputs. In: CVPR (2016)Google Scholar
  39. 39.
    Shen, X., Chen, Y.C., Tao, X., Jia, J.: Convolutional neural pyramid for image processing. In: CVPR (2017)Google Scholar
  40. 40.
    Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2035–2048 (2018)CrossRefGoogle Scholar
  41. 41.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: CVPR (2017)Google Scholar
  42. 42.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  43. 43.
    Upchurch, P., Snavely, N., Bala, K.: From a to z: supervised transfer of style and content using deep neural network generators. arXiv preprint arXiv:1603.02003 (2016)
  44. 44.
    Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)Google Scholar
  45. 45.
    Wang, T., et al.: Decoupled attention network for text recognition. In: AAAI (2020)Google Scholar
  46. 46.
    Wang, T.Q., Yin, F., Liu, C.L.: Radical-based Chinese character recognition via multi-labeled learning of deep residual networks. In: ICDAR (2017)Google Scholar
  47. 47.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)CrossRefGoogle Scholar
  48. 48.
    Wen, C., Chang, J., Zhang, Y.: Handwritten Chinese font generation with collaborative stroke refinement. arXiv preprint arXiv:1904.13268 (2019)
  49. 49.
    Wigington, C., Tensmeyer, C., Davis, B., Barrett, W., Price, B., Cohen, S.: Start, follow, read: end-to-end full-page handwriting recognition. In: ECCV (2018)Google Scholar
  50. 50.
    Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV (2015)Google Scholar
  51. 51.
    Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)Google Scholar
  52. 52.
    Xu, S., Jin, T., Jiang, H., Lau, F.C.: Automatic generation of personal Chinese handwriting by capturing the characteristics of personal handwriting. In: IAAI (2009)Google Scholar
  53. 53.
    Yang, H., Jin, L., Huang, W., Yang, Z., Lai, S., Sun, J.: Dense and tight detection of Chinese characters in historical documents: datasets and a recognition guided detector. IEEE Access 6, 30174–30183 (2018)CrossRefGoogle Scholar
  54. 54.
    Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: CVPR (2016)Google Scholar
  55. 55.
    Yiming Gao, J.W.: GAN-based unpaired Chinese character image translation via skeleton transformation and stroke rendering. In: AAAI (2020)Google Scholar
  56. 56.
    Zhang, J., Du, J., Dai, L.: Track, attend, and parse (TAP): an end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia 21, 221–233 (2018)CrossRefGoogle Scholar
  57. 57.
    Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)CrossRefGoogle Scholar
  58. 58.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23, 1499–1503 (2016)CrossRefGoogle Scholar
  59. 59.
    Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_40CrossRefGoogle Scholar
  60. 60.
    Zhang, S., Liu, Y., Jin, L., Huang, Y., Lai, S.: EnsNet: Ensconce text in the wild. In: AAAI (2019)Google Scholar
  61. 61.
    Zhang, X.Y., Yin, F., Zhang, Y.M., Liu, C.L., Bengio, Y.: Drawing and recognizing Chinese characters with recurrent neural network. IEEE Trans. Pattern Anal. Mach. Intell. 40, 849–862 (2017)CrossRefGoogle Scholar
  62. 62.
    Zhang, Y., Zhang, Y., Cai, W.: Separating style and content for generalized style transfer. In: CVPR (2018)Google Scholar
  63. 63.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Electronic and Information EngineeringSouth China University of TechnologyGuangzhouChina
  2. 2.Alibaba GroupHangzhouChina

Personalised recommendations