Image-to-Markup Generation via Paired Adversarial Learning

  • Jin-Wen WuEmail author
  • Fei Yin
  • Yan-Ming Zhang
  • Xu-Yao Zhang
  • Cheng-Lin Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


Motivated by the fact that humans can grasp semantic-invariant features shared by the same category while attention-based models focus mainly on discriminative features of each object, we propose a scalable paired adversarial learning (PAL) method for image-to-markup generation. PAL can incorporate the prior knowledge of standard templates to guide the attention-based model for discovering semantic-invariant features when the model pays attention to regions of interest. Furthermore, we also extend the convolutional attention mechanism to speed up the image-to-markup parsing process while achieving competitive performance compared with recurrent attention models. We evaluate the proposed method in the scenario of handwritten-image-to-LaTeX generation, i.e., converting handwritten mathematical expressions to LaTeX. Experimental results show that our method can significantly improve the generalization performance over standard attention-based encoder-decoder models.


Paired adversarial learning Semantic-invariant features Convolutional attention Handwritten-image-to-LaTeX generation 



This work has been supported by the National Natural Science Foundation of China (NSFC) Grants 61721004, 61411136002, 61773376, 61633021 and 61733007.

The authors want to thank Yi-Chao Wu for insightful comments and suggestion.


  1. 1.
    Álvaro, F., Sánchez, J.A., Benedí, J.M.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recogn. 51, 135–147 (2016)CrossRefGoogle Scholar
  2. 2.
    Anderson, R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics. In: Symposium on Interactive Systems for Experimental Applied Mathematics: Proceedings of the Association for Computing Machinery Inc. Symposium, pp. 436–459. ACM (1967)Google Scholar
  3. 3.
    Awal, A.M., Mouchère, H., Viard-Gaudin, C.: A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recogn. Lett. 35, 68–77 (2014)CrossRefGoogle Scholar
  4. 4.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
  5. 5.
    Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 95–104 (2017)Google Scholar
  6. 6.
    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)Google Scholar
  7. 7.
    Cho, K., Courville, A., Bengio, Y.: Describing multimedia content using attention-based encoder-decoder networks. IEEE Trans. Multimedia 17(11), 1875–1886 (2015)CrossRefGoogle Scholar
  8. 8.
    Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015)Google Scholar
  9. 9.
    Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. arXiv preprint arXiv:1612.08083 (2016)
  10. 10.
    Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: International Conference on Machine Learning, pp. 980–989 (2017)Google Scholar
  11. 11.
    Deng, Y., Kanervisto, A., Rush, A.M.: What you get is what you see: a visual markup decompiler. arXiv preprint arXiv:1609.04938 (2016)
  12. 12.
    Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)Google Scholar
  13. 13.
    Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)
  14. 14.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  15. 15.
    Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 545–552 (2009)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  17. 17.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  18. 18.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. arXiv preprint (2017)Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  20. 20.
    Le, A.D., Nakagawa, M.: Training an end-to-end system for handwritten mathematical expression recognition by generated patterns. In: 14th International Conference on Document Analysis and Recognition, vol. 1, pp. 1056–1061. IEEE (2017)Google Scholar
  21. 21.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  22. 22.
    Mouchere, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 791–796. IEEE (2014)Google Scholar
  23. 23.
    Mouchere, H., Zanibbi, R., Garain, U., Viard-Gaudin, C.: Advancing the state of the art for handwritten math recognition: the CROHME competitions, 2011–2014. Int. J. Doc. Anal. Recogn. 19(2), 173–189 (2016)CrossRefGoogle Scholar
  24. 24.
    Qureshi, A.H., Nakamura, Y., Yoshikawa, Y., Ishiguro, H.: Show, attend and interact: perceivable human-robot social interaction through neural attention q-network. In: International Conference on Robotics and Automation, pp. 1639–1645. IEEE (2017)Google Scholar
  25. 25.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  26. 26.
    Sayre, K.M.: Machine recognition of handwritten words: a project report. Pattern Recogn. 5(3), 213–228 (1973)CrossRefGoogle Scholar
  27. 27.
    Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)Google Scholar
  28. 28.
    Zhang, J., Du, J., Dai, L.: A gru-based encoder-decoder approach with attention for online handwritten mathematical expression recognition. arXiv preprint arXiv:1712.03991 (2017)
  29. 29.
    Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)CrossRefGoogle Scholar
  30. 30.
    Zhou, X.D., Wang, D.H., Tian, F., Liu, C.L., Nakagawa, M.: Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2413–2426 (2013)CrossRefGoogle Scholar
  31. 31.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jin-Wen Wu
    • 1
    • 2
    Email author
  • Fei Yin
    • 1
  • Yan-Ming Zhang
    • 1
  • Xu-Yao Zhang
    • 1
  • Cheng-Lin Liu
    • 1
    • 2
    • 3
  1. 1.NLPR, Institute of AutomationChinese Academy of SciencesBeijingPeople’s Republic of China
  2. 2.University of Chinese Academy of SciencesBeijingPeople’s Republic of China
  3. 3.CAS Center for Excellence of Brain Science and Intelligence TechnologyBeijingPeople’s Republic of China

Personalised recommendations