Skip to main content

Can Pre-trained Language Models Help in Understanding Handwritten Symbols?

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2023 Workshops (ICDAR 2023)

Abstract

The emergence of transformer models like BERT, GPT-2, GPT-3, RoBERTa, T5 for natural language understanding tasks has opened the floodgates towards solving a wide array of machine learning tasks in other modalities like images, audio, music, sketches and so on. These language models are domain-agnostic and as a result could be applied to 1-D sequences of any kind. However, the key challenge lies in bridging the modality gap so that they could generate strong features beneficial for out-of-domain tasks. This work focuses on leveraging the power of such pre-trained language models and discusses the challenges in predicting challenging handwritten symbols and alphabets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://huggingface.co/.

  2. 2.

    https://github.com/brendenlake/omniglot.

References

  1. Bhunia, A.K., et al.: Pixelor: A competitive sketching AI agent. so you think you can sketch? ACM Trans. Graph. (TOG) 39(6), 1–15 (2020)

    Google Scholar 

  2. Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)

    Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Fabi, S., Otte, S., Scholz, F., Wührer, J., Karlbauer, M., Butz, M.V.: Extending the omniglot challenge: imitating handwriting styles on a new sequential dataset. IEEE Trans. Cogn. Dev. Syst. 15, 896–903 (2022)

    Article  Google Scholar 

  5. Feinman, R., Lake, B.M.: Learning task-general representations with generative neuro-symbolic modeling. arXiv preprint arXiv:2006.14448 (2020)

  6. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)

    Google Scholar 

  7. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)

  8. Ha, D., Eck, D.: A neural representation of sketch drawings. In: ICLR 2018 (2018). https://openreview.net/pdf?id=Hy6GHpkCW

  9. Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916 (2022)

  10. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  11. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: The omniglot challenge: a 3-year progress report. Current Opin. Behav. Sci. 29, 97–104 (2019)

    Article  Google Scholar 

  12. Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017)

    Article  Google Scholar 

  13. Liu, Y., et al.: Roberta: a robustly optimized Bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  14. Lladós, J.: Two decades of GREC workshop series. conclusions of GREC2017. In: Fornés, A., Lamiroy, B. (eds.) GREC 2017. LNCS, vol. 11009, pp. 163–168. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02284-6_14

    Chapter  Google Scholar 

  15. Souibgui, M.A., et al.: One-shot compositional data generation for low resource handwritten text recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 935–943 (2022)

    Google Scholar 

  16. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  17. Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

Download references

Acknowledgment

This work has been partially supported by the Spanish project PID2021-126808OB-I00, the Catalan project 2021 SGR 01559 and the PhD Scholarship from AGAUR (2021FIB-10010). The Computer Vision Center is part of the CERCA Program / Generalitat de Catalunya.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanket Biswas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tiwari, A., Biswas, S., Lladós, J. (2023). Can Pre-trained Language Models Help in Understanding Handwritten Symbols?. In: Coustaty, M., Fornés, A. (eds) Document Analysis and Recognition – ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14193. Springer, Cham. https://doi.org/10.1007/978-3-031-41498-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41498-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41497-8

  • Online ISBN: 978-3-031-41498-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics