Can Pre-trained Language Models Help in Understanding Handwritten Symbols?

Tiwari, Adarsh; Biswas, Sanket; Lladós, Josep

doi:10.1007/978-3-031-41498-5_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14193))

Included in the following conference series:

International Conference on Document Analysis and Recognition

641 Accesses

Abstract

The emergence of transformer models like BERT, GPT-2, GPT-3, RoBERTa, T5 for natural language understanding tasks has opened the floodgates towards solving a wide array of machine learning tasks in other modalities like images, audio, music, sketches and so on. These language models are domain-agnostic and as a result could be applied to 1-D sequences of any kind. However, the key challenge lies in bridging the modality gap so that they could generate strong features beneficial for out-of-domain tasks. This work focuses on leveraging the power of such pre-trained language models and discusses the challenges in predicting challenging handwritten symbols and alphabets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bhunia, A.K., et al.: Pixelor: A competitive sketching AI agent. so you think you can sketch? ACM Trans. Graph. (TOG) 39(6), 1–15 (2020)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fabi, S., Otte, S., Scholz, F., Wührer, J., Karlbauer, M., Butz, M.V.: Extending the omniglot challenge: imitating handwriting styles on a new sequential dataset. IEEE Trans. Cogn. Dev. Syst. 15, 896–903 (2022)
Article Google Scholar
Feinman, R., Lake, B.M.: Learning task-general representations with generative neuro-symbolic modeling. arXiv preprint arXiv:2006.14448 (2020)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Google Scholar
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
Ha, D., Eck, D.: A neural representation of sketch drawings. In: ICLR 2018 (2018). https://openreview.net/pdf?id=Hy6GHpkCW
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916 (2022)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
Article MathSciNet MATH Google Scholar
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: The omniglot challenge: a 3-year progress report. Current Opin. Behav. Sci. 29, 97–104 (2019)
Article Google Scholar
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017)
Article Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized Bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lladós, J.: Two decades of GREC workshop series. conclusions of GREC2017. In: Fornés, A., Lamiroy, B. (eds.) GREC 2017. LNCS, vol. 11009, pp. 163–168. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02284-6_14
Chapter Google Scholar
Souibgui, M.A., et al.: One-shot compositional data generation for low resource handwritten text recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 935–943 (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar

Download references

Acknowledgment

This work has been partially supported by the Spanish project PID2021-126808OB-I00, the Catalan project 2021 SGR 01559 and the PhD Scholarship from AGAUR (2021FIB-10010). The Computer Vision Center is part of the CERCA Program / Generalitat de Catalunya.

Author information

Authors and Affiliations

Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
Adarsh Tiwari, Sanket Biswas & Josep Lladós

Authors

Adarsh Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Sanket Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Josep Lladós
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanket Biswas .

Editor information

Editors and Affiliations

University of La Rochelle, La Rochelle, France
Mickael Coustaty
Autonomous University of Barcelona, Bellaterra, Spain
Alicia Fornés

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tiwari, A., Biswas, S., Lladós, J. (2023). Can Pre-trained Language Models Help in Understanding Handwritten Symbols?. In: Coustaty, M., Fornés, A. (eds) Document Analysis and Recognition – ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14193. Springer, Cham. https://doi.org/10.1007/978-3-031-41498-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-41498-5_15
Published: 15 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41497-8
Online ISBN: 978-3-031-41498-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Can Pre-trained Language Models Help in Understanding Handwritten Symbols?