Abstract
The emergence of transformer models like BERT, GPT-2, GPT-3, RoBERTa, T5 for natural language understanding tasks has opened the floodgates towards solving a wide array of machine learning tasks in other modalities like images, audio, music, sketches and so on. These language models are domain-agnostic and as a result could be applied to 1-D sequences of any kind. However, the key challenge lies in bridging the modality gap so that they could generate strong features beneficial for out-of-domain tasks. This work focuses on leveraging the power of such pre-trained language models and discusses the challenges in predicting challenging handwritten symbols and alphabets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhunia, A.K., et al.: Pixelor: A competitive sketching AI agent. so you think you can sketch? ACM Trans. Graph. (TOG) 39(6), 1–15 (2020)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fabi, S., Otte, S., Scholz, F., Wührer, J., Karlbauer, M., Butz, M.V.: Extending the omniglot challenge: imitating handwriting styles on a new sequential dataset. IEEE Trans. Cogn. Dev. Syst. 15, 896–903 (2022)
Feinman, R., Lake, B.M.: Learning task-general representations with generative neuro-symbolic modeling. arXiv preprint arXiv:2006.14448 (2020)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
Ha, D., Eck, D.: A neural representation of sketch drawings. In: ICLR 2018 (2018). https://openreview.net/pdf?id=Hy6GHpkCW
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916 (2022)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: The omniglot challenge: a 3-year progress report. Current Opin. Behav. Sci. 29, 97–104 (2019)
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017)
Liu, Y., et al.: Roberta: a robustly optimized Bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lladós, J.: Two decades of GREC workshop series. conclusions of GREC2017. In: Fornés, A., Lamiroy, B. (eds.) GREC 2017. LNCS, vol. 11009, pp. 163–168. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02284-6_14
Souibgui, M.A., et al.: One-shot compositional data generation for low resource handwritten text recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 935–943 (2022)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Acknowledgment
This work has been partially supported by the Spanish project PID2021-126808OB-I00, the Catalan project 2021 SGR 01559 and the PhD Scholarship from AGAUR (2021FIB-10010). The Computer Vision Center is part of the CERCA Program / Generalitat de Catalunya.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tiwari, A., Biswas, S., Lladós, J. (2023). Can Pre-trained Language Models Help in Understanding Handwritten Symbols?. In: Coustaty, M., Fornés, A. (eds) Document Analysis and Recognition – ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14193. Springer, Cham. https://doi.org/10.1007/978-3-031-41498-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-41498-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41497-8
Online ISBN: 978-3-031-41498-5
eBook Packages: Computer ScienceComputer Science (R0)