Skip to main content

Open-Set Text Recognition Implementations(I): Label-to-Representation Mapping

  • Chapter
  • First Online:
Open-Set Text Recognition

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 79 Accesses

Abstract

This chapter describes the possible approaches to implement the representation-related variable and module in the framework discussed above, i.e., the representation space and the label-to-representation mapping module. First, this chapter introduces how characters, or other corresponding granularities, are represented in different methods, i.e., the representation space, where class centers (prototypes) and features extracted from input images reside. Second, we discuss choices of human representation of labels (side-information) and different approaches in the literature to implement the label-to-representation mapping module. The module, which maps the side information to prototypes residing in the representation space, is the key to implementing class incremental learning functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Might have some semantic mixed in as well without explicit isolation.

  2. 2.

    Note these methods also raise fairness concerns, for they use much more data during pretraining, which may or may not include the test images.

References

  1. Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: Recognizing scene text words. In: 12th International Conference on Document Analysis and Recognition. ICDAR 2013, August 25–28, pp. 398–402. IEEE Computer Society, Washington, DC, USA (2013)

    Google Scholar 

  2. Manmatha, R., Han, C., Riseman, E.M.: Word spotting: A new approach to indexing handwriting. In: 1996 Conference on Computer Vision and Pattern Recognition (CVPR ’96). June 18–20, 1996, pp. 631–637. IEEE Computer Society, San Francisco, CA, USA (1996)

    Google Scholar 

  3. Wang, K., Belongie, S.J.: Word spotting in the wild. In: Computer Vision–ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, Part I, ser. Lecture Notes in Computer Science, vol. 6311, pp. 591–604. Springer (2010)

    Google Scholar 

  4. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  Google Scholar 

  5. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  6. Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: A simple and strong baseline for irregular text recognition. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27–February 1, 2019, pp. 8610–8617. AAAI Press (2019)

    Google Scholar 

  7. Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 6538–6547. Computer Vision Foundation/IEEE (2019)

    Google Scholar 

  8. Chanda, S., Baas, J., Haitink, D., Hamel, S., Stutzmann, D., Schomaker, L.: Zero-shot learning based approach for medieval word recognition using deep-learned features. In: 16th International Conference on Frontiers in Handwriting Recognition, ICFHR 2018, Niagara Falls, NY, USA, August 5–8, 2018, pp. 345–350. IEEE Computer Society (2018)

    Google Scholar 

  9. Chanda, S., Haitink, D., Prasad, P.K., Baas, J., Pal, U., Schomaker, L.: Recognizing Bengali word images–A zero-shot learning perspective. In: 25th International Conference on Pattern Recognition, ICPR 2020, Virtual Event/Milan, Italy, January 10–15, 2021, pp. 5603–5610. IEEE (2020)

    Google Scholar 

  10. Rai, A., Krishnan, N.C., Chanda, S.: Pho(sc)net: An approach towards zero-shot word image recognition in historical documents. In: 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I, ser. Lecture Notes in Computer Science, vol. 12821, pp. 19–33. Springer (2021)

    Google Scholar 

  11. Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019, pp. 4714–4722. IEEE (2019)

    Google Scholar 

  12. Wan, Z., Zhang, J., Zhang, L., Luo, J., Yao, C.: On vocabulary reliance in scene text recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 11 422–11 431. IEEE (2020)

    Google Scholar 

  13. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)

    Article  Google Scholar 

  14. Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., Bai, X.: Scene text recognition from two-dimensional perspective. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27–February 1, 2019, pp. 8714–8721. AAAI Press (2019)

    Google Scholar 

  15. Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 12 110–12 119. IEEE (2020)

    Google Scholar 

  16. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, 2021, pp. 7098–7107. Computer Vision Foundation/IEEE (2021)

    Google Scholar 

  17. Zhang, C., Gupta, A., Zisserman, A.: Adaptive text recognition through visual matching. In: Computer Vision–ECCV 2020–16th European Conference, Glasgow, UK, August 23–28, Proceedings, Part XVI, ser. Lecture Notes in Computer Science, vol. 12361, pp. 51–67. Springer (2020)

    Google Scholar 

  18. Souibgui, M.A., Fornés, A., Kessentini, Y., Megyesi, B.: Few shots is all you need: A progressive few shot learning approach for low resource handwriting recognition (2021). [Online]. Available: https://arxiv.org/abs/2107.10064

  19. Ao, X., Zhang, X., Yang, H., Yin, F., Liu, C.: Cross-modal prototype learning for zero-shot handwriting recognition. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, Australia, September 20–25, 2019, pp. 589–594. IEEE (2019)

    Google Scholar 

  20. Cao, Z., Lu, J., Cui, S., Zhang, C.: Zero-shot handwritten Chinese character recognition with hierarchical decomposition embedding. Pattern Recognit. 107, 107488 (2020)

    Article  Google Scholar 

  21. Huang, Y., Jin, L., Peng, D.: Zero-shot Chinese text recognition via matching class embedding. In: 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part III, ser. Lecture Notes in Computer Science, vol. 12823, pp. 127–141. Springer (2021)

    Google Scholar 

  22. Liu, C., Yang, C., Qin, H., Zhu, X., Liu, C., Yin, X.: Towards open-set text recognition via label-to-prototype learning. Pattern Recognit. 134, 109109 (2023)

    Article  Google Scholar 

  23. Liu, C., Yang, C., Yin, X.: Open-set text recognition via character-context decoupling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, pp. 4513–4522. IEEE (2022)

    Google Scholar 

  24. Chen, J., Li, B., Xue, X.: Zero-shot Chinese character recognition with stroke-level decomposition. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19–27 August 2021, pp. 615–621 (2021). www.ijcai.org

  25. Zhang, J., Matsumoto, T.: Improving character-level japanese-chinese neural machine translation with radicals as an additional input feature. In: 2017 International Conference on Asian Language Processing (IALP), pp. 172–175. IEEE (2017)

    Google Scholar 

  26. Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)

    Article  Google Scholar 

  27. Wang, T., Xie, Z., Li, Z., Jin, L., Chen, X.: Radical aggregation network for few-shot offline handwritten Chinese character recognition. Pattern Recognit. Lett. 125, 821–827 (2019)

    Article  Google Scholar 

  28. Wang, W., Zhang, J., Du, J., Wang, Z., Zhu, Y.: DenseRAN for offline handwritten Chinese character recognition. In: 16th International Conference on Frontiers in Handwriting Recognition, ICFHR 2018, Niagara Falls, NY, USA, August 5–8, 2018, pp. 104–109. IEEE Computer Society (2018)

    Google Scholar 

  29. Zhang, J., Zhu, Y., Du, J., Dai, L.: Trajectory-based radical analysis network for online handwritten Chinese character recognition. In: 24th International Conference on Pattern Recognition, ICPR 2018, Beijing, China, August 20–24, 2018, pp. 3681–3686. IEEE Computer Society (2018)

    Google Scholar 

  30. Zhang, J., Zhu, Y., Du, J., Dai, L.: Radical analysis network for zero-shot learning in printed Chinese character recognition. In: 2018 IEEE International Conference on Multimedia and Expo, ICME 2018, San Diego, CA, USA, July 23–27, 2018, pp. 1–6. IEEE Computer Society (2018)

    Google Scholar 

  31. Ke, Y., Hagiwara, M.: Cnn-encoded radical-level representation for Japanese processing. Trans. Japanese Soc. Artif. Intell. 33(4), D–I23 (2018)

    Google Scholar 

  32. Liu, Y., Liu, Q., Chen, J., Wang, Y.: Reading chinese in natural scenes with a bag-of-radicals prior. In: 33rd British Machine Vision Conference,: BMVC 2022, London, UK, November 21–24, 2022, p. 969. BMVA Press (2022)

    Google Scholar 

  33. Zu, X., Yu, H., Li, B., Xue, X.: Chinese character recognition with augmented character profile matching. In: MM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10–14, 2022, pp. 6094–6102. ACM (2022)

    Google Scholar 

  34. Chen, Z., Yang, W., Li, X.: Stroke-based autoencoders: Self-supervised learners for efficient zero-shot Chinese character recognition (2022). [Online]. Available: http://arxiv.org/abs/2207.08191

  35. Yu, H., Chen, J., Li, B., Xue, X.: Chinese character recognition with radical-structured stroke trees (2022). [Online]. Available: http://arxiv.org/abs/2211.13518

  36. He, S., Schomaker, L.: Open set Chinese character recognition using multi-typed attributes (2018). [Online]. Available: http://arxiv.org/abs/1808.08993

  37. Li, B., Tang, X., Qi, X., Chen, Y., Xiao, R.: Hamming OCR: A locality sensitive hashing neural network for scene text recognition (2020). [Online]. Available: https://arxiv.org/abs/2009.10874

  38. Chen, J., Yu, H., Ma, J., Guan, M., Xu, X., Wang, X., Qu, S., Li, B., Xue, X.: Benchmarking Chinese text recognition: Datasets, baselines, and an empirical study (2021). [Online]. Available: https://arxiv.org/abs/2112.15093

  39. Zhang, J., Du, J., Dai, L.: Radical analysis network for learning hierarchies of Chinese characters. Pattern Recognit. 103, 107305 (2020)

    Article  Google Scholar 

  40. Zeng, J., Xu, R., Wu, Y., Li, H., Lu, J.: STAR: zero-shot chinese character recognition with stroke- and radical-level decompositions (2022). [Online]. Available: http://arxiv.org/abs/2210.08490

  41. Diao, X., Shi, D., Tang, H., Wu, L., Li, Y., Xu, H.: REZCR: A zero-shot character recognition method via radical extraction (2022). [Online]. Available: https://arxiv.org/abs/2207.05842

  42. Brunelli, R.: Template Matching Techniques in Computer Vision: Theory and Practice. John Wiley & Sons (2009)

    Google Scholar 

  43. Paz-Argaman, T., Tsarfaty, R., Chechik, G., Atzmon, Y.: ZEST: Zero-shot learning from text descriptions using textual similarity and visual summarization. In: Findings of the Association for Computational Linguistics: EMNLP. Association for Computational Linguistics 2020, 569–579 (2020)

    Google Scholar 

  44. Yang, C., Liu, C., Yin, X.: Weakly correlated knowledge integration for few-shot image classification. Int. J. Autom. Comput. 19(1), 24–37 (2022)

    Google Scholar 

  45. Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5–10, 2016, pp. 3630–3638. Barcelona, Spain (2016)

    Google Scholar 

  46. Koch, G., Zemel, R., Salakhutdinov, R., et al.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2, 1st edn. Lille (2015)

    Google Scholar 

  47. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  48. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019)

    Google Scholar 

  49. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding with unsupervised learning (2018)

    Google Scholar 

  50. Ma, Z., Luo, G., Gao, J., Li, L., Chen, Y., Wang, S., Zhang, C., Hu, W.: Open-vocabulary one-stage detection with hierarchical visual-language knowledge distillation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, pp. 14 054–14 063. IEEE (2022)

    Google Scholar 

  51. Huang, G., Luo, X., Wang, S., Gu, T., Su, K.: Hippocampus-heuristic character recognition network for zero-shot learning in Chinese character recognition. Pattern Recognit. 130, 108818 (2022)

    Article  Google Scholar 

  52. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  53. Tay, Y., Dehghani, M., Abnar, S., Shen, Y., Bahri, D., Pham, P., Rao, J., Yang, L., Ruder, S., Metzler, D.: Long range arena : A benchmark for efficient transformers. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu-Cheng Yin .

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Yin, XC., Yang, C., Liu, C. (2024). Open-Set Text Recognition Implementations(I): Label-to-Representation Mapping. In: Open-Set Text Recognition. SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-97-0361-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0361-6_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0360-9

  • Online ISBN: 978-981-97-0361-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics