Abstract
In identifying objects, people usually associate memory templates to guide visual attention and determine the category of an object. The initial character images that children learn are usually normal patterns. However, the variation in corresponding handwritten patterns is quite large. To learn these deformed images with large variance, current deep models must involve millions of parameters for such kind of classification tasks that seem much easier and simpler to children who learn to recognize new characters associated with their initially taught normal patterns. From the perspective of humans’ perception, when people see a new object, they first think of a template image in their memory, which is similar to the object. This mapping process makes it easier for humans to learn new objects. Inspired by this cognitive association mechanism, this study developed a cognition-inspired handwritten character recognition model using a proposed normal template mapping neural network. This model uses an encoder-decoder architecture to build a normal template mapping neural network that transforms handwritten character images of one class to normalized characters similar to a given printed template character image representing that class. Then, a simple shallow classifier recognizes these normalized images, which are easier to classify. The experimental results show that the proposed model completes handwritten character recognition with comparable or higher precision at a much lower parameter count than current representative deep models. The proposed model removes the individual styles of handwritten character images and maps them to patterns similar to normal template images. This greatly reduces the classification difficulty and enables the classifier to classify only known standard character images.
Similar content being viewed by others
Data Availability
The datasets of NMIST and EMIST that support the findings of this study are publicly available from the web.
References
Ibadulla R, Chen TM, Reyes-Aldasoro CC. FatNet: high-resolution kernels for classification using fully convolutional optical neural networks. AI. 2023;4:361–74. https://doi.org/10.3390/ai4020018.
Zhou Y, Sun P, Zhang Y, Anguelov D, Gao J, Ouyang T, Guo J, Ngiam J, Vasudevan V. “End-to-end multi-view fusion for 3d object detection in lidar point clouds,” InConference on Robot Learning, 2020, pp. 923–932.
Giv MD, Borujeini MH, Makrani DS, Dastranj L, Yadollahi M, Semyari S, Sadrnia M, Ataei G, Madvar HR. Lung segmentation using active shape model to detect the disease from chest radiography. J Biomed Phys Eng. 2021;11:747.
Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. 2013. arXiv preprint arXiv:1312.6199.
Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. 2017. arXiv preprint arXiv:1706.06083.
Kim YG, Kim K, Wu D, Ren H, Tak WY, Park SY, Lee YR, Kang MK, Park JG, Kim BS, et al. Deep learning-based four-region lung segmentation in chest radiography for COVID-19 diagnosis. Diagnostics. 2022;12:101.
Nguyen A, Yosinski J, Clune J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 427–436.
Emanuel Ben-Baruch, Tal Ridnik, Itamar Friedman, Avi Ben Cohen, Nadav Zamir, Asaf Noy, and Lihi Zelnik-Manor. Multi-label classification with partial annotations using class aware selective loss. In Proceedings of the IEEE/CVF Con ference on Computer Vision and Pattern Recognition, pages 4764–4772, 2022.
Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, and Fei Wu. Variational cross graph reasoning and adaptive structured semantics learning for compositional temporal grounding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
Cavalin P, Oliveira L. Confusion matrix-based building of hierarchical classification[C]//Iberoamerican Congress on Pattern Recognition. Cham: Springer; 2018. p. 271–8.
Law H, Deng J. CornerNet: detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV). 2048;734–750.
LeCun Y, Bengio Y. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks. 1995;3361(10):1995.
Biederman I. Recognition-by-components: a theory of human image understanding. Psychol Rev. 1987;94(2):115–47.
Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA. 1982;79:2554–8.
Yu X, Johal S, Geng J. Visual search guidance uses coarser template information than target-match decisions. Atten Percept Psychophys. 2022;84(5):1432–45.
Lau J, Pashler H, Brady T. Target templates in low target-distractor discriminability visual search have higher resolution, but the advantage they provide is short-lived. Atten Percept Psychophys. 2021;83(4):1435–54.
Kiat J, Bahle B, Luck S. Search templates for real-world objects in natural scenes. J Vis. 2022;22(14):4477.
Volkova S. Template selection technique on object recognition. Proc. SPIE 12564, International Conference on Computer Applications for Management and Sustainable Development of Production and Industry. 2023;125640V.
Sahadevan S, Chen Y, Caplan J. Imagery-based strategies for memory for associations. Memory. 2021;29(10):1275–95.
Mei L, Zhao Y, Wang H, Wang C, Zhang J, Zhao X. Matching by pixel distribution comparison: multisource image template matching. IET Signal Process. 2022;17(2).
Le M, Lien J. Robot arm grasping using learning-based template matching and self-rotation learning network. Preprint of Research Square. 2022. https://doi.org/10.21203/rs.3.rs-1402918/v1.
Li D, Song L, Wei Q, Chai H, Han T. Dynamic learning rate of template update for visual target tracking. Mathematics. 2023;11(9):1988.
Hanne A, Tünnermann J, Schubö A. Target templates and the time course of distractor location learning. PsyArXiv. 2022. https://doi.org/10.31234/osf.io/728ch
Liu T, Wei B, Chang B, Sui Z. Large-scale simple question generation by template-based Seq2seq learning. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural language processing and Chinese computing. NLPCC 2017. Lect Notes Comput Sci. 2018;10619. Springer, Cham.
Wei H, Pan S, Ma G, Duan X. Vision-guided hand–eye coordination for robotic grasping and its application in tangram puzzles. AI 2021, 2, 209–228. https://doi.org/10.3390/ai2020013.
Wei H, Li H. Shape description and recognition method inspired by the primary visual cortex. Cogn Comput. 2014;6:164–74.
Alain G, Bengio Y. Understanding intermediate layers using linear classifier probes. 2016. arXiv preprint arXiv:1610.01644.
LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proc IEEE. 1998;86(11):2278–324.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016;2016:770–8.
Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017;2017:2261–9.
Kabir HM, Abdar M, Jalali SMJ, et al. SpinalNet: deep neural network with gradual input. arXiv preprint arXiv:2007.03347, 2020.
Jayasundara V, Jayasekara S, Jayasekara H, et al. TextCaps: handwritten character recognition with very small datasets[C]//2019 IEEE winter conference on applications of computer vision (WACV). IEEE, 2019: 254–262.
Howard AG. MobileNets: efficient convolutional neural networks for mobile vision applications. 2017. https://doi.org/10.48550/arXiv.1704.04861.
Ma N, Zhang X, Zheng H-T, Sun J. ShuffleNet V2: practical guidelines for efficient CNN architecture design. 2018. https://doi.org/10.48550/arXiv.1807.11164.
Cohen G, Afshar S, Tapson J, et al. EMNIST: extending MNIST to handwritten letters[C]//2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017: 2921–2926.
Dufourq E, Bassett BA. Eden: Evolutionary deep networks for efficient machine learning[C]//2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech). IEEE. 2017:110–115.
Cheolhwan O, Zak SH. Large-scale pattern storage and retrieval using generalized brain-state-in box neural networks. IEEE Trans Neural Networks. 2010;4(21):633–43.
Kosko B. Adaptive bidirectional associative memories. Appl Opt. 1987;26(23):4947–4860.
Kosko B. Constructing an associative memory. Byte. 1987;12(10):137–44.
Kosko B. Bidirectional associative memory. IEEE Trans Syst Man Cybern. 1988;18(1):49–60.
Isola P, Zhu J Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1125–1134.
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. 2017. arXiv preprint arXiv:1706.03762.
Ronneberger O, Fischer P, Brox TT. U-Net: convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer: Cham; 2015. p. 234–41.
Wang Z, Cun X, Bao J, Zhou W, Liu J, Li H. Uformer: a general U-shaped transformer for image restoration. In CVPR. 2022;6.
Kramer MA. Nonlinear principal component analysis using autoassociative neural networks[J]. AIChE J. 1991;37(2):233–43.
Lu X, Tsao Y, Matsuda S, et al. Speech enhancement based on deep denoising autoencoder[C]//Interspeech. 2013, 2013: 436–440.
Makhzani A, Frey B. K-sparse autoencoders. 2013. arXiv preprint arXiv:1312.5663.
An J, Cho S. Variational autoencoder based anomaly detection using reconstruction probability[J]. Special Lecture on IE. 2015;2(1):1–18.
Zhang L, Chen X, Tu X, Wan P, Xu N, Ma K. Wavelet knowledge distillation: towards efficient image-to-image translation. In CVPR. 2022;6.
Goodfellow IJ. “Generative adversarial networks”, arXiv e-prints, 2014. https://doi.org/10.48550/arXiv.1406.2661.
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J. UNet++: A nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. Deep learning in medical image analysis and multimodal learning for clinical decision support. DLMIA ML-CDS 2018. Lect Notes Comput Sci. 2018;11045. Springer, Cham. https://doi.org/10.1007/978-3-030-00889-5_1.
Cohen G, Afshar S, Tapson J, Van Schaik A. EMNIST: an extension of MNIST to handwritten letters. 2017. Retrieved from arxiv.org/abs/1702.05373.
Funding
This research is partially sponsored by the Beijing Natural Science Foundation (No. 4202025); the Tianjin Anjian IoT Technology Enterprise Key Laboratory Research Project (No. VTJ-OT20230209-2); the Beijing VanJee Technology Co., Ltd-Beijing Municipal Science and Technology Project (No. Z201100003920003); and the Guizhou Provincial Sci-Tech Project (No. zk[2022] general 012).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of Interest
The authors have no conflicts of interest to declare relevant to this article’s content.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Miao, J., Liu, P., Chen, C. et al. Normal Template Mapping: An Association-Inspired Handwritten Character Recognition Model. Cogn Comput (2024). https://doi.org/10.1007/s12559-024-10270-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12559-024-10270-8