Abstract
There are a countless number of fonts with various shapes and styles. In addition, there are many fonts that only have subtle differences in features. Due to this, font identification is a difficult task. In this paper, we propose a method of determining if any two characters are from the same font or not. This is difficult due to the difference between fonts typically being smaller than the difference between alphabet classes. Additionally, the proposed method can be used with fonts regardless of whether they exist in the training or not. In order to accomplish this, we use a Convolutional Neural Network (CNN) trained with various font image pairs. In the experiment, the network is trained on image pairs of various fonts. We then evaluate the model on a different set of fonts that are unseen by the network. The evaluation is performed with an accuracy of 92.27%. Moreover, we analyzed the relationship between character classes and font identification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Throughout this paper, we assume that the pairs come from different character classes. This is simply because our font identification becomes a trivial task for the pairs of the same character class (e.g., ‘A’); if the two images are exactly the same, they are the same font; otherwise, they are different.
- 2.
- 3.
References
Type Identifier for Beginners. Seibundo Shinkosha Publishing (2013)
Abe, K., Iwana, B.K., Holmér, V.G., Uchida, S.: Font creation using class discriminative deep convolutional generative adversarial networks. In: Asian Conference on Pattern Recognition, pp. 232–237 (2017)
Avilés-Cruz, C., Villegas, J., Arechiga-MartÃnez, R., Escarela-Perez, R.: Unsupervised font clustering using stochastic versio of the EM algorithm and global texture analysis. In: Sanfeliu, A., MartÃnez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 275–286. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30463-0_34
Bagoriya, Y., Sharma, N.: Font type identification of Hindi printed document. Int. J. Res. Eng. Technol. 03(03), 513–516 (2014)
Chaudhuri, B., Garain, U.: Automatic detection of italic, bold and all-capital words in document images. In: International Conference on Pattern Recognition (1998)
Chen, G., et al.: Large-scale visual font recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 3598–3605 (2014)
Elgammal, A.M., Ismail, M.A.: Techniques for language identification for hybrid Arabic-English document images. In: International Conference on Document Analysis and Recognition (2001)
Ghosh, D., Dube, T., Shivaprasad, A.: Script recognition-a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)
Gupta, A., Gutierrez-Osuna, R., Christy, M., Furuta, R., Mandell, L.: Font identification in historical documents using active learning. arXiv preprint arXiv:1601.07252 (2016)
Hafemann, L.G., Sabourin, R., Oliveira, L.S.: Learning features for offline handwritten signature verification using deep convolutional neural networks. Pattern Recogn. 70, 163–176 (2017)
Hayashi, H., Abe, K., Uchida, S.: GlyphGAN: style-consistent font generation based on generative adversarial networks. Knowl.-Based Syst. 186, 104927 (2019)
Jeong, C.B., Kwag, H.K., Kim, S., Kim, J.S., Park, S.C.: Identification of font styles and typefaces in printed Korean documents. In: International Conference on Asian Digital Libraries, pp. 666–669 (2003)
Jung, M.C., Shin, Y.C., Srihari, S.: Multifont classification using typographical attributes. In: International Conference on Document Analysis and Recognition (1999)
Khosravi, H., Kabir, E.: Farsi font recognition based on sobel-roberts features. Pattern Recogn. Lett. 31(1), 75–82 (2010)
Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop (2015)
Li, Q., Li, J.P., Chen, L.: A bezier curve-based font generation algorithm for character fonts. In: International Conference on High Performance Computing and Communications, pp. 1156–1159 (2018)
Liu, A.H., Liu, Y.C., Yeh, Y.Y., Wang, Y.C.F.: A unified feature disentangler for multi-domain image translation and manipulation. In: Advances in Neural Information Processing Systems, pp. 2590–2599 (2018)
Liu, Y., Wei, F., Shao, J., Sheng, L., Yan, J., Wang, X.: Exploring disentangled feature representation beyond face identification. In: Conference on Computer Vision and Pattern Recognition, pp. 2080–2089 (2018)
Amer, I.M., ElSayed, S., Mostafa, M.G.: Deep Arabic font family and font size recognition. Int. J. Comput. Appl. 176(4), 1–6 (2017)
Ma, H., Doermann, D.: Font identification using the grating cell texture operator. In: Document Recognition and Retrieval XII, vol. 5676, pp. 148–156. International Society for Optics and Photonics (2005)
Miyazaki, T., et al.: Automatic generation of typographic font from small font subset. IEEE Comput. Graph. Appl. 40, 99–111 (2019)
Moussa, S.B., Zahour, A., Benabdelhafid, A., Alimi, A.M.: New features using fractal multi-dimensions for generalized arabic font recognition. Pattern Recogn. Lett. 31(5), 361–371 (2010)
Nguyen, H.T., Nguyen, C.T., Ino, T., Indurkhya, B., Nakagawa, M.: Text-independent writer identification using convolutional neural network. Pattern Recogn. Lett. 121, 104–112 (2019)
Oöztuörk, S.: Font clustering and cluster identification in document images. J. Electron. Imaging 10(2), 418 (2001)
Pal, U., Chaudhuri, B.B.: Identification of different script lines from multi-script documents. Image Vis. Comput. 20(13–14), 945–954 (2002)
Pan, W., Suen, C., Bui, T.D.: Script identification using steerable Gabor filters. In: International Conference on Document Analysis and Recognition (2005)
Press, O., Galanti, T., Benaim, S., Wolf, L.: Emerging disentanglement in auto-encoder based unsupervised image content transfer (2018)
Ruiz, V., Linares, I., Sanchez, A., Velez, J.F.: Off-line handwritten signature verification using compositional synthetic generation of signatures and siamese neural networks. Neurocomputing 374, 30–41 (2020)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. (2019)
Shinahara, Y., Karamatsu, T., Harada, D., Yamaguchi, K., Uchida, S.: Serif or sans: visual font analytics on book covers and online advertisements. In: International Conference on Document Analysis and Recognition, pp. 1041–1046 (2019)
Suveeranont, R., Igarashi, T.: Example-based automatic font generation. In: International Symposium on Smart Graphics, pp. 127–138 (2010)
Tan, T.: Rotation invariant texture features and their use in automatic script identification. IEEE Trans. Pattern Anal. Mach. Intell. 20(7), 751–756 (1998)
Uchida, S., Ide, S., Iwana, B.K., Zhu, A.: A further step to perfect accuracy by training CNN with larger data. In: International Conference on Frontiers in Handwriting Recognition, pp. 405–410 (2016)
Wang, Z., et al.: DeepFont: identify your font from an image. In: ACM International Conference on Multimedia, pp. 451–459 (2015)
Xing, Z.J., yi-chao wu, Liu, C.L., Yin, F.: Offline signature verification using convolution siamese network. In: Yu, H., Dong, J. (eds.) International Conference on Graphic and Image Processing (2018)
Yang, Z., Yang, L., Qi, D., Suen, C.Y.: An EMD-based recognition method for chinese fonts and styles. Pattern Recogn. Lett. 27(14), 1692–1701 (2006)
Zheng, Y., Ohyama, W., Iwana, B.K., Uchida, S.: Capturing micro deformations from pooling layers for offline signature verification. In: International Conference on Document Analysis and Recognition, pp. 1111–1116 (2019)
Acknowledgment
This work was supported by JSPS KAKENHI Grant Number JP17H06100.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix a Font Identification Using a Dataset with Less Fancy Fonts
Appendix a Font Identification Using a Dataset with Less Fancy Fonts
The dataset used in the above experiment contains many fancy fonts and thus there was a possibility that our evaluation might overestimate the font identification performance; this is because fancy fonts are sometimes easy to be identified by their particular appearance. We, therefore, use another font dataset, called the Adobe Font Folio 11.1Footnote 3. From this font set, we selected 1,132 fonts, which are comprised of 511 Serif, 314 Sans Serif, 151 Serif-Sans Hybrid, 74 Script, 61 Historical Script, and (only) 21 Fancy fonts. Note that this font type classification for the 1,132 fonts is given by [1]. We used the same neural network trained by the dataset of Sect. 4.1, i.e., trained with the fancy font dataset and tested on the Adobe dataset. Note that for the evaluation, 367,900 positive pairs and 367,900 negative pairs are prepared using the 1,132 fonts. Using the Adobe fonts as the test, the identification accuracy was 88.33 ± 0.89%. This was lower than \(92.27\%\) of the original dataset. However, considering the fact that formal fonts are often very similar to each other, we can still say that the character-independent font identification is possible even for the formal fonts.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Haraguchi, D., Harada, S., Iwana, B.K., Shinahara, Y., Uchida, S. (2020). Character-Independent Font Identification. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-57058-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)