Abstract
Computerized analysis of historical documents has remained an interesting research area for the pattern classification community for many decades. From the perspective of computerized analysis, key challenges in the historical manuscripts include automatic transcription, dating, retrieval, classification of writing styles and identification of scribes etc. Among these, the focus of our current study lies on identification of writers from the digitized manuscripts. We exploit convolutional neural networks for extraction of features and characterization of writer. The ConvNets are first trained on contemporary handwriting samples and then fine-tuned to the limited set of historical manuscripts considered in our study. Dense sampling is carried out over a given manuscript producing a set of small writing patches for each document. Decisions on patches are combined using a majority vote to conclude the authorship of a query document. Preliminary experiments on a set of challenging and degraded manuscripts report promising performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baird, H.S., Govindaraju, V., Lopresti, D.P.: Document analysis systems for digital libraries: challenges and opportunities. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 1–16. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28640-0_1
Le Bourgeois, F., Trinh, E., Allier, B., Eglin, V., Emptoz, H.: Document images analysis solutions for digital libraries. In: 2004 Proceedings of the First International Workshop on Document Image Analysis for Libraries, pp. 2–24. IEEE (2004)
Sankar, K.P., Ambati, V., Pratha, L., Jawahar, C.V.: Digitizing a million books: challenges for document analysis. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 425–436. Springer, Heidelberg (2006). https://doi.org/10.1007/11669487_38
Klemme, A.: International Dunhuang project: the silk road online. Ref. Rev. 28(2), 51–52 (2014)
Van der Zant, T., Schomaker, L., Haak, K.: Handwritten-word spotting using biologically inspired features. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1945–1957 (2008)
Aiolli, F., Ciula, A.: A case study on the system for paleographic inspections (SPI): challenges and new developments. Comput. Intell. Bioeng. 196, 53–66 (2009)
Hamid, A., Bibi, M., Siddiqi, I., Moetesum, M.: Historical manuscript dating using textural measures. In: 2018 International Conference on Frontiers of Information Technology (FIT), pp. 235–240. IEEE (2018)
Hamid, A., Bibi, M., Moetesum, M., Siddiqi, I.: Deep learning based approach for historical manuscript dating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 967–972 (2019)
He, S., Samara, P., Burgers, J., Schomaker, L.: Image-based historical manuscript dating using contour and stroke fragments. Pattern Recogn. 58, 159–171 (2016)
Srihari, S.N., Cha, S.-H., Arora, H., Lee, S.: Individuality of handwriting. J. Forensic Sci. 47(4), 1–17 (2002)
Said, H.E., Tan, T.N., Baker, K.D.: Personal identification based on handwriting. Pattern Recogn. 33(1), 149–160 (2000)
He, Z., You, X., Tang, Y.Y.: Writer identification using global wavelet-based features. Neurocomputing 71(10–12), 1832–1841 (2008)
He, S., Schomaker, L.: Deep adaptive learning for writer identification based on single handwritten word images. Pattern Recogn. 88, 64–74 (2019)
Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 701–717 (2007)
Siddiqi, I., Vincent, N.: Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features. Pattern Recogn. 43(11), 3853–3865 (2010)
Xing, L., Qiao, Y.: DeepWriter: a multi-stream deep CNN for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589. IEEE (2016)
Mohammed, H., Marthot-Santaniello, I., Märgner, V.: GRK-Papyri: a dataset of Greek handwriting on papyri for the task of writer identification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 726–731 (2019)
Rehman, A., Naz, S., Razzak, M.I., Hameed, I.A.: Automatic visual features for writer identification: a deep learning approach. IEEE Access 7, 17149–17157 (2019)
Xing, L., Qiao, Y.: DeepWriter: a multi-stream deep CNN for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589 (2016)
Christlein, V., Gropp, M., Fiel, S., Maier, A.: Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 991–997 (2017)
Keglevic, M., Fiel, S., Sablatnig, R.: Learning features for writer retrieval and identification using triplet CNNs. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 211–216 (2018)
Awaida, S.M., Mahmoud, S.A.: State of the art in off-line writer identification of handwritten text and survey of writer identification of Arabic text. Educ. Res. Rev. 7(20), 445–463 (2012)
Tan, G.J., Sulong, G., Rahim, M.S.M.: Writer identification: a comparative study across three world major languages. Forensic Sci. Int. 279, 41–52 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Tang, Y., Wu, X.: Text-independent writer identification via CNN features and joint Bayesian. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 566–571, October 2016
Nasuno, R., Arai, S.: Writer identification for offline Japanese handwritten character using convolutional neural network. In: Proceedings of the 5th IIAE (Institute of Industrial Applications Engineers) International Conference on Intelligent Systems and Image Processing, pp. 94–97 (2017)
Fiel, S., Sablatnig, R.: Writer identification and retrieval using a convolutional neural network. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9257, pp. 26–37. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23117-4_3
Chen, S., Wang, Y., Lin, C.-T., Ding, W., Cao, Z.: Semi-supervised feature learning for improving writer identification. Inf. Sci. 482, 156–170 (2019)
Islam, A.U., Khan, M.J., Khurshid, K., Shafait, F.: Hyperspectral image analysis for writer identification using deep learning. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7 (2019)
Bar-Yosef, I., Beckman, I., Kedem, K., Dinstein, I.: Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents. IJDAR 9(2–4), 89–99 (2007). https://doi.org/10.1007/s10032-007-0041-5
Fecker, D., Asit, A., Märgner, V., El-Sana, J., Fingscheidt, T.: Writer identification for historical Arabic documents. In: 2014 22nd International Conference on Pattern Recognition, pp. 3050–3055. IEEE (2014)
Schomaker, L., Franke, K., Bulacu, M.: Using codebooks of fragmented connected-component contours in forensic and historic writer identification. Pattern Recogn. Lett. 28(6), 719–727 (2007)
Cilia, N., De Stefano, C., Fontanella, F., Marrocco, C., Molinara, M., Di Freca, A.S.: An end-to-end deep learning system for medieval writer identification. Pattern Recogn. Lett. 129, 137–143 (2020)
Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. arXiv preprint arXiv:1905.09113 (2019)
Cilia, N.D., De Stefano, C., Fontanella, F., Marrocco, C., Molinara, M., Scotto Di Freca, A.: A two-step system based on deep transfer learning for writer identification in medieval books. In: Vento, M., Percannella, G. (eds.) CAIP 2019. LNCS, vol. 11679, pp. 305–316. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29891-3_27
Mohammed, H., Märgner, V., Stiehl, H.S.: Writer identification for historical manuscripts: analysis and optimisation of a classifier as an easy-to-use tool for scholars from the humanities. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 534–539 (2018)
McCann, S., Lowe, D.G.: Local Naive Bayes nearest neighbor for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3650–3656. IEEE (2012)
Pagels, P.E.: e-codices-virtual manuscript library of Switzerland (2016)
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)
He, S., Schomaker, L.: DeepOtsu: document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)
Marti, U.-V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071
Rong, W., Li, Z., Zhang, W., Sun, L.: An improved canny edge detection algorithm. In: 2014 IEEE International Conference on Mechatronics and Automation, pp. 577–582. IEEE (2014)
Acknowledgement
Authors would like to thank Dr. Isabelle Marthot-Santaniello from University of Basel, Switzerland for making the dataset available.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Nasir, S., Siddiqi, I. (2021). Learning Features for Writer Identification from Handwriting on Papyri. In: Djeddi, C., Kessentini, Y., Siddiqi, I., Jmaiel, M. (eds) Pattern Recognition and Artificial Intelligence. MedPRAI 2020. Communications in Computer and Information Science, vol 1322. Springer, Cham. https://doi.org/10.1007/978-3-030-71804-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-71804-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71803-9
Online ISBN: 978-3-030-71804-6
eBook Packages: Computer ScienceComputer Science (R0)