Abstract
Processing and recognition of Greek Byzantine and Post-Byzantine (old Greek) Documents has been proven to be a tedious task in the domain of Historical Document Image Processing. Several unique characteristics of these documents (existence of character ligatures, abbreviations, lack of clear word division, existence of symbols or punctuations in an arbitrary position) impose significant difficulties for current processing and recognition tools. In this work, we introduce a system for processing and recognition of old Greek documents and give details about all the components that comprise it. These include an image pre-processing, a text line segmentation and a recognition module. In order to test the proposed system, we introduce and provide publicly a new dataset of old Greek Documents that includes text line images and the corresponding transcription. Using this dataset, we evaluate the embedded recognition engine of the proposed system which is the open-source Calamari-OCR engine employing a variety of configurations. The best result corresponded to a character error rate less than 1.5% which is acceptable and promising. Finally, we also achieved promising results when comparing the embedded OCR engine with other recognition methods already proposed for the recognition of old Greek Documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
When notation AxB is used, A is for the horizontal axis of a layer (x-width) and B for the vertical axis (y-height)
References
Wick, C., Reul, C., Puppe, F.: Calamari - A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition. Digit. Humanit. Q. 14(1) (2020)
Ntzios, K., Gatos, B., Pratikakis, I., Konidaris, T., Perantonis, S.J.: An old Greek handwritten OCR system based on an efficient segmentation-free approach. Int. J. Doc. Anal. Recogn. (IJDAR) 9(2–4), 179–192 (2007). special issue on historical documents
Gatos, B., Ntzios, K., Pratikakis, I., Petridis, S., Konidaris, T., Perantonis, S.J.: An efficient segmentation-free approach to assist old Greek handwritten manuscript OCR. Pattern Anal. Appl. (PAA) 8(4), 305–320 (2006)
Tsochatzidis, L., Symeonidis, S., Papazoglou, A., Pratikakis, I.: HTR for Greek historical handwritten documents. J Imaging 7, 260 (2021)
Platanou, P., Pavlopoulos, J., Papaioannou, G.:. Handwritten paleographic greek text recognition: a century-based approach. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 6585–6589. European Language Resources Association, Marseille (2022)
Gatos, B., Pratikakis, I., Perantonis, S.J.: Adaptive degraded document image binarization. Pattern Recogn. 39, 317–327 (2006)
de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: HTR-Flor: a deep learning system for offline handwritten text recognition. In: Proceedings of the 33rd SIBGRAPI Conference on Graphics, Patterns and Images, pp. 54–61. Recife/Porto de Galinhas (2020)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, pp. 67–72. Kyoto (2017)
Acknowledgments
This research has been partially co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call "RESEARCH-CREATE-INNOVATE", project Culdile (Cultural Dimensions of Deep Learning, project code: Τ1ΕΔΚ-03785) and the Operational Program Attica 2014–2020, under the call "RESEARCH AND INNOVATION PARTNERSHIPS IN THE REGION OF ATTICA", project reBook (Digital platform for re-publishing Historical Greek Books, project code: ΑΤΤΡ4–0331172).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kaddas, P., Palaiologos, K., Gatos, B., Katsouros, V., Christopoulou, K. (2023). A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. https://doi.org/10.1007/978-3-031-41685-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-41685-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41684-2
Online ISBN: 978-3-031-41685-9
eBook Packages: Computer ScienceComputer Science (R0)