Skip to main content

A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14190))

Included in the following conference series:

  • 585 Accesses

Abstract

Processing and recognition of Greek Byzantine and Post-Byzantine (old Greek) Documents has been proven to be a tedious task in the domain of Historical Document Image Processing. Several unique characteristics of these documents (existence of character ligatures, abbreviations, lack of clear word division, existence of symbols or punctuations in an arbitrary position) impose significant difficulties for current processing and recognition tools. In this work, we introduce a system for processing and recognition of old Greek documents and give details about all the components that comprise it. These include an image pre-processing, a text line segmentation and a recognition module. In order to test the proposed system, we introduce and provide publicly a new dataset of old Greek Documents that includes text line images and the corresponding transcription. Using this dataset, we evaluate the embedded recognition engine of the proposed system which is the open-source Calamari-OCR engine employing a variety of configurations. The best result corresponded to a character error rate less than 1.5% which is acceptable and promising. Finally, we also achieved promising results when comparing the embedded OCR engine with other recognition methods already proposed for the recognition of old Greek Documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/hukaixuan19970627/yolov5_obb

  2. 2.

    When notation AxB is used, A is for the horizontal axis of a layer (x-width) and B for the vertical axis (y-height)

References

  1. Wick, C., Reul, C., Puppe, F.: Calamari - A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition. Digit. Humanit. Q. 14(1) (2020)

    Google Scholar 

  2. https://zenodo.org/record/7876098#.ZEvjNtJBxNh

  3. Ntzios, K., Gatos, B., Pratikakis, I., Konidaris, T., Perantonis, S.J.: An old Greek handwritten OCR system based on an efficient segmentation-free approach. Int. J. Doc. Anal. Recogn. (IJDAR) 9(2–4), 179–192 (2007). special issue on historical documents

    Article  Google Scholar 

  4. Gatos, B., Ntzios, K., Pratikakis, I., Petridis, S., Konidaris, T., Perantonis, S.J.: An efficient segmentation-free approach to assist old Greek handwritten manuscript OCR. Pattern Anal. Appl. (PAA) 8(4), 305–320 (2006)

    Article  MathSciNet  Google Scholar 

  5. Tsochatzidis, L., Symeonidis, S., Papazoglou, A., Pratikakis, I.: HTR for Greek historical handwritten documents. J Imaging 7, 260 (2021)

    Article  Google Scholar 

  6. Platanou, P., Pavlopoulos, J., Papaioannou, G.:. Handwritten paleographic greek text recognition: a century-based approach. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 6585–6589. European Language Resources Association, Marseille (2022)

    Google Scholar 

  7. https://readcoop.eu/transkribus/

  8. Gatos, B., Pratikakis, I., Perantonis, S.J.: Adaptive degraded document image binarization. Pattern Recogn. 39, 317–327 (2006)

    Article  MATH  Google Scholar 

  9. https://github.com/ultralytics/yolov5

  10. de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: HTR-Flor: a deep learning system for offline handwritten text recognition. In: Proceedings of the 33rd SIBGRAPI Conference on Graphics, Patterns and Images, pp. 54–61. Recife/Porto de Galinhas (2020)

    Google Scholar 

  11. Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, pp. 67–72. Kyoto (2017)

    Google Scholar 

Download references

Acknowledgments

This research has been partially co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call "RESEARCH-CREATE-INNOVATE", project Culdile (Cultural Dimensions of Deep Learning, project code: Τ1ΕΔΚ-03785) and the Operational Program Attica 2014–2020, under the call "RESEARCH AND INNOVATION PARTNERSHIPS IN THE REGION OF ATTICA", project reBook (Digital platform for re-publishing Historical Greek Books, project code: ΑΤΤΡ4–0331172).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Kaddas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kaddas, P., Palaiologos, K., Gatos, B., Katsouros, V., Christopoulou, K. (2023). A System for Processing and Recognition of Greek Byzantine and Post-Byzantine Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. https://doi.org/10.1007/978-3-031-41685-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41685-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41684-2

  • Online ISBN: 978-3-031-41685-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics