Skip to main content

Mathematical Models and Neural Networks for the Description and the Correction of Typical Distortions of Historical Manuscripts

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2023 Workshops (ICCSA 2023)

Abstract

Historical manuscripts are very often degraded by the seeping or transparency of the ink from the page opposite side. Suppressing the interfering text can be of great aid to philologists and paleographers who aim at interpreting the primary text, and nowadays also for the automatic analysis of the text. We formerly proposed a data model, which approximately describes this damage, to generate an artificial training set able to teach a shallow neural network how to classify pixels in clean or corrupted. This NN has proved to be effective in classifying manuscripts where the degradation can be also widely variable. In this paper, we modify the architecture of the NN to better account for ink saturation in text overlay areas, by including a specific class for these pixels. From the experiments, the improvement of the classification and then the restoration is significant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pratikakis, I., Zagori, K., Kaddas, P., Gatos, B.: ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018). In Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 489–493 (2018)

    Google Scholar 

  2. Pai, Y., Chang, Y., Ruan, S.: Adaptive thresholding algorithm: Efficient computation technique based on intelligent block detection for degraded document images. Pattern Recognit. 43, 3177–3187 (2010)

    Article  MATH  Google Scholar 

  3. Westphal, F., Lavesson, N., Grahn, H.: Document image binarization using recurrent neural networks. In: Proceedings of the 13th IAPR International Workshop on Document Analysis Systems (DAS2018), pp. 263–268 (2018)

    Google Scholar 

  4. Tensmeyer, R., Martinez, T.: Document image binarization with fully convolutional neural networks. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), pp. 99–104 (2017)

    Google Scholar 

  5. Vo, Q., Kim, S., Yang, H., Lee, G.: Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recognit. 74, 568–586 (2018)

    Article  Google Scholar 

  6. Fadoua, D., Le Bourgeois, F., Emptoz, H: Restoring ink bleed-through degraded document images using a recursive unsupervised classification technique. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 38–49. Springer, Heidelberg (2006). https://doi.org/10.1007/11669487_4

  7. Sun, B., Li, S., Zhang, X.P., Sun, J.: Blind bleed-through removal for scanned historical document image with conditional random fields. IEEE Trans. Image Process. 5702–5712 (2016)

    Google Scholar 

  8. Rowley-Brooke, R., Pitié, F., Kokaram, A.: A non-parametric framework for document bleed-through removal. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2954–2960 (2013)

    Google Scholar 

  9. Huang, Y., Brown, M.S., Xu, D.: User assisted ink-bleed reduction. IEEE Trans. Image Process. 19(10), 2646–2658 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Hanif, M., Tonazzini, A., Savino, P., Salerno, E.: Non-local sparse image in paintig for document bleed-through removal. J. Imaging 4, 68 (2018)

    Article  Google Scholar 

  11. Tonazzini, A., Savino, P., Salerno, E.: A non-stationary density model to separate overlapped texts in degraded documents. Signal Image Video Process. 9, 155–164 (2015)

    Article  Google Scholar 

  12. Rowley-Brooke, R., Pitié, F., Kokaram, A.C.: Non-rigid recto-verso registration using page outline structure and content preserving warps. In: Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, pp. 8–13 (2013)

    Google Scholar 

  13. Wang, J., Tan, C.L.: Non-rigid registration and restoration of double-sided historical manuscripts. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1374–1378 (2011)

    Google Scholar 

  14. Savino, P., Tonazzini, A.: Digital restoration of ancient color manuscripts from geometrically misaligned recto-verso pairs. J. Cultural Heritage 19, 511–521 (2016)

    Article  Google Scholar 

  15. Savino, P., Tonazzini, A., Bedini, L.: Bleed-through cancellation in non-rigidly misaligned recto-verso archival manuscripts based on local registration. Int. J. Doc. Anal. Recognit. 22, 163–176 (2019)

    Article  Google Scholar 

  16. Tonazzini, A., Bedini, L., Salerno, E.: Independent component analysis for document restoration. Int. J. Doc. Anal. Recognit. 7, 17–27 (2004)

    Article  Google Scholar 

  17. Tonazzini, A., Bedini, L.: Restoration of recto-verso colour documents using correlated component analysis. EURASIP J. Adv. Signal Process. 58, 2013 (2013)

    Google Scholar 

  18. Tonazzini, A., Salerno, E., Bedini, L.: Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique. Int. J. Doc. Anal. Recogn. 10, 17–25 (2007)

    Article  Google Scholar 

  19. Criminisi, A., Pérez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13, 1200–1212 (2004)

    Article  Google Scholar 

  20. He, S., Schomaker, L.: DeepOtsu: Document enhancement and binarization using iterative dep learning. Pattern Recogn. 9, 379–390 (2019)

    Article  Google Scholar 

  21. Savino, P., Tonazzini, A.: A Procedure for the routinary correction of back-to-front degradations in archival manuscripts. In: Nguyen, N.T., et al. (eds.) ICCCI 2020. LNCS (LNAI), vol. 12496, pp. 838–849. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63007-2_66

  22. Tonazzini, A., Savino, P., Salerno, E., Hanif, M., Debole, F.: Virtual restoration and content analysis of ancient degraded manuscripts. Int. J. Inf. Sci. Technol. 3, 16–25 (2019)

    Google Scholar 

  23. Hagan, M.T., Demuth, H.B., Beale, M.H.: Neural Network Design. PWS Publishing, Boston (1996)

    Google Scholar 

  24. Xiong, W., Jia, X., Xu, J., Xiong, Z., Liu, M., Wang, J.: Historical document image binarization using background estimation and energy minimization. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR 2018), pp. 3716–3721 (2018)

    Google Scholar 

  25. Xiong, W., Zhou, L., Yue, L., Li, L., Wang, S.: An enhanced binarization framework for degraded historical document images. EURASIP J. Image Video Process. (2021)

    Google Scholar 

  26. Rowley-Brooke, R., Pitié, F., Kokaram, A.: A ground truth bleed-through document image database. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds.) TPDL 2012. LNCS, vol. 7489, pp. 185–196. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33290-6_21

  27. Irish Script On Screen Project (2012). www.isos.dias.ie

    Google Scholar 

  28. Hanif, M., et al.: Blind bleed-through removal in color ancient manuscripts. Multim. Tools Appl. (2022). https://doi.org/10.1007/s11042-022-13755-6

  29. Savino, P., Tonazzini, A.: A shallow neural net with model-based learning for the virtual restoration of recto-verso manuscripts. 1st International Virtual Conference on Visual Pattern Extraction and Recognition for Cultural Heritage Understanding VIPERC 2022 (2022). https://ceur-ws.org/Vol-3266/paper3.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Tonazzini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Savino, P., Tonazzini, A. (2023). Mathematical Models and Neural Networks for the Description and the Correction of Typical Distortions of Historical Manuscripts. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023 Workshops. ICCSA 2023. Lecture Notes in Computer Science, vol 14108. Springer, Cham. https://doi.org/10.1007/978-3-031-37117-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37117-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37116-5

  • Online ISBN: 978-3-031-37117-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics