Advertisement

In Codice Ratio: Machine Transcription of Medieval Manuscripts

  • Serena Ammirati
  • Donatella FirmaniEmail author
  • Marco Maiorino
  • Paolo Merialdo
  • Elena Nieddu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 988)

Abstract

Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors.

Notes

Acknowledgments

We thank NVIDIA Corporation for the donation of a Quadro M5000 GPU, and Regione Lazio (Progetti di Gruppi di Ricerca) and Roma Tre University (Piano Straordinario di Sviluppo della Ricerca di Ateneo) for supporting our project “In Codice Ratio”.

References

  1. 1.
    Ammirati, S., Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E., Rossi, A.: In codice ratio: scalable transcription of historical handwritten documents. In: Proceedings of the 25th Italian Symposium on Advanced Database Systems, Squillace Lido (Catanzaro), Italy, 25–29 June 2017, p. 65 (2017)Google Scholar
  2. 2.
    Causer, T., Terras, M.: ‘Many hands make light work. Many hands together make merry work’: transcribe Bentham and crowdsourcing manuscript collections. Crowdsourcing Our Cultural Heritage, pp. 57–88 (2014)Google Scholar
  3. 3.
    Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the 2003 International Conference on Information Integration on the Web, IIWEB 2003, pp. 73–78 (2003)Google Scholar
  4. 4.
    Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E.: Towards knowledge discovery from the Vatican secret archives. In codice ratio - episode 1: machine transcription of the manuscripts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, 19–23 August 2018, pp. 263–272 (2018)Google Scholar
  5. 5.
    Firmani, D., Merialdo, P., Nieddu, E., Scardapane, S.: In codice ratio: OCR of handwritten Latin documents using deep convolutional networks. In: Proceedings of the 11th International Workshop on Artificial Intelligence for Cultural Heritage (2017)Google Scholar
  6. 6.
    Fischer, A., et al.: Automatic transcription of handwritten medieval documents. In: 15th International Conference on Virtual Systems and Multimedia. IEEE (2009)Google Scholar
  7. 7.
    Flaounas, I., et al.: Research methods in the age of digital journalism: massive-scale automated analysis of news-content-topics, style and gender. Digit. J. 1(1), 102–116 (2013)Google Scholar
  8. 8.
    Keysers, D., Deselaers, T., Rowley, H.A., Wang, L.-L., Carbune, V.: Multi-language online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1180–1194 (2017)CrossRefGoogle Scholar
  9. 9.
    Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2741–2749. AAAI Press (2016)Google Scholar
  10. 10.
    Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recogn. 9(2), 123–138 (2007)CrossRefGoogle Scholar
  11. 11.
    Puigcerver, J., Toselli, A.H., Vidal, E.: ICDAR 2015 competition on keyword spotting for handwritten documents. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1176–1180. IEEE (2015)Google Scholar
  12. 12.
    Sánchez, J.A., et al.: tranScriptorium: a European project on handwritten text recognition. In: Proceedings of the 2013 ACM Symposium on Document Engineering, pp. 227–228. ACM (2013)Google Scholar
  13. 13.
    Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR 2014 competition on handwritten text recognition on tranScriptorium datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790. IEEE (2014)Google Scholar
  14. 14.
    Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282. IEEE (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Serena Ammirati
    • 1
  • Donatella Firmani
    • 2
    Email author
  • Marco Maiorino
    • 3
  • Paolo Merialdo
    • 2
  • Elena Nieddu
    • 2
  1. 1.Department of HumanitiesRoma Tre UniversityRomeItaly
  2. 2.Department of Computer ScienceRoma Tre UniversityRomeItaly
  3. 3.Vatican Secret ArchivesVatican CityItaly

Personalised recommendations