Making Large Collections of Handwritten Material Easily Accessible and Searchable
Abstract
Libraries and cultural organisations contain a rich amount of digitised historical handwritten material in the form of scanned images. A vast majority of this material has not been transcribed yet, owing to technological challenges and lack of expertise. This renders the task of making these historical collections available for public access challenging, especially in performing a simple text search across the collection. Machine learning based methods for handwritten text recognition are gaining importance these days, which require huge amount of pre-transcribed texts for training the system. However, it is impractical to have access to several thousands of pre-transcribed documents due to adversities transcribers face. Therefore, this paper presents a training-free word spotting algorithm as an alternative for handwritten text transcription, where case studies on Alvin (Swedish repository) and Clavius on the Web are presented. The main focus of this work is on discussing prospects of making materials in the Alvin platform and Clavius on the Web easily searchable using a word spotting based handwritten text recognition system.
Keywords
Transcription Handwritten text recognition Word spotting Alvin Clavius on the WebNotes
Acknowledgment
This work was supported by the Swedish strategic research programme eSSENCE and the Riksbankens Jubileumsfond (Dnr NHS14-2068:1).
References
- 1.http://www.alvin-portal.org/ (2017)
- 2.
- 3.
- 4.
- 5.
- 6.Abrate, M., et al.: Sharing cultural heritage: the clavius on the web project. In: LREC, pp. 627–634 (2014)Google Scholar
- 7.Pedretti, I., et al.: The clavius on the web project: digitization, annotation and visualization of early modern manuscripts. In: Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital Ecosystem, p. 11. ACM (2014)Google Scholar
- 8.
- 9.
- 10.Valsecchi, F., Abrate, M., Bacciu, C., Piccini, S., Marchetti, A.: Text encoder and annotator: an all-in-one editor for transcribing and annotating manuscripts with RDF. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 399–407. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_52CrossRefGoogle Scholar
- 11.Piccini, S., et al.: When traditional ontologies are not enough: modelling and visualizing dynamic ontologies in semantic-based access to texts. In: Digital Humanities 2016: Conference Abstracts, Jagiellonian University and Pedagogical University, Kraków (2016)Google Scholar
- 12.Piccini, S., Bellandi, A., Benotto, G.: Formalizing and querying a diachronic termino-ontological resource: the clavius case study. In: Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, Krakow, Poland, 11 July 2016, pp. 38–41, no. 126. Linköping University Electronic Press (2016)Google Scholar
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.Romero, V., Bosch, V., Hernández, C., Vidal, E., Sánchez, J.A.: A historical document handwriting transcription end-to-end system. In: Alexandre, L.A., Salvador Sánchez, J., Rodrigues, J.M.F. (eds.) IbPRIA 2017. LNCS, vol. 10255, pp. 149–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58838-4_17CrossRefGoogle Scholar
- 19.Terrades, O.R., Toselli, A.H., Serrano, N., Romero, V., Vidal, E., Juan, A.: Interactive layout analysis and transcription systems for historic handwritten documents. In: 10th ACM Symposium on Document Engineering, pp. 219–222 (2010)Google Scholar
- 20.Serrano, N., Pérez, D., Sanchis, A., Juan, A.: Adaptation from partially supervised handwritten text transcriptions. In: Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI 2009, pp. 289–292. ACM, New York (2009)Google Scholar
- 21.Serrano, N., Giménez, A., Sanchis, A., Juan, A.: Active learning strategies for handwritten text transcription. In: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010, pp. 48:1–48:4. ACM, New York (2010)Google Scholar
- 22.Romero, V., Toselli, A.H., Vidal, E.: Multimodal Interactive Handwritten Text Transcription, vol. 80. World Scientific, Singapore (2012)zbMATHGoogle Scholar
- 23.
- 24.Moyle, M., Tonra, J., Wallace, V.: Manuscript transcription by crowdsourcing: transcribe Bentham. Liber Q. 20(3–4), 347–356 (2011)CrossRefGoogle Scholar
- 25.
- 26.Hast, A., Fornés, A.: A segmentation-free handwritten word spotting approach by relaxed feature matching. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 150–155. IEEE (2016)Google Scholar
- 27.Vats, E., Hast, A., Singh, P.: Automatic document image binarization using Bayesian optimization. In: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, pp. 89–94. ACM (2017)Google Scholar
- 28.Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)CrossRefGoogle Scholar
- 29.Hast, A., Vats, E.: Radial line Fourier descriptor for historical handwritten text representation. In: 26th International Conference on Computer Graphics, Visualization and Computer Vision (2018)Google Scholar
- 30.Zagoris, K., Pratikakis, I., Gatos, B.: Unsupervised word spotting in historical handwritten document images using document-oriented local features. IEEE Trans. Image Process. 26(8), 4032–4041 (2017)MathSciNetCrossRefGoogle Scholar
- 31.Leydier, Y., Ouji, A., LeBourgeois, F., Emptoz, H.: Towards an omnilingual word retrieval system for ancient manuscripts. Pattern Recognit. 42(9), 2089–2105 (2009)CrossRefGoogle Scholar
- 32.Hast, A., Marchetti, A.: An efficient preconditioner and a modified RANSAC for fast and robust feature matching. In: WSCG 2012 (2012)Google Scholar