Advertisement

A combined strategy of analysis for the localization of heterogeneous form fields in ancient pre-printed records

  • Aurélie Lemaitre
  • Jean Camillerapp
  • Cérès Carton
  • Bertrand Coüasnon
Original Paper
  • 39 Downloads

Abstract

This paper deals with the location of handwritten fields in old pre-printed registers. The images present the difficulties of old and damaged documents, and we also have to face the difficulty of extracting the text due to the great interaction between handwritten and printed writing. In addition, in many collections, the structure of the forms varies according to the origin of the documents. This work is applied to a database of Mexican marriage records, which has been published for a competition in the workshop HIP 2013 and is publicly available. In this paper, we show the interest and limitations of the empirical method which has been submitted for the competition. We then present a method that combines a logical description of the contents of the documents, with the result of an automatic analysis on the physical properties of the collection. The particularity of this analysis is that it does not require any ground-truth. We show that this combined strategy can locate 97.2% of handwritten fields. The proposed approach is generalizable and could be applied to other databases.

Keywords

Historical documents Field localization Heterogeneous layout Rule-based system Word spotting Unsupervised clustering 

References

  1. 1.
    Adam, P., Knibbe, M., Bernard, A.L., Mtaireau, P.Y.: ICDAR 2013 HIP workshop familysearch competition A2ia submission. In: Historical Image Processing (HIP) (2013)Google Scholar
  2. 2.
    Barlas, P., Adam, S., Chatelain, C., Paquet, T.: A typed and handwritten text block segmentation system for heterogeneous and complex documents. In: DAS’14 (2014)Google Scholar
  3. 3.
    Cannaday, A.B., Gehring, J.: ICDAR 2015 HIP workshop familysearch competition capstone summary. In: Historical Image Processing (HIP) (2013)Google Scholar
  4. 4.
    Carton, C., Lemaitre, A., Coüasnon, B.: Eyes wide open: an interactive learning method for the design of rule-based systems. IJDAR 63, 411–411 (2017)Google Scholar
  5. 5.
    Coüasnon, B.: DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way. IJDAR 8(2), 111–122 (2006)CrossRefGoogle Scholar
  6. 6.
    Coüasnon, B., Camillerapp, J., Leplumey, I.: Making handwritten archives documents accessible to public with a generic system of document image analysis. In: International Conference on Document Image Analysis for Libraries (DIAL), pp. 270–277 (2004)Google Scholar
  7. 7.
    Fred, A.L., Jain, A.K.: Data clustering using evidence accumulation. In: International Conference on Pattern Recognition (ICPR), vol. 4, pp. 276–280 (2002)Google Scholar
  8. 8.
    Garris, M.D.: Evaluating spatial correspondence of zones in document recognition systems. In: International Conference on Image Processing (ICIP), pp. 304–307 (1995)Google Scholar
  9. 9.
    Garz, A., Sablatnig, R., Diem, M.: Layout analysis for historical manuscripts using sift features. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 508–512 (2011)Google Scholar
  10. 10.
    Guichard, L., Chazalon, J., Coüasnon, B.: Exploiting collection level for improving assisted handwritten words transcription of historical documents. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 875–879 (2011)Google Scholar
  11. 11.
    Jayadevan, R., Kolhe, S.R., Patil, P.M., Pal, U.: Automatic processing of handwritten bank cheque images: a survey. IJDAR 15(4), 267–296 (2012)CrossRefGoogle Scholar
  12. 12.
    Kooli, N., Belad, A.: Semantic label and structure model based approach for entity recognition in database context. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 301–305 (2015)Google Scholar
  13. 13.
    Lemaitre, A., Camillerapp, J.: HIP 2013 familysearch competition—contribution of IRISA. In: Historical Image Processing (HIP) (2013)Google Scholar
  14. 14.
    Lemaitre, A., Camillerapp, J., Coüasnon, B.: Multiresolution cooperation improves document structure recognition. IJDAR 11(2), 97–109 (2008)CrossRefGoogle Scholar
  15. 15.
    Leplumey, I., Camillerapp, J., Queguiner, C.: Kalman filter contributions towards document segmentation. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 765–769 (1995)Google Scholar
  16. 16.
    Liang, J., Doermann, D.: Logical labeling of document images using layout graph matching with adaptive learning. In: Document Analysis Systems (DAS), pp. 224–235 (2002)Google Scholar
  17. 17.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Mas, J., Fornés, A., Lladós, J.: An interactive transcription system of census records using word-spotting based information transfer. In: Document Analysis Systems (DAS), pp. 54–59 (2016)Google Scholar
  19. 19.
    Moysset, B., Kermorvant, C., Wolf, C.: Full-page text recognition: learning where to start and when to stop. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 871–876. Kyoto, Japan (2017)Google Scholar
  20. 20.
    Nielson, H.E., Barrett, W.A.: Consensus-based table form recognition of low-quality historical documents. IJDAR 8(2–3), 183–200 (2006)CrossRefGoogle Scholar
  21. 21.
    Nion, T., Menasri, F., Louradour, J., Sibade, C., Retornaz, T., Metaireau, P., Kermorvant, C.: Handwritten information extraction from historical census documents. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 822–826 (2013)Google Scholar
  22. 22.
    Pham, T.A., Alaei, A.: ICDAR 2013 HIP workshop family search competition: a multi-scale image analysis approach for historical document image classification. In: Historical Image Processing (HIP) (2013)Google Scholar
  23. 23.
    Richarz, J., Vajda, S., Fink, G.A.: Towards semi-supervised transcription of handwritten historical weather reports. In: Document Analysis Systems (DAS), pp. 180–184 (2012)Google Scholar
  24. 24.
    Romero, V., Fornés, A., Serrano, N., Sánchez, J., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recognit. 46(6), 1658–1669 (2013)CrossRefGoogle Scholar
  25. 25.
    Sibade, C., Retornaz, T., Nion, T., Lerallut, R., Kermorvant, C.: Automatic indexing of French handwritten census registers for probate geneaology. In: Historical Image Processing (HIP), pp. 51–58 (2011)Google Scholar
  26. 26.
    Stewart, S., Barrett, B.: Document image page segmentation and character recognition as semantic segmentation. In: Historical Document Processing (HIP), pp. 101–106 (2017)Google Scholar
  27. 27.
    Ye, X., Cheriet, M., Suen, C.Y.: A generic method of cleaning and enhancing handwritten data from business forms. IJDAR 4(2), 84–96 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Univ Rennes - CNRS - IRISARennesFrance

Personalised recommendations