Advertisement

Access by content to handwritten archive documents: generic document recognition method and platform for annotations

  • Bertrand Coüasnon
  • Jean Camillerapp
  • Ivan Leplumey
Original Paper

Abstract

This paper presents annotations needed for handwritten archive document retrieval by content. We propose two complementary ways of producing these annotations: automatically by using document image analysis and collectively by using the Internet and manual input by users. A platform for managing these annotations is presented as well as examples of automatic annotations on civil status registers, military forms (tested on 165,000 pages) and naturalization decrees, using a generic method for structured document recognition and handwriting recognition on names. Examples of collective annotations built on automatic annotations are also given. This platform is already open to the public in the reading room of the new building of the Archives départementales des Yvelines and on the Internet. About 1,450,000 images of civil status registers are available for collective annotation as well as 105,000 pages of military forms with automatic annotation of handwritten names.

Keywords

Structured document recognition Generic method Handwriting recognition Automatic indexing Annotation Archive document 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adam, S., Rigamonti, M., Clavier, E., Trupin, E., Ogier, J.-M., Tombre, K., Gardes, J.: Docmining: A document analysis system builder. In: Marinai, S., Dengel, A. (eds.) Document Analysis Systems VI, 6th International Workshop, DAS 2004, vol. 3163. Lecture Notes in Computer Science, pp. 472–483. Florence, Italy, September (2004) SpringerGoogle Scholar
  2. 2.
    Amano, A., Asada, N.: Graph grammar based analysis system of complex table form document. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 2, pp. 916–920. Edinburgh, Scotland (2003)Google Scholar
  3. 3.
    Brainerd W.S. (1969). Tree generating regular systems. Inf. Control 14: 217–231 zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Clavier, E., Masini, G., Delalandre, M., Rigamonti, M., Tombre, K., Gardes, J.: Docmining: a cooperative platform for heterogeneous document interpretation according to user-defined scenarios. In: Lladós, J., Kwon, Y.-B. (eds.) Graphics Recognition: Recent Advances and Perspectives, vol. 3088 of LNCS, pp. 13–24. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Coüasnon, B.: Dmos: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 215–220, Seattle (2001)Google Scholar
  6. 6.
    Coüasnon, B., Brisset, P., Stephan, I.: Using logic programming languages for optical music recognition. In: International Conference on the Practical Application of Prolog, pp. 115–134. Paris, France (1995)Google Scholar
  7. 7.
    Coüasnon, B., Camillerapp, J.: Using grammars to segment and recognize music scores. In: Spitz, L., Dengel, A. (eds.) Document Analysis Systems. World Scientific, Singapore (1995)Google Scholar
  8. 8.
    Coüasnon, B., Pasquer, L.: A real-world evaluation of a generic document recognition method applied to a military form of the 19th century. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 779–783. Seattle, USA (2001)Google Scholar
  9. 9.
    Coüasnon, B.: Dealing with noise in dmos, a generic method for structured document recognition: an example on a complete grammar. In: Lladós, J., Kwon, Y.-B. (eds.) Graphics Recognition: Recent Advances and Perspectives, vol. 3088 of LNCS, pp. 38–49. Springer, Heidelberg (2004)Google Scholar
  10. 10.
    Describing, retrieving photos~using RDF, and HTTP. W3C Note (2002) http://www.w3.org/TR/photo-rdf/Google Scholar
  11. 11.
    Esposito F., Malerba D. and Lisi F.A. (2000). Machine learning for intelligent processing of printed documents. J. Intell. Inf. Syst. 14(2–3): 175–198 CrossRefGoogle Scholar
  12. 12.
    Feder J. (1971). Plex languages. Inf. Sci. 3: 225–241 CrossRefMathSciNetGoogle Scholar
  13. 13.
    Pascal Garcia and Bertrand Coüasnon. Using a generic document recognition method for methematical formulae recognition. In: Graphics Recognition: Algorithms and Applications, vol. 2390 of LNCS, pp. 236–244. Springer, Heidelberg (2002)Google Scholar
  14. 14.
    Grbavec, A., Blostein, D.: Mathematics recognition using graph rewriting. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 1, pp. 417–421. Montréal (1995)Google Scholar
  15. 15.
    Hori, O., Doermann, D.S.: Robust table-form structure analysis based on box-driven reasoning. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 1, pp. 218–221. Montréal (1995)Google Scholar
  16. 16.
    Hu, J., Kashi, R., Lopresti, D., Wilfong, G.: System for understanding and reformulating tables. In: Fourth IAPR International Workshop on Document Analysis Systems, pp. 361–372. Rio de Janeiro, Brazil (2000)Google Scholar
  17. 17.
    Hunter, J., Zhan, Z.: An indexing and querying system for online images based on the png format and embedded metadata. In: Proceedings of the ARLIS/ANZ Conference, Brisbane (1999)Google Scholar
  18. 18.
    Hurst, M.: A constraint-based approach to table structure derivation. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 2, pp. 910–915. Edinburgh (2003)Google Scholar
  19. 19.
    Hurst, M., Douglas, S.: Layout and language: preliminary investigations in recognizing the structure of tables. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 2, pp. 1043–1047. Ulm, Germany (1997)Google Scholar
  20. 20.
    Kahan, J., Koivunen, M.-R., Prud’Hommeaux, E., Swick, R.R.: Annotea: an open rdf infrastructure for shared web annotations. In: Proceedings of the WWW10 International Conference, Hong Kong (2001)Google Scholar
  21. 21.
    Kieninger, T., Dengel, A.: Applying the t-recs table recognition system to the business letter domain. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 518–522. Seattle (2001)Google Scholar
  22. 22.
    Klein, B., Dengel, A.R., Fordan, A.: smartfix: an adaptive system for document analysis and understanding. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning: Adaptive Content Recognition, vol. 2956 of LNCS, pp. 166–186. Springer, Heidelberg (2004)Google Scholar
  23. 23.
    Klein, B., Gökkus, S., Kieninger, T., Dengel, A.: Three approaches to “industrial” table spotting. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 513–517. Seattle (2001)Google Scholar
  24. 24.
    Lebourgeois, F., Emptoz, H., Trinh, E., Duong, J.: Networking digital document images. In: Proceedings of the 6th ICDAR, pp. 379–383. Seattle (2001)Google Scholar
  25. 25.
    Levenshtein V.I. (1966). Binary codes capable of correction deletions, insertions and reversals. Sov. Phys. Dokladay 10: 707–710 MathSciNetGoogle Scholar
  26. 26.
    Lopresti, D., Nagy, G.: A tabular survey of automated table processing. In: Atul~K. Chhabra and Dov Dori, (eds.) Graphics Recognition, Recent Advances, vol. 1941 of Lecture Notes in Computer Science, pp. 93–120. Springer, Heidelberg (2000)Google Scholar
  27. 27.
    Manmatha, R., Croft, W.B.: Word spotting: Indexing handwritten archives. In: Maybury, M. (ed.) Intelligent Multi-media Information Retrieval Collection. AAAI/MIT Press (1997)Google Scholar
  28. 28.
    Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Document Recognition and Retreval X, (Proceedings of SPIE/IST), vol. 5010. Santa Clara, California (2003)Google Scholar
  29. 29.
    Middendorf, M., Peust, J., Schacht, C.: A component-based framework for recognition systems. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning: Adaptive Content Recognition, vol. 2956 of LNCS, pp. 153–165. Springer, Heidelberg (2004)Google Scholar
  30. 30.
    Mühlberger, G: Automated digitisation of printed material for everyone: the metadata engine project. RLG DigiNews 6(3), (2002)Google Scholar
  31. 31.
    Nielson, H.E., Barrett, W.A.: Consensus-based table form recognition. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 2, pp. 906–910. Edinburgh (2003)Google Scholar
  32. 32.
    Pereira F.C.N. and Warren D.H.D. (1980). Definite clauses for language analysis. Artific. Intell. 13: 231–278 zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Pfaltz, J.L., Rosenfeld, A.: Web grammars. In: Proceedings of the First International Joint Conference on Artificial Intelligence, pp. 609–619. Washington (1969)Google Scholar
  34. 34.
    Phelps, T.A., Wilensky, R.: Multivalent annotations. In: Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, Pisa (1997)Google Scholar
  35. 35.
    Poulain d’ Andecy, V., Camillerapp, J., Leplumey, I.: Kalman filtering for segment detection: application to music scores analysis. In: ICPR, 12th International Conference on Pattern Recognition (IAPR), vol. 1, pp. 301–305. Jérysalem, Israel (1994)Google Scholar
  36. 36.
    Ramel, J.-Y., Crucianu, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 1, pp. 374–378. Edinburgh (2003)Google Scholar
  37. 37.
    Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 521–527. Madison (2003)Google Scholar
  38. 38.
    Resource Description~Framework (RDF): Model and syntax specification. W3C Recommandation (1999) http://www.w3.org/ TR/REC-rdf-syntax/Google Scholar
  39. 39.
    Schäfer, H., Thomas~Bayer, T., Kreuzer, K., Miletzki, U., Schambach, M.-P., Schulte-Austum, M.: How postal address readers are made adaptive. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning: Adaptive Content Recognition, vol. 2956 of LNCS, pp. 187–215. Springer, Heidelberg (2004)Google Scholar
  40. 40.
    Taylor S.L., Fritzson R. and Pastor J.A. (1992). Extraction of data from preprinted forms. Mach. Vis. Appl. 5(3): 211–222 Google Scholar
  41. 41.
    Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition, pp. 413–418. Niagara-on-the-Lake (2002)Google Scholar
  42. 42.
    Vinciarelli, A., Bengio, S., Bunke, H.: Offline recognition of large vocabulary cursive handwritten text. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, vol. 1, pp. 1101–1105. Edinburgh (2003)Google Scholar
  43. 43.
    Wang, Y., Phillips, I.T., Haralick, R.M.: Table detection via probability optimization. In: Hu, J., Lopresti, D., Kashi, R. (eds.) DAS 2002, LNCS 2423, pp. 272–282. Springer, Heidelberg (2002)Google Scholar
  44. 44.
    Watanabe, T., Luo, Q., Sugie, N.: Toward a practical document understanding of table-form documents: its framework and knowledge representation. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 510–515, Tsukuba Science City (1993)Google Scholar
  45. 45.
    Xingyuan, L., Doerman, D., Oh, W., Gao, W.: A robust method for unknown forms analysis. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 531–534. Bangalore, (1999)Google Scholar
  46. 46.
    Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: models, observations, transformations, and inferences. Int. J. Doc. Anal. Recog. IJDAR 7(1), (2004)Google Scholar
  47. 47.
    Cropped military forms: Archives départementales de la Mayenne. http://www.lamayenne.fr follow Archives départementales then Archives en ligne and Registres matricules d’incorporation militaire.Google Scholar
  48. 48.
    Demo of the platform on civil status registers: http://imadoc-ar. irisa.fr/ECGoogle Scholar
  49. 49.
    Demo of the platform on military forms with automatic access by handwritten last names: http://imadoc-ar.irisa.fr/RMGoogle Scholar
  50. 50.
    Demo of the platform on naturalization decrees with a fast leaf-through on handwritten last names: http://imadoc-ar.irisa.fr/ DecretsGoogle Scholar
  51. 51.
    Platform on military forms with automatic access by handwritten last names: Archives départementales des Yvelines. http://www.archives.yvelines.fr follow Matricules militaires (Plateforme d’annotation)Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Bertrand Coüasnon
    • 1
  • Jean Camillerapp
    • 2
  • Ivan Leplumey
    • 2
  1. 1.IRISA/INRIACampus universitaire de BeaulieuRennes CedexFrance
  2. 2.IRISA/INSACampus universitaire de BeaulieuRennes CedexFrance

Personalised recommendations