Advertisement

Automatic accurate broken character restoration for patrimonial documents

  • Bénédicte AllierEmail author
  • Nadia Bali
  • Hubert Emptoz
Original Paper

Abstract

In this article, we are interested in the restoration of character shapes in antique document images. This particular class of documents generally present a lot of involuntary historical information that have to be taken into account to get quality digital libraries. Actually, many document processing methods of all sorts have already been proposed to cope with degraded character images, but those techniques often consist in replacing the degraded shapes by a corresponding prototype which is not satisfying for lots of specialists. For that, we decided to develop our own method for accurate character restoration, basing our study on generic image processing tools (namely: Gabor filtering and the active contours model) completed with some specific automatically extracted structural information. The principle of our method is to make an active contour recover the lost information using an external energy term based on the use of an automatically built and selected reference character image. Results are presented for real case examples taken from printed and handwritten documents.

Keywords

Symbol recognition Machine print Document image processing Digital libraries 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Le Bourgeois, F., Trinh, E., Allier, B., Eglin, V., Emptoz, H.: Document images analysis solutions for digital libraries. In: Proceedings of the International Workshop on Document Image Analysis for Libraries, pp. 2–24. Palo Alto, CA, USA (2004)Google Scholar
  2. 2.
    Baird, H.S.: Digital libraries and document image analysis. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 2–14. Edinburgh, Scotland (2003)Google Scholar
  3. 3.
    Belaïd, A.: Recognition of table of contents for electronic library consulting. Int. J. Doc. Anal. Recognit. 4(1), 35–45 (2001)CrossRefGoogle Scholar
  4. 4.
    Ping, Z., Lihui, C.: Document filters using morphological and geometrical features of characters. Image Vis. Comput. (19), 847–855 (2001)Google Scholar
  5. 5.
    Bouche, R., Emptoz, H., Lebourgeois, F., Metzger, J.-P.: DEBORA European Project, Research Report no. LB 5608 A, juin 2000Google Scholar
  6. 6.
    Hobby, J.D., Baird, H.S.: Degraded character image continuation. In: Proceedings of 5th UNLV Symposium on Document Analysis and Information Retrieval, pp. 233–245. Las Vegas, Nevada, USA (1996)Google Scholar
  7. 7.
    Hobby, J.D., Ho, T.K.: Enhancing degraded document images via bitmap clustering and averaging. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 394–400. Ulm, Germany (1997)Google Scholar
  8. 8.
    Whichello, A., Yan, H.: Linking broken character borders with variable sized masks to improve recognition. Pattern Recognit 29(8), 1429–1435 (1996)CrossRefGoogle Scholar
  9. 9.
    Yu, D., Yan, H.: Reconstruction of broken handwritten digits based on structural morphological features. Pattern Recognit. (34), 235–254 (2001)Google Scholar
  10. 10.
    Shi, Z., Govindaraju, V.: Character image enhancement by selective region-growing. Pattern Recognit. Lett. (17), 523–527 (1996)Google Scholar
  11. 11.
    Billawala, N., Hart, P.E., Peairs, M.: Image continuation. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 53–57. Tsukuba, Japan (1993)Google Scholar
  12. 12.
    Zheng, Q., Kanungo, T.: Morphological degradation models and their use in document image restoration, Research Report LAMP-TR-065/CS-TR-4218/CAR-TR-962, February 2001. University of Maryland, MarylandGoogle Scholar
  13. 13.
    Baird, H.S.: Document image defect models. In: Baird, H., Bunke, H., Yamamoto, K. (eds.) Structured document image analysis—Proceedings of the IAPR 1990 Workshop on SSPR, pp. 546–556. Springer, Berlin Heidelberg Germany (1992)Google Scholar
  14. 14.
    Baird, H.S.: The state of the art of document image degradation modeling. In: Proceedings of the 4th IAPR Workshop on Document Analysis Systems, pp. 1–16, Rio de Janeiro, Brazil (2000)Google Scholar
  15. 15.
    Andre, J.: De Pacioli à Bézier: 5 siècles de mathématiques pour la typographie, in 4000 ans d'histoire des mathématiques: les mathématiques dans la longue durée, Actes du 13ème colloque Inter-IREM d'épistémologie et histoire des mathématiques 2000, pp. 98–139. Rennes (2002)Google Scholar
  16. 16.
    Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Int. J. Comput. Vis. 1(X), 321–331 (1988)CrossRefGoogle Scholar
  17. 17.
    Allier, B.: Contours actifs et caractères dégradés, Research Report RR2002-3, mars 2002. Laboratoire RFV/LIRIS-INSA LyonGoogle Scholar
  18. 18.
    Cohen, L.D., Cohen, I.: Finite-elements methods for active contour models and balloons for 2-D and 3-D images. IEEE Trans. Pattern Anal. Machine Intell. 15(11), 1131–1147 (1993)CrossRefGoogle Scholar
  19. 19.
    Cohen, L.D.: On active contour models and balloons: Image understanding. Comput. Vis. Graphics Image Process. 17(2), 211–218 (1991)Google Scholar
  20. 20.
    Xu, C., Prince, J.L.: Gradient vector flow: A new external force for snakes. In: Proceedings of the IEEE Conference of Computer Vision and Pattern Recognition, pp. 66–71. San Juan, Puerto Rico, USA (1997)Google Scholar
  21. 21.
    Allier, B., Emptoz, H.: Character prototyping in document images using Gabor filters. In: Proceedings of the IEEE International Conference on Image Processing. Barcelona, Spain (2003) (0-7803-7751-6)Google Scholar
  22. 22.
    Jain, A.K., Farrokhnia, F.: Unsupervised texture segmentation using Gabor filters. Pattern Recognit. 24(12), 1167–1186 (1991)CrossRefGoogle Scholar
  23. 23.
    Zramdini, A.: Study of optical font recognition based on global typographical features. PhD thesis, IIUF-Université de Fribourg, Fribourg, Suisse (1995), 170 pGoogle Scholar
  24. 24.
    Chaudhuri, B.B., Garain, U.: Automatic detection of italic, bold and all-capital words in document images. In: Proceedings of the 14th International Conference on Pattern Recognition, pp. 610–612. Brisbane, Australia (1998)Google Scholar
  25. 25.
    Wong, K.Y., Casey, R.G., Wahl, F.M.: Document Analysis System. IBM J. Res. Dev. 26(6), 647–656 (1982)CrossRefGoogle Scholar
  26. 26.
    Duffy, L.: Recherche d'information logique dans les documents à typographie riche et récurrente. Application aux sommaires. PhD thesis, Institut National des Sciences Appliquées-INSA de Lyon, Lyon, France (1997), 160 pGoogle Scholar
  27. 27.
    Doermann, D.S., Rivlin, E., Rosenfeld, A.: The function of documents. Int. J. Comput. Vis. 16(11), 799–814 (1998)Google Scholar
  28. 28.
    Wu, V., Manmatha, R., Riseman, E.M.: Finding text in images. In: Proceedings of the Second ACM International Conference on Digital Libraries, pp. 23–26. Philadelphia, PA, USA (1997)Google Scholar
  29. 29.
    Jain, A.K., Bhattacharjee, S.K., Chen, Y.: On texture in document images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 677–680. Champaign, Illinois, USA (1992)Google Scholar
  30. 30.
    Ma, H., Doermann, D.S.: Gabor filter based multi-class classifier for scanned document images. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 968–972. Edinburgh, Scotland (2003)Google Scholar
  31. 31.
    Hamamoto, Y., Uchimura, S., Watanabe, M., Yasuda, T., Mitani, Y., Tomita, S.: A Gabor filter-based method for recognizing handwritten numerals. Pattern Recognit. 31(4), 395–400 (1998)CrossRefGoogle Scholar
  32. 32.
    Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Machine Intell. 18(8), 837–842 (1996)CrossRefGoogle Scholar
  33. 33.
    Allier, B., Emptoz, H.: Font type extraction and character prototyping using Gabor filters. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 799–803. Edinburgh, Scotland (2003)Google Scholar
  34. 34.
    Basseville, M.: Distance measures for signal processing and pattern recognition. Signal Process. 18(4), 349–369 (1989)CrossRefMathSciNetGoogle Scholar
  35. 35.
    Allier, B.: Contribution à la numérisation des collections: apports des contours actifs. PhD thesis, Institut National des Sciences Appliquées-INSA de Lyon, Lyon France (2003), 260 pGoogle Scholar
  36. 36.
    Cheeseman, P., Stutz, J.: Bayesian classification (Autoclass): Theory and results. In: Fayyad, U.M., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI /MIT, Cambridge, MA (1996)Google Scholar
  37. 37.
    Chenevoy, Y.: Reconnaissance structurelle de documents imprimés: études et réalisations. PhD thesis, INPL (1992), 213 pGoogle Scholar
  38. 38.
    Anigbogu, J.C., Belaïd, A.: Hidden Markov models in text recognition. Int. J. Pattern Recognit. Artif. Intell. 9(6), 925–958 (1995)CrossRefGoogle Scholar
  39. 39.
    Simon, J.C., Zerhoumi, K.: Description Robuste d'une Image de Lignes. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 3–14. Saint-Malo, France (1991)Google Scholar
  40. 40.
    Xue, H., Govindaraju, V.: Building skeletal graphs for structural feature extraction on handwriting images. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 96–100. Seattle, USA (2001)Google Scholar
  41. 41.
    Chianese, A., Cordella, L.P., De Santo, M., Marcelli, A., Vento, M.: A structural method for handprinted character recognition. In: Lecture Notes in Computer Science, vol. 399, pp. 289–302. Springer-Verlag, Berlin Heidelberg New York (1989)Google Scholar
  42. 42.
    Wang, L., Pavlidis, T.: Direct gray-scale extraction of features for character recognition. IEEE Trans. Pattern Anal. Machine Intell. 15(10), 1053–1067 (1993)CrossRefGoogle Scholar
  43. 43.
    Lee, S.-W., Kim, Y.J.: Direct extraction of topographic features for gray scale character recognition. IEEE Trans. Pattern Anal. Machine Intell. 17(7), 724–729 (1995)CrossRefGoogle Scholar
  44. 44.
    Allier, B.: Reconstruction de caractères: Extraction de graphes structurels, Research Report RR2001-5, déc. 2001. Laboratoire RFV/LIRIS - INSA LyonGoogle Scholar
  45. 45.
    Bali, N.: Codage source adapté aux formats spécifiques aux documents complexes suivent la qualité service attendue, Research Report, Juin 2004. DEA T3IA Univ. de Poitiers, SIC/LIRISGoogle Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  1. 1.Lyon Research Center for Images and Intelligent Information Systems (LIRIS)Villeurbanne CedexFrance

Personalised recommendations