A Comparison of Some Morphological Filters for Improving OCR Performance

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9082)

Abstract

Studying discrete space representations has recently lead to the development of novel morphological operators. To date, there has been no study evaluating the performances of those novel operators with respect to a specific application. This article compares the capability of several morphological operators, both old and new, to improve OCR performance when used as preprocessing filters. We design an experiment using the Tesseract OCR engine on binary images degraded with a realistic document-dedicated noise model. We assess the performances of some morphological filters acting in complex, graph and vertex spaces, including the area filters. This experiment reveals the good overall performance of complex and graph filters. MSE measures have also been performed to evaluate the denoising capability of these filters, which again confirms the performances of both complex and graph filtering on this aspect.

Keywords

Character recognition Morphological filtering Vertex Graphs Simplicial complexes 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baird, H.S.: Document image defect models. In: Structured Document Image Analysis, pp. 546–556. Springer (1992)Google Scholar
  2. 2.
    Baird, H.S.: Calibration of document image defect models. In: Annual Symp. on Doc. Anal. and Inf. Retr., pp. 1–16 (1993)Google Scholar
  3. 3.
    Baird, H.S.: The state of the art of document image degradation modelling. In: Digital Document Processing, pp. 261–279. Springer (2007)Google Scholar
  4. 4.
    Cousty, J., Najman, L., Dias, F., Serra, J.: Morphological filtering on graphs. Computer Vision and Image Understanding 117(4), 370–385 (2013)CrossRefGoogle Scholar
  5. 5.
    Dias, F., Cousty, J., Najman, L.: Dimensional operators for mathematical morphology on simplicial complexes. PRL 47, 111–119 (2014)CrossRefGoogle Scholar
  6. 6.
    Dias, F., Cousty, J., Najman, L.: Some morphological operators on simplicial complex spaces. In: Debled-Rennesson, I., Domenjoud, E., Kerautret, B., Even, P. (eds.) DGCI 2011. LNCS, vol. 6607, pp. 441–452. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Heijmans, H.J.A.M., Nacken, P., Toet, A., Vincent, L.: Graph morphology. Journal of Visual Communication and Image Representation 3(1), 24–38 (1992)CrossRefGoogle Scholar
  8. 8.
    Ho, T.K., Baird, H.S.: Evaluation of ocr accuracy using synthetic data. In: Annual Symp. on Doc. Anal. and Inf. Retr. (1995)Google Scholar
  9. 9.
    Kanungo, T., Haralick, R.M., Baird, H.S., Stuezle, W., Madigan, D.: A statistical, nonparametric methodology for document degradation model validation. PAMI 22(11), 1209–1223 (2000)CrossRefGoogle Scholar
  10. 10.
    Kanungo, T., Haralick, R.M., Phillips, I.: Global and local document degradation models. In: Proceedings of the Second International Conference on Document Analysis and Recognition, pp. 730–734. IEEE (1993)Google Scholar
  11. 11.
    Mennillo, L., Cousty, J., Najman, L.: Morphological filters for ocr: a performance comparison. Tech. rep. (December 2012), http://hal.archives-ouvertes.fr/hal-00762631
  12. 12.
    Meyer, F., Angulo, J.: Micro-viscous morphological operators. In: ISMM 2007, pp. 165–176. INPE (October 2007)Google Scholar
  13. 13.
    Nartker, T.A., Rice, S.V., Jenkins, F.R.: OCR accuracy: UNLV’s fourth annual test. Inform 9(7), 38–46 (1995)Google Scholar
  14. 14.
    Nartker, T.A., Rice, S.V., Lumos, S.E.: Software tools and test data for research and testing of page-reading ocr systems. In: Document Recognition and Retrieval XII. SPIE, vol. 5676, pp. 37–47 (2005)Google Scholar
  15. 15.
    Rice, S.V., Nagy, G., Nartker, T.A.: Optical character recognition: An illustrated guide to the frontier. Springer (1999)Google Scholar
  16. 16.
    Serra, J.: Image analysis and mathematical morphology. Academic Press (1982)Google Scholar
  17. 17.
    Smith, R.: An overview of the tesseract ocr engine. In: ICDAR 2007, vol. 2, pp. 629–633 (2007)Google Scholar
  18. 18.
    Vincent, L.: Graphs and mathematical morphology. Signal Processing 16(4), 365–388 (1989)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Vincent, L.: Morphological area openings and closings for greyscale images. In: Shape in Picture. Nato ASI Series, vol. 126, pp. 197–208. Springer, Heidelberg (1992)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.UMR6602 - UBP/CNRS/IFMA, Institut PascalAubiéreFrance
  2. 2.LIGM, Université Paris-Est, Équipe A3SI, ESIEENoisy-le-Grand CodexFrance

Personalised recommendations