Word Searching in Document Images Using Word Portion Matching

  • Yue Lu
  • Chew Lim Tan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)

Abstract

An approach with the capability of searching a word portion in document images is proposed in this paper, to facilitate the detection and location of the user-specified query words. A feature string is synthesized according to the character sequence in the user-specified word, and each word image extracted from documents are represented by a feature string. Then, an inexact string matching technology is utilized to measure the similarity between the two feature strings, based on which we can estimate how the document word image is relevant to the user-specified word and decide whether its portion is the same as the user-specified word. Experimental results on real document images show that it is a promising approach, which is capable of detecting and locating the document words that entirely match or partially match with the user-specified word.

Keywords

Hide Markov Model Document Image Optical Character Recognition Word Searching Word Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Ishitani, Y.: Model-based Information Extraction Method Tolerant of OCR Errors for Document Images. In: Proc. of the Sixth International Conference on Document Analysis and Recognition, Seattle, USA (2001) 908–915Google Scholar
  2. 2.
    Ohtam, M., Takasu, A., Adachi, J.: Retrieval Methods for English Text with Misrecognized OCR Characters. In: Proc. of the Fourth International Conference on Document Analysis and Recognition, Ulm, Germany (1997) 950–956Google Scholar
  3. 3.
    Doermann, D.: The Indexing and Retrieval od Document Images: A Survey. Computer Vision and Image Understanding, Vol.70, No.3 (1998) 287–298CrossRefGoogle Scholar
  4. 4.
    Tan, C. L., Huang, W. H., Yu, Z., Xu, Y.: Imaged Document Text Retrieval without OCR. IEEE Trans. Pattern Analysis and Machine Intelligence, to appearGoogle Scholar
  5. 5.
    Manmatha, R., Han C., Riseman, E. M.: Word Spotting: A New Approach to Indexing Handwriting. In: Proc. of the International Conference on Computer Vision and Pattern Recognition (1996) 631–637Google Scholar
  6. 6.
    Syeda-Mahmood, T.: Indexing of Handwritten Document Images. In: Proc. of the Workshop on Document Image Analysis. San Juan, Puerto Rico (1997) 66–73Google Scholar
  7. 7.
    DeCurtins, J., Chen, E.: Keyword Spotting via Word Shape Recognition. In: Vincent, L. M., Baird, H. S. eds. Proceedings of SPIE, Document Recognition II, Vol.2422, San Jose, California (1995) 270–277Google Scholar
  8. 8.
    Kuo, S., Agazzi, O. F.: Keyword Spotting in Poorly Printed Documents Using Pseudo 2-D Hidden Markov Models. IEEE Trans. Pattern Analysis and Machine Intelligence vol.16, No.8(1994) 842–848CrossRefGoogle Scholar
  9. 9.
    Chen, F. R., Wilcox, L. D., Bloomberg, D. S.: Word Spotting in Scanned Images Using Hidden Markov Models. In: Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Vol.5(1993) 1–4Google Scholar
  10. 10.
    Chen, F. R., Wilcox, L. D., Bloomberg, D. S.: Detecting and Loacting Partially Specified Keywords in Scanned Images Using Hidden Markov Models. In: Proc. of the International Conference on Document Analysis and Recognition (1993) 133–138Google Scholar
  11. 11.
    Lu, Y., Tan, C. L., Huang, W., Fan, L: An Approach to Word Image Matching Based on Weighted Hausdor. Distance. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, USA (2001) 921–925Google Scholar
  12. 12.
    Gusfield, D.: Algorithms on Strings, Trees, and Squences. Combridge University Press (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Yue Lu
    • 1
  • Chew Lim Tan
    • 1
  1. 1.Department of Computer ScienceSchool of Computing National University of SingaporeKent RidgeSingapore

Personalised recommendations