A HMM-Based Approach to Recognize Ultra Low Resolution Anti-Aliased Words

  • Farshideh Einsele
  • Rolf Ingold
  • Jean Hennebert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4815)


In this paper, we present a HMM based system that is used to recognize ultra low resolution text such as those frequently embedded in images available on the web. We propose a system that takes specifically the challenges of recognizing text in ultra low resolution images into account. In addition to this, we show in this paper that word models can be advantageously built connecting together sub-HMM-character models and inter-character state. Finally we report on the promising performance of the system using HMM topologies which have been improved to take into account the presupposed minimum length of each character.


State Sequence Emission Probability Word Image Grid Alignment Adjacent Character 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Antonacopoulos, A., Karatzas, D.: Text extraction from web images based on a split-and-merge segmentation method using color perception. In: Proc. of ICPR 2004, Cambridge, UK (August 2004)Google Scholar
  2. 2.
    Antonacopoulos, A., Karatzas, D., Lopetz, J.O.: Accessing textual information embedded in internet images. In: Proc. of Electronic Imaging II, San Jose, California, USA (January 2001)Google Scholar
  3. 3.
    Einsele, F., Hennebert, J., Ingold, R.: Towards identification of very low resolution, anti-aliased characters. In: Proc. of ISSPA 2007, Sharjah, UAE (February 2007)Google Scholar
  4. 4.
    Einsele, F., Ingold, R.: A study of the variability of very low resolution characters and the feasibility of their discrimination using geometrical features. In: Proc. of 4th Enformatika Int. Conf. on Pattern Recognition and Computer Vision, Istanbul, Turkey, pp. 213–217 (June 2005)Google Scholar
  5. 5.
    Lopresti, D., Zhou, J.: Locating and recognizing text in www images. Information Retrieval 2(2/3), 177–206 (2000)CrossRefGoogle Scholar
  6. 6.
    Lu, Z., Bazzi, I., Kornai, A., Makhoul, J., Natarajan, P., Schwartz, R.: A robust, language-independent ocr system. In: Proc. 27th IAPR Workshop, vol. 3584, pp. 96–104 (January 1999)Google Scholar
  7. 7.
    Munson, E.V., Tsymbalenko, Y.: Using html metadata to find relevant images on the web. In: Web Document AnalysisGoogle Scholar
  8. 8.
    Nagy, G.: Twenty years of document image analysis in pami. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 38–62 (2000)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Perantonis, S.J., Gatos, B., Maragos, V.: A novel web image processing algorithm for text area identification that helps commercial ocr engines to improve their web recognition accuracy. In: Proc. of the second Int. Workshop on Web Document Analysis, Edinburgh, United Kingdom (August 2003)Google Scholar
  10. 10.
    Rabiner, L., Juang, B.-H.: Fundamentals Of Speech Recognition. Prentice Hall, Englewood Cliffs (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Farshideh Einsele
    • 1
  • Rolf Ingold
    • 1
  • Jean Hennebert
    • 1
  1. 1.Université de Fribourg, Boulevard de Pérolles 90, 1700 FribourgSwitzerland

Personalised recommendations