Advertisement

A keyword retrieval system for historical Mongolian document images

  • Hongxi WeiEmail author
  • Guanglai Gao
Original Paper

Abstract

In this paper, we propose a keyword retrieval system for locating words in historical Mongolian document images. Based on the word spotting technology, a collection of historical Mongolian document images is converted into a collection of word images by word segmentation, and a number of profile-based features are extracted to represent word images. For each word image, a fixed-length feature vector is formulated by obtaining the appropriate number of the complex coefficients of discrete Fourier transform on each profile feature. The system supports online image-to-image matching by calculating similarities between a query word image and each word image in the collection, and consequently, a ranked result is returned in descending order of the similarities. Therein, the query word image can be generated by synthesizing a sequence of glyphs when being retrieved. By experimental evaluations, the performance of the system is confirmed.

Keywords

Kanjur Word spotting Profile features Discrete Fourier transform Query image synthesis 

Notes

Acknowledgments

The research is supported by the National Natural Science Foundation of China and the grant number is 60865003. And we also thank for the support of the Natural Science Foundation of Inner Mongolia Autonomous Region and the grant numbers are 2012MS0921 and 2011ZD11.

References

  1. 1.
    Gao, G., Li, W., Hou, H., Li, Z.: Multi-agent based recognition system of printed Mongolian characters. In: Proceedings of the International Conference on Active Media Technology, pp. 376–381 (2003)Google Scholar
  2. 2.
    Wei, H., Gao, G.: Machine-printed traditional Mongolian characters recognition using BP neural networks. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1–7 (2009)Google Scholar
  3. 3.
    Peng, L., Liu, C., Ding, X., et al.: Multi-font printed Mongolian document recognition system. IJDAR 13(2), 93–106 (2010)CrossRefGoogle Scholar
  4. 4.
    Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian words recognition in historical document. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 692–697 (2011)Google Scholar
  5. 5.
    Manmatha, R., Han, C., Riseman, E.M., Croft, W.B.: Indexing handwriting using word matching. In: Proceedings of 1st ACM International Conference on Digital Libraries (ICDL), pp. 151–159 (1996)Google Scholar
  6. 6.
    Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR 9(2), 139–152 (2007)CrossRefGoogle Scholar
  7. 7.
    Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of 28th International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 521–527 (2003)Google Scholar
  8. 8.
    Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 218–222 (2003)Google Scholar
  9. 9.
    Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proceedings of 8th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 437–441 (2005)Google Scholar
  10. 10.
    Saabni, R., El-Sana, J.: Keyword searching for Arabic handwritten documents. In: Proceedings of the 11th International Conference on Frontiers in Handwriting recognition (ICFHR), pp. 716–722 (2008)Google Scholar
  11. 11.
    Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Case study in Hebrew character searching. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1080–1084 (2011)Google Scholar
  12. 12.
    Saabni, R., El-Sana, J.: Word spotting for handwritten documents using Chamfer distance and dynamic time warping. In: Proceedings of Document Recognition and Retrieval XVIII (DRR), pp. 1–7 (2011)Google Scholar
  13. 13.
    Creating and Supporting OpenType Fonts for the Mongolian Script. http://www.microsoft.com/typography/otfntdev/mongolot/
  14. 14.
  15. 15.
  16. 16.
    Wei, H., Gao, G., Bao, Y., Wang, Y.: An effective binarization method for ancient Mongolian document images. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 2, pp. 43–46 (2010) Google Scholar
  17. 17.
    Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)CrossRefGoogle Scholar
  18. 18.
    Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognit. 19, 41–47 (1986)CrossRefGoogle Scholar
  19. 19.
    Duda, R., Hart, P., David G.: Pattern Classification, 2nd edn. Wiley, New York, pp. 528-530 (2001)Google Scholar
  20. 20.
    Aghbari, Z., Brook, S.: HAN manuscripts: a holistic paradigm for classifying and retrieving historical Arabic handwritten documents. Expert Syst. Appl. 36(8), 10942–10951 (2009)CrossRefGoogle Scholar
  21. 21.
    Konidaris, T., Gatos, B., Ntzios, K., et al.: Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. IJDAR 9, 167–177 (2007)CrossRefGoogle Scholar
  22. 22.
    Abidi, A., Siddiqi, I., Khurshid, K.: Towards searchable digital Urdu libraries—a word spotting based retrieval approach. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1344–1348 (2011)Google Scholar
  23. 23.
    Zagoris, K., Ergina, K., Papamarkos, N.: A document image retrieval system. Eng. Appl. Artif. Intell. 23(6), 872–879 (2010)CrossRefGoogle Scholar
  24. 24.
    Rath, T., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 369–376 (2004)Google Scholar
  25. 25.
    Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: Proceedings of the 8th Asian Conference on Computer Vision (ACCV), pp. 586–595 (2007)Google Scholar
  26. 26.
    Jain, A.K.: Fundamentals of Digital Image Processing. Prentice-Hall Press, Englewood Cliffs, pp. 151–154 (1989)Google Scholar
  27. 27.
  28. 28.
    Wei, H., Gao, G., Bao, Y.: A method for removing inflectional suffixes in word spotting of Mongolian Kanjur. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 88–92 (2011)Google Scholar
  29. 29.
    Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge UP, Cambridge, pp. 158–163 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.School of Computer ScienceInner Mongolia University HohhotChina

Personalised recommendations