Skip to main content
Log in

A keyword retrieval system for historical Mongolian document images

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this paper, we propose a keyword retrieval system for locating words in historical Mongolian document images. Based on the word spotting technology, a collection of historical Mongolian document images is converted into a collection of word images by word segmentation, and a number of profile-based features are extracted to represent word images. For each word image, a fixed-length feature vector is formulated by obtaining the appropriate number of the complex coefficients of discrete Fourier transform on each profile feature. The system supports online image-to-image matching by calculating similarities between a query word image and each word image in the collection, and consequently, a ranked result is returned in descending order of the similarities. Therein, the query word image can be generated by synthesizing a sequence of glyphs when being retrieved. By experimental evaluations, the performance of the system is confirmed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Gao, G., Li, W., Hou, H., Li, Z.: Multi-agent based recognition system of printed Mongolian characters. In: Proceedings of the International Conference on Active Media Technology, pp. 376–381 (2003)

  2. Wei, H., Gao, G.: Machine-printed traditional Mongolian characters recognition using BP neural networks. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1–7 (2009)

  3. Peng, L., Liu, C., Ding, X., et al.: Multi-font printed Mongolian document recognition system. IJDAR 13(2), 93–106 (2010)

    Article  Google Scholar 

  4. Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian words recognition in historical document. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 692–697 (2011)

  5. Manmatha, R., Han, C., Riseman, E.M., Croft, W.B.: Indexing handwriting using word matching. In: Proceedings of 1st ACM International Conference on Digital Libraries (ICDL), pp. 151–159 (1996)

  6. Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR 9(2), 139–152 (2007)

    Article  Google Scholar 

  7. Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of 28th International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 521–527 (2003)

  8. Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 218–222 (2003)

  9. Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proceedings of 8th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 437–441 (2005)

  10. Saabni, R., El-Sana, J.: Keyword searching for Arabic handwritten documents. In: Proceedings of the 11th International Conference on Frontiers in Handwriting recognition (ICFHR), pp. 716–722 (2008)

  11. Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Case study in Hebrew character searching. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1080–1084 (2011)

  12. Saabni, R., El-Sana, J.: Word spotting for handwritten documents using Chamfer distance and dynamic time warping. In: Proceedings of Document Recognition and Retrieval XVIII (DRR), pp. 1–7 (2011)

  13. Creating and Supporting OpenType Fonts for the Mongolian Script. http://www.microsoft.com/typography/otfntdev/mongolot/

  14. Mongolian Language. http://en.wikipedia.org/wiki/Mongolian_language

  15. Mongolian Script. http://en.wikipedia.org/wiki/Mongolian_script

  16. Wei, H., Gao, G., Bao, Y., Wang, Y.: An effective binarization method for ancient Mongolian document images. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 2, pp. 43–46 (2010)

  17. Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)

    Article  Google Scholar 

  18. Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognit. 19, 41–47 (1986)

    Article  Google Scholar 

  19. Duda, R., Hart, P., David G.: Pattern Classification, 2nd edn. Wiley, New York, pp. 528-530 (2001)

  20. Aghbari, Z., Brook, S.: HAN manuscripts: a holistic paradigm for classifying and retrieving historical Arabic handwritten documents. Expert Syst. Appl. 36(8), 10942–10951 (2009)

    Article  Google Scholar 

  21. Konidaris, T., Gatos, B., Ntzios, K., et al.: Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. IJDAR 9, 167–177 (2007)

    Article  Google Scholar 

  22. Abidi, A., Siddiqi, I., Khurshid, K.: Towards searchable digital Urdu libraries—a word spotting based retrieval approach. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1344–1348 (2011)

  23. Zagoris, K., Ergina, K., Papamarkos, N.: A document image retrieval system. Eng. Appl. Artif. Intell. 23(6), 872–879 (2010)

    Article  Google Scholar 

  24. Rath, T., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 369–376 (2004)

  25. Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: Proceedings of the 8th Asian Conference on Computer Vision (ACCV), pp. 586–595 (2007)

  26. Jain, A.K.: Fundamentals of Digital Image Processing. Prentice-Hall Press, Englewood Cliffs, pp. 151–154 (1989)

  27. Discrete Fourier Transform. http://en.wikipedia.org/wiki/Discrete_Fourier_transform

  28. Wei, H., Gao, G., Bao, Y.: A method for removing inflectional suffixes in word spotting of Mongolian Kanjur. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 88–92 (2011)

  29. Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge UP, Cambridge, pp. 158–163 (2009)

Download references

Acknowledgments

The research is supported by the National Natural Science Foundation of China and the grant number is 60865003. And we also thank for the support of the Natural Science Foundation of Inner Mongolia Autonomous Region and the grant numbers are 2012MS0921 and 2011ZD11.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxi Wei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, H., Gao, G. A keyword retrieval system for historical Mongolian document images. IJDAR 17, 33–45 (2014). https://doi.org/10.1007/s10032-013-0203-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-013-0203-6

Keywords

Navigation