A keyword retrieval system for historical Mongolian document images

Wei, Hongxi; Gao, Guanglai

doi:10.1007/s10032-013-0203-6

A keyword retrieval system for historical Mongolian document images

Original Paper
Published: 26 February 2013

Volume 17, pages 33–45, (2014)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Hongxi Wei¹ &
Guanglai Gao¹

984 Accesses
33 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we propose a keyword retrieval system for locating words in historical Mongolian document images. Based on the word spotting technology, a collection of historical Mongolian document images is converted into a collection of word images by word segmentation, and a number of profile-based features are extracted to represent word images. For each word image, a fixed-length feature vector is formulated by obtaining the appropriate number of the complex coefficients of discrete Fourier transform on each profile feature. The system supports online image-to-image matching by calculating similarities between a query word image and each word image in the collection, and consequently, a ranked result is returned in descending order of the similarities. Therein, the query word image can be generated by synthesizing a sequence of glyphs when being retrieved. By experimental evaluations, the performance of the system is confirmed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word Spotting Application in Historical Mongolian Document Images

Providing Access to Old Greek Documents Using Keyword Spotting Techniques

Segmentation-Free Keyword Retrieval in Historical Document Images

References

Gao, G., Li, W., Hou, H., Li, Z.: Multi-agent based recognition system of printed Mongolian characters. In: Proceedings of the International Conference on Active Media Technology, pp. 376–381 (2003)
Wei, H., Gao, G.: Machine-printed traditional Mongolian characters recognition using BP neural networks. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1–7 (2009)
Peng, L., Liu, C., Ding, X., et al.: Multi-font printed Mongolian document recognition system. IJDAR 13(2), 93–106 (2010)
Article Google Scholar
Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian words recognition in historical document. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 692–697 (2011)
Manmatha, R., Han, C., Riseman, E.M., Croft, W.B.: Indexing handwriting using word matching. In: Proceedings of 1st ACM International Conference on Digital Libraries (ICDL), pp. 151–159 (1996)
Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR 9(2), 139–152 (2007)
Article Google Scholar
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of 28th International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 521–527 (2003)
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 218–222 (2003)
Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proceedings of 8th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 437–441 (2005)
Saabni, R., El-Sana, J.: Keyword searching for Arabic handwritten documents. In: Proceedings of the 11th International Conference on Frontiers in Handwriting recognition (ICFHR), pp. 716–722 (2008)
Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Case study in Hebrew character searching. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1080–1084 (2011)
Saabni, R., El-Sana, J.: Word spotting for handwritten documents using Chamfer distance and dynamic time warping. In: Proceedings of Document Recognition and Retrieval XVIII (DRR), pp. 1–7 (2011)
Creating and Supporting OpenType Fonts for the Mongolian Script. http://www.microsoft.com/typography/otfntdev/mongolot/
Mongolian Language. http://en.wikipedia.org/wiki/Mongolian_language
Mongolian Script. http://en.wikipedia.org/wiki/Mongolian_script
Wei, H., Gao, G., Bao, Y., Wang, Y.: An effective binarization method for ancient Mongolian document images. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 2, pp. 43–46 (2010)
Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)
Article Google Scholar
Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognit. 19, 41–47 (1986)
Article Google Scholar
Duda, R., Hart, P., David G.: Pattern Classification, 2nd edn. Wiley, New York, pp. 528-530 (2001)
Aghbari, Z., Brook, S.: HAN manuscripts: a holistic paradigm for classifying and retrieving historical Arabic handwritten documents. Expert Syst. Appl. 36(8), 10942–10951 (2009)
Article Google Scholar
Konidaris, T., Gatos, B., Ntzios, K., et al.: Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. IJDAR 9, 167–177 (2007)
Article Google Scholar
Abidi, A., Siddiqi, I., Khurshid, K.: Towards searchable digital Urdu libraries—a word spotting based retrieval approach. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1344–1348 (2011)
Zagoris, K., Ergina, K., Papamarkos, N.: A document image retrieval system. Eng. Appl. Artif. Intell. 23(6), 872–879 (2010)
Article Google Scholar
Rath, T., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 369–376 (2004)
Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: Proceedings of the 8th Asian Conference on Computer Vision (ACCV), pp. 586–595 (2007)
Jain, A.K.: Fundamentals of Digital Image Processing. Prentice-Hall Press, Englewood Cliffs, pp. 151–154 (1989)
Discrete Fourier Transform. http://en.wikipedia.org/wiki/Discrete_Fourier_transform
Wei, H., Gao, G., Bao, Y.: A method for removing inflectional suffixes in word spotting of Mongolian Kanjur. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 88–92 (2011)
Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge UP, Cambridge, pp. 158–163 (2009)

Download references

Acknowledgments

The research is supported by the National Natural Science Foundation of China and the grant number is 60865003. And we also thank for the support of the Natural Science Foundation of Inner Mongolia Autonomous Region and the grant numbers are 2012MS0921 and 2011ZD11.

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, 010021, China
Hongxi Wei & Guanglai Gao

Authors

Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxi Wei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, H., Gao, G. A keyword retrieval system for historical Mongolian document images. IJDAR 17, 33–45 (2014). https://doi.org/10.1007/s10032-013-0203-6

Download citation

Received: 14 April 2012
Revised: 08 January 2013
Accepted: 11 February 2013
Published: 26 February 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10032-013-0203-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A keyword retrieval system for historical Mongolian document images

Abstract

Access this article

Similar content being viewed by others

Word Spotting Application in Historical Mongolian Document Images

Providing Access to Old Greek Documents Using Keyword Spotting Techniques

Segmentation-Free Keyword Retrieval in Historical Document Images

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A keyword retrieval system for historical Mongolian document images

Abstract

Access this article

Similar content being viewed by others

Word Spotting Application in Historical Mongolian Document Images

Providing Access to Old Greek Documents Using Keyword Spotting Techniques

Segmentation-Free Keyword Retrieval in Historical Document Images

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation