The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

Lu, Min; Bao, Feilong; Zhang, Hui; Gao, Guanglai

doi:10.1007/s10032-023-00450-x

The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

Original Paper
Published: 07 September 2023

(2023)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Min Lu^1,2,3,4,
Feilong Bao^2,3,4,
Hui Zhang^2,3,4 &
…
Guanglai Gao^2,3,4

196 Accesses
1 Citation
Explore all metrics

Abstract

OCR approaches have been widely advanced in recent years thanks to the resurgence of deep learning. However, to the best of our knowledge, there is little work on Mongolian movable-type document recognition. One major hurdle is the lack of a domain-specific well-labeled set for training robust models. This paper aims to create the first Mongolian movable type text-image dataset for OCR research. We collated 771 paragraph-level pages segmented from 34 newspapers from 1947 to 1952. For each page, word- and line-level text transcriptions and boundary annotations are recorded. It consists of 86,578 word appearances and 9711 text-line images in total. The vocabulary is 7964. The dataset was finally established from scratch through image collection, text transcription, text-image alignment and manual correction. Moreover, an official train and test set partition is defined on which the typical text segmentation and recognition experiments are tested to set the strong baselines. This dataset is available for research, and we encourage researchers to develop and test new methods using our dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

Article Open access 13 September 2023

Notes

http://www.iam.unibe.ch/fki/databases/iam-historical-document-database.
https://www.primaresearch.org/datasets/ENP.
https://www.digitization.edu.
Latin: oNgerebel, Unicodes: U+1825 U+1829 U+182D U+1821 U+1837 U+1821 U+182A U+1821 U+182F, meaning: walk over, past.
Latin:tologelegqid, Unicodes: U+1832 U+1825 U+182F U+1825 U+182D U+1821 U+182F U+1821 U+182D U+1834 U+1822 U+1833, meaning: Representatives.
Latin: undusuten, Unicodes: U+1826 U+1828 U+1833 U+1826 U+1830 U+1826 U+1832 U+1821 U+1828, meaning: nation, race.
Latin: uiledburilel, Unicodes: U+1826 U+1822 U+182F U+1821 U+1833 U+182A U+1826 U+1837 U+1822 U+182F U+1821 U+182F,meaning: Industry.
Latin: beyeleguluged, Unicodes: U+182A U+1821 U+1836 U+1821 U+182F U+1821 U+182D U+1826 U+182F U+1826 U+182D U+1821 U+1833, meaning: finished.
Latin: burilduhun, Unicodes: U+182A U+1826 U+1837 U+1822 U+182F U+1833 U+1826 U+182C U+1826 U+1828, meaning: component.
Latin: bvlbasvrajv, U+182A U+1824 U+182F U+182A U+1820 U+1830 U+1824 U+1837 U+1820 U+1835 U+1824, meaning: training, exercising.
Latin: yabvgvlvgsan, Unicodes: 1836 U+1820 U+182A U+1824 U+182D U+1824 U+182F U+1824 U+182D U+1830 U+1820 U+1828, meaning: let sb go.
Latin: yarilqahv, Unicodes: 1836 U+1820 U+1837 U+1822 U+182F U+1834 U+1820 U+182C U+1824, meaning: talk.
Latin: tegsidhen, Unicodes: U+1832 U+1821 U+182D U+1830 U+1822 U+1833 U+182C U+1821 U+1828, meaning: average.

References

Al-Dmour, A., Zitar, R.A.: Word extraction from Arabic handwritten documents based on statistical measures. Int. Rev. Comput. Softw. 11(5), 1–10 (2016)
Google Scholar
Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4715–4723 (2019)
Biller, O., Asi, A., Kedem, K., El-Sana, J., Dinstein, I.: Webgt: An interactive web-based system for historical document ground truth generation. In: Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, pp. 305–308. IEEE (2013)
Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 71–79 (2018)
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)
Clausner, C., Papadopoulos, C., Pletschacher, S., Antonacopoulos, A.: The enp image and ground truth dataset of historical newspapers. In: Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 931–935. IEEE (2015)
Clausner, C., Pletschacher, S., Antonacopoulos, A.: Aletheia-an advanced document layout and text ground-truthing system for production environments. In: Proceedings of the 2011 11th International Conference on Document Analysis and Recognition, pp. 48–52. IEEE (2011)
D., F., G, G., H, W.: Mhw Mongolian off-line handwriting database and its application. J. Chin. Inf. Process. 32(1), 7 (2018)
Daoerji, F., Guanglai, G.: DNN-HMM for large vocabulary Mongolian offline handwriting recognition. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 72–77. IEEE (2016)
Fischer, A., Frinken, V., Fornés, A., Bunke, H.: Transcription alignment of latin manuscripts using hidden markov models. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, pp. 29–36 (2011)
Fischer, A., Wuthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: 2009 15th International Conference on Virtual Systems and Multimedia, pp. 137–142. IEEE (2009)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp. 369–376 (2006)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Kassis, M., Abdalhaleem, A., Droby, A., Alaasam, R., El-Sana, J.: Vml-hd: The historical arabic documents dataset for recognition systems. In: 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), pp. 11–14. IEEE (2017)
Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239 (2016)
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Article MathSciNet MATH Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 11474–11481 (2020)
Liu, J., Liu, X., Sheng, J., Liang, D., Li, X., Liu, Q.: Pyramid mask text detector. arXiv preprint arXiv:1903.11800 (2019)
Liwicki, M., Indermuhle, E., Bunke, H.: On-line handwritten text line detection using dynamic programming. In: Proceedings of the 2007 9th International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 1, pp. 447–451. IEEE (2007)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2015)
Lu, M., Bao, F., Gao, G., Wang, W., Zhang, H.: An automatic spelling correction method for classical mongolian. In: International Conference on Knowledge Science, Engineering and Management, pp. 201–214. Springer (2019)
Marti, U.V., Bunke, H.: Text line segmentation and word recognition in a system for general writer independent handwriting recognition. In: Proceedings of the 2001 6th International Conference on Document Analysis and Recognition, pp. 159–163. IEEE (2001)
Nagy, G., Seth, S.: A prototype document image analysis system for technical journals. IEEE Computer 25(7), 10–22 (1992)
Article Google Scholar
Ostu, N.: A threshold selection method from gray-histogram. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article Google Scholar
Papadopoulos, C., Pletschacher, S., Clausner, C., Antonacopoulos, A.: The impact dataset of historical document images. In: Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, pp. 123–130 (2013)
Papavassiliou, V., Stafylakis, T., Katsouros, V., Carayannis, G.: Handwritten document image segmentation into text lines and words. Pattern Recognit. 43(1), 369–377 (2010)
Article MATH Google Scholar
Peng, L., Liu, C., Ding, X., Jin, J., Wu, Y., Wang, H., Bao, Y.: Multi-font printed Mongolian document recognition system. Int. J. Doc. Anal. Recognit. (IJDAR) 13(2), 93–106 (2010)
Article Google Scholar
Pletschacher, S., Antonacopoulos, A.: The page (page analysis and ground-truth elements) format framework. In: 20th International Conference on Pattern Recognition, ICPR 2010, Istanbul, Turkey, 23–26 August 2010 (2010)
Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recognit. (IJDAR) 9(2), 139–152 (2007)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 1 (2015)
Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Stamatopoulos, N., Louloudis, G., Gatos, B.: Efficient transcript mapping to ease the creation of document image segmentation ground truth with text-image alignment. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 226–231. IEEE (2010)
Wei, H., Zhang, H., Gao, G.: Representing word image using visual word embeddings and rnn for keyword spotting on historical document images. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1368–1373. IEEE (2017)
Yang, F., Bao, F., Gao, G.: Online handwritten mongolian character recognition using cma-mohr and coordinate processing. In: 2020 International Conference on Asian Language Processing (IALP), pp. 30–33. IEEE (2020)
Zhang, H., Wei, H., Bao, F., Gao, G.: Segmentation-free printed traditional mongolian ocr using sequence to sequence with attention model. In: Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1, pp. 585–590. IEEE (2017)
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Zimmermann, M., Bunke, H.: Automatic segmentation of the iam off-line database for handwritten english text. In: 2002 International Conference on Pattern Recognition, Vol. 4, pp. 35–39. IEEE (2002)
Zinger, S., Nerbonne, J., Schomaker, L.: Text-image alignment for historical handwritten documents. In: Document recognition and retrieval XVI, Vol. 7247, p. 724703. International Society for Optics and Photonics (2009)

Download references

Author information

Authors and Affiliations

School of Information Engineering, Inner Mongolia University of Technology, Hohhot, 010051, Inner Mongolia, People’s Republic of China
Min Lu
College of Computer Science, Inner Mongolia University, Hohhot, 010021, Inner Mongolia, People’s Republic of China
Min Lu, Feilong Bao, Hui Zhang & Guanglai Gao
National & Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian, Hohhot, 010021, Inner Mongolia, People’s Republic of China
Min Lu, Feilong Bao, Hui Zhang & Guanglai Gao
Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, Hohhot, Inner Mongolia, 010021, People’s Republic of China
Min Lu, Feilong Bao, Hui Zhang & Guanglai Gao

Authors

Min Lu
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Bao
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feilong Bao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lu, M., Bao, F., Zhang, H. et al. The image and ground truth dataset of Mongolian movable-type newspapers for text recognition. IJDAR (2023). https://doi.org/10.1007/s10032-023-00450-x

Download citation

Received: 01 September 2022
Revised: 27 December 2022
Accepted: 11 August 2023
Published: 07 September 2023
DOI: https://doi.org/10.1007/s10032-023-00450-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation