Skip to main content
Log in

How to handle bi/tri-lingual Indic texts in a single image? A new dataset of natural scene and born-digital images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript
  • 1 Altmetric

Abstract

Detection and language identification of multi-lingual texts in natural scene images (NSI) and born-digital images (BDI) are popular research problems in the domain of information retrieval. Several methods addressing these problems have been evaluated over the years upon mostly NSI based standard datasets. However, datasets highlighting bi/tri-lingual Indic texts in a single image are quite a few. Also, datasets housing BDIs with multi-lingual texts are hardly available. To this end, a new dataset called Mixed-lingual Indic Texts in Digital Images (MITDI) having 500 NSIs and 500 BDIs, is introduced where each image contains texts written in at least two of the either English, Bangla and Hindi languages which are quite commonly used in India. Overall, NSI pool contains 360 images with bi-lingual texts and 140 with tri-lingual texts, whereas BDI pool contains 489 images with bi-lingual texts and 11 with tri-lingual texts. To benchmark the performance on MITDI, a deep learning based Connectionist-DenseNet framework is built and evaluated for each data pool NSI, BDI and combined set. The proposed dataset can serve as an important resource for evaluating state-of-the-art methods in this domain. The dataset is publicly available at: https://github.com/NCJUCSE/MITDI

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265–283).

  2. Agrawal A, Mukherjee P, Srivastava S, & Lall B (2018). Enhanced characterness for text detection in the wild. In proceedings of 2nd international conference on Computer Vision & Image Processing (pp. 359–369). Springer, Singapore

  3. Baur C, Albarqouni S, Navab N (2017) Semi-supervised deep learning for fully convolutional networks. In international conference on medical image computing and computer-assisted intervention (pp. 311-319). Springer, Cham.

  4. Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recogn 85:172–184

    Article  Google Scholar 

  5. Bušta M, Patel Y, Matas J (2018, December) E2e-mlt-an unconstrained end-to-end method for multi-language scene text. In Asian conference on computer vision (pp. 127-143). Springer, Cham.

  6. Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach. J Ambient Intell Humaniz Comput 12:7997–8008

    Article  Google Scholar 

  7. Chakraborty N, Chatterjee A, Singh PK, Mollah AF, Sarkar R (2021) Application of daisy descriptor for language identification in the wild. Multimed Tools Appl 80(1):323–344

    Article  Google Scholar 

  8. Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In 2011 18th IEEE international conference on image processing (pp. 2609-2612). IEEE.

  9. Cheng C, Huang Q, Bai X, Feng B, Liu W (2019, September) Patch aggregator for scene text script identification. In 2019 international conference on document analysis and recognition (ICDAR) (pp. 1077-1083). IEEE.

  10. Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1):1–13

    Article  Google Scholar 

  11. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).

  12. Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In proceedings of the AAAI conference on artificial intelligence (Vol. 32, no. 1).

  13. Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vision Image Process (IJCVIP) 10(3):31–43

    Article  Google Scholar 

  14. Doulamis N, Doulamis A (2014) Semi-supervised deep learning for object tracking and classification. In 2014 IEEE international conference on image processing (ICIP) (pp. 848-852). IEEE.

  15. Dutta IN, Chakraborty N, Mollah AF, Basu S, Sarkar R (2021) BOB: a bi-level overlapped binning procedure for scene word binarization. Multimed Tools Appl 80(5):7609–7635

    Article  Google Scholar 

  16. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 2963-2970). IEEE.

  17. Fan K, Baek SJ (2018) A robust proposal generation method for text lines in natural scene images. Neurocomputing 304:47–63

    Article  Google Scholar 

  18. Fujii Y, Driesen K, Baccash J, Hurst A, Popat AC (2017, November) Sequence-to-label script identification for multilingual ocr. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 161-168). IEEE.

  19. Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In 2016 12th IAPR workshop on document analysis systems (DAS) (pp. 192-197). IEEE.

  20. Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96

    Article  Google Scholar 

  21. Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48

    Article  Google Scholar 

  22. Haifeng D, Siqi H (2020, September) Natural scene text detection based on YOLO V2 network model. In journal of physics: conference series (Vol. 1634, no. 1, p. 012013). IOP publishing.

  23. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).

  24. Huang G, Liu S, Van der Maaten L, Weinberger KQ (2018) Condensenet: an efficient densenet using learned group convolutions. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2752-2761).

  25. Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, ... & Luo Z (2017). R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579.

  26. Joan SF, Valli S (2019) A survey on text information extraction from born-digital and scene text images. Proceed National Acad Sci, India Section A: Phys Sci 89(1):77–101

    Article  Google Scholar 

  27. Jung J, Lee S, Cho MS, Kim JH (2011) Touch TT: scene text extractor using touchscreen interface. ETRI J 33(1):78–88

    Article  Google Scholar 

  28. Karatzas D, Mestre SR, Mas J, Nourbakhsh F, Roy PP (2011) ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email). In 2011 international conference on document analysis and recognition (pp. 1485-1490). IEEE.

  29. Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda LG, Mestre, S. R., ... & De Las Heras LP (2013). ICDAR 2013 robust reading competition. In 2013 12th International Conference on Document Analysis and Recognition (pp. 1484–1493). IEEE.

  30. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., ... & Valveny E (2015). ICDAR 2015 competition on robust reading. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (pp. 1156–1160). IEEE.

  31. Khan T, Mollah AF (2019) AUTNT-A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN. Multimed Tools Appl 78(22):32159–32186

    Article  Google Scholar 

  32. Khan T, Sarkar R, Mollah AF (2021) Deep learning approaches to scene text detection: a comprehensive review. Artif Intell Rev 54:3239–3298

    Article  Google Scholar 

  33. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.

  34. Liao M, Shi B, Bai X, Wang X, Liu W (2017) February. A fast text detector with a single deep neural network. In Thirty-first AAAI conference on artificial intelligence, Textboxes

  35. Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690

    Article  MathSciNet  Google Scholar 

  36. Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. Multimedia Systems 8(1):69–81

    Article  Google Scholar 

  37. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.

  38. Liu Z, Lin G, Yang S, Liu F, Lin W, Goh WL (2019) Towards robust curve text detection with conditional spatial expansion. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7269-7278).

  39. Liu Z, Zhou W, Li H (2019) Scene text detection with fully convolutional neural networks. Multimed Tools Appl 78(13):18205–18227

    Article  Google Scholar 

  40. Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679

    Article  Google Scholar 

  41. Lucas SM (2005) ICDAR 2005 text locating competition results. In eighth international conference on document analysis and recognition (ICDAR'05) (pp. 80-84). IEEE.

  42. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, … Lin X (2005) ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122

    Article  Google Scholar 

  43. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767

    Article  Google Scholar 

  44. Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In 2016 23rd international conference on pattern recognition (ICPR) (pp. 4053-4058). IEEE.

  45. Mukhopadhyay A, Kumar S, Chowdhury SR, Chakraborty N, Mollah AF, Basu S, Sarkar R (2019) Multi-lingual scene text detection using one-class classifier. Int J Comput Vision Image Process (IJCVIP) 9(2):48–65

    Article  Google Scholar 

  46. Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., ... & Ogier JM (2017). Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 1454–1459). IEEE.

  47. Nayef, N., Patel, Y., Busta, M., Chowdhury, P. N., Karatzas, D., Khlif, W., ... & Ogier, J. M. (2019). ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1582–1587). IEEE.

  48. Özgen AC, Fasounaki M, Ekenel HK (2018) Text detection in natural and computer-generated images. In 2018 26th signal processing and communications applications conference (SIU) (pp. 1-4). IEEE.

  49. Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multimed Tools Appl 78(13):18017–18036

    Article  Google Scholar 

  50. Raghunandan KS, Shivakumara P, Roy S, Kumar GH, Pal U, Lu T (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circ Syst Video Technol 29(4):1145–1162

    Article  Google Scholar 

  51. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

  52. Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Multi-lingual scene text detection and language identification. Pattern Recogn Lett 138:16–22

    Article  Google Scholar 

  53. Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. arXiv preprint arXiv:1703.01780.

  54. Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In European conference on computer vision (pp. 56-72). Springer, Cham.

  55. Veit, A., Matera, T., Neumann, L., Matas, J., & Belongie, S. (2016). Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140.

  56. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Comput Intell Neurosci 2018:1–13

    Google Scholar 

  57. Wang SH, Zhang YD (2020) DenseNet-201-based deep neural network with composite learning factor and precomputation for multiple sclerosis classification. ACM Trans Multimedia Comput, Comm, Appl (TOMM) 16(2s):1–19

    Article  Google Scholar 

  58. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In 2012 IEEE conference on computer vision and pattern recognition (pp. 1083-1090). IEEE.

  59. Zhang H, Zhao K, Song YZ, Guo J (2013) Text extraction from natural scene image: a survey. Neurocomputing 122:310–323

    Article  Google Scholar 

  60. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4159-4167).

  61. Zhang Z, Liang X, Dong X, Xie Y, Cao G (2018) A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution. IEEE Trans Med Imaging 37(6):1407–1417

    Article  Google Scholar 

  62. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5551-5560).

Download references

Acknowledgements

This work is partially supported by the CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India, PURSE-II and UPE-II, project. This work is partially funded by DBT grant (BT/PR16356/BID/7/596/2016) and DST grant (EMR/2016/007213).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neelotpal Chakraborty.

Ethics declarations

Conflict of interest

The authors state that there is no conflicts of interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chakraborty, N., Mitra, A., Choudhury, A. et al. How to handle bi/tri-lingual Indic texts in a single image? A new dataset of natural scene and born-digital images. Multimed Tools Appl 81, 15367–15394 (2022). https://doi.org/10.1007/s11042-022-12596-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12596-7

Keywords

Navigation