Skip to main content
Log in

LWSINet: A deep learning-based approach towards video script identification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Videos – a high volume of texts – broadcast via different media, such as television and the internet. Since Optical Character Recognition (OCR) engines are script-dependent, script identification is a precursor. Other than that, video script identification is not trivial as we have difficult issues, such as low resolution, complex background, noise, and blur effects. In this work, a deep learning-based system, which we call LWSINet: LightWeight Script Identification Network (6-layered CNN) is proposed to identify video scripts. For validation, we used a publicly available dataset named CVSI-15. Besides, the effects of three common noises namely, Salt & pepper, Gaussian and Poisson were considered on the scripts along with their hybridized metamorphosis. In our test results, we observed that the proposed CNN is coherent and robust enough to identify scripts in both scenarios, with and without noise. Further, we also employed other well-known handcrafted feature-based and deep learning approaches for a comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

References

  1. Acharjya D, Anitha A (2017) A comparative study of statistical and rough computing models in predictive data analysis. Int J Ambient Comput Intell (IJACI) 8(2):32–51

    Article  Google Scholar 

  2. Awad A (2019) Denoising images corrupted with impulse Gaussian, or a mixture of impulse and Gaussian noise. Eng Sci Technol Int J 22(3):746–753

    Google Scholar 

  3. Baljozović D, Kovačević B, Baljozović A (2013) Mixed noise removal filter for multi-channel images based on halfspace deepest location. IET Image Process 7(4):310–323

    Article  MathSciNet  Google Scholar 

  4. Basu S, Sarkar R, Das N, Kundu M, Nasipuri M, Basu DK (2005) Handwritten Bangla digit recognition using classifier combination through DS technique. In: International conference on pattern recognition and machine intelligence. Springer, Berlin, pp. 236–241

  5. Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recogn 85:172–184

    Article  Google Scholar 

  6. Castellano G, De Caro D, Esposito D, Bifulco P, Napoli E, Petra N, Andreozzi E, Cesarelli M, Strollo AG (2019) An FPGA-oriented Algorithm for real-time filtering of poisson noise in video streams, with application to X-ray fluoroscopy. Circ Syst Signal Process 38(7):3269–3294

    Article  Google Scholar 

  7. Cheriet M, Suen CY (1993) Extraction of key letters for cursive script recognition. Pattern Recogn Lett 14(12):1009–1017

    Article  Google Scholar 

  8. Ghosh D, Dube T, Shivaprasad A (2010) Script recognition—a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161

    Article  Google Scholar 

  9. Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96

    Article  Google Scholar 

  10. Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Proc Mag 35(1):84–100

    Article  Google Scholar 

  11. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1580–1589

  12. Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, vasudevan v et al (2019) Searching for mobilenetv3 In: Proceedings of the IEEE International Conference on Computer Vision, pp 1314–1324

  13. Khaliq A, Ehsan S, Chen Z, Milford M, McDonald-Maier K (2019) A holistic visual place recognition approach using lightweight cnns for significant viewpoint and appearance changes. IEEE Trans Robot 36(2):561–569

    Article  Google Scholar 

  14. Kim WY, Kim YS (2000) A region-based shape descriptor using Zernike moments. Signal Process Image Commun 16(1-2):95–102

    Article  Google Scholar 

  15. Kingma DP, Ba JA (2014) A method for stochastic optimization. arXiv:1412.6980

  16. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  17. Liu Y, Xiao H, Wang W, Zhang M (2015) A robust motion detection algorithm on noisy videos. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1563–1567

  18. Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679

    Article  Google Scholar 

  19. Luisier F, Blu T, Unser M (2010) Image denoising in mixed Poisson–Gaussian noise. IEEE Trans Image Process 20(3):696–708

    Article  MathSciNet  Google Scholar 

  20. Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, pp 4053–4058

  21. Müller H, Müller W, Squire DM, Marchand-Maillet S, Pun T (2001) Performance evaluation in content-based image retrieval: overview and proposals. Pattern Recogn Lett 22(5):593–601

    Article  Google Scholar 

  22. Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241

    Article  Google Scholar 

  23. Obaidullah SM, Mondal A, Das N, Roy K (2014) Script identification from printed Indian document images and performance evaluation using different classifiers. Applied Computational Intelligence and Soft Computing

  24. Obaidullah SM, Santosh KC, Das N, Halder C, Roy K (2018) Handwritten Indic script identification in multi-script document images: A survey. Int J Pattern Recogn Artif Intell 32(10):1856012

  25. Obaidullah SM, Halder C, Santosh KC, Das N, Roy K (2018) PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimed Tools Appl 77(2):1643–1678

    Article  Google Scholar 

  26. Obaidullah SM, Santosh KC, Halder C, Das N, Roy K (2019) Automatic Indic script identification from handwritten documents: page, block, line and word-level approach. Int J Mach Learn Cybern 10(1):87–106

    Article  Google Scholar 

  27. Pal U, Sinha S, Chaudhuri BB (2003) Multi-script line identification from Indian documents. In: Seventh International Conference on Document Analysis and Recognition, 2003 Proceedings. IEEE, pp 880–884

  28. Pal U, Roy PP, Tripathy N, lladós J (2010) Multi-oriented Bangla and Devnagari text recognition. Pattern Recogn 43(12):4124–4136

    Article  Google Scholar 

  29. Petrovska B, Atanasova-Pacemska T, Corizzo R, Mignone P, Lameski P, Zdravevski E (2020) Aerial scene classification through fine-tuning with adaptive learning rates and label smoothing. Appl Sci 10(17):5792

    Article  Google Scholar 

  30. Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogr Remote Sens 67:93–104

    Article  Google Scholar 

  31. Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Syst Appl 42(13):5554–5566

    Article  Google Scholar 

  32. Sharma N, Shivakumara P, Pal U, Blumenstein M, Tan CL (2012) A new method forword segmentation from arbitrarily-oriented video text lines. In: 2012 International conference on digital image computing techniques and applications (DICTA). IEEE, pp 1–8

  33. Sharma N, Pal U, Blumenstein M (2014) A study on word-level multi-script identification from video frames. In: 2014 International joint conference on neural networks(IJCNN). IEEE, pp 1827–1833

  34. Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) ICDAR2015 competition on video script identification (CVSI. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1196–1200

  35. Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 531–535

  36. Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458

    Article  Google Scholar 

  37. Shijian L, Tan CL (2007) Script and language identification in noisy and degraded document images. IEEE Trans Pattern Anal Mach Intell 30(1):14–24

    Article  Google Scholar 

  38. Shivakumara P, Sharma N, Pal U, Blumenstein M, Tan CL (2014) Gradient-angular-features for word-wise video script identification. In: 2014 22nd International Conference on Pattern Recognition. IEEE, pp 3098–3103

  39. Shivakumara P, Yuan Z, Zhao D, Lu T, Tan CL (2015) New gradient-spatial-structural features for video script identification. Comput Vis Image Underst 130:35–53

    Article  Google Scholar 

  40. Singh PK, Sarkar R, Das N, Basu S, Kundu M, Nasipuri M (2018) Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images. Multimed Tools Appl 77(7):8441–8473

    Article  Google Scholar 

  41. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261

  42. Soh LK, Tsatsoulis C (1999) Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans Geosci Remote Sens 37 (2):780–795

    Article  Google Scholar 

  43. Thanh DNH, Dvoenko SD (2016) A method of total variation to remove the mixed Poisson-Gaussian noise. Pattern Recogn Image Anal 26(2):285–293

    Article  Google Scholar 

  44. Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Das N (2018) Deep learning for word-level handwritten Indic script identification. arXiv:1801.01627

  45. Ul-Hasan A, Afzal MZ, Shafait F, Liwicki M, Breuel TM (2015) A sequence learning approach for multiple script identification. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1046–1050

  46. Wojna Z, Gorban AN, Lee DS, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 1. IEEE, pp 844–850

  47. Wong EK, Chen M (2003) A new robust algorithm for video text extraction. Pattern Recogn 36(6):1397–1406

    Article  Google Scholar 

  48. Yeung S, Ramanathan V, Russakovsky O, Shen L, Mori G, Fei-Fei L (2017) Learning to learn from noisy web videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5154–5162

  49. Zhang P, Shi Z, Gao H (2018) Research on Text Location and Recognition in Natural Images with Deep Learning. In: Proceedings of the 2nd International Conference on Advances in Artificial Intelligence, pp 1–6

  50. Zhou L, Lu Y, Tan CL (2006) Bangla/ english script identification based on analysis of connected component profiles. In: International workshop on document analysis systems. Springer, pp 243–254

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaushik Roy.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, M., Mukherjee, H., Obaidullah, S.M. et al. LWSINet: A deep learning-based approach towards video script identification. Multimed Tools Appl 80, 29095–29128 (2021). https://doi.org/10.1007/s11042-021-11103-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11103-8

Keywords

Navigation