CNN Based Transfer Learning for Scene Script Identification

  • Maroua Tounsi
  • Ikram Moalla
  • Frank Lebourgeois
  • Adel M. Alimi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10639)

Abstract

Identifying scripts in natural images is an important step in document analysis. Recently, Convolutional Neural Network (CNN) has achieved great success in image classification tasks, due to its strong capacity and invariance to translation and distortions. A problem with training a new CNN is that it requires a large amount of labelled images and extensive computation resources. Transfer learning from pre-trained models proves to ease the application of CNN and even boost the performance in some circumstances. In this paper, we use transfer learning and fine-tuning in document analysis. Indeed, we deal with the scene script identification quantitatively by comparing the performances of transfer learning and learning from scratch. We evaluate two CNN architectures trained on natural images: AlexNet and VGG-16. Experimental results on several benchmark datasets namely, SIW-13, MLe2e and CVSI2015, demonstrate that our approach outperforms previous approaches and full training.

Keywords

Transfer learning Convolutional Neural Network Deep learning Script identification Natural scenes 

Notes

Acknowledgment

This work is performed in the framework of a thesis MOBIDOC financed by the EU under the program PASRI. The authors would like also to acknowledge the partial financial support of this work by grants from General Direction of Scientific Research (DGRT), Tunisia, under the ARUB program. The research leading to these results has received funding from the Ministry of Higher Education and Scientific Research of Tunisia under the grant agreement number LR11ES48.

References

  1. 1.
    Andrew, B., Wageeh, W., Sridha, S.: Texture for Script Identification. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1720–1732 (2005)CrossRefGoogle Scholar
  2. 2.
    Palaiahnakote, S., Ze, H.Y., Danni, Z., Tong, L., Chew, L.T.: New gradient-spatial-structural features for video script identification. Comput. Vis. Image Underst. 130, 35–53 (2015)CrossRefGoogle Scholar
  3. 3.
    Ubul, K., Tursun, G., Aysa, A., Impedovo, D., Pirlo, G., Yibulayin, I.: Script identification of multi-script documents: a survey. IEEE Access PP(99), p. 1 (2017)Google Scholar
  4. 4.
    Alex, K., Ilya, S., Georey, E.H.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held 3–6 December 2012, Lake Tahoe, Nevada, United States, pp. 1106–1114 (2012)Google Scholar
  5. 5.
    Gil, L., Tal, H.: Age and gender classification using convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2012, Boston, MA, USA, 7–12 June, pp. 34–42 (2012)Google Scholar
  6. 6.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar
  7. 7.
    Maxime, O., Leon, B., Ivan, L., Josef, S.: Learning and transferring mid-level image representations using convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, 23–28 June, pp. 1717–1724 (2014)Google Scholar
  8. 8.
    Dan, C.C., Ueli, M., Jurgen, S.: Transfer learning for Latin and Chinese characters with deep neural networks. In: The 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia, 10–15 June, pp. 1–6 (2012)Google Scholar
  9. 9.
    Yejun T. Liangrui P., Qian X., Yanwei W., Akio F.: CNN based transfer learning for historical Chinese character recognition. In: 12th IAPR Workshop on Document Analysis Systems, DAS 2016, Santorini, Greece, 11–14 April, pp. 25–29 (2016)Google Scholar
  10. 10.
    Nima, T., Jae, Y.S., Suryakanth, R., Gurudu, R., Todd, H., Christopher, B.K., Michael, B.G., Jianming, L.: Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans. Med. Imaging 35(5), 1299–1312 (2016)CrossRefGoogle Scholar
  11. 11.
    Neslihan, B., Juho, K., Janne, H.: Human Epithelial Type 2 cell classification with convolutional neural networks. In: 15th IEEE International Conference on Bioinformatics and Bioengineering, BIBE, Belgrade, Serbia, 2–4 November, pp. 1–6 (2015)Google Scholar
  12. 12.
    Hoo, C.S., Holger, R.R., Mingchen, G., Le, L., Ziyue, X., Isabella, N., Jianhua, Y., Daniel, J.M., Ronald, M.S.: Deep convolutional neural networks for computer aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016)CrossRefGoogle Scholar
  13. 13.
    Ross, B.G., Jeff, D., Trevor, D., Jitendra, M.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, 23–28 June, pp. 580–587 (2016)Google Scholar
  14. 14.
    Ali, S.R., Hossein, A., Josephine, S., Stefan, C.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2014, Columbus, OH, USA, 23–28 June, pp. 512–519 (2014)Google Scholar
  15. 15.
    Nabin, S., Ranju, M., Rabi, S., Umapada, P., Michael, B.: ICDAR 2015 competition on video script identification (CVSI 2015). In: 13th International Conference on Document Analysis and Recognition, ICDAR 2015, Nancy, France, 23–26 August, pp. 1196–1200 (2015)Google Scholar
  16. 16.
    Shi, B., Cong, Y., Chengquan, Z., Xiaowei, G., Feiyue, H., Xiang, B.: Script identification in the wild via discriminative convolutional neural network. Pattern Recogn. 52, 448–458 (2016)CrossRefGoogle Scholar
  17. 17.
    Louis, G.B., Anguelos, N., Dimosthenis, K.: Boosting patch-based scene text script identification with ensembles of conjoined networks. CoRR abs/1602.07480. (2016)Google Scholar
  18. 18.
    Nabin, S., Sukalpa, C., Umapada, P., Michael, B.: Word-wise script identification from video frames. In: 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013, pp. 867–871 (2013)Google Scholar
  19. 19.
    Baoguang, S., Cong, Y., Chengquan, Z., Xiaowei, G., Feiyue, H., Xiang, B.: Automatic script identification in the wild. In: 13th International Conference on Document Analysis and Recognition, ICDAR 2015, Nancy, France, 23–26 August 2015, pp. 531–535. (2013)Google Scholar
  20. 20.
    Nabin, S., Ranju, M., Rabi, S., Umapada, P., Michael, B.: Bag-of-Visual Words for word-wise video script identification: A study. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, 12–17 July 2015, pp. 1–7. (2015)Google Scholar
  21. 21.
    Lluis, G.B., Dimosthenis, K.: A fine-grained approach to scene text script identification. In: 12th IAPR Workshop on Document Analysis Systems, DAS 2016, Santorini, Greece, 11–14 April 2016, pp. 192–197 (2016)Google Scholar
  22. 22.
    Jieru, M., Luo, D., Baoguang, S., Xiang, B.: Scene text script identification with Convolutional Recurrent Neural Networks. In: 23rd International Conference on Pattern Recognition, ICPR 2016, Cancun, Mexico, 4–8 December 2016, pp. 4053–4058 (2016)Google Scholar
  23. 23.
    Karen, S., Andrew, Z.: Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014)Google Scholar
  24. 24.
    Nicolaou, A., Bagdanov, A.D., Louis, G., Karatzas, D.: Visual script and language identification. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 393–398 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Maroua Tounsi
    • 1
  • Ikram Moalla
    • 1
  • Frank Lebourgeois
    • 2
  • Adel M. Alimi
    • 1
  1. 1.REGIM-Laboratory: REsearch Groups in Intelligent Machines, National Engineering School of Sfax (ENIS)University of SfaxSfaxTunisia
  2. 2.Laboratoire d’InfoRmatique en Images et Systmes d’information (LIRIS)INSA of LyonVilleurbanneFrance

Personalised recommendations