Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 22, pp 32159–32186 | Cite as

AUTNT - A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN

  • Tauseef KhanEmail author
  • Ayatullah Faruk Mollah
Article
  • 31 Downloads

Abstract

Automated scene text recognition from camera images is considered as a pioneer research area through last few decades. Classification of foreground object components from camera images is an essential step of Text Information Extraction (TIE). Text/Non-text separation from complex document images as well as unstructured natural images is still a challenging task. Although, some works have been reported in this direction, component level standard benchmark datasets for specifically text/non-text classification are not available. In this paper, a new multi-script dataset of text and non-text components have been reported along with multi-purpose ground truth annotations. A novel feature set is also designed on the basis of distance information of medial skeleton points to set benchmark performance on this dataset. Also, a Deep Convolution Neural Network (D-CNN) based automated feature extraction and classification framework is developed for benchmarking purpose. More insight is put forward by conducting separate assessment of current two benchmark methods on component images originated from documents and natural scenes. Experimental results show that classification accuracy is over 94.00% for medial skeleton based feature descriptors and over 96.00% for D-CNN framework on both types of sources, which is pretty impressive in practical scenario.

Keywords

Text Non-text classification Distance Transform (DT) Medial Skeleton Burning Rope Algorithm Deep Convolution Neural Network 

Notes

Acknowledgements

The authors are thankful to the Department of Computer Science and Engineering of Aliah University for providing every support for carrying out this work. The first author is also thankful to University Grant Commission (UGC), Govt. of India for granting Maulana Azad National Fellowship (MANF) to him.

References

  1. 1.
    Agrawal M, Doermann D (2009) Voronoi++: A dynamic page segmentation approach based on voronoi and docstrum features. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, pp. 1011–1015, IEEEGoogle Scholar
  2. 2.
    AU Text Non-Text Dataset, https://github.com/iilabau/AUTNTdataset
  3. 3.
    Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. In: Pattern Recognition, vol 66, pp 437–446, ElsevierGoogle Scholar
  4. 4.
    Baird HS, Jones SE, Fortune SJ (1990) Image segmentation by shape-directed covers. In: Proceedings of the 10th International Conference on Pattern Recognition, vol.1, pp. 820–825, IEEEGoogle Scholar
  5. 5.
    Bhowmik S, Sarkar R, Nasipuri M, Doermann D (2018) Text and non-text separation in offline document images: a survey. Int J Doc Anal Recognit 21(1–2):1–20, SpringerCrossRefGoogle Scholar
  6. 6.
    Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. II-II, IEEEGoogle Scholar
  7. 7.
    Cheng H, Bouman CA (2001) Multiscale Bayesian segmentation using a trainable context model. IEEE Trans Image Process 10(4):511–525, IEEECrossRefGoogle Scholar
  8. 8.
    Cheng P, Wang W (2018) A Multi-Oriented Scene Text Detector with Position-Sensitive Segmentation. In: Proceedings of the International Conference on Multimedia Retrieval, pp. 152–159, ACMGoogle Scholar
  9. 9.
    Delaye A, Liu CL (2014) Contextual text/non-text stroke classification in online handwritten notes with conditional random fields. Pattern Recogn 47(3):959–968, ElsevierCrossRefGoogle Scholar
  10. 10.
    Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 2963–2970, IEEEGoogle Scholar
  11. 11.
    Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377, ElsevierCrossRefGoogle Scholar
  12. 12.
    He W, Zhang XY, Yin F, Liu CL (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the International Conference on Computer Vision, pp. 745–753, IEEEGoogle Scholar
  13. 13.
    He D, Yang X, Liang C, Zhou Z, Ororbi AG, Kifer D, Lee Giles C (2017) Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 3519–3528, IEEEGoogle Scholar
  14. 14.
    Hua XS, Wenyin L, Zhang HJ (2004) An automatic performance evaluation protocol for video text detection algorithms. IEEE Trans Circuits Syst Video Technol 14(4):498–507CrossRefGoogle Scholar
  15. 15.
    Huang R, Shivakumara P, Uchida S (2013) Scene character detection by an edge-ray filter. In: Proceedings of 12th International Conference on Document Analysis and Recognition, pp. 462–466, IEEEGoogle Scholar
  16. 16.
    Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of International Conference on Computer Vision, pp. 1241–1248, IEEEGoogle Scholar
  17. 17.
    Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. IN: arXiv preprint arXiv:1406.2227Google Scholar
  18. 18.
    Kise K, Sato A, Iwata M (1998) Segmentation of page images using the area Voronoi diagram. Comput Vis Image Underst 70(3):370–382, ElsevierCrossRefGoogle Scholar
  19. 19.
    Koo HI, Kim DH (2013) Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 22(6):2296–2305, IEEEMathSciNetCrossRefGoogle Scholar
  20. 20.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  21. 21.
    Lee SW, Ryu DS (2001) Parameter-free geometric document layout analysis. IEEE Trans Pattern Anal Mach Intell 23(11):1240–1256, IEEECrossRefGoogle Scholar
  22. 22.
    Lee S, Cho MS, Jung K, Kim JH (2010) Scene text extraction with edge constraint and text collinearity. In: Proceedings of 20th International Conference on Pattern Recognition, pp. 3983–3986, IEEEGoogle Scholar
  23. 23.
    Li Y, Lu H (2012) Scene text detection via stroke width. In: Proceedings of 21st International Conference on Pattern Recognition, pp. 681–684, IEEEGoogle Scholar
  24. 24.
    Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: proceedings of the 31st International Conference on Artificial IntelligenceGoogle Scholar
  25. 25.
    Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690MathSciNetCrossRefGoogle Scholar
  26. 26.
    Liu Y, Jin L (2017) Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 1962–1969, IEEEGoogle Scholar
  27. 27.
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Proceedings of the European Conference on Computer Vision, pp. 21–37, SpringerGoogle Scholar
  28. 28.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, IEEEGoogle Scholar
  29. 29.
    Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) robust reading competitions. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 682–687, IEEEGoogle Scholar
  30. 30.
    Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 7553–7563, IEEEGoogle Scholar
  31. 31.
    Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia 20(11):3111–3122, IEEECrossRefGoogle Scholar
  32. 32.
    Maurer CR, Qi R, Raghavan V (2003) A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions. IEEE Trans Pattern Anal Mach Intell 25(2):265–270, IEEECrossRefGoogle Scholar
  33. 33.
    Nagy G, Seth SC, Stoddard SD (1986) Document Analysis with an Expert System. Pattern Recogn Pract II:149–159Google Scholar
  34. 34.
    Nagy G, Seth S, Viswanathan M (1992) A prototype document image analysis system for technical journals. CSE J Articles 25(7):10–22Google Scholar
  35. 35.
    Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: International Conference on Computer Vision and Pattern Recognition, pp. 3538–3545, IEEEGoogle Scholar
  36. 36.
    Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66, IEEECrossRefGoogle Scholar
  37. 37.
    Pan YF, Hou X, Liu CL (2011) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813, IEEEMathSciNetCrossRefGoogle Scholar
  38. 38.
    Park J, Lee G, Kim E, Lim J, Kim S, Yang H, Lee M, Hwang S (2010) Automatic detection and recognition of Korean text in outdoor signboard images. Pattern Recogn Lett 31(12):1728–1739, ElsevierCrossRefGoogle Scholar
  39. 39.
    Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. In: Multimedia Tools and Applications, pp.1–20, SpringerGoogle Scholar
  40. 40.
    Qin H, Zhang H, Wang H, Yan Y, Zhang M, Zhao W (2019) An Algorithm for Scene Text Detection Using Multibox and Semantic Segmentation. Appl Sci 9(6):1054CrossRefGoogle Scholar
  41. 41.
    Sarkar R, Moulik S, Das N, Basu S, Nasipuri M, Kundu M (2011) Suppression of non-text components in handwritten document images. In: Proceedings of International Conference on Image Information Processing, pp. 1–7, IEEEGoogle Scholar
  42. 42.
    Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 2550–2558, IEEEGoogle Scholar
  43. 43.
    Shivakumara P, Huang W, Tan CL (2008) Efficient video text detection using edge features. In: Proceedings of 19th International Conference on Pattern Recognition, pp. 1–4, IEEEGoogle Scholar
  44. 44.
    Shivakumara P, Phan TQ, Tan CL (2011) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419, IEEECrossRefGoogle Scholar
  45. 45.
    Simon A, Pret JC, Johnson AP (1997) A fast algorithm for bottom-up document layout analysis. IEEE Trans Pattern Anal Mach Intell 19(3):273–277, IEEECrossRefGoogle Scholar
  46. 46.
    Subramanian K, Natarajan P, Decerbo M, Castanon D (2007) Character-stroke detection for text-localization and extraction. In: Proceedings of 9th International Conference on Document Analysis and Recognition, pp. 33–37, IEEEGoogle Scholar
  47. 47.
    Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: Proceedings of the European Conference on Computer Vision, pp. 56–72, SpringerGoogle Scholar
  48. 48.
    Tran TA, Na IS, Kim SH (2015) Separation of text and non-text in document layout analysis using a recursive filter. KSII Trans Internet Inf Syst 9(10):4072–4091Google Scholar
  49. 49.
    Wahl FM, Wong KY, Casey RG (1982) Block segmentation and text extraction in mixed text/image documents. Comput Graphics Image Process 20(4):375–390CrossRefGoogle Scholar
  50. 50.
    Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019) TextField: Learning A Deep Direction Field for Irregular Scene Text Detection. In: IEEE Transactions on Image Processing, IEEECrossRefGoogle Scholar
  51. 51.
    Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 1083–1090, IEEEGoogle Scholar
  52. 52.
    Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. In: arXiv preprint arXiv:1606.09002Google Scholar
  53. 53.
    Ye Q, Huang Q, Gao W, Zhao D (2005) Fast and robust text detection in images and video frames. Image Vis Comput 23(6):565–576, ElsevierCrossRefGoogle Scholar
  54. 54.
    Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605, IEEEMathSciNetCrossRefGoogle Scholar
  55. 55.
    Zhang C, Yao C, Shi B, Bai X (2015) Automatic discrimination of text and non-text natural images. In: Proceedings of 13th International Conference on Document Analysis and Recognition, pp. 886–890, IEEEGoogle Scholar
  56. 56.
    Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 4159–4167, IEEEGoogle Scholar
  57. 57.
    Zhao F, Yang Y, Zhang HY, Yang LL, Zhang L (2018) Sign text detection in street view images using an integrated feature. Multimed Tools Appl 77:28049–28076, SpringerCrossRefGoogle Scholar
  58. 58.
    Zhong Y, Zhang H, Jain AK (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Anal Mach Intell 22(4):385–392, IEEECrossRefGoogle Scholar
  59. 59.
    Zhong Z, Jin L, Zhang S, Feng Z (2016) Deeptext: A unified framework for text proposal generation and text detection in natural images. In: arXiv preprint arXiv:1605.07314Google Scholar
  60. 60.
    Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 5551–5560, IEEEGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAliah UniversityKolkataIndia

Personalised recommendations