Multimedia Tools and Applications

, Volume 75, Issue 20, pp 12815–12829 | Cite as

Texture feature-based text region segmentation in social multimedia data

  • Sul-Ho Kim
  • Kwon-Jae An
  • Seok-Woo Jang
  • Gye-Young Kim


This paper proposes a method of effectively segmenting text areas that exist in images by using the texture features of various types of input images obtained in social multimedia networks with an artificial neural network. The proposed text segmentation method consists of four main steps: a step for extracting candidate text areas, a step for localizing the text areas, a step for separating the text from the background, and a step for verifying the candidate text areas. In the candidate text area extraction step, candidate blocks that have any text areas are segmented in an input image on the basis of the texture features of the candidate blocks. In the text area localization step, only strings are extracted from the candidate text blocks. In the text and background separation step, the text areas are separated from the background area in the localized text blocks. In the candidate text area verification step, an artificial neural network is used to verify whether the extracted text blocks include actual text areas and exclude non-text areas. In the experimental results, the proposed method was applied to various types of news and non-news images, and it was found that the proposed method extracted text regions more accurately than existing methods.


Social multimedia Artificial neural network Candidate region Background 



This work was supported by the ICT R&D program of MSIP/IITP. [2014(R0112-14-1014), The Development of Open Platform for Service of Convergence Contents.


  1. 1.
    Affonso C, Sassi RJ, Barreiros RM (2015) Biological image classification using rough-fuzzy artificial neural network. Expert Syst Appl 42(24):9482–9488CrossRefGoogle Scholar
  2. 2.
    Aggoune A, Bouramoul A, Kholladi MK (2014) Personalized indexing for heterogeneous multimedia data. In: Proc. International Symposium on Concepts and Tools for knowledge Management (ISKO-Maghreb), 1–7Google Scholar
  3. 3.
    Angadi SA, Kodabagi MM (2014) A robust segmentation technique for line, word and character extraction from Kannada text in low resolution display board images. In: Proc. International Conference on Signal and Image Processing (ICSIP), 42–49Google Scholar
  4. 4.
    Chen D, Odobez JM, Bourlard H (2004) Text detection and recognition in images and video frames. Pattern Recogn 37(3):595–608CrossRefGoogle Scholar
  5. 5.
    Dan Z (2013) Improving the accuracy in software effort estimation: using artificial neural network model based on particle swarm optimization. In: Proc. International Conference on Service Operations and Logistics, and Informatics (SOLI), 180–185Google Scholar
  6. 6.
    Deng C, Ma W, Yin Y (2011) An edge detection approach of image fusion based on improved Sobel operator. In: Proc. International Conference on Image and Signal Processing (CISP), 3:1189–1193Google Scholar
  7. 7.
    Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling Internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233CrossRefGoogle Scholar
  8. 8.
    Haneda E, Bouman CA (2011) Text segmentation for MRC document compression. IEEE Trans Image Process 20(6):611–1626MathSciNetCrossRefGoogle Scholar
  9. 9.
    Herrera PJ, Pajares G, Guijarro M (2011) A segmentation method using Otsu and fuzzy k-Means for stereovision matching in hemispherical images from forest environments. Appl Soft Comput 11(8):4738–4747CrossRefGoogle Scholar
  10. 10.
    Hsia SC, Ho CN, Liu CH (2014) Real-time text detection using PAC/DUE embedded system. In: Proc. International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 321–324Google Scholar
  11. 11.
    Huang X (2011) A novel video text extraction approach based on Log-Gabor filters. In: Proc. International Congress on Image and Signal Processing (CISP), 1:474–478Google Scholar
  12. 12.
    Huang S, Ahmadi M, Sid-Ahmed MA (2008) A hidden Markov model-based character extraction method. Pattern Recogn 41(9):2890–2900CrossRefMATHGoogle Scholar
  13. 13.
    Huang X, Yang L, Yang Z (2009) A method of text segmentation from scanned image with complex background. In: Proc. International Conference on Management and Service Science (MASS), 1–4Google Scholar
  14. 14.
    Ilkucar M, Isik AH, Cifci A (2014) Classification of breast cancer data with harmony search and back propagation based artificial neural network. In: Proc. International Conference on Signal Processing and Communications Applications (SIU), 762–765Google Scholar
  15. 15.
    Jee HK, Lim S, Youn J, Lee J (2014) An augmented reality-based authoring tool for E-learning applications. Multimed Tool Appl 68(2):225–235CrossRefGoogle Scholar
  16. 16.
    Jiang N, Yang W, Duan L, Xu X, Huang C, Liu Q (2012) Acceleration of CT reconstruction for wheat tiller inspection based on adaptive minimum enclosing rectangle. Comput Electron Agr 85:123–133CrossRefGoogle Scholar
  17. 17.
    Kim W, Kim C (2009) A new approach for overlay text detection and extraction from complex video scene. IEEE Trans Image Process 18(2):401–411MathSciNetCrossRefGoogle Scholar
  18. 18.
    Kim T, Kim EJ (2015) Hybrid storage-based caching strategy for content delivery network services. Multimed Tool Appl 74(5):1697–1709CrossRefGoogle Scholar
  19. 19.
    Kim WJ, Kim SD, Radha H (2008) 3D binary morphological operations using run-length representation. Signal Process Image Commun 23(6):442–450CrossRefGoogle Scholar
  20. 20.
    Kolesnikov A, Trichina E, Kauranne T (2015) Estimating the number of clusters in a numerical data set via quantization error modeling. Pattern Recogn 48(3):941–952CrossRefGoogle Scholar
  21. 21.
    Li J, Tian Y, Huang T, Gao W (2008) Multi-polarity text segmentation using graph theory. In: Proc. IEEE International Conference on Image Processing (ICIP), 3008–3011Google Scholar
  22. 22.
    Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circ Syst Video Technol 15(2):243–255CrossRefGoogle Scholar
  23. 23.
    Marquez D, Besccs J (2007) A model-based iterative method for caption extraction in compressed MPEG video. Lect Notes Comput Sci 4816:91–94CrossRefGoogle Scholar
  24. 24.
    Nguyen TN, Miyata K (2015) Multi-scale region perpendicular local binary pattern: an effective feature for interest region description. Vis Comput 31(4):391–406CrossRefGoogle Scholar
  25. 25.
    Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Syst Man Cybern 9(1):62–66MathSciNetCrossRefGoogle Scholar
  26. 26.
    Qian X, Liu G, Wang H, Su R (2007) Text detection, localization, and tracking in compressed video. Signal Process Image Commun 22(9):752–768CrossRefGoogle Scholar
  27. 27.
    Rahman MA, Kim HN, Saddik AE, Gueaieb W (2012) A context-aware multimedia framework toward personal social network services. Multimed Tool Appl 71(3):1717–1747Google Scholar
  28. 28.
    Roccetti M, Salomoni P, Ghini V, Ferretti S (2005) Bringing the wireless Internet to UMTS devices: a case study with music distribution. Multimed Tool Appl 25(2):217–251CrossRefGoogle Scholar
  29. 29.
    Roy PP, Pal U, Llados J, Delalandre M (2012) Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recogn 45(5):1972–1983CrossRefGoogle Scholar
  30. 30.
    Song J, Cai M, Lyu MR (2003) A robust statistic method for classifying color polarity of video text. In: Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 3:385–388Google Scholar
  31. 31.
    Strauss O, Comby F (2007) Variable structuring element-based fuzzy morphological operations for single viewpoint omnidirectional images. Pattern Recogn 40(12):3578–3596CrossRefMATHGoogle Scholar
  32. 32.
    Su R, Sun C, Zhang C, Pham TD (2014) A new method for linear feature and junction enhancement in 2D images based on morphological operation, oriented anisotropic Gaussian function and Hessian information. Pattern Recogn 47(10):3193–3208CrossRefGoogle Scholar
  33. 33.
    Thepade SD, Subhedarpage KS, Mali AA (2013) Performance rise in content based video retrieval using multi-level Thepade’s sorted ternary block truncation coding with intermediate block videos and even-odd videos, In: Proc. International Conference on Advances in Computing, Communications and Informatics (ICACCI), 962–966Google Scholar
  34. 34.
    Tian S, Lu S, Su B, Tan CL (2014) Scene text segmentation with multi-level maximally stable extremal regions. In: Proc. International Conference on Pattern Recognition (ICPR), 2703–2708Google Scholar
  35. 35.
    Vasudev T, Hemanthkumar G, Nagabhushan P (2007) Transformation of arc-form-text to linear-form-text suitable for OCR. Pattern Recogn Lett 28(16):2343–2351CrossRefGoogle Scholar
  36. 36.
    Wu JW, Tseng JCR, Tsai WN (2010) A discrete particle swarm optimization algorithm for domain independent linear text segmentation. In: Proc. IEEE International Conference on granular computing (GRC), 519–524Google Scholar
  37. 37.
    Zhang DQ, Rajendran RK, Chang SF (2002) General and domain-specific techniques for detecting and recognizing superimposed text in video. In: Proc. International Conference on Image Processing (ICIP), 1:I-593–I-596Google Scholar
  38. 38.
    Zhang H, Zhu Q, Guan XF (2012) Probe into image segmentation based on Sobel operator and maximum entropy algorithm. In: Proc. International Conference on Computer Science and Service System (CSSS), 238–241Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Sul-Ho Kim
    • 1
  • Kwon-Jae An
    • 1
  • Seok-Woo Jang
    • 2
  • Gye-Young Kim
    • 1
  1. 1.School of SoftwareSoongsil UniversityDongjak-GuSouth Korea
  2. 2.Department of Digital MediaAnyang UniversityManan-GuSouth Korea

Personalised recommendations