Advertisement

Text Region Extraction for Noisy Spam Image

  • Estqlal Hammad Dhahi
  • Suhad A. AliEmail author
  • Mohammed Abdullah Naser
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1040)

Abstract

In this paper, the problem of spam filtering for images, a type of fast-spreading spam where the text is included in images to overcome the text-based spam filter. One common method for detecting spam is the optical character recognition system (OCR) that detecting and recognizing the text embedded, following by a classifier which distinguishes spam from ham. Nevertheless, the spammers begin hiding image text for preventing OCR from detecting spam. To recompense for the shortages of the OCR system, a method based on the detection algorithm is proposed for the text region. To estimate the performance of the projected system, the methodology was applied to a group of unwanted images Dredze (available to the public) to check the efficiency of our method which outperforms the initial OCR system in sensible use with a complex background in spam. The test results indicated that the new method gives good text regions detection even for noisy images.

Keywords

OCR Text localization Spam image Text-based spam filtering Text features 

References

  1. 1.
    Gupta, Y., Sharma, S.H., Bedwal, T.: Text extraction techniques. Int. J. Comput. Appl. NSFTICE, 10–12 (2015)Google Scholar
  2. 2.
    Natei, K.N., Viradiya, J., Sasikumar, S.: Extracting text from image document and displaying its related information. J. Eng. Res. Appl. 8(5), 27–33 (Part-V) (2018). ISSN: 2248-9622Google Scholar
  3. 3.
    Mathur, G., Rikhari, S.: Text detection in document images: highlight on using FAST algorithm. Int. J. Adv. Eng. Res. Sci. (IJAERS) 4(3) (2017). ISSN: 2349-6495(P)|2456-1908(O)CrossRefGoogle Scholar
  4. 4.
    Kulkarni, C.R., Barbadekar, A.B.: Text detection and recognition: a review. Int. Res. J. Eng. Technol. (IRJET) (2017). e-ISSN: 2395-0056, p-ISSN: 2395-0072Google Scholar
  5. 5.
    Dai, J., Wang, Z., Zhao, X., Shao, S.: Scene text detection based on enhanced multi-channels MSER and a fast text grouping process. Int. J. Comput. Linguist. Res. 9(2) (2018)Google Scholar
  6. 6.
    Lee, H.: Wavelet analysis for image processing. Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan. On http://disp.ee.ntu.edu.tw/henry/wavelet_analysis.pdf
  7. 7.
    Javed, M., Nagabhushan, P., Chaudhuri, B.B.: Extraction of projection profile, run-histogram and entropy features straight from run-length compressed text documents. In: IAPR Asian conference on Pattern Recognition, IEEE proceedings, pp. 813–817 (2013)Google Scholar
  8. 8.
    Burger, W., Burge, M.J.: Principles of digital image processing. Cor Algorithms. Springer Publishing Company (2009)Google Scholar
  9. 9.
    Gonz´alez, A., Bergasa, L.M., Yebes, J.J., Bron, S.: Text location in complex images. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 617–620, Tsukuba, 11–15 Nov 2012Google Scholar
  10. 10.

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Estqlal Hammad Dhahi
    • 1
  • Suhad A. Ali
    • 1
    Email author
  • Mohammed Abdullah Naser
    • 1
  1. 1.Department of Computer ScienceCollege of Science for Women, University of BabylonBabylonIraq

Personalised recommendations