Skip to main content

Recognition System to Separate Text Graphics from Indian Newspaper

  • Conference paper
  • First Online:
Operations Research and Optimization (FOTA 2016)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 225))

Abstract

Identification of graphics from newspaper pages and then their separation from text is a challenging task. Very few works have been reported in this field. In general, newspapers are printed in low quality papers which have a tendency to change color with time. This color change generates noise that adds with time to the document. In this work we have chosen several features to distinguish graphics from text as well as tried to reduce the noise. At first minimum bounding box around each object has been identified by connected component analysis of binary image. Each object was cropped thereafter and passed through geometric feature extraction system. Then we have done two different frequency analysis of each object. Thus we have collected both spatial and frequency domain features from objects which are used for training and testing purpose using different classifiers. We have applied the techniques on Indian newspapers written in roman script and got satisfactory results over that.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Garg, R., Bansal, A., Chaudhury, S., Roy, S.D.: Text graphic separation in Indian newspapers. In: Proceedings of 4th International Work Multiling. OCR-MOCR’13, August 24, p. 1 (2013)

    Google Scholar 

  2. Roy, P.P., Vazquez, E., Lladós, J., Baldrich, R., Umapada, P.: A System to Segment Text and Symbols from Color Maps. In: 7th International Workshop, GREC 2007, 5046, pp. 245–256 (2008). https://doi.org/10.1007/978-3-540-88188-9

  3. Mollah, A.F., Basu, S., Nasipuri, M., Basu, D.K.: Text/Graphics Separation for Business Card Devices, pp. 263–270 (2009)

    Google Scholar 

  4. Rege, P.P., Chandrakar, C.A.: Text-Image Separation in Document Images Using Boundary/Perimeter Detection (2011)

    Google Scholar 

  5. Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.: Text Extraction in Complex Color Documents, vol. 35, pp. 1743–1758 (2002)

    Google Scholar 

  6. Garg, R., Hassan, E., Chaudhury, S., Gopal, M.: A CRF Based Scheme for Overlapping Multi-Colored Text Graphics Separation,” In: 2011 International Conference on Document Analysis and Recognition, no. c (2011)

    Google Scholar 

  7. Cao, R., Tan, C.L.: Separation of Overlapping Text from Graphics, pp. 44–48 (2001)

    Google Scholar 

  8. Science, C., Kent, L., Rd, R., Abe, N.: A Clustering-Based Approach to the Separation of Text Strings from Mixed Text Graphics Documents, pp. 706–710 (1996)

    Google Scholar 

  9. Vieux, R., Domenger, J., Talence, F.: Hierarchical Clustering Model for Pixel-Based Classification of Document Images, no. Icpr, pp. 290–293 (2012)

    Google Scholar 

  10. Chinnasarn, K.: Removing Salt-and-Pepper Noise in Text/Graphics Images, IEEE, pp. 459–462

    Google Scholar 

  11. Haralick, R.M., Sternberg, S.R., Zhuang, X.: Image Analysis Using Mathemetical Morphology, IEEE Trans. Pattern Anal. Mach. Intel. (4), pp. 532–550 (1987)

    Google Scholar 

  12. Kowalczyk, M., Koza, P., Kupidura, P., Marciniak, J.: Application of Mathematical Morphology Operations for Simplification and Improvement of Correlation of Images in Close-Range Photogrammetry, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVII, part B5. Beijing (2008)

    Google Scholar 

  13. Verma, R., Ali, J.: A Comparative Study of Various Types of Image Noise and Efficient Noise Removal Techniques, Int. J. Adv. Res. Comput. Sci. Soft. Eng. 3(10), 617–622 (2013)

    Google Scholar 

  14. Kumar, M., Saxena, R.: Algorithm and Technique on Various Edge Detection: A Survey, vol. 4, no. 3, pp. 65–75 (2013)

    Google Scholar 

  15. To, E.: The, A DWT, DCT and SVD Based Watermarking, vol. 4, no. 2, pp. 21–32 (2013)

    Google Scholar 

  16. Jiansheng, M., Sukang, L., Xiaomei, T.: A Digital Watermarking Algorithm Based on DCT and DWT, In: Proceedings of the 2009 International Symposium on Web Information Systems and Applications (WISA’09) Nanchang, P. R. China, May 22–24, vol. 8, no. 2, pp. 104–107 (2009)

    Google Scholar 

Download references

Acknowledgements

The authors are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER) and Project on Storage Retrieval and Understanding of Video for Multimedia (SRUVM) of Computer Science and Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work. The current work reported here, has been partially funded by University with Potential for Excellence (UPE), Phase-II, UGC, Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shantanu Jana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jana, S., Das, N., Sarkar, R., Nasipuri, M. (2018). Recognition System to Separate Text Graphics from Indian Newspaper. In: Kar, S., Maulik, U., Li, X. (eds) Operations Research and Optimization. FOTA 2016. Springer Proceedings in Mathematics & Statistics, vol 225. Springer, Singapore. https://doi.org/10.1007/978-981-10-7814-9_14

Download citation

Publish with us

Policies and ethics