Abstract
Identification of graphics from newspaper pages and then their separation from text is a challenging task. Very few works have been reported in this field. In general, newspapers are printed in low quality papers which have a tendency to change color with time. This color change generates noise that adds with time to the document. In this work we have chosen several features to distinguish graphics from text as well as tried to reduce the noise. At first minimum bounding box around each object has been identified by connected component analysis of binary image. Each object was cropped thereafter and passed through geometric feature extraction system. Then we have done two different frequency analysis of each object. Thus we have collected both spatial and frequency domain features from objects which are used for training and testing purpose using different classifiers. We have applied the techniques on Indian newspapers written in roman script and got satisfactory results over that.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Garg, R., Bansal, A., Chaudhury, S., Roy, S.D.: Text graphic separation in Indian newspapers. In: Proceedings of 4th International Work Multiling. OCR-MOCR’13, August 24, p. 1 (2013)
Roy, P.P., Vazquez, E., Lladós, J., Baldrich, R., Umapada, P.: A System to Segment Text and Symbols from Color Maps. In: 7th International Workshop, GREC 2007, 5046, pp. 245–256 (2008). https://doi.org/10.1007/978-3-540-88188-9
Mollah, A.F., Basu, S., Nasipuri, M., Basu, D.K.: Text/Graphics Separation for Business Card Devices, pp. 263–270 (2009)
Rege, P.P., Chandrakar, C.A.: Text-Image Separation in Document Images Using Boundary/Perimeter Detection (2011)
Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.: Text Extraction in Complex Color Documents, vol. 35, pp. 1743–1758 (2002)
Garg, R., Hassan, E., Chaudhury, S., Gopal, M.: A CRF Based Scheme for Overlapping Multi-Colored Text Graphics Separation,” In: 2011 International Conference on Document Analysis and Recognition, no. c (2011)
Cao, R., Tan, C.L.: Separation of Overlapping Text from Graphics, pp. 44–48 (2001)
Science, C., Kent, L., Rd, R., Abe, N.: A Clustering-Based Approach to the Separation of Text Strings from Mixed Text Graphics Documents, pp. 706–710 (1996)
Vieux, R., Domenger, J., Talence, F.: Hierarchical Clustering Model for Pixel-Based Classification of Document Images, no. Icpr, pp. 290–293 (2012)
Chinnasarn, K.: Removing Salt-and-Pepper Noise in Text/Graphics Images, IEEE, pp. 459–462
Haralick, R.M., Sternberg, S.R., Zhuang, X.: Image Analysis Using Mathemetical Morphology, IEEE Trans. Pattern Anal. Mach. Intel. (4), pp. 532–550 (1987)
Kowalczyk, M., Koza, P., Kupidura, P., Marciniak, J.: Application of Mathematical Morphology Operations for Simplification and Improvement of Correlation of Images in Close-Range Photogrammetry, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVII, part B5. Beijing (2008)
Verma, R., Ali, J.: A Comparative Study of Various Types of Image Noise and Efficient Noise Removal Techniques, Int. J. Adv. Res. Comput. Sci. Soft. Eng. 3(10), 617–622 (2013)
Kumar, M., Saxena, R.: Algorithm and Technique on Various Edge Detection: A Survey, vol. 4, no. 3, pp. 65–75 (2013)
To, E.: The, A DWT, DCT and SVD Based Watermarking, vol. 4, no. 2, pp. 21–32 (2013)
Jiansheng, M., Sukang, L., Xiaomei, T.: A Digital Watermarking Algorithm Based on DCT and DWT, In: Proceedings of the 2009 International Symposium on Web Information Systems and Applications (WISA’09) Nanchang, P. R. China, May 22–24, vol. 8, no. 2, pp. 104–107 (2009)
Acknowledgements
The authors are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER) and Project on Storage Retrieval and Understanding of Video for Multimedia (SRUVM) of Computer Science and Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work. The current work reported here, has been partially funded by University with Potential for Excellence (UPE), Phase-II, UGC, Government of India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jana, S., Das, N., Sarkar, R., Nasipuri, M. (2018). Recognition System to Separate Text Graphics from Indian Newspaper. In: Kar, S., Maulik, U., Li, X. (eds) Operations Research and Optimization. FOTA 2016. Springer Proceedings in Mathematics & Statistics, vol 225. Springer, Singapore. https://doi.org/10.1007/978-981-10-7814-9_14
Download citation
DOI: https://doi.org/10.1007/978-981-10-7814-9_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7813-2
Online ISBN: 978-981-10-7814-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)