Advertisement

Noise Reduction in Urdu Document Image–Spatial and Frequency Domain Approaches

  • R. J. Ramteke
  • Imran Khan Pathan
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 222)

Abstract

With advancement in optical character recognition technology, now it is possible to digitize printed and handwritten documents and to make it editable and searchable for many scripts and languages. But still the major challenges which need to be simplify in case of Urdu script is segmentation dilemma. The segmentation of Urdu text is untouched by most of the researchers due to complexity in Urdu script. An ideal preprocessing for Urdu script may reduce these complexities and simplify the segmentation process. The noise removal in Urdu is complex due to importance of dots and modifiers which are similar to noise. In character recognition system preprocessing intends to remove/reduce the noise, normalize image against present variations like skewness, slant, size etc. and minimize the storage requirement to increase processing speed. In present paper an attempt is made to recapitulate various preprocessing techniques proposed in literature for Arabic, Persian, Jawi and Urdu. Also the enhancement of the dark and noisy Urdu document is done using histogram equalization, spatial max and median filter, and frequency domain Gaussian Lowpass Filters. These noise free document image can help to improve further segmentation and feature extraction process.

Keywords

Noise reduction Histogram equalization Spatial filter Max filter Median filter Frequency domain gaussian lowpass filters Normalization Slant and skew correction 

Notes

Acknowledgments

This work is sponsored by a G.H. Raisoni Doctoral fellowship, North Maharashtra University, Jalgoan. The author would like to acknowledge for their financial support.

References

  1. 1.
    Peters RA (1995) A new algorithm for image noise reduction using mathematical morphology. IEEE Trans Image Process 4:554–568CrossRefGoogle Scholar
  2. 2.
    Imran Razzak M, Afaq Hussain S, Muhammad S, Shafi Khan Z (2009) Combining offline and online preprocessing for online Urdu character recognition. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1, Hong Kong, 18–20 March 2009, IMECS 2009Google Scholar
  3. 3.
    Hussain M et al (2005) Urdu character recognition using spatial temporal neural network. In: Proceedings of 9th international multitopic conference, IEEE INMIC 2005, 24–25 Dec 2005, pp 1–5Google Scholar
  4. 4.
    Sheikh Faisal R, Syed Saqib B, Faisal S, Thomas MB (2009) A discriminative learning approach for orientation detection of Urdu document images. In: Proceedings of multitopic conference, IEEE 13th international, INMIC 2009, 14–15 Dec 2009, pp 1–5Google Scholar
  5. 5.
    Malik Waqas S, Nicola N, Lei He C, Ching YS (2010) A novel handwritten Urdu word spotting based on connected components analysis. In: Proceeding of 2010 international conference on pattern recognition, IEEE computer society 2010Google Scholar
  6. 6.
    Syed Saqib B, Faisal S, Thomas MB (2011) High performance layout analysis of Arabic and Urdu document images. In: Proceedings of 2011 international conference on document analysis and recognition, pp 1275–1279Google Scholar
  7. 7.
    Malik Waqas S, Nicola N, Chun LH, Ching YS (2010) Holistic Urdu handwritten word recognition using support vector machine. In: Proceedings of 12th international conference on pattern recognition, 23–26 August, ISBN: 978-0-7695-4109-9Google Scholar
  8. 8.
    Shuwair S, Abdul W (2010) Optical character recognition system for Urdu, Information and emerging technologies (ICIET). In: Proceedings of 2010 international conference 9 Nov 2010Google Scholar
  9. 9.
    Alama’ adeed S et al. (2002) Recognition of offline handwritten Arabic word using hidden Markov model approach, ICPR 02. In: Proceedings of the 16th international conference on pattern recognition (ICPR 02), vol 3, ISBN:0-7695-1695-X, p 30481Google Scholar
  10. 10.
    Pechwitz M, M¨argner V (2002) Baseline estimation for Arabic handwritten words. In: Proceedings of 8th international workshop on frontiers in handwriting recognition, IWFHR 2002, August 2002, Niagara-on-the-Lake, Canada 2002Google Scholar
  11. 11.
    Safwan W, Zhixin S,Venu G (2009) Segmentation of Arabic handwriting based on both contour and skeleton segmentation. In: 10th international conference on document analysis and recognition, pp 793–797Google Scholar
  12. 12.
    Pal U, Anirban S (2003) In: Proceedings of 7th international conference on document analysis and recognition, pp 1183–1187Google Scholar
  13. 13.
    Atique Ur Rehman M (2010) A new scale invariant optimized chain code for Nastaliq character representation. In: 2nd international conference on computer modeling and simulation, pp 400–403Google Scholar
  14. 14.
    Hassan Shirali-Shahreza M, Shirali-Shahreza S (2008) Removing noises similar to dots from persian scanned documents. In: ISECS international colloquium on computing, communication, control, and management, pp 313–317Google Scholar
  15. 15.
    Vaseghi B, Alirezaee S, Ahmadi M, Amirfattahi R (2008) Off-line Farsi/Arabic handwritten word recognition using vector quantization and hidden Markov model. In: Proceedings of multitopic conference, INMIC 2008. IEEE International, 23–24 Dec 2008, pp 575–578Google Scholar
  16. 16.
    Al-Badr B, Robert MH (1995) Segmentation-free word recognition with application to Arabic. In: Proceedings of the 3rd international conference on document analysis and recognition, ICDAR ‘95, pp 355–359Google Scholar
  17. 17.
    Deya M, Adnan A, Robert S (1997) Segmentation of Arabic cursive script. In: Proceedings of the 4th international conference on document analysis and recognition pages, ICDAR ‘97, pp 625–628Google Scholar
  18. 18.
    Muhammad S, Syed Nazim N, Abdulaziz A-K (2003) Offline Arabic text recognition system. In: Proceedings of the 2003 international conference on geometric modeling and graphics GMAG-03Google Scholar
  19. 19.
    Cheung A, Bennamoun M, Bergmann NW (2001) A recognition-based Arabic optical character segmentation. Pattern recognition, vol 34, pp.215–233Google Scholar
  20. 20.
    Kavianifar M, Adnan A (1999) Preprocessing and structural feature extraction for a multi-fonts Arabic/Persian OCR. IJDR 1999, pp 213–216Google Scholar
  21. 21.
    Mahmoud AL-Shatnawi A, AL-Salaimeh S, AL-Zawaideh FH, Khairuddin O (2011) Offline Arabic text recognition–––an overview. World Comput Sci Inform Technol J (WCSIT) ISSN: 2221-0741 1(5): 184–192Google Scholar
  22. 22.
    Ahmed ME, Mohamed AI (2001) A graph-based segmentation and feature extraction framework for Arabic text recognition. In: Proceedings of ICDAR 2001, pp 622–626Google Scholar
  23. 23.
    Ziad O, Lebanon B (2009) Automatic processing of Arabic text. In: Proceedings of the 6th international conference on Innovations in information technology, IIT’09, p 6–10Google Scholar
  24. 24.
    Zhixin S, Srirangaraj S, Venu G (2011) Image enhancement for degraded binary document images. In: Proceedings of international conference on document analysis and recognition, pp 895–899Google Scholar
  25. 25.
    Abuhaiba ISI, Mahmoud SA, Green RJ (1994) Recognition of handwritten cursive Arabic characters. IEEE Transactions Pattern Anal Mach Intell 16(6): 644–672Google Scholar
  26. 26.
    Al-Shoshan AI (2006) Arabic OCR based on image invariants. In: Proceedings of the geometric modeling and imaging trends (GMAI’06)—July 2006 NewGoogle Scholar
  27. 27.
    Mohammad FN, Khairuddin O, Mohamad SZ, Liong CY (2010) Handwritten cursive Jawi character recognition: a survey. In: Proceedings of 5th international conference on computer graphics, imaging and visualization, pp 247–256Google Scholar
  28. 28.
    Khairuddin O (2000) Jawi handwritten text recognition using multi-level classifier (in Malay), PhD Thesis, Universiti Putra MalaysiaGoogle Scholar
  29. 29.
    Sharaf El-Deen S, Horaini M, Zainodin J, Khairuddin O (1993) Skeletonization, Laporan Teknik Jabatan Sains Komputer. Fakulti Sains Matematik dan Komputer. Universiti Kebangsaan Malaysia, BangiGoogle Scholar
  30. 30.
    Naccashe NJ, Shinghal R (1984) SPTA: a proposed algorithm for thinning binary patterns. IEEE Trans Syst Man Cybernatics, SMC-14(3), May/June, pp 409–418Google Scholar
  31. 31.
    Mazani M (2002) In: Jawi handwritten text recognition using recurrent Bama neural networks (in Malay), PhD thesis, Universiti Kebangsaan MalaysiaGoogle Scholar
  32. 32.
    Parker JR (1994) Practical computer vision. Wiley, SingaporeGoogle Scholar
  33. 33.
    Philips D (1994) Image processing: analyzing and enhancing digital images. R&D Publications Inc, KansasGoogle Scholar
  34. 34.
    Zhang TY, Suen CY (1984) A fast algorithm for thinning digital pattern. Comm ACM 7(3):236–239MathSciNetCrossRefGoogle Scholar
  35. 35.
    Chen M-Y, Kundu A, Zhou J (1994) Off-line handwritten word recognition using a hidden Markov model type Stochastic network. IEEE Trans Patter Analy Mach Intell 16(5):481–496CrossRefGoogle Scholar
  36. 36.
    Mohd Sanusi A (2003) Reengineering of slant and slope orientation skew histogram for Merong Mahawangsa Manuscript (in Malay), MIT Thesis, Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, BangiGoogle Scholar
  37. 37.
    Nafiz A, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans On Patter Analysis and Machine Intelligence 31(2): 216–233Google Scholar
  38. 38.
    Serra J (1994) Morphological filtering: an overview. Signal Process 38(1):3–11MATHCrossRefGoogle Scholar
  39. 39.
    Sonka M, Hlavac V, Boyle R (1999) Image processing, analysis and machine vision, 2nd edn. Brooks/Cole, CAGoogle Scholar
  40. 40.
    Gonzalez RC, Woods RE (2004) Digital image processing, 2nd edn Pearson EducationGoogle Scholar

Copyright information

© Springer India 2013

Authors and Affiliations

  1. 1.School of Computer SciencesNorth Maharashtra UniversityJalgaonIndia

Personalised recommendations