Advertisement

A blackboard approach towards integrated Farsi OCR system

  • Hossein Khosravi
  • Ehsanollah Kabir
Original Paper

Abstract

An integrated OCR system for Farsi text is proposed. The system uses information from several knowledge sources (KSs) and manages them in a blackboard approach. Some KSs like classifiers are acquired a priori through an offline training process while others like statistical features are extracted online while recognizing. An arbiter controls the interactions between the solution blackboard and KSs. The system has been tested on 20 real-life scanned documents with ten popular Farsi fonts and a recognition rate of 97.05% in word level and 99.03% in character level has been achieved.

Keywords

Farsi Persian OCR Blackboard approach Segmentation and recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abdelazim, H.Y., Hashish, M.A.: Arabic reading machine. In: Proceedings of the 10th National Computer Conference, Jeddah, pp. 733–744 (1988)Google Scholar
  2. 2.
    Al-Shoshan, A.I.: Arabic OCR based on image invariants. In: Proceedings of the International Conference on Geometric Modeling and Imaging—New Trends, pp. 150–154 (2006)Google Scholar
  3. 3.
    Amin A.: Off-line Arabic character recognition: the state of the art. Pattern Recognit. 31(5), 517–530 (1998)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Azmi R., Kabir E.: A new segmentation technique for omnifont Farsi text. Pattern Recognit. Lett. 22, 97–104 (2001)zbMATHCrossRefGoogle Scholar
  5. 5.
    Cheung A., Bennamoun M., Bergmann N.W.: An Arabic optical character recognition system using recognition-based segmentation. Pattern Recognit. 34, 215–233 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Ebrahimi A., Kabir E.: A pictorial dictionary for printed Farsi subwords. Pattern Recognit. Lett. 29(5), 656–663 (2008)CrossRefGoogle Scholar
  7. 7.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, Bari, Italy, pp. 148–156 (1996)Google Scholar
  8. 8.
    Gouda, A.M., Rashwan, M.A.: Segmentation of connected Arabic characters using hidden Markov models. IEEE International Conference on Computational Intelligence for Measurement Systems and Applications, USA pp. 115–119 (2004)Google Scholar
  9. 9.
    Houle, G., Shridhar, M.: Handwritten word recognition with OCR-based segmenter. In: Proceedigns of the Workshop on Document Image Analysis, pp. 51–58 (1997)Google Scholar
  10. 10.
    Khosravi H., Kabir E.: Introducing a very large dataset of handwritten Farsi digits and a study on their varieties. Pattern Recognit. Lett. 28(10), 1133–1141 (2007)CrossRefGoogle Scholar
  11. 11.
    Khosravi, H., Kabir, E.: Introducing two fast and efficient features for Farsi digit recognition (in Farsi). Machine Vision and Image Processing, Mashhad, pp. 1126–1131 (2007)Google Scholar
  12. 12.
    Khosravi H., Kabir, E.: Farsi font recognition based on Sobel- Roberts features. Pattern Recognit. Lett. (Under Review) (2008)Google Scholar
  13. 13.
    Kimura, F., Shridhar, M., Chen, Z.: Improvements of a Lexicon directed algorithm for recognition of unconstrained handwritten words. In: Proceedings of 2nd ICDAR Conference, pp. 18–22 (1993)Google Scholar
  14. 14.
    Kurdy, B., AlSabbagh, M.: Omnifont Arabic optical character recognition system. In: Proceedings of International Conference on Information and Communication Technologies: From Theory to Applications, pp. 469–470 (2004)Google Scholar
  15. 15.
    Levenshtein V.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Doklady 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  16. 16.
    Mansoory, S., Hassibi, H., Rajabi, F.: A heuristic Persian handwritten digit recognition with neural network. In: The 6th Iranian Conference on Electrical Engineering, pp. 131–135 (1998)Google Scholar
  17. 17.
    Mehran, R., Pirsiavash, H., Razzaziy, F.: A front-end OCR for omni-font Persian/Arabic cursive printed documents. Digital Imaging Computing: Techniques and Applications, pp. 385–392 (2005)Google Scholar
  18. 18.
    Menhaj, M.B., Adab, M.: Simultaneous segmentation and recognition of Farsi/Latin printed texts with MLP. In: International Joint Conference on Neural Networks, pp. 1534–1539 (2002)Google Scholar
  19. 19.
    Nabavi, S.H., Ebrahimpour, R., Kabir, E.: Recognition of handwritten Farsi digits using classifier combination. In: Third Conference on Machine Vision, Image Processing and Applications, Tehran, pp. 116–119 (2005)Google Scholar
  20. 20.
    Nashida H., Mori S.: An Algebraic approach to automatic construction of structured models. Pattern Anal. Mach. Intell. 15(12), 1298–1311 (1993)CrossRefGoogle Scholar
  21. 21.
    Parhami B., Taraghi M.: Automatic recognition of printed Farsi texts. Pattern Recognit. Lett. 14, 395–403 (1981)Google Scholar
  22. 22.
    Sarfraz, M., Nawaz, S.N., Al-Khuraidly, A.: Offline Arabic text recognition system. In: Proceedings of International Conference on Geometric Modeling and Graphics, pp. 30–35 (2003)Google Scholar
  23. 23.
    Soltanzadeh H., Rahmati M.: Recognition of Persian handwritten digits using image profiles of multiple orientations. Pattern Recognit. Lett. 25(14), 1569–1576 (2004)CrossRefGoogle Scholar
  24. 2.
    Yazdi, S.A.B., A’rabi, B.N.: Printed Farsi text recognition with simultaneous use of HMM. In: Dynamic Programming and SVM (in Farsi), Machine Vision and Image Processing, Mashhad (2007)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Department of Electrical EngineeringTarbiat Modarres UniversityTehranIran

Personalised recommendations