Skip to main content
Log in

A blackboard approach towards integrated Farsi OCR system

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

An Erratum to this article was published on 29 April 2009

Abstract

An integrated OCR system for Farsi text is proposed. The system uses information from several knowledge sources (KSs) and manages them in a blackboard approach. Some KSs like classifiers are acquired a priori through an offline training process while others like statistical features are extracted online while recognizing. An arbiter controls the interactions between the solution blackboard and KSs. The system has been tested on 20 real-life scanned documents with ten popular Farsi fonts and a recognition rate of 97.05% in word level and 99.03% in character level has been achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abdelazim, H.Y., Hashish, M.A.: Arabic reading machine. In: Proceedings of the 10th National Computer Conference, Jeddah, pp. 733–744 (1988)

  2. Al-Shoshan, A.I.: Arabic OCR based on image invariants. In: Proceedings of the International Conference on Geometric Modeling and Imaging—New Trends, pp. 150–154 (2006)

  3. Amin A.: Off-line Arabic character recognition: the state of the art. Pattern Recognit. 31(5), 517–530 (1998)

    Article  MathSciNet  Google Scholar 

  4. Azmi R., Kabir E.: A new segmentation technique for omnifont Farsi text. Pattern Recognit. Lett. 22, 97–104 (2001)

    Article  MATH  Google Scholar 

  5. Cheung A., Bennamoun M., Bergmann N.W.: An Arabic optical character recognition system using recognition-based segmentation. Pattern Recognit. 34, 215–233 (2001)

    Article  MATH  Google Scholar 

  6. Ebrahimi A., Kabir E.: A pictorial dictionary for printed Farsi subwords. Pattern Recognit. Lett. 29(5), 656–663 (2008)

    Article  Google Scholar 

  7. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, Bari, Italy, pp. 148–156 (1996)

  8. Gouda, A.M., Rashwan, M.A.: Segmentation of connected Arabic characters using hidden Markov models. IEEE International Conference on Computational Intelligence for Measurement Systems and Applications, USA pp. 115–119 (2004)

  9. Houle, G., Shridhar, M.: Handwritten word recognition with OCR-based segmenter. In: Proceedigns of the Workshop on Document Image Analysis, pp. 51–58 (1997)

  10. Khosravi H., Kabir E.: Introducing a very large dataset of handwritten Farsi digits and a study on their varieties. Pattern Recognit. Lett. 28(10), 1133–1141 (2007)

    Article  Google Scholar 

  11. Khosravi, H., Kabir, E.: Introducing two fast and efficient features for Farsi digit recognition (in Farsi). Machine Vision and Image Processing, Mashhad, pp. 1126–1131 (2007)

  12. Khosravi H., Kabir, E.: Farsi font recognition based on Sobel- Roberts features. Pattern Recognit. Lett. (Under Review) (2008)

  13. Kimura, F., Shridhar, M., Chen, Z.: Improvements of a Lexicon directed algorithm for recognition of unconstrained handwritten words. In: Proceedings of 2nd ICDAR Conference, pp. 18–22 (1993)

  14. Kurdy, B., AlSabbagh, M.: Omnifont Arabic optical character recognition system. In: Proceedings of International Conference on Information and Communication Technologies: From Theory to Applications, pp. 469–470 (2004)

  15. Levenshtein V.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Doklady 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  16. Mansoory, S., Hassibi, H., Rajabi, F.: A heuristic Persian handwritten digit recognition with neural network. In: The 6th Iranian Conference on Electrical Engineering, pp. 131–135 (1998)

  17. Mehran, R., Pirsiavash, H., Razzaziy, F.: A front-end OCR for omni-font Persian/Arabic cursive printed documents. Digital Imaging Computing: Techniques and Applications, pp. 385–392 (2005)

  18. Menhaj, M.B., Adab, M.: Simultaneous segmentation and recognition of Farsi/Latin printed texts with MLP. In: International Joint Conference on Neural Networks, pp. 1534–1539 (2002)

  19. Nabavi, S.H., Ebrahimpour, R., Kabir, E.: Recognition of handwritten Farsi digits using classifier combination. In: Third Conference on Machine Vision, Image Processing and Applications, Tehran, pp. 116–119 (2005)

  20. Nashida H., Mori S.: An Algebraic approach to automatic construction of structured models. Pattern Anal. Mach. Intell. 15(12), 1298–1311 (1993)

    Article  Google Scholar 

  21. Parhami B., Taraghi M.: Automatic recognition of printed Farsi texts. Pattern Recognit. Lett. 14, 395–403 (1981)

    Google Scholar 

  22. Sarfraz, M., Nawaz, S.N., Al-Khuraidly, A.: Offline Arabic text recognition system. In: Proceedings of International Conference on Geometric Modeling and Graphics, pp. 30–35 (2003)

  23. Soltanzadeh H., Rahmati M.: Recognition of Persian handwritten digits using image profiles of multiple orientations. Pattern Recognit. Lett. 25(14), 1569–1576 (2004)

    Article  Google Scholar 

  24. Yazdi, S.A.B., A’rabi, B.N.: Printed Farsi text recognition with simultaneous use of HMM. In: Dynamic Programming and SVM (in Farsi), Machine Vision and Image Processing, Mashhad (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hossein Khosravi.

Additional information

An erratum to this article can be found at http://dx.doi.org/10.1007/s10032-009-0087-7

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khosravi, H., Kabir, E. A blackboard approach towards integrated Farsi OCR system. IJDAR 12, 21–32 (2009). https://doi.org/10.1007/s10032-009-0079-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-009-0079-7

Keywords

Navigation