Skip to main content
Log in

Recognition of newspaper printed in Gurumukhi script

Gurumukhi 印刷报纸的文字识别

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

In this work, a system for recognition of newspaper printed in Gurumukhi script is presented. Four feature extraction techniques, namely, zoning features, diagonal features, parabola curve fitting based features, and power curve fitting based features are considered for extracting the statistical properties of the characters printed in the newspaper. Different combinations of these features are also applied to improve the recognition accuracy. For recognition, four classification techniques, namely, k-NN, linear-SVM, decision tree, and random forest are used. A database for the experiments is collected from three major Gurumukhi script newspapers which are Ajit, Jagbani and Punjabi Tribune. Using 5-fold cross validation and random forest classifier, a recognition accuracy of 96.19% with a combination of zoning features, diagonal features and parabola curve fitting based features has been reported. A recognition accuracy of 95.21% with a partitioning strategy of data set (70% data as training data and remaining 30% data as testing data) has been achieved.

摘要

本文提出了一种基于 Gurumukhi 字体的报纸识别系统。采用分区特征、对角线特征、抛物线拟 合特征和势曲线拟合特征提取技术对报刊印刷字符的统计特性进行提取。为了提高识别精度, 还对这 些特征进行了不同的组合。在识别方面, 采用了k-神经网络、线性支持向量机、决策树和随机森林四 种分类技术。实验数据库是从三家主要的 Gurumukhi 字体报纸 Ajit, Jagbani 和Punjabi Tribune 收集的。 采用五倍交叉验证和随机森林分类器, 以及分区特征、对角线特征和抛物线拟合特征相结合的方法, 识别准确率达96.19%。采用数据集分割策略(70%的数据作为训练数据, 其余的30%的数据作为测试 数据), 识别准确率达到95.21%。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. GARG R, BANSAL A, CHAUDHURY S, ROY S D. Text graphic separation in Indian newspapers [C]// Proceedings of the 4th International Workshop on Multilingual OCR. Washington DC, USA: ACM. 2013: 1–5.

    Google Scholar 

  2. AYYAZ M N, JAVED I, MAHMOOD W. Handwritten character recognition using multiclass SVM classification with hybrid feature extraction [J]. Pakistan Journal of Engineering and Applied Sciences, 2012, 10: 57–67.

    Google Scholar 

  3. KUMAR G, BHATIA P K. A detailed review of feature extraction in image processing systems [C]// Proceedings of the 2014 Fourth International Conference on Advanced Computing & Communication Technologies. Rohtak, India: IEEE. 2014: 5–12.

    Google Scholar 

  4. DAS N, PRAMANIK S, BASU S, SAHA PK, SARKAR R, KUNDU M, NASIPURI M. Recognition of handwritten Bangla basic characters and digits using convex hull based feature set [C]// Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition. Orlando, USA: Computer Vision and Pattern Recognition. 2014: 380–386.

    Google Scholar 

  5. V YŠNIAUSKAITE L, VYDŪNAS Š. A priori filtration of points for finding convex hull [J]. Technological and Economic Development of Economy, 2006, 12(4): 341–346.

    Article  Google Scholar 

  6. MAITRA D S, BHATTACHARYA U, PARUI S K. CNN based common approach to handwritten character recognition of multiple scripts [C]// Proceedings of the 2015 13th International Conference on Document Analysis and Recognition. Tunis, Tunisia: IEEE. 2015: 1021–1025.

    Google Scholar 

  7. PRABHANJAN S, DINESH R. Handwritten Devanagari numeral recognition by fusion of classifiers [J]. Journal of Computer Engineering & Information Technology, 2015, 4(2): 1–6.

    Google Scholar 

  8. KUMAR M, JINDAL M K, SHARMA R K, JINDAL S R. Offline handwritten pre-segmented character recognition of gurmukhi script [J]. Machine Graphics & Vision, 2016, 25(1-4): 45–55.

    Google Scholar 

  9. KUMAR M, SHARMA R K, JINDAL M K. Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition [J]. National Academy Science Letters, 2014, 37(4): 381–391.

    Article  Google Scholar 

  10. JINDAL M K, SHARMA R K, LEHAL G S. Structural features for recognizing degraded printed Gurmukhi script [C]// Proceedings of the 5th International Conference on Information Technology: New Generations. Las Vegas, USA: IEEE. 2008: 668–673.

    Google Scholar 

  11. DHIMAN S, LEHAL G S. Performance comparison of Gurmukhi script: k-NN classifier with DCT and Gabor filter [J]. International Journal of Advanced Research in Computer Science, 2017, 8(5): 762–764.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Kumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, R.P., Jindal, M.K. & Kumar, M. Recognition of newspaper printed in Gurumukhi script. J. Cent. South Univ. 26, 2495–2503 (2019). https://doi.org/10.1007/s11771-019-4189-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-019-4189-1

Key words

关键词

Navigation