Abstract
In this work, a system for recognition of newspaper printed in Gurumukhi script is presented. Four feature extraction techniques, namely, zoning features, diagonal features, parabola curve fitting based features, and power curve fitting based features are considered for extracting the statistical properties of the characters printed in the newspaper. Different combinations of these features are also applied to improve the recognition accuracy. For recognition, four classification techniques, namely, k-NN, linear-SVM, decision tree, and random forest are used. A database for the experiments is collected from three major Gurumukhi script newspapers which are Ajit, Jagbani and Punjabi Tribune. Using 5-fold cross validation and random forest classifier, a recognition accuracy of 96.19% with a combination of zoning features, diagonal features and parabola curve fitting based features has been reported. A recognition accuracy of 95.21% with a partitioning strategy of data set (70% data as training data and remaining 30% data as testing data) has been achieved.
摘要
本文提出了一种基于 Gurumukhi 字体的报纸识别系统。采用分区特征、对角线特征、抛物线拟 合特征和势曲线拟合特征提取技术对报刊印刷字符的统计特性进行提取。为了提高识别精度, 还对这 些特征进行了不同的组合。在识别方面, 采用了k-神经网络、线性支持向量机、决策树和随机森林四 种分类技术。实验数据库是从三家主要的 Gurumukhi 字体报纸 Ajit, Jagbani 和Punjabi Tribune 收集的。 采用五倍交叉验证和随机森林分类器, 以及分区特征、对角线特征和抛物线拟合特征相结合的方法, 识别准确率达96.19%。采用数据集分割策略(70%的数据作为训练数据, 其余的30%的数据作为测试 数据), 识别准确率达到95.21%。
Similar content being viewed by others
References
GARG R, BANSAL A, CHAUDHURY S, ROY S D. Text graphic separation in Indian newspapers [C]// Proceedings of the 4th International Workshop on Multilingual OCR. Washington DC, USA: ACM. 2013: 1–5.
AYYAZ M N, JAVED I, MAHMOOD W. Handwritten character recognition using multiclass SVM classification with hybrid feature extraction [J]. Pakistan Journal of Engineering and Applied Sciences, 2012, 10: 57–67.
KUMAR G, BHATIA P K. A detailed review of feature extraction in image processing systems [C]// Proceedings of the 2014 Fourth International Conference on Advanced Computing & Communication Technologies. Rohtak, India: IEEE. 2014: 5–12.
DAS N, PRAMANIK S, BASU S, SAHA PK, SARKAR R, KUNDU M, NASIPURI M. Recognition of handwritten Bangla basic characters and digits using convex hull based feature set [C]// Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition. Orlando, USA: Computer Vision and Pattern Recognition. 2014: 380–386.
V YŠNIAUSKAITE L, VYDŪNAS Š. A priori filtration of points for finding convex hull [J]. Technological and Economic Development of Economy, 2006, 12(4): 341–346.
MAITRA D S, BHATTACHARYA U, PARUI S K. CNN based common approach to handwritten character recognition of multiple scripts [C]// Proceedings of the 2015 13th International Conference on Document Analysis and Recognition. Tunis, Tunisia: IEEE. 2015: 1021–1025.
PRABHANJAN S, DINESH R. Handwritten Devanagari numeral recognition by fusion of classifiers [J]. Journal of Computer Engineering & Information Technology, 2015, 4(2): 1–6.
KUMAR M, JINDAL M K, SHARMA R K, JINDAL S R. Offline handwritten pre-segmented character recognition of gurmukhi script [J]. Machine Graphics & Vision, 2016, 25(1-4): 45–55.
KUMAR M, SHARMA R K, JINDAL M K. Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition [J]. National Academy Science Letters, 2014, 37(4): 381–391.
JINDAL M K, SHARMA R K, LEHAL G S. Structural features for recognizing degraded printed Gurmukhi script [C]// Proceedings of the 5th International Conference on Information Technology: New Generations. Las Vegas, USA: IEEE. 2008: 668–673.
DHIMAN S, LEHAL G S. Performance comparison of Gurmukhi script: k-NN classifier with DCT and Gabor filter [J]. International Journal of Advanced Research in Computer Science, 2017, 8(5): 762–764.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kaur, R.P., Jindal, M.K. & Kumar, M. Recognition of newspaper printed in Gurumukhi script. J. Cent. South Univ. 26, 2495–2503 (2019). https://doi.org/10.1007/s11771-019-4189-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-019-4189-1