Skip to main content
Log in

HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Optical Character Recognition (OCR) system is used to convert the document images, either printed or handwritten, into its electronic counterpart. But dealing with handwritten texts is much more challenging than printed ones due to the erratic writing style of the individuals. The problem becomes more severe when the input image is a doctor’s prescription. Before feeding such an image to the OCR engine, the classification of printed and handwritten texts is a necessity as a doctor’s prescription contains both handwritten and printed texts which are to be processed separately. Much work has been done in the domain of handwritten and printed text separation albeit work related to doctor’s handwriting. In this paper, a method is proposed which first localizes the position of texts in a doctor’s prescription, and then separates out the printed texts from the handwritten ones. Due to the unavailability of a large database, we have used some standard data (image) augmentation techniques to evaluate as well as to prove the robustness of our method. Besides, we have also designed a Graphical User Interface (GUI) so that anybody can visualize the output by providing a prescription image as input.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Alard C (2000) Image subtraction using a space-varying kernel. Astron Astrophys Suppl Ser 144(2)

  2. Banerjee S (2013) Identification of handwritten text in machine printed document images. In: Advances in computing and information technology. Springer, pp 823–831

  3. Becker BG (1998) Visualizing decision table classifiers. In: Proceedings of the 1998 IEEE symposium on information visualization, INFOVIS ’98. IEEE Computer Society, USA, pp 102–105

  4. Berwick DM (1996) The truth about doctors’ handwriting: a prospective study. BMJ 313:1657

    Article  Google Scholar 

  5. Bhattacharya R, Malakar S, Ghosh S, Bhowmik S, Sarkar R (2020) Understanding contents of filled-in bangla form images. Multimed Tools Appl 1–42

  6. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  7. Chanda S, Franke K, Pal U (2010) Structural handwritten and machine print classification for sparse content and arbitrary oriented document fragments. In: Proceedings of the 2010 ACM symposium on applied computing, SAC ’10. https://doi.org/10.1145/1774088.1774093. Association for Computing Machinery, New York, NY, USA, pp 18–22

  8. Core, RS. Random tree (rapidminer studio core). https://docs.rapidminer.com/latest/studio/operators/modeling/predictive/trees/random_tree.html

  9. Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vis Image Process (IJCVIP) 10(3):31–43

    Article  Google Scholar 

  10. Dutly N, Slimane F, Ingold R (2019) Phti-ws: A printed and handwritten text identification web service based on fcn and crf post-processing. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol 2. IEEE, pp 20–25

  11. Garain A, Dhar D, Singh PK, Sarkar R (2020) Dataset for classification of handwritten and printed text in a doctor’s prescription. https://doi.org/10.21227/5zbc-8g23

  12. Garlapati BM, Chalamala SR (2017) A system for handwritten and printed text classification. In: 2017 UKSim-AMSS 19th international conference on computer modelling & simulation (UKSim). IEEE, pp 50–54

  13. Garlapati BM, Chalamala SR (2017) A system for handwritten and printed text classification. In: 2017 UKSim-AMSS 19th international conference on computer modelling simulation (UKSim), pp 50–54

  14. Ghosh S, Bhattacharya R, Majhi S, Bhowmik S, Malakar S, Sarkar R (2018) Textual content retrieval from filled-in form images. In: Workshop on document analysis and recognition. Springer, pp 27–37

  15. How to create salt and pepper noise in an image. https://www.projectrhea.org/rhea/index.php/How_to_Create_Salt_and_Pepper_Noise_in_an_Image

  16. Hamerly GEC (2003) Learning the k in k-means. Adv Neural Inf Process Syst

  17. Hangarge M, Santosh K, Doddamani S, Pardeshi R (2013) Statistical texture features based handwritten and printed text classification in south indian documents. arXiv:1303.3087

  18. Haralick RM, Katz PL (1995) Model-based morphology: The opening spectrum. Graph Models Image Process 57(1):1–12

    Article  Google Scholar 

  19. Herf M (2005) Method for blurring images in real-time. US Patent 6,925,210

  20. Jawas N, Suciati N (2013) Image inpainting using erosion and dilation operation

  21. Jindal A, Amir M (2014) Automatic classification of handwritten and printed text in icr boxes. In: 2014 IEEE International Advance Computing Conference (IACC). IEEE, pp 1028–1032

  22. Lim KHJ, Yap KB (1999) The prescribing pattern of outpatient polyclinic doctors. Singapore Med J 40(6):416

    Google Scholar 

  23. Lookup table (2020) https://en.wikipedia.org/wiki/Lookup_table

  24. Malakar S, Das RK, Sarkar R, Basu S, Nasipuri M (2013) Handwritten and printed word identification using gray-scale feature vector and decision tree classifier. Procedia Technol 10:831–839

    Article  Google Scholar 

  25. Mandelbrot BB (1971) A fast fractional gaussian noise generator. Water Resour Res 7(3):543–553

    Article  Google Scholar 

  26. Meta machine learning. https://yandex.github.io/rep/metaml.html

  27. Moghaddam RF, Cheriet M (2012) Adotsu: An adaptive and parameterless generalization of otsu’s method for document image binarization. Pattern Recognit 45(6):2419–2431. https://doi.org/10.1016/j.patcog.2011.12.013. Brain Decoding

    Article  Google Scholar 

  28. Peng X, Setlur S, Govindaraju V, Sitaram R (2010) Overlapped text segmentation using markov random field and aggregation. In: Proceedings of the 9th IAPR international workshop on document analysis systems, pp 129–134

  29. Peng X, Setlur S, Govindaraju V, Sitaram R (2013) Handwritten text separation from annotated machine printed documents using markov random fields. Int J Doc Anal Recognit (IJDAR) 16(1):1–16

    Article  Google Scholar 

  30. Sharma AK, Sahni S (2011) A comparative study of classification algorithms for spam email data analysis. Int J Comput Sci Eng 3(5):1890–1895

    Google Scholar 

  31. Singh PK, Sarkar R, Nasipuri M (2015) Statistical validation of multiple classifiers over multiple datasets in the field of pattern recognition. Int J Appl Pattern Recognit 2(1):1–23

    Article  Google Scholar 

  32. Singh PK, Sarkar R, Nasipuri M (2016) Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int J Comput Sci Math 7(5):410–442

    Article  MathSciNet  Google Scholar 

  33. Sudhakar S (2017) Histogram equalization. https://towardsdatascience.com/histogram-equalization-5d1013626e64

  34. Van Rossum G, Drake FL (2009) Python 3 Reference Manual. CreateSpace, Scotts Valley, CA

  35. Zheng Y, Li H, Doermann D (2004) Machine printed text and handwriting identification in noisy document images. IEEE Trans Pattern Analy Mach Intell 26(3):337–353

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pawan Kumar Singh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Annexure

Annexure

1.1 A.1 Code

figure e
figure f

1.2 A.2 Classifier Parameters

The parameters used for the Random Forest Classifier are following.

  • bagSizePercent = 100

  • batchsize = 100

  • breakTiesRandomly = False

  • calcOutOfBag = False

  • computeAttributeImportance = False

  • debug = False

  • doNotCheckCapabilities = False

  • maxDepth = 0

  • numDecimalPlaces = 2

  • numExecutionSlots = 1

  • numFeatures = 0

  • numIterations = 100

  • minVarianceProp = 0.001

  • outputOutOfBagComplexityStatistics = False

  • seed = 1

  • storeOutOfBagPredictions = False

The parameters used for the REPT Classifier are following.

  • Batchsize = 100

  • debug = False

  • doNotCheckCapabilities = False

  • initialCount = 0.0

  • maxDepth = -1

  • minNum = 2.0

  • minVarianceProp = 0.001

  • noPruning = False

  • numDecimalPlaces = 2

  • numFolds = 3

  • seed = 1

  • spreadInitialCount = False

The parameters used for the Random Tree Classifier are following.

  • kValue = 0

  • allowUnclassifiedInstances = False

  • batchsize = 100

  • debug = False

  • doNotCheckCapabilities = False

  • maxDepth = 0

  • minNum = 1.0

  • minVarianceProp = 0.001

  • numDecimalPlaces = 2

  • numFolds = 0

  • seed = 1

The parameters used for the Decision Table Classifier are following.

  • batchsize = 100

  • debug = False

  • doNotCheckCapabilities = False

  • crossVal = 1

  • displayRules = False

  • numDecimalPlaces = 2

  • search = BestFirst (LookUpCache Size = 1, Depth = 5)

The parameters used for the J48 Classifier are following.

  • batchsize = 100

  • binarySplits = False

  • collapseTree = True

  • confidenceFactor = 0.25

  • debug = False

  • doNotCheckCapabilities = False

  • doNotMakeSplitPointActionValue = False

  • minNumObj = 2

  • numFolds = 3

  • reducedErrorPruning = False

  • saveInstanceData = False

  • seed = 1

  • subtreeRaising = True

  • unpruned = False

  • useLaplace = False

  • useMDLcorrection = True

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhar, D., Garain, A., Singh, P.K. et al. HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription. Multimed Tools Appl 80, 9779–9812 (2021). https://doi.org/10.1007/s11042-020-10151-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10151-w

Keywords

Navigation