Skip to main content
Log in

Isolated structural error analysis of printed mathematical expressions

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Recognition of mathematical expressions (MEs) has two stages: symbol recognition and structural analysis. Symbols are recognized in the first stage, while structure (spatial relationships like superscript, subscript) is interpreted in the second stage. Errors in any stage may affect the overall recognition performance. Due to complex two-dimensional nature of MEs, structural analysis is a challenging task eventhough all symbols are properly recognized. In our present work, we have focused on structural analysis of printed mathematical expressions in an isolated manner. We have analyzed various structural errors and presented the behavior of isolated structural analysis module in the context of perfect symbol recognition. For our error analysis, we have created a database of 829 expression images. For each image in the database, we have also generated ground truth symbol labels to simulate perfect symbol recognition. As ground truth symbol labels are readily available, our database can also be used as a benchmark to compare various structural analysis approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Álvaro F, Sánchez JA, Benedí JM (2013) Classification of on-line mathematical symbols with hybrid features and recurrent neural networks. In: International conference on document analysis and recognition (ICDAR), pp 1012–1016

  2. Álvaro F, Sánchez JA, Benedí JM (2014) Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden markov models. Pattern Recogn Lett 35:58–67

    Article  Google Scholar 

  3. Aly W, Uchida S, Suzuki M (2008) A large-scale analysis of mathematical expressions for an accurate understanding of their structure. In: The eighth IAPR international workshop on document analysis systems, 2008. DAS ’08., pp 549–556

  4. Awal AM, Mouchère H, Viard-Gaudin C (2014) A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recogn Lett 35:68–77

    Article  Google Scholar 

  5. Baker JB, Sexton AP, Sorge V (2009) A linear grammar approach to mathematical formula recognition from PDF. In: Proceedings of the 16th symposium, 8th international conference. Held as part of CICM ’09 on intelligent computer mathematics, Springer-Verlag, Berlin, Heidelberg, Calculemus ’09/MKM ’09, pp 201–216

    Google Scholar 

  6. Celik M, Yanikoglu B (2011) Probabilistic mathematical formula recognition using a 2D context-free graph grammar. In: International conference on document analysis and recognition (ICDAR) 2011:161–166

  7. Chan KF, Yeung DY (2000) An efficient syntactic approach to structural analysis of on-line handwritten mathematical expressions. Pattern Recogn 33(3):375–384

    Article  Google Scholar 

  8. Chan KF, Yeung DY (2001) Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recogn 34(8):1671–1684

    Article  Google Scholar 

  9. Chang SK (1970) A method for the structural analysis of two-dimensional mathematical expressions. Inf Sci 2(3):253–272

    Article  Google Scholar 

  10. Chaudhuri BB, Garain U (2000) An approach for recognition and interpretation of mathematical expressions in printed document. Pattern Anal Appl 3(2):120–131

    Article  Google Scholar 

  11. Chen Y, Okada M (2001) Structural analysis and semantic understanding for offline mathematical expressions. Int J Pattern Recognit Artif Intell 15(06):967–987

    Article  Google Scholar 

  12. Cormen TH, Leiserson CE, Rivest RL (1989) Introduction to Algorithms. The MIT Press, Cambridge

  13. CROHME (2011) Competition on recognition of online handwritten mathematical expressions. http://www.isical.ac.in/~crohme/

  14. Eto Y, Suzuki M (2001) Mathematical formula recognition using virtual link network. ICDAR’01. IEEE Computer Society, Washington, DC, USA, pp 762–767

  15. Foggia P, Sansone C, Tortorella F, Vento M (1999) Definition and validation of a distance measure between structural primitives. Pattern Anal Appl 2(3):215–227

    Article  Google Scholar 

  16. Fujiyoshi A, Suzuki M, Uchida S (2008) Verification of mathematical formulae based on a combination of context-free grammar and tree grammar. In: Proceedings of MKM, 2008, vol LNCS (LNAI) 5144, pp 415–429

  17. Garain U, Chaudhuri BB (2004) Recognition of online handwritten mathematical expressions. IEEE Trans Syst Man Cybern B Cybern 34(6):2366–2376

    Article  Google Scholar 

  18. Grbavec A, Blostein D (1995) Mathematics recognition using graph rewriting. In: Proceedings of the third international conference on document analysis and recognition, vol 1, pp 417–421

  19. sheng Guo Y, Huang L, ping Liu C, Jiang X (2007) An automatic mathematical expression understanding system. In: ICDAR’07, IEEE Computer Society, Washington, DC, USA, vol 2, pp 719–723

  20. Guo YS, Huang L, Liu CP (2007) A new approach for understanding of structure of printed mathematical expression. In: International conference on machine learning and cybernetics, 2007, vol 5, pp 2633–2638

  21. Infty (2009) Printed mathematical expressions. http://www.inftyproject.org/download/InftyMDB-1.zip

  22. Kamola G, Spytkowski M, Paradowski M, Markowska-Kaczmar U (2014) Image-based logical document structure recognition. Pattern Anal Appl 18(3):651–665

    Article  MathSciNet  Google Scholar 

  23. Kanahori T, Suzuki M (2003) Detection of matrices and segmentation of matrix elements in scanned images of scientific documents. In: ICDAR’03, IEEE Computer Society, Washington, DC, USA, vol 1, p 433

  24. Labahn G, Lank E, MacLean S, Marzouk M, Tausky D (2008) Mathbrush: a system for doing math on pen-based devices. In: Proceedings of the 2008 the eighth IAPR international workshop on document analysis systems, IEEE Computer Society, Washington, DC, USA, DAS ’08, pp 599–606

  25. LABELS-MBRS (2015) Labels and MBR information. http://scis.uohyd.ac.in/~pavanp/mathocr/CCLabels_MBRs.zip

  26. Lee HJ, Lee MC (1994) Understanding mathematical expressions using procedure-oriented transformation. Pattern Recogn 27(3):447–457

    Article  Google Scholar 

  27. Lee HJ, Wang JS (1997) Design of a mathematical expression understanding system. Pattern Recogn Lett 18(3):289–298

    Article  Google Scholar 

  28. Li C, Zeleznik RC, Miller T, LaViola JJ (2008) Online recognition of handwritten mathematical expressions with support for matrices. In: ICPR’08, pp 1–4

  29. Lin X, Gao L, Tang Z, Lin X, Hu X (2012) Performance evaluation of mathematical formula identification. In: IAPR international workshop on document analysis systems. IEEE Computer Society, Los Alamitos, CA, USA, pp 287–291

  30. Lin X, Gao L, Tang Z, Baker J, Sorge V (2014) Mathematical formula identification and performance evaluation in pdf documents. Int J Doc Anal Recogn (IJDAR) 17(3):239–255

    Article  Google Scholar 

  31. MacLean S, Labahn G (2013) A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. IJDAR 16(2):139–163

    Article  Google Scholar 

  32. MacLean S, Labahn G (2015) A bayesian model for recognizing handwritten mathematical expressions. Pattern Recogn 48(8):2433–2445

    Article  Google Scholar 

  33. MacLean S, Labahn G, Lank E, Marzouk M, Tausky D (2011) Grammar-based techniques for creating ground-truthed sketch corpora. Int J Doc Anal Recogn (IJDAR) 14(1):65–74

    Article  Google Scholar 

  34. Mitra J, Garain U, Chaudhuri BB, Swamy H V K, Pal T (2003) Automatic understanding of structures in printed mathematical expressions. In: Proceedings of the seventh international conference on document analysis and recognition, vol 1, ICDAR ’03, pp 540–544

  35. Mouchère H, Viard-Gaudin C, Kim DH, Kim JH, Garain U (2011) CROHME2011: competition on recognition of online handwritten mathematical expressions. In: 2011 international conference on document analysis and recognition, pp 1497–1500

  36. Mouchère H, Viard-Gaudin C, Kim DH, Kim JH, Garain U (2012) ICFHR 2012—Competition on recognition of online mathematical expressions (CROHME2012). In: International conference on frontiers in handwriting recognition (ICFHR)

  37. Mouchère H, Viard-Gaudin C, Zanibbi R, Garain U, Kim DH, Kim JH (2013) ICDAR 2013 CROHME: third international competition on recognition of online handwritten mathematical expressions. In: 2013 international conference on document analysis and recognition, pp 1428–1432

  38. Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl. https://doi.org/10.1007/s00521-016-2294-8

    Article  Google Scholar 

  39. PACME (2011) Printed Mathematical Expression Image Database. http://scis.uohyd.ac.in/~pavanp/mathocr/PrintedMEs.zip

  40. Pavan Kumar P (2013) A knowledge based design for structural analysis of printed mathematical expressions. Ph.D. thesis, School of Computer and Information Sciences, University of Hyderabad, India

  41. Pavan Kumar P, Agarwal A, Bhagvati C (2011) A rule-based approach to form mathematical symbols in printed mathematical expressions. In: MIWAI 2011, Springer, Berlin, Heidelberg, LNCS/LNAI, vol 7080, pp 181–192

    Google Scholar 

  42. Pavan Kumar P, Bhagvati C, Negi A, Agarwal A, Deekshatulu BL (2011) Towards improving the accuracy of Telugu OCR systems. In: ICDAR, pp 910–914

  43. Pavan Kumar P, Agarwal A, Bhagvati C (2014) A knowledge-based design for structural analysis of printed mathematical expressions. In: MIWAI 2014, Springer, LNCS/LNAI, vol 8875, pp 112–123

  44. Pavan Kumar P, Agarwal A, Bhagvati C (2014b) A string matching based algorithm for performance evaluation of mathematical expression recognition. Sadhana 39(1):63–79

    Article  Google Scholar 

  45. Raja A, Rayner M, Sexton A, Sorge V (2006) Towards a parser for mathematical formula recognition. Mathematical Knowledge Management, Springer, Berlin / Heidelberg, Lecture Notes in Computer Science 4108:139–151

  46. Rhee TH, Kim JH (2009) Efficient search strategy in structural analysis for handwritten mathematical expression recognition. Pattern Recogn 42:3192–3201

    Article  Google Scholar 

  47. Sain K, Dasgupta A, Garain U (2011) EMERS: a tree matching-based performance evaluation of mathematical expression recognition systems. Int J Doc Anal Recogn 14(1):75–85

    Article  Google Scholar 

  48. Sexton A, Sorge V (2006) Abstract matrices in symbolic computation. In: Proceedings of the 2006 international symposium on symbolic and algebraic computation, ACM, New York, NY, USA, ISSAC ’06, pp 318–325

  49. Sexton AP, Sorge V, Watt SM (2009) Computing with abstract matrix structures. In: Proceedings of the 2009 international symposium on symbolic and algebraic computation, ACM, New York, NY, USA, ISSAC ’09, pp 325–332

  50. Stalnaker D, Zanibbi R (2015) Math expression retrieval using an inverted index over symbol pairs. In: Document recognition and retrieval XXII, p 940207

  51. Suzuki M, Tamari F, Fukuda R, Uchida S, Kanahori T (2003) Infty- an integrated OCR system for mathematical documents. In: Proceedings of ACM symposium on document engineering 2003, ACM Press, pp 95–104

  52. Suzuki M, Uchida S, Nomura A (2005) A ground-truthed mathematical character and symbol image database. In: Proceedings of eighth international conference on document analysis and recognition, 2005, vol 2, pp 675–679

  53. SYMBOLS (2015) Symbol images. http://scis.uohyd.ac.in/~pavanp/mathocr/SymbolImagesLabels.zip

  54. Tapia E, Rojas R (2004) Recognition of on-line handwritten mathematical expressions using a minimum spanning tree construction and symbol dominance. In: Graphics recognition, Springer, Berlin, pp 329–340

    Chapter  Google Scholar 

  55. Tausky D, Labahn G, Lank E, Marzouk M (2007) Managing ambiguity in mathematical matrices. In: Proceedings of the 4th eurographics workshop on Sketch-based interfaces and modeling, ACM, New York, NY, USA, SBIM ’07, pp 115–122

  56. Tian X, Fan H (2005) Structural analysis based on baseline in printed mathematical expressions. In: PDCAT ’05, pp 787–790

  57. Tian XD, Li HY, Li XF, Zhang LP (2006) Research on symbol recognition for mathematical expressions. In: International conference on innovative computing, information and control, IEEE Computer Society, Los Alamitos, CA, USA 3:357–360

  58. Toshihiro K, Masakazu S (2002) A recognition method of matrices by using variable block pattern elements generating rectangular area. In: Graphics recognition algorithms and applications. Springer, Berlin, Heidelberg, Lecture notes in computer science, pp 320–329

    Google Scholar 

  59. Toyozumi K, Suzuki T, Mori K, Suenaga Y (2001) A system for real-time recognition of handwritten mathematical formulas. In: Proceedings of sixth international conference on document analysis and recognition 2001, pp 1059–1063

  60. Toyozumi K, Suzuki T, Mori K, Suenaga Y (2006) An on-line handwritten mathematical equation recognition system that can process matrix expressions by referring to the relative positions of matrix elements. Syst Comput Jpn 37(14):87–96

    Article  Google Scholar 

  61. Vuong BQ, Hui SC, He Y (2008) Progressive structural analysis for dynamic recognition of on-line handwritten mathematical expressions. In: Pattern recognition letters, Elsevier Science Inc., New York, NY, USA, vol 29, pp 647–655

    Article  Google Scholar 

  62. Wu K, Otoo E, Suzuki K (2008) Optimizing two-pass connected-component labeling algorithms. Pattern Anal Appl 12(2):117–135

    Article  MathSciNet  Google Scholar 

  63. Wu W, Li F, Kong J, Hou L, Zhu B (2006) A bottom-up OCR system for mathematical formulas recognition. In: Intelligent computing, Springer, Berlin, Heidelberg, Lecture notes in computer science, vol 4113, pp 274–279

    Google Scholar 

  64. Zanibbi R, Blostein D (2012) Recognition and retrieval of mathematical expressions. IJDAR 15(4):331–357

    Article  Google Scholar 

  65. Zanibbi R, Blostein D, Cordy JR (2002) Recognizing mathematical expressions using tree transformation. IEEE Trans PAMI 24(11):1455–1467

    Article  Google Scholar 

  66. Zanibbi R, Pillay A, Mouchere H, Viard-Gaudin C, Blostein D (2011) Stroke-based performance metrics for handwritten mathematical expressions. In: 2011 International conference on document analysis and recognition (ICDAR), pp 334–338

  67. Zhang H, Cao X, Ho JKL, Chow TWS (2017a) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531

    Article  Google Scholar 

  68. Zhang H, Li J, Ji Y, Yue H (2017b) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Industr Inf 13(2):616–624

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Pavan Kumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, P.P., Agarwal, A. & Bhagvati, C. Isolated structural error analysis of printed mathematical expressions. Pattern Anal Applic 21, 1097–1107 (2018). https://doi.org/10.1007/s10044-017-0667-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-017-0667-y

Keywords

Navigation