Abstract
Recognition of mathematical expressions (MEs) has two stages: symbol recognition and structural analysis. Symbols are recognized in the first stage, while structure (spatial relationships like superscript, subscript) is interpreted in the second stage. Errors in any stage may affect the overall recognition performance. Due to complex two-dimensional nature of MEs, structural analysis is a challenging task eventhough all symbols are properly recognized. In our present work, we have focused on structural analysis of printed mathematical expressions in an isolated manner. We have analyzed various structural errors and presented the behavior of isolated structural analysis module in the context of perfect symbol recognition. For our error analysis, we have created a database of 829 expression images. For each image in the database, we have also generated ground truth symbol labels to simulate perfect symbol recognition. As ground truth symbol labels are readily available, our database can also be used as a benchmark to compare various structural analysis approaches.
References
Álvaro F, Sánchez JA, Benedí JM (2013) Classification of on-line mathematical symbols with hybrid features and recurrent neural networks. In: International conference on document analysis and recognition (ICDAR), pp 1012–1016
Álvaro F, Sánchez JA, Benedí JM (2014) Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden markov models. Pattern Recogn Lett 35:58–67
Aly W, Uchida S, Suzuki M (2008) A large-scale analysis of mathematical expressions for an accurate understanding of their structure. In: The eighth IAPR international workshop on document analysis systems, 2008. DAS ’08., pp 549–556
Awal AM, Mouchère H, Viard-Gaudin C (2014) A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recogn Lett 35:68–77
Baker JB, Sexton AP, Sorge V (2009) A linear grammar approach to mathematical formula recognition from PDF. In: Proceedings of the 16th symposium, 8th international conference. Held as part of CICM ’09 on intelligent computer mathematics, Springer-Verlag, Berlin, Heidelberg, Calculemus ’09/MKM ’09, pp 201–216
Celik M, Yanikoglu B (2011) Probabilistic mathematical formula recognition using a 2D context-free graph grammar. In: International conference on document analysis and recognition (ICDAR) 2011:161–166
Chan KF, Yeung DY (2000) An efficient syntactic approach to structural analysis of on-line handwritten mathematical expressions. Pattern Recogn 33(3):375–384
Chan KF, Yeung DY (2001) Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recogn 34(8):1671–1684
Chang SK (1970) A method for the structural analysis of two-dimensional mathematical expressions. Inf Sci 2(3):253–272
Chaudhuri BB, Garain U (2000) An approach for recognition and interpretation of mathematical expressions in printed document. Pattern Anal Appl 3(2):120–131
Chen Y, Okada M (2001) Structural analysis and semantic understanding for offline mathematical expressions. Int J Pattern Recognit Artif Intell 15(06):967–987
Cormen TH, Leiserson CE, Rivest RL (1989) Introduction to Algorithms. The MIT Press, Cambridge
CROHME (2011) Competition on recognition of online handwritten mathematical expressions. http://www.isical.ac.in/~crohme/
Eto Y, Suzuki M (2001) Mathematical formula recognition using virtual link network. ICDAR’01. IEEE Computer Society, Washington, DC, USA, pp 762–767
Foggia P, Sansone C, Tortorella F, Vento M (1999) Definition and validation of a distance measure between structural primitives. Pattern Anal Appl 2(3):215–227
Fujiyoshi A, Suzuki M, Uchida S (2008) Verification of mathematical formulae based on a combination of context-free grammar and tree grammar. In: Proceedings of MKM, 2008, vol LNCS (LNAI) 5144, pp 415–429
Garain U, Chaudhuri BB (2004) Recognition of online handwritten mathematical expressions. IEEE Trans Syst Man Cybern B Cybern 34(6):2366–2376
Grbavec A, Blostein D (1995) Mathematics recognition using graph rewriting. In: Proceedings of the third international conference on document analysis and recognition, vol 1, pp 417–421
sheng Guo Y, Huang L, ping Liu C, Jiang X (2007) An automatic mathematical expression understanding system. In: ICDAR’07, IEEE Computer Society, Washington, DC, USA, vol 2, pp 719–723
Guo YS, Huang L, Liu CP (2007) A new approach for understanding of structure of printed mathematical expression. In: International conference on machine learning and cybernetics, 2007, vol 5, pp 2633–2638
Infty (2009) Printed mathematical expressions. http://www.inftyproject.org/download/InftyMDB-1.zip
Kamola G, Spytkowski M, Paradowski M, Markowska-Kaczmar U (2014) Image-based logical document structure recognition. Pattern Anal Appl 18(3):651–665
Kanahori T, Suzuki M (2003) Detection of matrices and segmentation of matrix elements in scanned images of scientific documents. In: ICDAR’03, IEEE Computer Society, Washington, DC, USA, vol 1, p 433
Labahn G, Lank E, MacLean S, Marzouk M, Tausky D (2008) Mathbrush: a system for doing math on pen-based devices. In: Proceedings of the 2008 the eighth IAPR international workshop on document analysis systems, IEEE Computer Society, Washington, DC, USA, DAS ’08, pp 599–606
LABELS-MBRS (2015) Labels and MBR information. http://scis.uohyd.ac.in/~pavanp/mathocr/CCLabels_MBRs.zip
Lee HJ, Lee MC (1994) Understanding mathematical expressions using procedure-oriented transformation. Pattern Recogn 27(3):447–457
Lee HJ, Wang JS (1997) Design of a mathematical expression understanding system. Pattern Recogn Lett 18(3):289–298
Li C, Zeleznik RC, Miller T, LaViola JJ (2008) Online recognition of handwritten mathematical expressions with support for matrices. In: ICPR’08, pp 1–4
Lin X, Gao L, Tang Z, Lin X, Hu X (2012) Performance evaluation of mathematical formula identification. In: IAPR international workshop on document analysis systems. IEEE Computer Society, Los Alamitos, CA, USA, pp 287–291
Lin X, Gao L, Tang Z, Baker J, Sorge V (2014) Mathematical formula identification and performance evaluation in pdf documents. Int J Doc Anal Recogn (IJDAR) 17(3):239–255
MacLean S, Labahn G (2013) A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. IJDAR 16(2):139–163
MacLean S, Labahn G (2015) A bayesian model for recognizing handwritten mathematical expressions. Pattern Recogn 48(8):2433–2445
MacLean S, Labahn G, Lank E, Marzouk M, Tausky D (2011) Grammar-based techniques for creating ground-truthed sketch corpora. Int J Doc Anal Recogn (IJDAR) 14(1):65–74
Mitra J, Garain U, Chaudhuri BB, Swamy H V K, Pal T (2003) Automatic understanding of structures in printed mathematical expressions. In: Proceedings of the seventh international conference on document analysis and recognition, vol 1, ICDAR ’03, pp 540–544
Mouchère H, Viard-Gaudin C, Kim DH, Kim JH, Garain U (2011) CROHME2011: competition on recognition of online handwritten mathematical expressions. In: 2011 international conference on document analysis and recognition, pp 1497–1500
Mouchère H, Viard-Gaudin C, Kim DH, Kim JH, Garain U (2012) ICFHR 2012—Competition on recognition of online mathematical expressions (CROHME2012). In: International conference on frontiers in handwriting recognition (ICFHR)
Mouchère H, Viard-Gaudin C, Zanibbi R, Garain U, Kim DH, Kim JH (2013) ICDAR 2013 CROHME: third international competition on recognition of online handwritten mathematical expressions. In: 2013 international conference on document analysis and recognition, pp 1428–1432
Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl. https://doi.org/10.1007/s00521-016-2294-8
PACME (2011) Printed Mathematical Expression Image Database. http://scis.uohyd.ac.in/~pavanp/mathocr/PrintedMEs.zip
Pavan Kumar P (2013) A knowledge based design for structural analysis of printed mathematical expressions. Ph.D. thesis, School of Computer and Information Sciences, University of Hyderabad, India
Pavan Kumar P, Agarwal A, Bhagvati C (2011) A rule-based approach to form mathematical symbols in printed mathematical expressions. In: MIWAI 2011, Springer, Berlin, Heidelberg, LNCS/LNAI, vol 7080, pp 181–192
Pavan Kumar P, Bhagvati C, Negi A, Agarwal A, Deekshatulu BL (2011) Towards improving the accuracy of Telugu OCR systems. In: ICDAR, pp 910–914
Pavan Kumar P, Agarwal A, Bhagvati C (2014) A knowledge-based design for structural analysis of printed mathematical expressions. In: MIWAI 2014, Springer, LNCS/LNAI, vol 8875, pp 112–123
Pavan Kumar P, Agarwal A, Bhagvati C (2014b) A string matching based algorithm for performance evaluation of mathematical expression recognition. Sadhana 39(1):63–79
Raja A, Rayner M, Sexton A, Sorge V (2006) Towards a parser for mathematical formula recognition. Mathematical Knowledge Management, Springer, Berlin / Heidelberg, Lecture Notes in Computer Science 4108:139–151
Rhee TH, Kim JH (2009) Efficient search strategy in structural analysis for handwritten mathematical expression recognition. Pattern Recogn 42:3192–3201
Sain K, Dasgupta A, Garain U (2011) EMERS: a tree matching-based performance evaluation of mathematical expression recognition systems. Int J Doc Anal Recogn 14(1):75–85
Sexton A, Sorge V (2006) Abstract matrices in symbolic computation. In: Proceedings of the 2006 international symposium on symbolic and algebraic computation, ACM, New York, NY, USA, ISSAC ’06, pp 318–325
Sexton AP, Sorge V, Watt SM (2009) Computing with abstract matrix structures. In: Proceedings of the 2009 international symposium on symbolic and algebraic computation, ACM, New York, NY, USA, ISSAC ’09, pp 325–332
Stalnaker D, Zanibbi R (2015) Math expression retrieval using an inverted index over symbol pairs. In: Document recognition and retrieval XXII, p 940207
Suzuki M, Tamari F, Fukuda R, Uchida S, Kanahori T (2003) Infty- an integrated OCR system for mathematical documents. In: Proceedings of ACM symposium on document engineering 2003, ACM Press, pp 95–104
Suzuki M, Uchida S, Nomura A (2005) A ground-truthed mathematical character and symbol image database. In: Proceedings of eighth international conference on document analysis and recognition, 2005, vol 2, pp 675–679
SYMBOLS (2015) Symbol images. http://scis.uohyd.ac.in/~pavanp/mathocr/SymbolImagesLabels.zip
Tapia E, Rojas R (2004) Recognition of on-line handwritten mathematical expressions using a minimum spanning tree construction and symbol dominance. In: Graphics recognition, Springer, Berlin, pp 329–340
Tausky D, Labahn G, Lank E, Marzouk M (2007) Managing ambiguity in mathematical matrices. In: Proceedings of the 4th eurographics workshop on Sketch-based interfaces and modeling, ACM, New York, NY, USA, SBIM ’07, pp 115–122
Tian X, Fan H (2005) Structural analysis based on baseline in printed mathematical expressions. In: PDCAT ’05, pp 787–790
Tian XD, Li HY, Li XF, Zhang LP (2006) Research on symbol recognition for mathematical expressions. In: International conference on innovative computing, information and control, IEEE Computer Society, Los Alamitos, CA, USA 3:357–360
Toshihiro K, Masakazu S (2002) A recognition method of matrices by using variable block pattern elements generating rectangular area. In: Graphics recognition algorithms and applications. Springer, Berlin, Heidelberg, Lecture notes in computer science, pp 320–329
Toyozumi K, Suzuki T, Mori K, Suenaga Y (2001) A system for real-time recognition of handwritten mathematical formulas. In: Proceedings of sixth international conference on document analysis and recognition 2001, pp 1059–1063
Toyozumi K, Suzuki T, Mori K, Suenaga Y (2006) An on-line handwritten mathematical equation recognition system that can process matrix expressions by referring to the relative positions of matrix elements. Syst Comput Jpn 37(14):87–96
Vuong BQ, Hui SC, He Y (2008) Progressive structural analysis for dynamic recognition of on-line handwritten mathematical expressions. In: Pattern recognition letters, Elsevier Science Inc., New York, NY, USA, vol 29, pp 647–655
Wu K, Otoo E, Suzuki K (2008) Optimizing two-pass connected-component labeling algorithms. Pattern Anal Appl 12(2):117–135
Wu W, Li F, Kong J, Hou L, Zhu B (2006) A bottom-up OCR system for mathematical formulas recognition. In: Intelligent computing, Springer, Berlin, Heidelberg, Lecture notes in computer science, vol 4113, pp 274–279
Zanibbi R, Blostein D (2012) Recognition and retrieval of mathematical expressions. IJDAR 15(4):331–357
Zanibbi R, Blostein D, Cordy JR (2002) Recognizing mathematical expressions using tree transformation. IEEE Trans PAMI 24(11):1455–1467
Zanibbi R, Pillay A, Mouchere H, Viard-Gaudin C, Blostein D (2011) Stroke-based performance metrics for handwritten mathematical expressions. In: 2011 International conference on document analysis and recognition (ICDAR), pp 334–338
Zhang H, Cao X, Ho JKL, Chow TWS (2017a) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531
Zhang H, Li J, Ji Y, Yue H (2017b) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Industr Inf 13(2):616–624
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kumar, P.P., Agarwal, A. & Bhagvati, C. Isolated structural error analysis of printed mathematical expressions. Pattern Anal Applic 21, 1097–1107 (2018). https://doi.org/10.1007/s10044-017-0667-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-017-0667-y