Abstract
This paper proposes the use of a formal grammar for the verification of mathematical formulae for a practical mathematical OCR system. Like a C compiler detecting syntax errors in a source file, we want to have a verification mechanism to find errors in the output of mathematical OCR. A linear monadic context-free tree grammar (LM-CFTG) is employed as a formal framework to define “well-formed” mathematical formulae. A cubic time parsing algorithm for LM-CFTGs is presented. For the purpose of practical evaluation, a verification system for mathematical OCR is developed, and the effectiveness of the system is demonstrated by using the ground-truthed mathematical document database InftyCDB-1 and a misrecognition database newly constructed for this study.
Similar content being viewed by others
References
Fujiyoshi, A., Suzuki, M., Uchida, S.: Verification of mathematical formulae based on a combination of context-free grammar and tree grammar. In: Proceedings of the 7th International Conference on Mathematical Knowledge Management (MKM 2008), pp. 415–429. LNCS(LNAI) 5144 (2008)
Chan K.F., Yeung D.Y.: Mathematical expression recognition: a survey. Int. J. Document Anal. Recogin. 3(1), 3–15 (2000)
Fujiyoshi A., Kasai T.: Spinal-formed context-free tree grammars. Theory Comput. Syst. 33(1), 59–83 (2000)
Anderson R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics. In: Klerer, M., Reinfelds, J. (eds) Interactive Systems for Experimental Applied Mathematics., pp. 436–459. Academic Press, Dublin (1968)
Chou, P.A.: Recognition of equations using a two-dimensional stochastic context-free grammar. In: Proceedings of SPIE, vol. 1199, pp. 852–863 (1989)
Grbavec, A., Blostein, D.: Mathematics recognition using graph rewriting. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR ’95), vol. 2, pp. 417–421 (1995)
Lavirotte, S., Potter, L.: Optical formula recognition. In: Proceedings of the 4th International Conference on Document Analysis and Recognition (ICDAR ’97), pp. 357–361 (1997)
Raja, A., Rayner, M., Sexton, A.P., Sorge, V.: Towards a parser for mathematical formula recognition. In: Proceedings of the 5th International Conference on Mathematical Knowledge Management (MKM 2006), pp. 139–151. LNCS 4108 (2006)
Hopcroft J.E., Ullman J.D.: Introduction to Automata Theory, Languages and Computation. Addison Wesley, Reading (1979)
Sikkel K., Nijholt A.: Parsing of contex-free languages. In: Rozenberg, G., Salomaa, A. (eds) Handbook of Formal Languages, vol 2., pp. 61–100. Springer, Berlin (1997)
Fujiyoshi, A.: Analogical conception of Chomsky normal form and Greibach normal form for linear, monadic context-free tree grammars. IEICE Trans. Inf. Syst., E89-D(12), 2933–2938 (2006)
Joshi A.K., Levy L.S., Takahashi M.: Tree adjunct grammars. J. Comput. Syst. Sci. 10(1), 136–163 (1975)
Joshi A.K., Schabes Y.: Tree-adjoining grammars. In: Rozenberg, G., Salomaa, , A., (eds) Handbook of Formal Languages, vol 3., pp. 69–124. Springer, Berlin (1997)
Abeillé, A., Rambow, O. (eds): Tree Adjoining Grammars: Formalisms, Linguistic Analysis and Processing. CSLI Publications, Stanford (2000)
Fujiyoshi, A.: Application of the CKY algorithm to recognition of tree structures for linear, monadic context-free tree grammars. IEICE Trans. Inf. Syst., E90-D(2), 388–394 (2007)
Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty—an integrated OCR system for mathematical documents. In: Proceedings of ACM Symposium on Document Engineering 2003, pp. 95–104 (2003)
Donnelly, C., Stallman, R.: Bison: The yacc-compatible parser generator. Available on: http://www.gnu.org/software/bison/manual/ (2006)
Mozilla Firefox. http://www.mozilla.com/firefox/
Infty Project. http://www.inftyproject.org/en/
Eto, Y., Suzuki, M.: Mathematical formula recognition using virtual link network. In: Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR 2001), pp. 430–437 (2001)
Kanahori, T., Sexton, A., Sorge, V., Suzuki, M.: Capturing abstract matrices from paper. In: Proceedings of the 5th International Conference on Mathematical Knowledge Management (MKM 2006), pp. 124–138. LNCS 4108 (2006)
Suzuki, M., Uchida, S., Nomura, A.: A ground-truthed mathematical character and symbol image database. In: Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR 2005), vol. 2, pp. 675–679 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
An earlier version of this article [1] was presented at the 7th International Conference on Mathematical Knowledge Management (MKM 2008).
Rights and permissions
About this article
Cite this article
Fujiyoshi, A., Suzuki, M. & Uchida, S. Grammatical Verification for Mathematical Formula Recognition Based on Context-Free Tree Grammar. Math.Comput.Sci. 3, 279–298 (2010). https://doi.org/10.1007/s11786-010-0023-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11786-010-0023-8