Skip to main content
Log in

Grammatical Verification for Mathematical Formula Recognition Based on Context-Free Tree Grammar

  • Published:
Mathematics in Computer Science Aims and scope Submit manuscript

Abstract

This paper proposes the use of a formal grammar for the verification of mathematical formulae for a practical mathematical OCR system. Like a C compiler detecting syntax errors in a source file, we want to have a verification mechanism to find errors in the output of mathematical OCR. A linear monadic context-free tree grammar (LM-CFTG) is employed as a formal framework to define “well-formed” mathematical formulae. A cubic time parsing algorithm for LM-CFTGs is presented. For the purpose of practical evaluation, a verification system for mathematical OCR is developed, and the effectiveness of the system is demonstrated by using the ground-truthed mathematical document database InftyCDB-1 and a misrecognition database newly constructed for this study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Fujiyoshi, A., Suzuki, M., Uchida, S.: Verification of mathematical formulae based on a combination of context-free grammar and tree grammar. In: Proceedings of the 7th International Conference on Mathematical Knowledge Management (MKM 2008), pp. 415–429. LNCS(LNAI) 5144 (2008)

  2. Chan K.F., Yeung D.Y.: Mathematical expression recognition: a survey. Int. J. Document Anal. Recogin. 3(1), 3–15 (2000)

    Article  Google Scholar 

  3. Fujiyoshi A., Kasai T.: Spinal-formed context-free tree grammars. Theory Comput. Syst. 33(1), 59–83 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  4. Anderson R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics. In: Klerer, M., Reinfelds, J. (eds) Interactive Systems for Experimental Applied Mathematics., pp. 436–459. Academic Press, Dublin (1968)

    Google Scholar 

  5. Chou, P.A.: Recognition of equations using a two-dimensional stochastic context-free grammar. In: Proceedings of SPIE, vol. 1199, pp. 852–863 (1989)

  6. Grbavec, A., Blostein, D.: Mathematics recognition using graph rewriting. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR ’95), vol. 2, pp. 417–421 (1995)

  7. Lavirotte, S., Potter, L.: Optical formula recognition. In: Proceedings of the 4th International Conference on Document Analysis and Recognition (ICDAR ’97), pp. 357–361 (1997)

  8. Raja, A., Rayner, M., Sexton, A.P., Sorge, V.: Towards a parser for mathematical formula recognition. In: Proceedings of the 5th International Conference on Mathematical Knowledge Management (MKM 2006), pp. 139–151. LNCS 4108 (2006)

  9. Hopcroft J.E., Ullman J.D.: Introduction to Automata Theory, Languages and Computation. Addison Wesley, Reading (1979)

    MATH  Google Scholar 

  10. Sikkel K., Nijholt A.: Parsing of contex-free languages. In: Rozenberg, G., Salomaa, A. (eds) Handbook of Formal Languages, vol 2., pp. 61–100. Springer, Berlin (1997)

    Google Scholar 

  11. Fujiyoshi, A.: Analogical conception of Chomsky normal form and Greibach normal form for linear, monadic context-free tree grammars. IEICE Trans. Inf. Syst., E89-D(12), 2933–2938 (2006)

  12. Joshi A.K., Levy L.S., Takahashi M.: Tree adjunct grammars. J. Comput. Syst. Sci. 10(1), 136–163 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  13. Joshi A.K., Schabes Y.: Tree-adjoining grammars. In: Rozenberg, G., Salomaa, , A., (eds) Handbook of Formal Languages, vol 3., pp. 69–124. Springer, Berlin (1997)

    Google Scholar 

  14. Abeillé, A., Rambow, O. (eds): Tree Adjoining Grammars: Formalisms, Linguistic Analysis and Processing. CSLI Publications, Stanford (2000)

    MATH  Google Scholar 

  15. Fujiyoshi, A.: Application of the CKY algorithm to recognition of tree structures for linear, monadic context-free tree grammars. IEICE Trans. Inf. Syst., E90-D(2), 388–394 (2007)

  16. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty—an integrated OCR system for mathematical documents. In: Proceedings of ACM Symposium on Document Engineering 2003, pp. 95–104 (2003)

  17. Donnelly, C., Stallman, R.: Bison: The yacc-compatible parser generator. Available on: http://www.gnu.org/software/bison/manual/ (2006)

  18. Mozilla Firefox. http://www.mozilla.com/firefox/

  19. Infty Project. http://www.inftyproject.org/en/

  20. Eto, Y., Suzuki, M.: Mathematical formula recognition using virtual link network. In: Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR 2001), pp. 430–437 (2001)

  21. Kanahori, T., Sexton, A., Sorge, V., Suzuki, M.: Capturing abstract matrices from paper. In: Proceedings of the 5th International Conference on Mathematical Knowledge Management (MKM 2006), pp. 124–138. LNCS 4108 (2006)

  22. Suzuki, M., Uchida, S., Nomura, A.: A ground-truthed mathematical character and symbol image database. In: Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR 2005), vol. 2, pp. 675–679 (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akio Fujiyoshi.

Additional information

An earlier version of this article [1] was presented at the 7th International Conference on Mathematical Knowledge Management (MKM 2008).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fujiyoshi, A., Suzuki, M. & Uchida, S. Grammatical Verification for Mathematical Formula Recognition Based on Context-Free Tree Grammar. Math.Comput.Sci. 3, 279–298 (2010). https://doi.org/10.1007/s11786-010-0023-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11786-010-0023-8

Keywords

Mathematics Subject Classification (2000)

Navigation