Skip to main content
Log in

Grammar-based techniques for creating ground-truthed sketch corpora

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Although publicly available, ground-truthed corpora have proven useful for training, evaluating, and comparing recognition systems in many domains, the availability of such corpora for sketch recognizers, and math recognizers in particular, is currently quite poor. This paper presents a general approach to creating large, ground-truthed corpora for structured sketch domains such as mathematics. In the approach, random sketch templates are generated automatically using a grammar model of the sketch domain. These templates are transcribed manually, then automatically annotated with ground-truth. The annotation procedure uses the generated sketch templates to find a matching between transcribed and generated symbols. A large, ground-truthed corpus of handwritten mathematical expressions presented in the paper illustrates the utility of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Blackwell, F.W., Anderson, R.H.: An on-line symbolic mathematics system using hand-printed two-dimensional notation. In: Proceedings of the 1969 24th National Conference, pp. 551–557. ACM, New York (1969)

  2. Bunke, H.: Recognition of cursive roman handwriting–past, present and future. In: ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, p. 448. IEEE Computer Society, Washington (2003)

  3. Chan, K.-F., Yeung, D.-Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. In: On-Line Mathematical Expression Recognition, Pattern Recognition (1999)

  4. Costagliola, G., Tomita, M., Chang, S.-K.: A generalized parser for 2-d languages. In: Proceedings of the 1991 IEEE Workshop on Visual Languages, pp. 98–104 (1991)

  5. Garain U., Chaudhuri B.: A corpus for ocr research on mathematical expressions. Int. J. Doc. Anal. Recognit. 7(4), 241–259 (2005)

    Article  Google Scholar 

  6. Heroux, P., Barbu, E., Adam, S., Trupin, E.: Automatic ground-truth generation for document image analysis and understanding, document analysis and recognition, 2007. In: ICDAR 2007. Ninth International Conference on, vol. 1, Sept 2007, pp. 476–480 (2007)

  7. Kumar, A., Balasubramanian A., Namboodiri, A., Jawahar, C.V.: Model-based annotation of online handwritten datasets, In: Lorette, G. (ed.) Tenth International Workshop on Frontiers in Handwriting Recognition. Université de Rennes 1, Suvisoft, Oct 2006

  8. Labahn, G., Lank, E., MacLean, S., Marzouk, M., Tausky, D.: Mathbrush: a system for doing math on pen-based devices. In: The Eighth IAPR Workshop on Document Analysis Systems (DAS), pp. 599–606 (2008)

  9. Laviola, J.J. Jr.: Mathematical sketching: a new approach to creating and exploring dynamic illustrations, Ph.D. thesis, Brown University, Providence, RI, USA, Adviser-Dam, Andries Van (2005)

  10. Martin,W.A.: Computer input/output of mathematical expressions. In: SYMSAC ’71: Proceedings of the second ACM symposium on Symbolic and algebraic manipulation, pp. 78–89. ACM, New York (1971)

  11. Marzinkewitsch, R.: Operating computer algebra systems by handprinted input. In: ISSAC ’91: Proceedings of the 1991 international symposium on Symbolic and algebraic computation, pp. 411–413. ACM, New York (1991)

  12. Mas, J., Jorge, J.A., Sánchez, G., Lladós, J.: Representing and parsing sketched symbols using adjacency grammars and a grid-directed parser, pp. 169–180. GREC (2007)

  13. Okun, O., Pietikainen, M.: Automatic ground-truth generation for skew-tolerance evaluation of document layout analysis methods, Pattern Recognition, 2000. In: Proceedings 15th International Conference on, vol. 4, pp. 376–379 (2000)

  14. Prusa, D., Hlavac, V.: Mathematical formulae recognition using 2d grammars, Document Analysis and Recognition, 2007. In: ICDAR 2007. Ninth International Conference on, vol. 2, Sept 2007 pp. 849–853 (2007)

  15. Van Beusekom, J., Shafait, F., Breuel, T.M.: Automated ocr ground truth generation, document analysis systems, 2008. In: DAS ’08. The Eighth IAPR International Workshop on, Sept 2008, pp. 111–117 (2008)

  16. Wittenburg K., Weitzman L., Talley J.: Unification-based grammars and tabular parsing for graphical languages. J. Vis. Lang. Comput. 2, 347–370 (1991)

    Article  Google Scholar 

  17. Zanibbi R., Blostein D., Cordy J.R.: Recognizing mathematical expressions using tree transformation. Pattern Anal. Mach. Intell. IEEE Trans. 24(11), 1455–1467 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Scott MacLean.

Rights and permissions

Reprints and permissions

About this article

Cite this article

MacLean, S., Labahn, G., Lank, E. et al. Grammar-based techniques for creating ground-truthed sketch corpora. IJDAR 14, 65–74 (2011). https://doi.org/10.1007/s10032-010-0118-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-010-0118-4

Keywords

Navigation