Skip to main content

A Database of Glyphs for OCR of Mathematical Documents

  • Conference paper
Mathematical Knowledge Management (MKM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3863))

Included in the following conference series:

Abstract

Automatic document analysis tools for mathematical texts are necessary to enlarge the pool of mathematical knowledge available in electronic form. However, development of such tools is currently hindered by the weakness of optical character recognition systems in dealing with the large range of mathematical symbols and the often subtle but important distinctions in font usage in mathematical texts. Research on developing better systems for mathematical optical character recognition crucially depends on having an extensive, high quality database of glyphs used in mathematical texts for training and test purposes. We present such a database of symbols constructed from a large set of characters available in the LATEX document preparation system that can serve as a basis mathematical text recognition. We describe its integration into a prototypical system optical character recognition system for mathematics that enables the construction of LATEX source documents from mathematical documents available as images. From the lessons learned in this work we derive a road map for further research into the area of mathematical text analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, R.H.: Syntax-Directed Recognition of Hand-Printed Two-dimensional Mathematics. PhD thesis, January 1968, Harvard University, Cambridge (1968); Klerer, M., Reinfelds, J.: Shorter version. Interactive Systems for Experimental Applied Mathematics, pp. 436–459. Academics Press, London (1968)

    Google Scholar 

  2. The JSTOR scholarly journal archive, http://www.jstor.org/

  3. The JSTOR production process, http://www.jstor.org/about/process.html

  4. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty — an integrated ocr system for mathematical documents. In: Vanoirbeek, C., Roisin, C., Munson, E. (eds.) Proceedings of ACM Symposium on Document Engineering, Grenoble, France, pp. 95–104 (2003)

    Google Scholar 

  5. Parkin, S.: The comprehensive latex symbol list. Technical report, CTAN (September 29, 2003), available at http://twww.ctan.org

  6. Sexton, A., Sorge, V.: Database-driven mathematical character recognition. In: Liu, W., Lladós, J. (eds.) GREC 2005. LNCS, vol. 3926, pp. 218–230. Springer, Heidelberg (2006) (to appear)

    Chapter  Google Scholar 

  7. Sexton, A.P., Todman, A., Woodward, K.: Font recognition using shape-based quad-tree and kd-tree decomposition. In: 3rd International Conference on Computer Vision, Pattern Recognition and Image Processing, Atlantic City, USA, pp. 212–215 (February 2000); Appears in Vol 2 of the collected proceedings of JCIS 2000, the Fifth Joint Conference on Information Sciences

    Google Scholar 

  8. Suzuki, M., Uchida, S., Nomura, A.: A ground-truthed mathematical character and symbol image database. Technical report, Faculty of Mathematics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka-shi, 812-8581 Japan (2004), Available at http://www.inftyproject.org/AboutInftyCDB-1.pdf

  9. Transactions of the American Mathematical Society. Available as part of JSTOR at http://uk.jstor.org/journals/00029947.html

  10. The teTeX homepage, http://www.tug.org/teTeX/

  11. Tiño, P., Hammer, B.: Architectural Bias in Recurrent Neural Networks: Fractal Analysis. Neural Computation 15(8), 1931–1957 (2003), available at http://www.cs.bham.ac.uk/~pxt/PAPERS/rnn.frac.nc.fin.ps.gz

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sexton, A., Sorge, V. (2006). A Database of Glyphs for OCR of Mathematical Documents. In: Kohlhase, M. (eds) Mathematical Knowledge Management. MKM 2005. Lecture Notes in Computer Science(), vol 3863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11618027_14

Download citation

  • DOI: https://doi.org/10.1007/11618027_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31430-1

  • Online ISBN: 978-3-540-31431-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics