MaxTract: Converting PDF to $\mbox\LaTeX$ , MathML and Text

Baker, Josef B.; Sexton, Alan P.; Sorge, Volker

doi:10.1007/978-3-642-31374-5_29

Josef B. Baker²⁶,
Alan P. Sexton²⁶ &
Volker Sorge²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7362))

Included in the following conference series:

International Conference on Intelligent Computer Mathematics

1090 Accesses
9 Citations

Abstract

In this paper we present the first public, online demonstration of MaxTract; a tool that converts PDF files containing mathematics into multiple formats including $\mbox\LaTeX$, HTML with embedded MathML, and plain text. Using a bespoke PDF parser and image analyser, we directly extract character and font information to use as input for a linear grammar which, in conjunction with specialised drivers, can accurately recognise and reproduce both the two dimensional relationships between symbols in mathematical formulae and the one dimensional relationships present in standard text.

The main goals of MaxTract are to provide translation services into standard mathematical markup languages and to add accessibility to mathematical documents on multiple levels. This includes both accessibility in the narrow sense of providing access to content for print impaired users, such as those with visual impairments, dyslexia or dyspraxia, as well as more generally to enable any user access to the mathematical content at more re-usable levels than merely visual. MaxTract produces output compatible with web browsers, screen readers, and tools such as copy and paste, which is achieved by enriching the regular text with mathematical markup. The output can also be used directly, within the limits of the presentation MathML produced, as machine readable mathematical input to software systems such as Mathematica or Maple.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adobe. PDF Reference fifth edition Adobe Portable Document Format Version 1.6. Adobe Systems (2004)
Google Scholar
Adobe. Adobe Reader X. Adobe Systems (2012), http://get.adobe.com/uk/reader/
Baker, J.B., Sexton, A.P., Sorge, V.: A Linear Grammar Approach to Mathematical Formula Recognition from PDF. In: Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds.) MKM 2009, Held as Part of CICM 2009. LNCS (LNAI), vol. 5625, pp. 201–216. Springer, Heidelberg (2009)
Chapter Google Scholar
Baker, J.B., Sexton, A.P., Sorge, V.: Towards reverse engineering of PDF documents. In: Sojka, P., Bouche, T. (eds.) Towards a Digital Mathematics Library, DML 2011, Bertinoro, Italy, pp. 65–75. Masaryk University Press (July 2011)
Google Scholar
Black, A.W., Taylor, P.A.: The Festival Speech Synthesis System: System documentation. Technical Report HCRC/TR-83, Human Communciation Research Centre, University of Edinburgh, Scotland, UK (1997), http://www.cstr.ed.ac.uk/projects/festival.html
Gray, N.: Textpos (2010), http://purl.org/nxg/dist/textpos
Marik, R.: OCGtools (2012), http://ctan.org/pkg/ocgtools

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Birmingham, UK
Josef B. Baker, Alan P. Sexton & Volker Sorge

Authors

Josef B. Baker
View author publications
You can also search for this author in PubMed Google Scholar
Alan P. Sexton
View author publications
You can also search for this author in PubMed Google Scholar
Volker Sorge
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Computing Sciences, Utrecht University and Open Universiteit Nederland, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Johan Jeuring
Department of Computer Science, University College London, Gower Street, WC1E 6BT, London, UK
John A. Campbell
Department of Computing and Software, McMaster University, 1280 Main Street West, L8S 4K1, Hamilton, ON, Canada
Jacques Carette
Department of Computer Science and Engineering, Texas A&M University, 77843-3112, College Station, TX, USA
Gabriel Dos Reis
Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Petr Sojka
Laboratoire de Recherche en Informatique (LRI - UMR8623) PCRI, Université Paris-Sud, Batiment 650, 91405, Orsay Cedex, France
Makarius Wenzel
The University of Birmingham, School of Computer Science,, B15 2TT, Edgbaston, Birmingham, UK
Volker Sorge

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baker, J.B., Sexton, A.P., Sorge, V. (2012). MaxTract: Converting PDF to $\mbox\LaTeX$, MathML and Text. In: Jeuring, J., et al. Intelligent Computer Mathematics. CICM 2012. Lecture Notes in Computer Science(), vol 7362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31374-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-31374-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31373-8
Online ISBN: 978-3-642-31374-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MaxTract: Converting PDF to \(\mbox\LaTeX\), MathML and Text

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

MaxTract: Converting PDF to \(\mbox\LaTeX\), MathML and Text

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation