MathTools: An Open API for Convenient MathML Handling

  • André Greiner-PetterEmail author
  • Moritz Schubotz
  • Howard S. Cohl
  • Bela Gipp
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11006)


Mathematical formulae carry complex and essential semantic information in a variety of formats. Accessing this information with different systems requires a standardized machine-readable format that is capable of encoding presentational and semantic information. Even though MathML is an official recommendation by W3C and an ISO standard for representing mathematical expressions, we could identify only very few systems which use the full descriptiveness of MathML. MathML’s high complexity results in a steep learning curve for novice users. We hypothesize that this complexity is the reason why many community-driven projects refrain from using MathML, and instead develop problem-specific data formats for their purposes. We provide a user-friendly, open-source application programming interface for controlling MathML data. Our API allows one to create, manipulate, and efficiently access commonly needed information in presentation and content MathML. Our interface also provides tools for calculating differences and similarities between MathML expressions. The API also allows one to determine the distance between expressions using different similarity measures. In addition, we provide adapters for numerous conversion tools and the canonicalization project. Our toolkit facilitates processing of mathematics for digital libraries without the need to obtain XML expertise.


MathML API Toolkit Java 



We would like to thank Felix Hamborg, Vincent Stange, Jimmy Li, Telmo Menezes, and Michael Kramer for contributing to the MathTools project. We are also indebted to Akiko Aizawa for her advice and for hosting the first two authors as visiting researchers at the National Institute of Informatics (NII) in Tokyo. This work was supported by the FITWeltweit program of the German Academic Exchange Service (DAAD) as well as by the German Research Foundation (DFG) through grant no. GI 1259/1.


  1. 1.
    Bernardin, L., et al.: Maple 2016 Programming Guide. Maplesoft, a division of Waterloo Maple Inc. (2016). ISBN 978-1-926902-46-3Google Scholar
  2. 2.
    Cervone, D., Krautzberger, P., Sorge, V.: Towards universal rendering in MathJax. In: Proceedings of the W4A 2016. ACM Press (2016).
  3. 3.
    Cohl, H.S., et al.: Growing the digital repository of mathematical formulae with generic LaTeX sources. In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.) CICM 2015. LNCS (LNAI), vol. 9150, pp. 280–287. Springer, Cham (2015). Scholar
  4. 4.
    Olver, F.W.J., et al. (eds.): NIST Digital Library of Mathematical Functions. Release 1.0.19 of 22 June 2018
  5. 5.
    Formánek, D., et al.: Normalization of digital mathematics library content. In: Davenport, J., et al. (eds.) Proceeding of OpenMath/MathUI/CICM-WiP, vol. 921, Bremen, 9–13 July 2012Google Scholar
  6. 6.
    Gipp, B., et al.: Web-based demonstration of semantic similarity detection using citation pattern visualization for a cross language plagiarism case. In: ICEIS 2014 - Proceedings of the 16th International Conference on Enterprise Information Systems, vol. 2, Lisbon, 27–30 April 2014.
  7. 7.
    Kristianto, G.Y., Topic, G., Aizawa, A.: Utilizing dependency relationships between math expressions in math IR. Inf. Retr. J. 20(2) (2017). Scholar
  8. 8.
    Meuschke, N., Gipp, B., Breitinger, C.: CitePlag: a citation-based plagiarism detection system prototype. In: Proceedings of the 5th International Plagiarism Conference, Newcastle upon Tyne (2012)Google Scholar
  9. 9.
    Meuschke, N., et al.: Analyzing mathematical content to detect academic plagiarism. In: Lim, E., et al. (eds.) Proceedings of the ACM CIKM. ACM (2017).
  10. 10.
    Meuschke, N., et al.: HyPlag: a hybrid approach to academic plagiarism detection. In: Proceedings of the SIGIR, Ann Arbor (2018)Google Scholar
  11. 11.
    Miller, B.R.: LaTeXML: A LaTeX to XML/HTML/MathML Converter. Accessed June 2018
  12. 12.
    Pagel, R., Schubotz, M.: Mathematical language processing project. In: England, M., et al. (eds.) Proceedings of the MathUI/OpenMath/ThEdu/CICM-WiP, vol. 1186 (2014)Google Scholar
  13. 13.
    Pawlik, M., Augsten, N.: RTED: a robust algorithm for the tree edit distance. In: CoRR abs/1201.0230 (2012). arXiv:1201.0230
  14. 14.
    Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: IEEE 12th ICCV, Kyoto. IEEE Computer Society, 27 September–4 October 2009.
  15. 15.
    Schubotz, M.: Augmenting mathematical formulae for more effective querying & efficient presentation. Ph.D. thesis, TU, Berlin (2017). ISBN 978-3-7450-6208-3
  16. 16.
    Schubotz, M., Krämer, L., Meuschke, N., Hamborg, F., Gipp, B.: Evaluating and improving the extraction of mathematical identifier definitions. In: Jones, G., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 82–94. Springer, Cham (2017). Scholar
  17. 17.
    Schubotz, M., et al.: Improving the representation and conversion of mathematical formulae by considering their textual context. In: Proceedings of the ACM/IEEE-CS JCDL, Fort Worth, June 2018.
  18. 18.
    Schubotz, M., et al.: Semantification of identifiers in mathematics for better math information retrieval. In: Proceedings of the 39th International ACM SIGIR, Pisa. ACM (2016). ISBN 978-1-4503-4069-4
  19. 19.
    Schubotz, M., Meuschke, N., Hepp, T., Cohl, H.S., Gipp, B.: VMEXT: a visualization tool for mathematical expression trees. In: Geuvers, H., England, M., Hasan, O., Rabe, F., Teschke, O. (eds.) CICM 2017. LNCS (LNAI), vol. 10383, pp. 340–355. Springer, Cham (2017). Scholar
  20. 20.
    Wolfram, S.: An Elementary Introduction to the Wolfram Language, 2nd edn. Wolfram Media (2017). ISBN 978-1944183059Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • André Greiner-Petter
    • 1
    Email author
  • Moritz Schubotz
    • 1
  • Howard S. Cohl
    • 2
  • Bela Gipp
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of KonstanzKonstanzGermany
  2. 2.Applied and Computational Mathematics DivisionNational Institute of Standards and TechnologyMission ViejoUSA

Personalised recommendations