Towards Machine-Actionable Modules of a Digital Mathematics Library

The Example of DML-CZ
  • Michal Růžička
  • Petr Sojka
  • Vlastimil Krejčíř
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7961)

Abstract

Publishing and archiving mathematical literature presents its own sets of problems. Reaching the goal of building global digital mathematics library (DML), smaller DMLs play an inevitable role in collecting, validating, digitizing and checking data from smaller publishers.

In this paper, we overview the technical challenges of building a machine-actionable set of modules we have developed over almost a decade of evolution of the Czech Digital Mathematics Library (DML-CZ). Firstly, we survey methods of effective automated data acquisition from the content providers. Then we show OCR processing of mathematical documents and automated segmentation of plain text references for metadata enhancement and effective DOI look up. Finally we describe connection to the European Digital Mathematics Library (EuDML) project and public interfaces of DML-CZ for the best visibility and accessibility.

Keywords

DML-CZ EuDML DOI ParsCit references validation DSpace OAI-PMH TeX LaTeX Tralics Infty machine-actionable digital library library automation Google Scholar webometrics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Aus+10]
    Ausbrooks, R., et al.: Mathematical Markup Language (MathML). Version 3.0. W3C Recommendation. World Wide Web Consortium (W3C) (October 21, 2010), Carlisle, D., Ion, P., Miner, R. (eds.), http://www.w3.org/TR/2010/REC-MathML3-20101021/ (visited on January 06, 2013)
  2. [BKŠ08]
    Bartošek, M., Kovář, P., Šárfy, M.: DML-CZ Metadata Editor: Content Creation System for Digital Libraries. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 139–151. Masaryk University, Birmingham (2008) ISBN: 978-80-210-4658-0, http://dml.cz/dmlcz/702537 (visited on January 09, 2013)
  3. [CGK08]
    Councill, I.G., Lee Giles, C., Kan, M.-Y.: ParsCit: An open-source CRF reference string parsing package. In: Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morocco (May 2008), http://www.comp.nus.edu.sg/~kanmy/papers/lrec08b.pdf (visited on March 13, 2013)
  4. [Dig08]
    Digital Archive of Journal Articles National Center for Biotechnology Information (NCBI) and National Library of Medicine (NLM). NCBI Book Tag Library version 3.0 (November 2008), http://dtd.nlm.nih.gov/book/
  5. [Gri10]
    Grimm, J.: Producing MathML with Tralics. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 105–117. Masaryk University, Paris (2010) ISBN: 978-80-210-5242-0, http://dml.cz/dmlcz/702579 (visited on January 09, 2013)
  6. [Kre08]
    Krejčíř, V.: Building Czech Digital Mathematics Library upon DSpace System. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 117–126. Masaryk University, Birmingham (2008) ISBN: 978-80-210-4658-0, http://dml.cz/dmlcz/702539 (visited on January 09, 2013)
  7. [LNK10]
    Luong, M.-T., Nguyen, T.D., Kan, M.-Y.: Logical Structure Recovery in Scholarly Articles with Rich Document Features. International Journal of Digital Library Systems 4, 1–23 (2010), http://www.comp.nus.edu.sg/~kanmy/papers/ijdls-SectLabel.pdf, doi: 10.4018/jdls.2010100101 (visited on March 13, 2013)
  8. [NIS12]
    National Information Standards Organization NISO. JATS: Journal Article Tag Suite, ANSI/NISO Z39.96-2012 (August 2012), http://jats.niso.org/
  9. [RS10]
    Růžička, M., Sojka, P.: Data Enhancements in a Digital Mathematics Library. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 69–76. Masaryk University, Paris (2010) ISBN: 978-80-210-5242-0, http://dml.cz/dmlcz/702575 (visited on January 13, 2013)
  10. [RS11]
    Růžička, M., Sojka, P.: Redakční systém odborného časopisu s podporou exportu do digitální knihovny v MathML. In: Zpravodaj CSTUG, pp. 4–20 (January 2011), doi:10.5300/2011-1/4Google Scholar
  11. [Růž08]
    Růžička, M.: Automated Processing of TeX-typeset Articles for a Digital Library. In: Sojka, P. (ed.): Towards a Digital Mathematics Library, pp. 167–176. Masaryk University, Birmingham (2008) ISBN: 978-80-210-4658-0, http://dml.cz/dmlcz/702533 (visited on January 13, 2013)
  12. [SL11]
    Sojka, P., Líška, M.: The Art of Mathematics Retrieval. In: Proceedings of the ACM Conference on Document Engineering, DocEng 2011, pp. 57–60. ACM, Mountain View (2011) ISBN: 978-1-4503-0863-2, doi: 10.1145/2034691.2034703Google Scholar
  13. [Soj08]
    Sojka, P. (ed.): Towards a Digital Mathematics Library. Masaryk University, Birmingham (2008) ISBN: 978-80-210-4658-0, http://dml.cz/dmlcz/702564 (visited on January 13, 2013)
  14. [Soj10]
    Sojka, P. (ed.): Towards a Digital Mathematics Library. Masaryk University, Paris (2010) ISBN: 978-80-210-5242-0, http://dml.cz/dmlcz/702567 (visited on January 13, 2013)
  15. [Suz+03]
    Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY–An integrated OCR system for mathematical documents. In: Vanoirbeek, C., Roisin, C., Munson, E. (eds.) Proceedings of ACM Symposium on Document Engineering, pp. 95–104. ACM, Grenoble (2003)Google Scholar
  16. [Syl+10]
    Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiński, A., Sojka, P.: EuDML–Towards the European Digital Mathematics Library. In: Sojka, P. (ed.) Towards a Digital Mathematics Library, pp. 11–24. Masaryk University, Paris (2010) ISBN: 978-80-210-5242-0, http://dml.cz/dmlcz/702569 (visited on January 13, 2013)

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Michal Růžička
    • 1
    • 2
  • Petr Sojka
    • 1
  • Vlastimil Krejčíř
    • 1
    • 2
  1. 1.Faculty of InformaticsMasaryk UniversityBrnoCzech Republic
  2. 2.Institute of Computer ScienceMasaryk UniversityBrnoCzech Republic

Personalised recommendations