Advertisement

Abramowitz and Stegun – A Resource for Mathematical Document Analysis

  • Alan P. Sexton
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7362)

Abstract

In spite of advances in the state of the art of analysis of mathematical and scientific documents, the field is significantly hampered by the lack of large open and copyright free resources for research on and cross evaluation of different algorithms, tools and systems.

To address this deficiency, we have produced a new, high quality scan of Abramowitz and Stegun’s Handbook of Mathematical Functions and made it available on our web site. This text is fully copyright free and hence publicly and freely available for all purposes, including document analysis research. Its history and the respect in which scientists have held the book make it an authoritative source for many types of mathematical expressions, diagrams and tables.

The difficulty of building an initial working document analysis system is a significant barrier to entry to this research field. To reduce that barrier, we have added intermediate results of such a system to the web site, so that research groups can proceed on research challenges of interest to them without having to implement the full tool chain themselves. These intermediate results include the full collection of connected components, with location information, from the text, a set of geometric moments and invariants for each connected component, and segmented images for all plots.

Keywords

Ground Truth Document Image Optical Character Recognition Ground Truth Data Moment Invariant 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. US Government Printing Office, Washington, 10th printing, with corrections (December 1972)Google Scholar
  2. 2.
    Ashida, K., Okamoto, M., Imai, H., Nakatsuka, T.: Performance evaluation of a mathematical formula recognition system with a large scale of printed formula images. In: International Workshop on Document Image Analysis for Libraries, pp. 320–331 (2006), http://doi.ieeecomputersociety.org/10.1109/DIAL.2006.30
  3. 3.
    Boisvert, R.F., Lozier, D.W.: Handbook of mathematical functions. In: Lide, D.R. (ed.) A Century of Excellence in Measurements Standards and Technology, pp. 135–139. CRC Press (2001), http://nvl.nist.gov/pub/nistpubs/sp958-lide/135-139.pdf
  4. 4.
    Cheriet, M., Kharma, N., Liu, C.L., Suen, C.Y.: Character Recognition Systems — A Guide for Students and Practitioners. Wiley & Sons Ltd., Hoboken (2007)Google Scholar
  5. 5.
    Cornell University Library (2012), http://www.arxiv.org
  6. 6.
    Flusser, J., Suk, T., Zitová, B.: Moments and Moment Invariants in Pattern Recognition. Wiley & Sons Ltd., Chichester (2009)zbMATHCrossRefGoogle Scholar
  7. 7.
    Fuda, T., Omachi, S., Aso, H.: Recognition of line graph images in documents by tracing connected components. Trans. IEICE J86-D-II(6), 825–835 (2003)Google Scholar
  8. 8.
    Fujiyoshi, A., Suzuki, M., Uchida, S.: Verification of Mathematical Formulae Based on a Combination of Context-Free Grammar and Tree Grammar. In: Autexier, S., Campbell, J., Rubio, J., Sorge, V., Suzuki, M., Wiedijk, F. (eds.) AISC/Calculemus/MKM 2008. LNCS (LNAI), vol. 5144, pp. 415–429. Springer, Heidelberg (2008), http://dx.doi.org/10.1007/978-3-540-85110-3_35 CrossRefGoogle Scholar
  9. 9.
    Garain, U., Chaudhuri, B.B.: Ground truth datasets of mathematics, http://www.isical.ac.in/~utpal/resources.php
  10. 10.
    Garain, U., Chaudhuri, B.B.: A corpus for OCR research on mathematical expressions. IJDAR 7(4), 241–259 (2005), http://dx.doi.org/10.1007/s10032-004-0140-5 CrossRefGoogle Scholar
  11. 11.
    Hu, M.K.: Visual pattern recognition by moment invariants. IRE Transactions on Information Theory 8(2), 179–187 (1962)zbMATHCrossRefGoogle Scholar
  12. 12.
    Miller, B.: Personal communication (2011)Google Scholar
  13. 13.
    Mukundan, R., Ramakrishnan, K.: Moment Functions in Image Analysis. World Scientific, Singapore (1998)zbMATHCrossRefGoogle Scholar
  14. 14.
    Phillips, I., Chanda, B., Haralick, R.: UW-III english/technical document image database. University of Washington (2000), http://www.science.uva.nl/research/dlia/datasets/uwash3.html
  15. 15.
    Stamerjohanns, H., Kohlhase, M.: Transforming the arχiv to XML. In: Autexier, S., Campbell, J., Rubio, J., Sorge, V., Suzuki, M., Wiedijk, F. (eds.) AISC/Calculemus/MKM 2008. LNCS (LNAI), vol. 5144, pp. 574–582. Springer, Heidelberg (2008), http://dx.doi.org/10.1007/978-3-540-85110-3_46 CrossRefGoogle Scholar
  16. 16.
    Suzuki, M., Uchida, S., Nomura, A.: A ground-truthed mathematical character and symbol image database. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 675–679 (2005), http://doi.ieeecomputersociety.org/10.1109/ICDAR.2005.14
  17. 17.
    Takagi, N.: On consideration of a pattern recognition method for mathematical graphs with broken lines. In: International Workshop on Digitization and E-Inclusion in Mathematics and Science (DEIMS 2012), Tokyo, pp. 43–51 (2012)Google Scholar
  18. 18.
    The Infty Project: InftyCDB-1–3, InftyMDB-1 (2009), http://www.inftyproject.org/en/database.html
  19. 19.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press (2009)Google Scholar
  20. 20.
    Yampolskiy, R.: Feature Extraction Approaches for Optical Character Recognition. Briviba Scientific Press, Rochester (2007)Google Scholar
  21. 21.
    Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. International Journal on Document Analysis and Recognition, 1–27 (2012), http://dx.doi.org/10.1007/s10032-011-0174-4

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Alan P. Sexton
    • 1
  1. 1.School of Computer ScienceUniversity of BirminghamUK

Personalised recommendations