Advertisement

Mathematical Symbol Indexing

  • Simone Marinai
  • Beatrice Miotti
  • Giovanni Soda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5883)

Abstract

This paper addresses the indexing and retrieval of mathematical symbols from digitized documents. The proposed approach exploits Shape Contexts (SC) to describe the shape of mathematical symbols. Indexed symbols are represented with a vector space-based method that is grounded on SC clustering. We explore the use of the Self Organizing Map (SOM) to perform the clustering and we compare several approaches to compute the SCs. The retrieval performance are measured on a large collection of mathematical symbols gathered from the widely used INFTY database.

Keywords

Digital Library Visual Word Document Image Optical Character Recognition Shape Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty: an integrated ocr system for mathematical documents. In: DocEng 2003: Proceedings of the, ACM symposium on Document engineering, pp. 95–104. ACM, New York (2003)CrossRefGoogle Scholar
  2. 2.
    Garain, U., Chaudhuri, B.B., Chaudhuri, A.R.: Identification of embedded mathematical expressions in scanned documents. In: ICPR, vol. 01, pp. 384–387 (2004)Google Scholar
  3. 3.
    Chan, K.F., Yeung, D.Y.: Mathematical expression recognition: a survey. IJDAR 3(1), 3–15 (2000)CrossRefGoogle Scholar
  4. 4.
    Guo, Y., Huang, L., Liu, C., Jiang, X.: An automatic mathematical expression understanding system. In: ICDAR 2007: Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 719–723. IEEE Computer Society, Los Alamitos (2007)CrossRefGoogle Scholar
  5. 5.
    Anil, K.J., Bin, Y.: Document representation and its application to page decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 294–308 (1998)CrossRefGoogle Scholar
  6. 6.
    Chang, T.Y., Takiguchi, Y., Okada, M.: Physical structure segmentation with projection profile for mathematic formulae and graphics in academic paper images. In: ICDAR 2007: Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 1193–1197. IEEE Computer Society Press, Los Alamitos (2007)CrossRefGoogle Scholar
  7. 7.
    Takiguchi, Y., Okada, M., Miyake, Y.: A study on character recognition error correction at higher level recognition step for mathematical formulae understanding. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 2, pp. 966–969 (2006)Google Scholar
  8. 8.
    Garain, U., Chaudhuri, B.B.: An Approach for Recognition and Interpretation of Mathematical Expressions in Printed Document. PAA, 120–131 (2000)Google Scholar
  9. 9.
    Toyota, S., Uchida, S., Suzuki, M.: Structural analysis of mathematical formulae with verification based on formula description grammar. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 153–163. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Garain, U., Chaudhuri, B.: A syntactic approach for processing mathematical expressions in printed documents. In: ICPR, vol. 04, p. 4523 (2000)Google Scholar
  11. 11.
    Kanahori, T., Suzuki, M.: Refinement of digitized documents through recognition of mathematical formulae. In: DIAL 2006: Proceedings of the Second International Conference on Document Image Analysis for Libraries, Washington, DC, USA, pp. 297–302. IEEE Computer Society Press, Los Alamitos (2006)CrossRefGoogle Scholar
  12. 12.
    Youssef, A.M.: Roles of math search in mathematics. In: Borwein, J.M., Farmer, W.M. (eds.) MKM 2006. LNCS (LNAI), vol. 4108, pp. 2–16. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Kohlhase, M., Sucan, I.: A search engine for mathematical formulae. In: Calmet, J., Ida, T., Wang, D. (eds.) AISC 2006. LNCS (LNAI), vol. 4120, pp. 241–253. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(4), 509–522 (2002)CrossRefGoogle Scholar
  15. 15.
    Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: MIR 2007: Proceedings of the international workshop on Workshop on multimedia information retrieval, pp. 197–206. ACM, New York (2007)CrossRefGoogle Scholar
  16. 16.
    Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: Infty: an integrated ocr system for mathematical documents. In: DocEng 2003: Proceedings of the, ACM symposium on Document engineering, pp. 95–104. ACM, New York (2003)CrossRefGoogle Scholar
  17. 17.
    Marinai, S., Miotti, B., Soda, G.: Mathematical symbol indexing using topologically ordered clusters of shape contexts. In: Int’l Conference on Document Analysis and Recognition, pp. 1041–1045 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Simone Marinai
    • 1
  • Beatrice Miotti
    • 1
  • Giovanni Soda
    • 1
  1. 1.Dipartimento di Sistemi e InformaticaUniversity of FlorenceItaly

Personalised recommendations