A Bottom-Up OCR System for Mathematical Formulas Recognition

  • Wei Wu
  • Feng Li
  • Jun Kong
  • Lichang Hou
  • Bingdui Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4113)


An OCR system is presented to understand mathematical formulas in binary printed document images. The system utilizes a novel component-labeling algorithm for extracting local maximum components from image, and uses these components to locate the mathematical formulas. A character recognition algorithm based on neural networks is then adopted. For segmenting merged characters in the image, a novel segmentation algorithm based on a modified SOM neural network was introduced into the system. With the employment of LL(1) grammar, this system can convert the recognition results into a \(\mbox{\LaTeX}\) file.


Document Image Mathematical Formula Zernike Moment Optical Character Recognition System Recognition Ratio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, R.H.: Syntax-Directed Recognition of Hand-Printed Two-Dimensional Mathematics. In: Klerer, M., Reinfelds, J. (eds.) Interactive Systems for Experimental Applied Matheties, pp. 436–459. Academic Press, New York (1968)Google Scholar
  2. 2.
    Lee, H.J., Wang, J.S.: Design of a Mathematical Expression Understanding System. Pattern Recognition Letters 18(3), 289–298 (1997)CrossRefGoogle Scholar
  3. 3.
    Eto, Y., Suzuki, M.: Mathematical Formula Recognition Using Virtual Link Network. In: Proc. ICDAR, pp. 762–767 (2001)Google Scholar
  4. 4.
    Louden, K.C.: Compiler Construction: Principles and Practice. Brooks Cole (1997)Google Scholar
  5. 5.
    Kohonen, T.: The Self-Organizing Map. Proc. IEEE 78(9), 1464–1480 (1990)CrossRefGoogle Scholar
  6. 6.
    Li, F., Wu, W.: Local Maximum Component-Labeling Based on Parallel Local Operation Sequence for Layout Analysis. In: Proc. WCICA (2006) (accepted)Google Scholar
  7. 7.
    Chang, F., Chen, C.-J., Lu, C.-J.: A Linear-Time Component Labeling Algorithm Using Contour Tracing Technique. Computer Vision Image Understanding 93(2), 206–220 (2004)CrossRefGoogle Scholar
  8. 8.
    Drivas, D., Amin, A.: Page Segmentation and Classification Utilising Bottom-Up Approach. In: Proc. ICDAR, vol. 11, pp. 610–614 (1995)Google Scholar
  9. 9.
    Kacem, A., Belaid, A., Ben Ahmed, M.: EXTRAFOR: Automatic EXTRAction of Mathematical FORmulas. In: Proc. ICDAR, vol. 28, pp. 527–530 (1999)Google Scholar
  10. 10.
    Kacem, A., Belaid, A., Ben Ahmed, M.: Automatic Segmentation of Mathematical Documents. In: Proc. ACIDCA 2000, Monastir - Tunisia, pp. 86–91 (2000)Google Scholar
  11. 11.
    Mukunkan, R., Ramakrishnan, K.R.: Fast Computation of Legendre and Zernike Moments. Pattern Recognition 28(9), 1433–1442 (1995)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Oja, E., Ogawa, H.: Principal Component Analysis by Homogeneous Neural Network. IEICE. Trans. INF. & SYST.  E75-D(3) (1992)Google Scholar
  13. 13.
    Guo, L.B., Wu, W.: Recognition of Junctions in Two-Dimensional Images by Neural Networks. Journal of Dalian University of Technology 43, 548–550 (2003)zbMATHGoogle Scholar
  14. 14.
    Kong, J., Wu, W., Zhao, W.H.: Neural Networks for Recognition of Mathematical Symbols. Acta Scientiarum Naturalium Universitatis Jilinensis 3, 11–16 (2001)Google Scholar
  15. 15.
    Deng, J.S., Peng, R.R., Chen, C.S.: Open image in new window Science and Technology Typesetting Guide. Science Press, Beijing (2001)Google Scholar
  16. 16.
    Hou, L.C.: Design and Implement of Printed Mathematical Formula Recognition System. Master Degree Thesis of Dalian University of Technology (2004)Google Scholar
  17. 17.
    Hou, L.C., Wu, W., Zhu, B.D., Li, F.: A Segmentation Method for Merged Characters Using Self-Organizing Map Neural Networks (to appear)Google Scholar
  18. 18.
    Zhu, B.D.: Mathematic Expression Recognition. Master Degree Thesis of Dalian University of Technology (2005)Google Scholar
  19. 19.
    Wang, J.: Segmentation of Merged Characters by Neural Network and Shortest Path. Pattern Recognition 27(5), 649–658 (1994)CrossRefGoogle Scholar
  20. 20.
    Hou, L.C., Wu, W.: Structure Analysis of Mathematical Expressions Using LL(1) Grammar. To appear in Journal of Dalian University of Technology 46(3) (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wei Wu
    • 1
  • Feng Li
    • 1
  • Jun Kong
    • 2
  • Lichang Hou
    • 1
  • Bingdui Zhu
    • 1
  1. 1.Dept. Appl. Math.Dalian University of TechnologyDalianChina
  2. 2.Northeast Normal UniversityChangchunChina

Personalised recommendations