Recognition of Malayalam Documents

  • N.V. Neeba
  • Anoop Namboodiri
  • C.V. Jawahar
  • P.J. Narayanan
Chapter
Part of the Advances in Pattern Recognition book series (ACVPR)

Abstract

Malayalam is an Indian language spoken by 40 million people with its own script. It has a rich literary tradition. A character recognition system for this language will be of immense help in a spectrum of applications ranging from data entry to reading aids. The Malayalam script has a large number of similar characters making the recognition problem challenging. In this chapter, we present our approach for recognition of Malayalam documents, both printed and handwritten. Classification results as well as ongoing activities are presented.

References

  1. 1.
    Bishop Robert Caldwell: Comparative Grammar of Dravidian Languages (1875).Google Scholar
  2. 2.
    Nagy, G. and Seth, S.C.: Hierarchical Representation of Optically Scanned Documents. Proceedings of the 7th International Conference on Pattern Recognition, Montreal (1984) 347–349.Google Scholar
  3. 3.
    Ulichney, R.: Digital Halftoning. The MIT Press, Cambridge, MA, (1987).Google Scholar
  4. 4.
    Ulloor S Parameswara Iyer: Kerala Sahitya Charitram, Vol 1–5 (in Malayalam) Kerala University Press, Trivandrum, 1953.Google Scholar
  5. 5.
    Fujisawa, H., Nakano, Y., and Kurino, K.: Segmentation Methods for Character Recognition: From Segmentation to Document Structure Analysis. in Proceedings of the IEEE 80, (1992) 1079–1092.Google Scholar
  6. 6.
    Haralick, R.M.: Document Image Understanding: Geometric and Logical Layout. in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Seattle, WA (1994) pp. 385–390.Google Scholar
  7. 7.
    Jain, A.K. and Yu, B.: Document Representation and its Application to Page Decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, (1998) 294–308.CrossRefGoogle Scholar
  8. 8.
    Nagy G.: Twenty Years of Document Image Analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, (2000) 38–62.CrossRefGoogle Scholar
  9. 9.
    Trier, D., Jain, A.K., and Taxt, T.: Feature Extraction Methods for Character Recognition – A Survey. Pattern Recognition 29 (4), (1996) 641–662.CrossRefGoogle Scholar
  10. 10.
    Bagdanov, A.D. and Worring, M.: First Order Gaussian Graphs for Efficient Structure Classification. Pattern Recognition 36, (2003) 1311–1324.MATHCrossRefGoogle Scholar
  11. 11.
    Yamashita, A., Amano, T., Takahashi, I., rand Toyokawa, K.: A Model-based Layout Understanding Method for the Document Recognition System. in Proceedings of the International Conference on Document Analysis and Recognition, Saint-Malo, France (1991) pp. 130–138.Google Scholar
  12. 12.
    Kreich, J., Luhn, A., and Maderlechner, G.: An Experimental Environment for Model-Based Document Analysis. in Proceedings of the International Conference on Document Analysis and Recognition, Saint-Malo, France (1991), pp. 50–58.Google Scholar
  13. 13.
    Niyogi, D. and Srihari, S.N.: Knowledge-Based Derivation of Document Logical Structure. in Proceedings of the International Conference on Document Analysis and Recognition, Montreal, Canada (1995), pp. 472–475.Google Scholar
  14. 14.
    Mao, S. and Kanungo, T.: Empirical Performance Evaluation Methodology and its Application to Page Segmentation Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001), 242–256.CrossRefGoogle Scholar
  15. 15.
    Artires, T.: Poorly Structured Handwritten Documents Segmentation using Continuous Probabilistic Feature Grammars. in Workshop on Document Layout Interpretation and its Applications (DLIA2003).Google Scholar
  16. 16.
    Namboodiri, A.M. and Jain, A.K.: Robust Segmentation of Unconstrained On-line Handwritten Documents. in Proceedings of the Fourth Indian Conference on Computer Vision, Graphics and Image Processing, Calcutta, India (2004), 165–170.Google Scholar
  17. 17.
    Chalasani, Tejo Krishna, Namboodiri, Anoop, and Jawahar, C.V.: Support Vector Machine based Hierarchical Classifiers for Large Class Problems. in Proceedings of the sixth International Conference on Advances in Pattern Recognition, Kolkata, India (2007).Google Scholar
  18. 18.
    Sesh Kumar, K.S., Kumar, Sukesh, and Jawahar, C.V.: On Segmentation of Documents in Complex Scripts. in Proceedings of International Conference on Document Analysis and Recognition, Brazil (2007), 1243–1247.Google Scholar
  19. 19.
    Sesh Kumar, K.S., Namboodiri, Anoop M., and Jawahar, C.V.: Learning Segmentation of Documents with Complex Scripts. in Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, Madurai, India (2006), pp. 749–760.Google Scholar
  20. 20.
    Neeba, N.V. and Jawahar, C.V.: Recognition of Books by Verification and Retraining. in Proceedings of the International Conference on Pattern Recognition, Tampa, Florida (2008).Google Scholar
  21. 21.
    Alahari, Karteek, Lahari, Satya P., and Jawahar, C.V.: Discriminant Substrokes for Online Handwriting Recognition. in Proceedings of the International Conference on Document Analysis and Recognition, Seoul, Korea (2005), 499–503.Google Scholar
  22. 22.
    NIST : NIST Scientific and Technical Databases, http://www.nist.gov/srd/.
  23. 23.
    LAMP: Documents and Standards Information, http://documents.cfar.umd.edu/resources/database/
  24. 24.
    Anand Kumar, A. Balasubramanian, Anoop M. Namboodiri and C.V. Jawahar: Model-Based Annotation of Online Handwritten Datasets. in Proceedings of IWFHR-2006, October 23-26, 2006, La Baule, France.Google Scholar
  25. 25.
    Karteek Alahari, Satya Lahari Putrevu, and Jawahar, C.V.: Learning Mixtures of Offline and Online Features for Handwritten Stroke Recognition. in Proceedings of International Conference on Pattern Recognition, Hong Kong, Aug 2006, Vol. III, pp.379-382.Google Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  • N.V. Neeba
    • 1
  • Anoop Namboodiri
    • 1
  • C.V. Jawahar
    • 1
  • P.J. Narayanan
    • 1
  1. 1.Centre for Visual Information Technology, International Institute of Information TechnologyHyderabadIndia

Personalised recommendations