DL Architecture for Indic Scripts

  • Suryaprakash Kompalli
  • Srirangaraj Setlur
  • Venugopal Govindaraju
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)


In this study, we outline computational issues in the design of a Digital Library (DL) for Indic languages. The complicated character structure of Indic scripts entails novel OCR analysis techniques and user interface (UI) designs. This paper describes a multi-tier software architecture, which provides text and image processing tools as independent, reusable entities. Techniques for measuring and evaluating different stages of an Indic script recognition engine are outlined.


Digital Library Compound Character Image Processing Tool Document Image Analysis Qwerty Keyboard 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    The xml version of the tei guidelines. February 24 (2004),
  2. 2.
    Allen, R.B., Schalow, J.: Metadata and data structures for the historical newspaper digital library. In: Proceedings of the 8th international conference on Information and knowledge management, pp. 147–153 (1999)Google Scholar
  3. 3.
    Ashwin, T., Sastry, P.: A font and size independent ocr system for printed kannada documents using support vector machines. Sadhana 27, 35–58 (2002)CrossRefGoogle Scholar
  4. 4.
    Baird, H., Ho, T.K.: Large-scale simulation studies in image pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(10), 1067–1079 (1997)CrossRefGoogle Scholar
  5. 5.
    Bansal, V.: Integrating knowledge sources in devanagari text recognition. IEEE Transactions on Systems, Man and Cybernetics Part A 30(4), 500–505 (2000)CrossRefGoogle Scholar
  6. 6.
    Bazzi, I., Schwartz, R., Makhoul, J.: An omnifont open-vocabulary ocr system for english and arabic. IEEE Pattern Analysis and Machine Intelligence 21(6), 495–504 (1999)CrossRefGoogle Scholar
  7. 7.
    Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., Liberman, M.: Atlas: A flexible and extensible architecture for linguistic annotation. In: Proceedings of the Second International Language Resources and Evaluation Conference, pp. 1699–1706 (2000)Google Scholar
  8. 8.
    Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E.: Extensible markup language (xml) 1.0, second edition (2001)Google Scholar
  9. 9.
    Chaudhuri, B., Pal, U.: An ocr system to read two indian language scripts: Bangla and devanagari. In: Proceedings of the 4th International Conference on Document Analysis and Recognition, pp. 1011–1015 (1997)Google Scholar
  10. 10.
    Chaudhuri, B., Pal, U., Mitra, M.: Automatic recognition of printed oriya script. In: Proceedings of the 6th International Conference on Document Analysis and Recognition, pp. 795–799 (2001)Google Scholar
  11. 11.
    Consortium, U.: The Unicode Standard Version 4.0. Addison-Wesley, Reading (2003)Google Scholar
  12. 12.
    Couasnon, B., Camillerapp, J., Leplumey, I.: Making handwritten archives documents accessible to public with a generic system of document image analysis. In: Proceedings of the 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), pp. 270–277 (2004)Google Scholar
  13. 13.
    Daniels, P.T., Bright, W.: The World’s Writing Systems, March 1996. Oxford University Press, Oxford (1996)Google Scholar
  14. 14.
    Govindaraju, V., Khedekar, S., Kompalli, S., Farooq, F., Setlur, S., Prasad, V.: Tools for enabling digital access to multilingual indic documents. In: Proceedings of the 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), pp. 122–133 (2004)Google Scholar
  15. 15.
    Kompalli, S., Setlur, S., Govindaraju, V., Vemulapati, R.: Creation of data resources and design of an evaluation test bed for devanagari script recognition. In: Proceedings of the 13th International Workshop on Research Issues on Data Engineering: Multi-lingual Information Management, pp. 55–61 (2003)Google Scholar
  16. 16.
    Lee, C., Kanungo, T.: The architecture of trueviz:a groundtruth/metadata editing and visualizing toolkit. PR 36(3), 811–825 (2003)Google Scholar
  17. 17.
    Ma, H., Doermann, D.: Adaptive hindi ocr using generalized hausdorff image comparison. ACM Transactions on Asian Language Information Processing 26(2), 198–213 (2003)Google Scholar
  18. 18.
    Mao, S., Kanungo, T.: Software architecture of pset: A page segmentation evaluation toolkit. International Journal on Document Analysis and Recognition (IJDAR) 4(3), 205–217 (2002)CrossRefGoogle Scholar
  19. 19.
    Microsoft, C.: Windows glyph processing, February 24 (2004),
  20. 20.
    Negi, A., Bhagvati, C., Krishna, B.: An ocr system for telugu. In: Proceedings of the 6th International Conference on Document Analysis and Recognition, pp. 1110–1114 (2001)Google Scholar
  21. 21.
    B. of Indian Standards. Indian script code for information interchange (1999)Google Scholar
  22. 22.
    I. Sun Microsystems. Solaris 9 operating system features and benefits - compatibility, February 24 (2004),

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Suryaprakash Kompalli
    • 1
  • Srirangaraj Setlur
    • 1
  • Venugopal Govindaraju
    • 1
  1. 1.CEDAR, UB CommonsAmherstUSA

Personalised recommendations