Retrieval from Document Image Collections

  • A. Balasubramanian
  • Million Meshesha
  • C. V. Jawahar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)


This paper presents a system for retrieval of relevant documents from large document image collections. We achieve effective search and retrieval from a large collection of printed document images by matching image features at word-level. For representations of the words, profile-based and shape-based features are employed. A novel DTW-based partial matching scheme is employed to take care of morphologically variant words. This is useful for grouping together similar words during the indexing process.The system supports cross-lingual search using OM-Trans transliteration and a dictionary-based approach. System-level issues for retrieval (eg. scalability, effective delivery etc.) are addressed in this paper.


  1. 1.
    Digital Library of India at,
  2. 2.
    Doermann, D.: The Indexing and Retrieval of Document Images: A Survey. Computer Vision and Image Understanding (CVIU) 70, 287–298 (1998)CrossRefGoogle Scholar
  3. 3.
    Chaudhury, S., Sethi, G., Vyas, A., Harit, G.: Devising Interactive Access Techniques for Indian Language Document Images. In: Proc. of the Seventh International Conference on Document Analysis and Recognition (ICDAR), pp. 885–889 (2003)Google Scholar
  4. 4.
    Rath, T., Manmatha, R.: Features for Word Spotting in Historical Manuscripts. In: Proc. of the Seventh International Conference on Document Analysis and Recognition (ICDAR), pp. 218–222 (2003)Google Scholar
  5. 5.
    Rath, T., Manmatha, R.: Word Image Matching Using Dynamic Time Warping. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) 2, 521–527 (2003)Google Scholar
  6. 6.
    Jawahar, C.V., Meshesha, M., Balasubramanian, A.: Searching in Document Images. In: Proc. of the 4th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), pp. 622–627 (2004)Google Scholar
  7. 7.
    Department of Information Technology: Technology Development for Indian Languages at,
  8. 8.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Willey & Sons, New York (2001)zbMATHGoogle Scholar
  9. 9.
    Indian Language Transliteration at,
  10. 10.
    Vossen, P., Fellbaum, C.: The Global WordNet Association at,
  11. 11.
    Universal Language Dictionary Project (2003), at
  12. 12.
    Greenstone Digital Library Software at,

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • A. Balasubramanian
    • 1
  • Million Meshesha
    • 1
  • C. V. Jawahar
    • 1
  1. 1.Centre for Visual Information TechnologyInternational Institute of Information TechnologyHyderabadIndia

Personalised recommendations