Skip to main content

Word Spotting for Indic Documents to Facilitate Retrieval

  • Chapter
  • First Online:

Part of the book series: Advances in Pattern Recognition ((ACVPR))

Abstract

With advances in the field of digitization of printed documents and several mass digitization projects underway, information retrieval and document search have emerged as key research areas. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts has hampered information extraction from a large body of documents of cultural and historical importance. This chapter presents two relevant topics in this area. First, we describe the use of a script-specific keyword spotting for Devanagari documents that makes use of domain knowledge of the script. Second, we address the needs of a digital library to provide access to a collection of documents from multiple scripts. This requires intelligent solutions which scale across different scripts. We present a script-independent keyword spotting approach for this purpose. Experimental results illustrate the efficacy of our methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. N. R. Howe, T. M. Rath and R. Manmatha. Boosted decision trees for word recognition in handwritten document retrievals. In Proceedings of the SIGIR, pp. 377–383, 2005.

    Google Scholar 

  2. D. R. Lee, W. Y. Kim and I. S. Oh. Hangul document image retrieval system using rank-based recognition. In Proceedings of the International Conference on Document Analysis and Recognition, vol. 2, pp. 615–619, 2005.

    Google Scholar 

  3. T. M. Rath, R. Manmatha and V. Layrenko. A search engine for historical manuscripts. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004.

    Google Scholar 

  4. M. Burl and P. Perona. Using hierarchical shape models to spot keywords in cursive handwriting. In IEEECS Conference on Computer Vision and Pattern Recognition, pp. 535–540, 1998.

    Google Scholar 

  5. J. L. Decurtins and E. C. Chen. Keyword spotting via word shape recognition. In Proceedings of SPIE Document Recognition II, L. M. Vincent, H. S. Baird; Eds., vol. 2422, pp. 270–277, 1995.

    Google Scholar 

  6. T. M. Rath and R. Manmatha. Word image matching using dynamic time warping. In Proceedings of the Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 521–527, 2003.

    Google Scholar 

  7. H. Cao and V. Govindaraju. Template-free word spotting in low-quality manuscripts. In Proceedings of the 6th International Conference on Advances in Pattern Recognition, pp. 135–139, 2007.

    Google Scholar 

  8. T. Rath and R. Manmatha. Features for word spotting in historical manuscripts. In Proceedings of the 7th International Conference on Document Analysis and Recognition, pages 218–222, 2003.

    Google Scholar 

  9. S. N. Srihari, H. Srinivasan, C. Huang and S. Shetty. Spotting words in Latin, Devanagari and Arabic scripts. Vivek: Indian Journal of Artificial Intelligence, Vol. 16, no. 3, pp. 2–9, 2006.

    Google Scholar 

  10. A. Bhardwaj, S. Kompalli, S. Setlur and V. Govindaraju. An OCR based approach to word spotting in Devanagari documents. In Proceedings of the 15th SPIE – Document Recognition and Retrieval, vol. 6815, 2008.

    Google Scholar 

  11. C.-H. Teh and R. T. Chin On image analysis by the methods of moments. IEEE Trans actions on Pattern Analysis and Machine Intelligence, 10(4), 496–513, 1988.

    Article  MATH  Google Scholar 

  12. Franz L. Alt. Digital pattern recognition by moments. The Journal of the ACM, 9(2), 240–258, 1962.

    Article  Google Scholar 

Download references

Acknowledgment

This material is based upon work supported by the National Science Foundation under grant no. IIS-0112059, IIS-0535038, and IIS-0849511.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anurag Bhardwaj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag London Limited

About this chapter

Cite this chapter

Bhardwaj, A., Setlur, S., Govindaraju, V. (2009). Word Spotting for Indic Documents to Facilitate Retrieval. In: Govindaraju, V., Setlur, S. (eds) Guide to OCR for Indic Scripts. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-330-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-330-9_15

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-329-3

  • Online ISBN: 978-1-84800-330-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics