Digitizing a Million Books: Challenges for Document Analysis

  • K. Pramod Sankar
  • Vamshi Ambati
  • Lakshmi Pratha
  • C. V. Jawahar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)


This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the on-going activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.


Digital Library Document Image Digital Content Digitization Process Indian Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Digital Library of India at,
  2. 2.
    Universal Library at, Google Scholar
  3. 3.
    Reddy, R.: The universal library: Intelligent agents and information on demand. In: ADL, pp. 27–34 (1995)Google Scholar
  4. 4.
    Million Book Project, at (2001),
  5. 5.
    Lesk, M.E.: Understanding Digital Libraries, 2nd edn. Morgan Kaufmann, San Francisco (2004)Google Scholar
  6. 6.
    Baird, H.S., Govindaraju, V. (eds.): 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), Palo Alto, CA, USA, January 23-24. DIAL. IEEE Computer Society, Los Alamitos (2004)Google Scholar
  7. 7.
    Baird, H.S., Govindaraju, V., Lopresti, D.P.: Document analysis systems architectures for digital libraries. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Cole, T.W.: Creating a framework of guidance for building good digital collections. First Monday 7 (2002)Google Scholar
  9. 9.
    Baird, H.S.: Digital libraries and document image analysis. In: IAPR 7th Int’l Conf. on Document Analysis and Recognition, Edinburgh, Scotland (2003)Google Scholar
  10. 10.
    Workshop Proceedings, Tools and Resources for Digital Library, IIIT-Hyderabad (2005)Google Scholar
  11. 11.
    Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley Longman Publishing Co., Inc., Boston (2001)Google Scholar
  12. 12.
    Ambati, V., Sankar, K.P., Pratha, L., Jawahar, C.V.: Quality management in digital libraries. In: International Conference on Universal Digital Library. Zhejiang University Press, Hangzhou, P.R.China (2005)Google Scholar
  13. 13.
    Frommholz, I., Knezevic, P., Mehta, B., Niedere, C., Risse, T., Thiel, U.: Supporting information access in next generation digital library architectures. In: Agosti, M., Schek, H.J., Trker, C. (eds.) Digital Library Architectures: Peer-to-Peer, Grid, and Service-Orientation. Proceedings of the Sixth Thematic Workshop of the EU Network of Excellence DELOS, Cagliari, Italy, pp. 49–60 (2004)Google Scholar
  14. 14.
    Kompalli, S., Nayak, S., Setlur, S., Govindaraju, V.: Cahllenges in OCR of devanagari documents. In: IAPR 8th Int’l Conf. on Document Analysis and Recognition, Seoul, Korea (2005)Google Scholar
  15. 15.
    Baird, H.S., Lopresti, D.P., Davidson, B., Pottenger, W.: Robust document image understanding techniques. In: 1st ACM Hardcopy Document Processing Workshop (HDP 2004), Washington, DC, pp. 9–14 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • K. Pramod Sankar
    • 1
  • Vamshi Ambati
    • 2
  • Lakshmi Pratha
    • 1
  • C. V. Jawahar
    • 1
  1. 1.Regional Mega Scanning CentreInternational Institute of Information TechnologyHyderabadIndia
  2. 2.Institute for Software Research InternationalCarnegie Mellon UniversityUSA

Personalised recommendations