Multimedia Tools and Applications

, Volume 75, Issue 7, pp 3843–3877 | Cite as

A workflow management system to feed digital libraries: proposal and case study

  • Ángeles S. Places
  • Antonio Fariña
  • Miguel R. Luaces
  • Óscar Pedreira
  • Diego Seco
Article

Abstract

Building a digital library of antique documents involves not only technical implementation issues, but also aspects related to the digitization of large collections of documents. Antique documents are usually delicate and need to be handled with care. Also, a poor state of preservation and the use of unrecognizable font types make automatic text recognition more difficult, hence requiring a further human revision to perform text corrections. This makes the participation of experts in the digitization process mandatory and, therefore, costly. In this paper, we present a framework for managing the workflow of the digitization of large collections of antique documents. We describe the digitization process, and a tool supporting all of its phases and tasks. We also present a case study in which we describe how the workflow management system was applied to the digitization of more than 10,000 documents from journals of the 19th century. In addition, we describe the resulting digital library, focusing on the most important technological issues.

Keywords

Digital libraries Text retrieval Workflow management system 

References

  1. 1.
    Aalst WMP, Hee KM (2002) Workflow management: models, methods, and systems. MIT Press, Cambridge, MAGoogle Scholar
  2. 2.
    Arms, C. R. (2000), “Keeping Memory Alive: Practices for Preserving Digital Content at the National Digital Library Program of the Library of Congress”, RLG DigiNews, Vol 4 No 3, available at: http://www.rlg.org/legacy/preserv/diginews/diginews4-3.html#feature1 (accessed 11 May 2007)
  3. 3.
    Bainbridge D, Thompson J, Witten IH (2003) “Assembling and enriching library collections”, proceedings of JCDL’03: joint conference on digital libraries, May 27–31. Houston, Texas, USAGoogle Scholar
  4. 4.
    Baird HS (2003) “Digital libraries and document image analysis”, proceedings of the seventh international conference on document analysis and recognition, august 3–6. Edinburgh, UKGoogle Scholar
  5. 5.
    Banerjee J, Namboodiri A, Jawahar C (2009) “Contextual restoration of severely degraded document images”, proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2009, june 20–25. Miami, Fl, pp 517–524CrossRefGoogle Scholar
  6. 6.
    Borgman C. (2002), “Challenges in Building Digital Libraries for the 21st Century”, Proceedings of 5th International Conference on Asian Digital Libraries, ICADL 2002, December 11–14, Singapore, pp. 1–13.Google Scholar
  7. 7.
    Buchanan G, Bainbridge D, Don KJ (2005) “A New framework for building digital library collections”, proceedings of JCDL’05: joint conference on digital libraries, june 7–11. Denver, Colorado, USAGoogle Scholar
  8. 8.
    Brisaboa NR, Fariña A, Navarro G, Paramá JR (2007) “Lightweight natural language text compression”, information retrieval, 10 (1). Springer, Netherlands, pp 1–33Google Scholar
  9. 9.
    Chang, N. and Hopkinson, A. (2006), “Reskilling staff for digital libraries”, Digital Libraries: Achievements, Challenges and Opportunities, Lecture Notes in Computer Science, Vol. 4312, Springer-Verlag, Berlin, pp. 531–532.Google Scholar
  10. 10.
    CCSDS: Consultative Committee for Space Data Systems (2002), “Referente Model an Open Archival información System (OAIS)”, Available at: http://public.ccsds.org/publications/archive/650x0m1.pdf (accessed January 2014).
  11. 11.
    Cramer T, Kott K (2010) “Designing and implementing second generation digital preservation services: a scalable model for the Stanford digital repositor”, D-Lib magazine, 16 (9/10), online., Available at http://www.dlib.org/dlib/september10/cramer/09cramer.html Google Scholar
  12. 12.
    Delos (2008), “A Reference Model for Digital Library Management Systems”, Available at: http://www.delos.info/index.php?option=com_content&task=view&id=345&Itemid= (accessed January 2014).
  13. 13.
    Duguid P (1997) Report of the Santa Fe planning workshop on distributed knowledge work enviroments: digital libraries”. University of Michigan, School of InformationGoogle Scholar
  14. 14.
    Fischer L (ed) (2003) Workflow handbook 2003, workflow management coalition, future strategies. Lighthouse Point, FloridaGoogle Scholar
  15. 15.
    Hollingsworth, D. (1995), “WFMC Reference Model”. January 1995, available at: www.wfmc.org/standards/docs/tc003v11.pdf. (accessed January 2014).
  16. 16.
    Kolak O, Byrne WJ, Resnik P (2003) “A generative probabilistic OCR model for NLP applications”, proceedings of HLT-NAACL, May 27-june 1. Edmonton, CanadaGoogle Scholar
  17. 17.
    Larson R, Carson C (1999) “Information access for a digital library: Cheshire II and the Berkeley environmental digital library”, proceedings of ASIS’99, october 31- november 4. Washington D.C, USAGoogle Scholar
  18. 18.
    Library of Congress (2007), “Metadata Encoding and Transmission Standard (METS)”, available: http://www.loc.gov/standards/mets/
  19. 19.
    McCray AT, Gallagher ME (2001) “Principles for digital library development” communications of the ACM, 44 (4). ACM, NEW YORK, NY, pp 49–54Google Scholar
  20. 20.
    Moura ES, Navarro G, Ziviani N, Baeza-Yates R (2000) “Fast and flexible word searching on compressed text” ACM Transactions on Information Systems, 18 (2). ACM, NEW YORK, NY, pp 113–139Google Scholar
  21. 21.
    Mourão, H. and Antunes, P. (2003), “Workflow Recovery Framework for Exception Handling: Involving the User”, Groupware: Design, Implementation, and Use, 9th International Workshop, CRIWG 2003, Lecture Notes in Computer Science, Vol. 2806, Springer-Verlag, Berlin, pp. 159–167.Google Scholar
  22. 22.
    Navarro G, Raffinot M (2002) Flexible pattern matching in strings. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  23. 23.
    Paramá JR, Places AS, Brisaboa NR, Penabad MR (2006) “The desing of a virtual library of emblem books”, software: practice and experience, 36 (5). John Willey & Sons, Sussex, England, pp 473–494Google Scholar
  24. 24.
    Places AS, Brisaboa NR, Fariña A, Luaces MR, Paramá JR, Penabad MR (2007) “The Galician virtual library”, online information review, 31 (3). Emerald Group Publishing Limited, Yorkshire, England, pp 333–352Google Scholar
  25. 25.
    Ross, S. and M. Hedstrom (2005), “Preservation research and sustainable digital libraries”, International Journal on Digital Libraries, Vol 5 No 4, Springer, pp. 317–324.Google Scholar
  26. 26.
    Ross, S. (2014), “Digital preservation, archival science and methodological foundations for digital libraries”, New Review of Information Networking, Vol. 17, Taylor & Francis Group, pp. 43–68.Google Scholar
  27. 27.
    Sankar, K. P., Ambati, V., Pratha, L. and Jawahar, C. V. (2006), “Digitizing a Million Books: Challenges for Document Analysis”, Proceedings of Development and Application Systems, DAS 2006, Lecture Notes in Computer Science, Vol. 3872, Springer-Verlag, Berlin, pp. 425–436.Google Scholar
  28. 28.
    Van de Sompel, H. and Lagoze, C. (2000), “The Santa Fe Convention of the Open Archives Initiative”, Dlib Magazine, Vol 6 No 2, available http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html (accesed January 2014)
  29. 29.
    Witten IH, Bainbridge D (2003) How to build a digital library. Morgan Kaufmann Publishers, San Mateo, CAGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Ángeles S. Places
    • 1
  • Antonio Fariña
    • 1
  • Miguel R. Luaces
    • 1
  • Óscar Pedreira
    • 1
  • Diego Seco
    • 1
    • 2
  1. 1.Database Laboratory, Facultade de InformáticaUniversity of A CoruñaA CoruñaSpain
  2. 2.Department of Computer ScienceUniversity of ConcepciónConcepciónChile

Personalised recommendations