A workflow management system to feed digital libraries: proposal and case study
Building a digital library of antique documents involves not only technical implementation issues, but also aspects related to the digitization of large collections of documents. Antique documents are usually delicate and need to be handled with care. Also, a poor state of preservation and the use of unrecognizable font types make automatic text recognition more difficult, hence requiring a further human revision to perform text corrections. This makes the participation of experts in the digitization process mandatory and, therefore, costly. In this paper, we present a framework for managing the workflow of the digitization of large collections of antique documents. We describe the digitization process, and a tool supporting all of its phases and tasks. We also present a case study in which we describe how the workflow management system was applied to the digitization of more than 10,000 documents from journals of the 19th century. In addition, we describe the resulting digital library, focusing on the most important technological issues.
KeywordsDigital libraries Text retrieval Workflow management system
- 1.Aalst WMP, Hee KM (2002) Workflow management: models, methods, and systems. MIT Press, Cambridge, MAGoogle Scholar
- 2.Arms, C. R. (2000), “Keeping Memory Alive: Practices for Preserving Digital Content at the National Digital Library Program of the Library of Congress”, RLG DigiNews, Vol 4 No 3, available at: http://www.rlg.org/legacy/preserv/diginews/diginews4-3.html#feature1 (accessed 11 May 2007)
- 3.Bainbridge D, Thompson J, Witten IH (2003) “Assembling and enriching library collections”, proceedings of JCDL’03: joint conference on digital libraries, May 27–31. Houston, Texas, USAGoogle Scholar
- 4.Baird HS (2003) “Digital libraries and document image analysis”, proceedings of the seventh international conference on document analysis and recognition, august 3–6. Edinburgh, UKGoogle Scholar
- 6.Borgman C. (2002), “Challenges in Building Digital Libraries for the 21st Century”, Proceedings of 5th International Conference on Asian Digital Libraries, ICADL 2002, December 11–14, Singapore, pp. 1–13.Google Scholar
- 7.Buchanan G, Bainbridge D, Don KJ (2005) “A New framework for building digital library collections”, proceedings of JCDL’05: joint conference on digital libraries, june 7–11. Denver, Colorado, USAGoogle Scholar
- 8.Brisaboa NR, Fariña A, Navarro G, Paramá JR (2007) “Lightweight natural language text compression”, information retrieval, 10 (1). Springer, Netherlands, pp 1–33Google Scholar
- 9.Chang, N. and Hopkinson, A. (2006), “Reskilling staff for digital libraries”, Digital Libraries: Achievements, Challenges and Opportunities, Lecture Notes in Computer Science, Vol. 4312, Springer-Verlag, Berlin, pp. 531–532.Google Scholar
- 10.CCSDS: Consultative Committee for Space Data Systems (2002), “Referente Model an Open Archival información System (OAIS)”, Available at: http://public.ccsds.org/publications/archive/650x0m1.pdf (accessed January 2014).
- 11.Cramer T, Kott K (2010) “Designing and implementing second generation digital preservation services: a scalable model for the Stanford digital repositor”, D-Lib magazine, 16 (9/10), online., Available at http://www.dlib.org/dlib/september10/cramer/09cramer.html Google Scholar
- 12.Delos (2008), “A Reference Model for Digital Library Management Systems”, Available at: http://www.delos.info/index.php?option=com_content&task=view&id=345&Itemid= (accessed January 2014).
- 13.Duguid P (1997) Report of the Santa Fe planning workshop on distributed knowledge work enviroments: digital libraries”. University of Michigan, School of InformationGoogle Scholar
- 14.Fischer L (ed) (2003) Workflow handbook 2003, workflow management coalition, future strategies. Lighthouse Point, FloridaGoogle Scholar
- 15.Hollingsworth, D. (1995), “WFMC Reference Model”. January 1995, available at: www.wfmc.org/standards/docs/tc003v11.pdf. (accessed January 2014).
- 16.Kolak O, Byrne WJ, Resnik P (2003) “A generative probabilistic OCR model for NLP applications”, proceedings of HLT-NAACL, May 27-june 1. Edmonton, CanadaGoogle Scholar
- 17.Larson R, Carson C (1999) “Information access for a digital library: Cheshire II and the Berkeley environmental digital library”, proceedings of ASIS’99, october 31- november 4. Washington D.C, USAGoogle Scholar
- 18.Library of Congress (2007), “Metadata Encoding and Transmission Standard (METS)”, available: http://www.loc.gov/standards/mets/
- 19.McCray AT, Gallagher ME (2001) “Principles for digital library development” communications of the ACM, 44 (4). ACM, NEW YORK, NY, pp 49–54Google Scholar
- 20.Moura ES, Navarro G, Ziviani N, Baeza-Yates R (2000) “Fast and flexible word searching on compressed text” ACM Transactions on Information Systems, 18 (2). ACM, NEW YORK, NY, pp 113–139Google Scholar
- 21.Mourão, H. and Antunes, P. (2003), “Workflow Recovery Framework for Exception Handling: Involving the User”, Groupware: Design, Implementation, and Use, 9th International Workshop, CRIWG 2003, Lecture Notes in Computer Science, Vol. 2806, Springer-Verlag, Berlin, pp. 159–167.Google Scholar
- 23.Paramá JR, Places AS, Brisaboa NR, Penabad MR (2006) “The desing of a virtual library of emblem books”, software: practice and experience, 36 (5). John Willey & Sons, Sussex, England, pp 473–494Google Scholar
- 24.Places AS, Brisaboa NR, Fariña A, Luaces MR, Paramá JR, Penabad MR (2007) “The Galician virtual library”, online information review, 31 (3). Emerald Group Publishing Limited, Yorkshire, England, pp 333–352Google Scholar
- 25.Ross, S. and M. Hedstrom (2005), “Preservation research and sustainable digital libraries”, International Journal on Digital Libraries, Vol 5 No 4, Springer, pp. 317–324.Google Scholar
- 26.Ross, S. (2014), “Digital preservation, archival science and methodological foundations for digital libraries”, New Review of Information Networking, Vol. 17, Taylor & Francis Group, pp. 43–68.Google Scholar
- 27.Sankar, K. P., Ambati, V., Pratha, L. and Jawahar, C. V. (2006), “Digitizing a Million Books: Challenges for Document Analysis”, Proceedings of Development and Application Systems, DAS 2006, Lecture Notes in Computer Science, Vol. 3872, Springer-Verlag, Berlin, pp. 425–436.Google Scholar
- 28.Van de Sompel, H. and Lagoze, C. (2000), “The Santa Fe Convention of the Open Archives Initiative”, Dlib Magazine, Vol 6 No 2, available http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html (accesed January 2014)
- 29.Witten IH, Bainbridge D (2003) How to build a digital library. Morgan Kaufmann Publishers, San Mateo, CAGoogle Scholar