DiCoMo: the digitization cost model

Article

Abstract

The estimate of digitization costs is a very difficult task. It is difficult to obtain accurate values because of the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and the times involved in the development of their contents. The common practice when we start digitizing a new collection is to set a schedule, and a firm commitment to fulfil it (both in terms of cost and deadlines), even before the actual digitization work starts. As it happens with software development projects, incorrect estimates produce delays and cause costs overdrafts. Based on methods used in Software Engineering for software development cost prediction like COCOMO and Function Points, and using historical data gathered during 5 years at the MCDL project, during the digitization of more than 12000 books, we have developed a method for time-and-cost estimates named DiCoMo (Digitization Cost Model) for digital content production in general. This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning and OCR, and undergoing human proofreading and error correction, or for the production of digital facsimiles (scanning without OCR). The accuracy of the estimates improve with time, since the algorithms can be optimized by making adjustments based on historical data gathered from previous tasks. Finally, we consider the problem of parallelizing tasks, i.e. dividing the work among a number of encoders that will work in parallel.

Keywords

Cost and time estimates Digitization Contents production DL project management 

References

  1. 1.
    Boehm B.W.: Software engineering economics. Prentice Hall, Englewood Cliffs (1981)MATHGoogle Scholar
  2. 2.
    Magazinovic, A.: Exploring cost estimation inaccuracy: why do practitioners still fail to predict the actuals? Technical report, Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden (2008)Google Scholar
  3. 3.
    Galorath, D.: Software project failure costs billions... Better estimation and planning can help. http://tinyurl.com/Galorath (2008)
  4. 4.
    Bia A., Pedreño A.: The Miguel de Cervantes Digital Library: the Hispanic Voice on the Web. LLC (Literary and Linguistic Computing) J (Oxford University Press) 16(2), 161–177 (2001)Google Scholar
  5. 5.
    Bia A.: The use of multimedia to enhance the accessibility of digital library resources: The multicultural-scope of the services offered by the Miguel de Cervantes digital library project. In: Anderson, J., Dunning, A., Fraser, M. (eds) Digital resources for the humanities 2001 and 2002: an edited selection of papers, Office for Humanities Communication, vol. 16, pp. 1–11. King’s College, London (2003)Google Scholar
  6. 6.
    Nixon, P.G.: The human function curve. Practitioner pp. 765–769; 935–944 (1976)Google Scholar
  7. 7.
    Bauer K.: Cost analysis of a project to digitize classic articles in neurosurgery. J. Med. Libr. Assoc. (JMLA) 90(2), 230–234. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC100769/ (2002)Google Scholar
  8. 8.
    Tanner, S., Smith, J.L.: Digitisation: how much does it really cost? In: Digital resources for the humanities, King’s College, London (1999)Google Scholar
  9. 9.
    Puglia, S.: The costs of digital imaging projects. RLG DigiNews 3(5). http://chnm.gmu.edu/digitalhistory/links/cached/chapter3/link3.10b.digitalimagingcosts.html (1999)
  10. 10.
    Lee S.D.: Digitization: is it worth it?. Computer Libraries 21(5), 28–31. http://www.infotoday.com/cilmag/may01/lee.htm (2001)Google Scholar
  11. 11.
    UMich-MoA: Assessing the costs of conversion: Making of America IV: the American Voice 1850–1876. http://www.lib.umich.edu/files/services/dlps/moa4costs.pdf (2001)
  12. 12.
    Winer, D.: Good practices in cost reduction for digitisation: resources for minerva and minerva plus WG on good practices. http://www.minervaeurope.org/structure/workinggroups/goodpract/costreduction/documents/wp6costreduction0904.pdf (2004)
  13. 13.
    Hammond, M., Davies, C.: Understanding the costs of digitisation: detail report. http://www.jisc.ac.uk/media/documents/programmes/digitisation/digitisation-costs-full.pdf (2009)
  14. 14.
    Research Library Group: RLG worksheet for estimating digital reformatting costs. http://www.oclc.org/research/activities/past/rlg/digimgtools/rlgworksheet.pdf (1998)
  15. 15.
    Presto-Space: Preservation project cost calculator. http://digitalpreservation.ssl.co.uk/hosted/d13.2/newcalc.php (2007)
  16. 16.
    Putnam, L.H.: A general empirical solution to the macro software sizing and estimating problem. IEEE Trans. Software Eng. SE-4(4), 345–361, This article introduces the SLIM method (1978)Google Scholar
  17. 17.
    Boehm B.W., Clark B.K., Horowitz E., Westland C., Madachy R., Selby R.: Cost models for future software life-cycle processes: COCOMO 2.0. In: Arthur, J., Henry, S. (eds) Annals of software engineering special volume on software process and product measurement, vol 1, pp. 45–60. J.C. Baltzer AG, Science Publishers, Amsterdam, The Netherlands (1995)Google Scholar
  18. 18.
    Clark, B.K., Devnani-Chulani, S., Boehm, B.W.: Calibrating the COCOMO II post-architecture model. In: 20th international conference on software engineering. Center for Software Engineering, Computer Science Department, University of Southern California, Los Angeles (1998)Google Scholar
  19. 19.
    CSE COCOMO II model definition manual: Center for software Engineering, Computer Science Department, University of Southern California, Los Angeles (1997).Google Scholar
  20. 20.
    Albrecht, A.J.: Measuring application development productivity. In: Proceedings of the Joint Share/Guide/IBM Applications Development Symposium pp.83–92 (1979)Google Scholar
  21. 21.
    Albrecht A.J., Gaffney J.E.: Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans. Software Eng. SE-9(6), 639–648 (1983)Google Scholar
  22. 22.
    Banerjee, G.: Use case points, an estimation approach (2001)Google Scholar
  23. 23.
    LCI: Use cases and function points. Longstreet Consulting Inc., Blue Springs (2004)Google Scholar
  24. 24.
    Minkiewicz A.F.: Measuring object oriented software with predictive object points. PRICE Systems, LLC (1997)Google Scholar
  25. 25.
    Valerdi, R.: The constructive systems engineering cost model (COSYSMO). Phd thesis, University of Southern California. http://csse.usc.edu/csse/TECHRPTS/PhDDissertations/files/ValerdiDissertation.pdf (2005)
  26. 26.
    Salvetto-de-León, P.F.: Modelos automatizables de estimacióuy temprana del tiempo y esfuerzo de desarrollo de sistemas de información. Phd thesis, Departamento de Lenguajes y Sistemas Informáticos e Ingeniería de Software, Universidad Politécnica de Madrid. Supervisors: Francisco Javier Segovia-Pérez, Juan Carlos Nogueira-de-León. http://oa.upm.es/367/1/PEDROSALVETTOLEON.pdf (2006)
  27. 27.
    Bia A., Muñoz R., Gómez J.: Estimating digitization costs in digital libraries using DiCoMo. Lectur Notes Comput. Sci. 6273, 136–147 (2010)CrossRefGoogle Scholar
  28. 28.
    Fairley R.E.: Software engineering concepts. McGraw Hill, New York (1985)Google Scholar
  29. 29.
    Sackman, H., et al.: Exploratory experimental studies comparing online and offline programming performance. Communications of the ACM 11(1) (1968)Google Scholar
  30. 30.
    DeMarco T., Lister T.: Peopleware, productive projects and teams. Dorset House Publishing, New York (1987)Google Scholar
  31. 31.
    Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS conference proceedings pp. 483–485 (1967)Google Scholar
  32. 32.
    Ballard J.C.: Computerized assessment of sustained attention: a review of factors affecting vigilance performance. J. Clin. Exp. Neuropsychol. 18(6), 843–863 (1996)CrossRefGoogle Scholar
  33. 33.
    Kieras, D.E., Meyer, D.E.: The role of cognitive task analysis in the application of predictive models of human performance. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.2570&rep=rep1&type=pdf (1998)

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Operating Research Center (CIO), Universidad Miguel Hernández de ElcheElcheSpain
  2. 2.Department of Languages and Information SystemsUniversidad de AlicanteAlicanteSpain

Personalised recommendations