DiCoMo: An Algorithm Based Method to Estimate Digitization Costs in Digital Libraries
The estimate of web-content production costs is a very difficult task. It is difficult to make exact predictions due to the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and times involved in the development of their contents. As it happens with software development projects, incorrect estimates give way to delays and costs overdrafts. Based on methods used in Software Engineering for software development cost prediction like COCOMO ) and Function Points , and using historical data gathered during five years of work at the Miguel de Cervantes Digital Library, where more than 12.000 books were digitized, we have refined an equation for digitization cost estimates named DiCoMo (Digitization Cost Model). This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning plus OCR and human proofreading, or the production of digital facsimiles (scanning images without OCR). The estimates done a priori are improved as the project evolves by means of adjustments based on real data obtained from previous stages of the production process. Each estimate is a refinement obtained as a result of the work done so far.