Journal of Grid Computing

, Volume 10, Issue 1, pp 133–149 | Cite as

Digital Preservation in Grids and Clouds: A Middleware Approach

  • Peter Wittek
  • Sándor Darányi


Digital preservation is the persistent archiving of digital assets for future access and reuse, irrespective of the underlying platform and software solutions. Existing preservation systems have a strong focus on Grids, but the advent of cloud technologies offers an attractive option. We describe a middleware system that enables a flexible choice between a Grid and a cloud for ad-hoc computations that arise during the execution of a preservation workflow and also for archiving digital objects. The choice between different infrastructures remains open during the lifecycle of the archive, ensuring a smooth switch between different solutions to accommodate the changing requirements of the organization that needs its digital assets preserved. We also offer insights on the costs, running times, and organizational issues of cloud computing, proving that the cloud alternative is particularly attractive for smaller organizations without access to a Grid or with limited IT infrastructure.


Digital preservation Grid Cloud 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allinson, J.: OAIS as a reference model for repositories. Tech. rep., UKOLN, University of Bath (2006)Google Scholar
  2. 2.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I.: Above the clouds: a Berkeley view of cloud computing. Tech. rep., EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28 (2009)Google Scholar
  3. 3.
    Ball, A.: Briefing paper – the OAIS reference model. Tech. rep., UKOLN, University of Bath (2006)Google Scholar
  4. 4.
    Barateiro, J., Antunes, G., Borbinha, J., Lisboa, P.: Addressing digital preservation: proposals for new perspectives. In: Proceedings of InDP-09, 1st International Workshop on Innovation in Digital Preservation. Austin, TX, USA (2009)Google Scholar
  5. 5.
    Barateiro, J., Antunes, G., Cabral, M., Borbinha, J., Rodrigues, R.: Using a Grid for digital preservation. In: Proceedings of ICADL-08, 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information, pp. 225–235. Kuta, Indonesia (2008)CrossRefGoogle Scholar
  6. 6.
    Beagrie, N.: Digital curation for science, digital libraries, and individuals. International Journal of Digital Curation 1(1), 3–16 (2006)Google Scholar
  7. 7.
    Bégin, M., Jones, B., Casey, J., Laure, E., Grey, F., Loomis, C., Kubli, R.: An EGEE comparative study: Grids and clouds – evolution or revolution. Tech. rep., Enabling Grids for E-sciencE-II (EGEE-II) Project Report INFSO-RI-031688 (2008)Google Scholar
  8. 8.
    Cafarella, M., Cutting, D.: Building Nutch: open source search. Queue 2(2), 54–61 (2004)CrossRefGoogle Scholar
  9. 9.
    Comuzzi, M., Kotsokalis, C., Spanoudakis, G., Yahyapour, R.: Establishing and monitoring SLAs in complex service based systems. In: Proceedings of ICWS-09, 7th International Conference on Web Services, pp. 783–790. Los Angeles, CA, USA (2009)Google Scholar
  10. 10.
    Cundiff, M.: An introduction to the Metadata Encoding and Transmission Standard (METS). Libr. Hi Tech 22(1), 52–64 (2004)CrossRefGoogle Scholar
  11. 11.
    Darányi, S., Wittek, P., Dobreva, M.: Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints. Int. J. Digit. Libr. (2011). doi: 10.1007/s00799-012-0079-y Google Scholar
  12. 12.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of OSDI-04, 6th International Symposium on Operating Systems Design & Implementation. San Francisco, CA, USA (2004)Google Scholar
  13. 13.
    Déjean, H.: Numbered sequence detection in documents. Document Recognition and Retrieval XVII 7534(1), 753,405–12 (2010)Google Scholar
  14. 14.
    Déjean, H., Meunier, J.L.: On tables of contents and how to recognize them. Int. J. Doc. Anal. Recognit. 12(1), 1–20 (2009)CrossRefGoogle Scholar
  15. 15.
    Engel, F., Klas, C., Brocks, H., Kranstedt, A., Jäschke, G., Hemmje, M.: Towards supporting context-oriented information retrieval in a scientific-archive based information lifecycle. In: Proceedings of Cultural Heritage online. Empowering users: an active role for user communities, pp. 135–140. Florence, Italy (2009)Google Scholar
  16. 16.
    Foster, I., Kesselman, C.: The Grid: Blueprint for a new Computing Infrastructure. Morgan Kaufmann (2004)Google Scholar
  17. 17.
    Foster, I., Zhao, Y., Raicu, I., Lu, S.: Cloud computing and Grid computing 360-degree compared. In: Proceedings of GCE-08, Grid Computing Environments Workshop, pp. 1–10 (2008)Google Scholar
  18. 18.
    Gospodnetic, O., Hatcher, E., et al.: Lucene in Action. Manning (2005)Google Scholar
  19. 19.
    Han, H., Giles, C., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.: Automatic document metadata extraction using support vector machines. In: Proceedings of JCDL-03, 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 37–48. Houston, TX, USA (2003)Google Scholar
  20. 20.
    Hedges, M., Blanke, T., Hasan, A.: Rule-based curation and preservation of data: A data Grid approach using iRODS. Future Gener. Comput. Syst. 25(4), 446–452 (2009)CrossRefGoogle Scholar
  21. 21.
    Hedges, M., Hasan, A., Blanke, T.: Management and preservation of research data with iRODS. In: Mitra P., Giles C., Carr L. (eds.) Proceedings of CIKM-07, 1st Workshop on CyberInfrastructure: Information Management in eScience, in conjuction with 16th Conference on Information and Knowledge Management, pp. 17–22. Lisbon, Portugal (2007)Google Scholar
  22. 22.
    Innocenti, P., Ross, S., Maceciuvite, E., Wilson, T., Ludwig, J., Pempe, W.: Assessing digital preservation frameworks: the approach of the SHAMAN project. In: Proceedings of MEDES-09, 1st International Conference on Management of Emergent Digital EcoSystems, pp. 412–416. Lyon, France (2009)Google Scholar
  23. 23.
    ISO 14721: Reference model for an Open Archival Information System (OAIS) fCCSDS 650.0-B-1 Blue book (2003)Google Scholar
  24. 24.
    Jacquin, T., Déjean, H., Chanod, J.P.: Xeproc©: a model-based approach towards document process preservation. In: Lalmas M., Jose J., Rauber A., Sebastiani F., Frommholz I. (eds.) Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, vol. 6273, pp. 538–541 (2010)Google Scholar
  25. 25.
    Knight, G., Hedges, M.: Modelling OAIS compliance for disaggregated preservation services. International Journal of Digital Curation 2(1), 62–72 (2008)CrossRefGoogle Scholar
  26. 26.
    Larson, R., Sanderson, R.: Grid-based digital libraries: Cheshire3 and distributed retrieval. In: Proceedings of JCDL-05, 5th Joint Conference on Digital Libraries, pp. 112–113. Denver, CO, USA (2005)Google Scholar
  27. 27.
    Larson, R., Sanderson, R.: Cheshire3: retrieving from tera-scale Grid-based digital libraries. In: Proceedings of SIGIR-06, 29th Annual International Conference on Research and Development in Information Retrieval, pp. 730–730. Seattle, WA, USA (2006)Google Scholar
  28. 28.
    Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan & Claypool (2010)Google Scholar
  29. 29.
    Metsch, T., Edmonds, A., Bayon, V.: Using cloud standards for interoperability of cloud frameworks. Tech. rep., SLA@SOI (2010)Google Scholar
  30. 30.
    Michael, M., Moreira, J., Shiloach, D., Wisniewski, R.: Scale-up x scale-out: a case study using Nutch/Lucene. In: Proceedings of IPDPS-07, 21st International Parallel and Distributed Processing Symposium, pp. 1–8. Long Beach, CA, USA (2007)Google Scholar
  31. 31.
    Owen, S., Anil, R., Dunning, T., Friedman, E.: Mahout in Action. Manning Publications Co (2010)Google Scholar
  32. 32.
    Phelps, T., Watry, P.: A no-compromises architecture for digital document preservation. Research and Advanced Technology for Digital Libraries pp. 266–277 (2005)Google Scholar
  33. 33.
    Phelps, T., Wilensky, R.: The multivalent browser: a platform for new ideas. In: Proceedings of DocEng-01, 1st Symposium on Document Engineering, pp. 58–67. Atlanta, GA, USA (2001)Google Scholar
  34. 34.
    Rajasekar, A., Moore, R., Hou, C., Lee, C., Marciano, R., de Torcy, A., Wan, M., Schroeder, W., Chen, S., Gilbert, L., et al.: iRODS primer: integrated rule-oriented data system. Synthesis Lectures on Information Concepts, Retrieval, and Services 2(1), 1–143 (2010)CrossRefGoogle Scholar
  35. 35.
    Rimal, B., Jukan, A., Katsaros, D., Goeleven, Y.: Architectural requirements for cloud computing systems: an enterprise cloud approach. J. Grid Computing 9(1), 3–26 (2011)CrossRefGoogle Scholar
  36. 36.
    Rings, T., Caryer, G., Gallop, J., Grabowski, J., Kovacikova, T., Schulz, S., Stokes-Rees, I.: Grid and cloud computing: opportunities for integration with the next generation network. J. Grid Comput. 7(3), 375–393 (2009)CrossRefGoogle Scholar
  37. 37.
    Sanderson, R., Watry, P.: Integrating data and text mining processes for digital library applications. In: Proceedings of JCDL-07, 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 73–79. Vancouver, Canada (2007)Google Scholar
  38. 38.
    SHAMAN Consortium: WP2.D2.3 Specification of the SHAMAN reference architecture. Tech. rep., SHAMAN (2009)Google Scholar
  39. 39.
    Skinner, K., Schultz, M.: A Guide to Distributed Digital Preservation. Educopia Institute (2010)Google Scholar
  40. 40.
    Sunderam, V.: PVM: a framework for parallel distributed computing. Concurrency: practice and experience 2(4), 315–339 (1990)CrossRefGoogle Scholar
  41. 41.
    Theilmann, W., Yahyapou, R.: SLA@SOI – SLAs empowering a dependable service economy. ERCIM News 2010(83), 16–17 (2010)Google Scholar
  42. 42.
    Tidwell, D.: XSLT: Mastering XML Transformations. O’Reilly Media, Inc. (2007)Google Scholar
  43. 43.
    Wan, M., Moore, R., Rajasekar, A.: Integration of cloud storage with data Grids. In: Proceedings of ICVCI-09, 3rd International Conference on the Virtual Computing Initiative. Research Triangle Park, NC, USA (2009)Google Scholar
  44. 44.
    Watry, P.: Digital preservation theory and application: transcontinental persistent archives testbed activity. International Journal of Digital Curation 2(2), 41–68 (2007)Google Scholar
  45. 45.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media (2009)Google Scholar
  46. 46.
    Wittek, P., Darányi, S.: Leveraging on high-performance computing and cloud technologies in digital libraries: a case study. In: Proceedings of HPCCloud-11, Workshop on Integration and Application of Cloud Computing to High Performance Computing. Athens, Greece (2011)Google Scholar
  47. 47.
    Wittek, P., Jacquin, T., Déjean, H., Chanod, J.P., Darányi, S.: XML processing in the cloud: large-scale digital preservation in small institutions. In: Proceedings of DataCloud-11, 1st International Workshop on Data Intensive Computing in the Clouds in conjunction with the 25th IEEE International Parallel and Distributed Computing Symposium. Anchorage, AK, USA (2011)Google Scholar
  48. 48.
    Witten, I., Don, K., Dewsnip, M., Tablan, V.: Text mining in a digital library. Int. J. Digit. Libr. 4(1), 56–59 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Swedish School of Library and Information ScienceUniversity of BoråsBoråsSweden

Personalised recommendations