Advertisement

Stress-Testing General Purpose Digital Library Software

  • David Bainbridge
  • Ian H. Witten
  • Stefan Boddie
  • John Thompson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5714)

Abstract

DSpace, Fedora, and Greenstone are three widely used open source digital library systems. In this paper we report on scalability tests performed on these tools by ourselves and others. These range from repositories populated with synthetically produced data to real world deployment with content measured in millions of items. A case study is presented that details how one of the systems performed when used to produce fully-searchable newspaper collections containing in excess of 20 GB of raw text (2 billion words, with 60 million unique terms), 50 GB of metadata, and 570 GB of images.

Keywords

Digital Library Unique Term Newspaper Page Server Response Time Digital Library System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lagoze, C., Payette, S., Shin, E., Wilper, C.: Fedora: an architecture for complex objects and their relationships. International Journal on Digital Libraries 6(2), 124–138 (2006)CrossRefGoogle Scholar
  2. 2.
    Littman, J.: Technical approach and distributed model for validation of digital objects. D-Lib Magazine 12(5) (2006)Google Scholar
  3. 3.
    Misr, D., Seamans, J., Thoma, G.R.: Testing the scalability of a DSpace-based archive. Technical report, National Library of Medicine, Bethesda, Maryland, USA (2007)Google Scholar
  4. 4.
    Payette, S., Lagoze, C.: Flexible and extensible digital object and repository architecture (FEDORA). In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 41–59. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Reynaert, M.: Non-interactive OCR post-correction for giga-scale digitization projects. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 617–630. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Smith, M., Bass, M., McClella, G., Tansley, R., Barton, M., Branschofsky, M., Stuve, D., Walker, J.: DSpace: An open source dynamic digital repository. D-Lib Magazine 9(1) (2003), doi:10.1045/january2003-smithGoogle Scholar
  7. 7.
    Witten, I.H., Bainbridge, D.: A retrospective look at greenstone: lessons from the first decade. In: JCDL 2007: Proceedings of the 2007 conference on Digital libraries, pp. 147–156. ACM Press, New York (2007)CrossRefGoogle Scholar
  8. 8.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images, 2nd edn. Morgan Kaufmann, San Francisco (1999)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • David Bainbridge
    • 1
  • Ian H. Witten
    • 1
  • Stefan Boddie
    • 2
  • John Thompson
    • 2
  1. 1.Department of Computer ScienceUniversity of WaikatoHamiltonNew Zealand
  2. 2.DL Consulting LtdHamiltonNew Zealand

Personalised recommendations