Towards Very Large Scale Digital Library Building in Greenstone Using Parallel Processing

  • John Thompson
  • David Bainbridge
  • Hussein Suleman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7008)

Abstract

As very large digital library collections become more commonplace, software tools must adapt appropriately. This paper reports on an evolution of the Greenstone Digital Library software to support parallel processing during the collection building phase. A series of experiments were conducted to first establish a basic speed-up factor, and then deconstruct the parallelisation process to understand the execution profile of the application. Several bottlenecks were identified and resolved to further improve the performance. The adaptation of Greenstone confirms that the build phase is indeed a suitable candidate for parallelisation; and suggests that parallelisation of processing is a new avenue for exploration in emerging digital library architectures.

Keywords

Greenstone VLDL Parallel Processing Open MPI 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adams, D.: Papers past: browse access for online New Zealand newspapers. Microform & Imaging Review 34, 22–27 (2005)CrossRefGoogle Scholar
  2. 2.
    Bainbridge, D., Witten, I.H., Boddie, S., Thompson, J.: Stress-testing general purpose digital library software. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 203–214. Springer, Heidelberg (2009), http://portal.acm.org/citation.cfm?id=1812799.1812828 CrossRefGoogle Scholar
  3. 3.
    Barroso, L.A., Dean, J., Hölzle, U.: Web search for a planet: The Google cluster architecture. IEEE Micro 23, 22–28 (2003), http://portal.acm.org/citation.cfm?id=776692.776716 CrossRefGoogle Scholar
  4. 4.
    Goetz, B.: Java theory and practice: Thread pools and work queues. Tech. Rep. j-jtp0730, IBM, New York, United States (2002), http://www.ibm.com/developerworks/library/j-jtp0730/index.html
  5. 5.
    Manghi, P., Pagano, P., Ioannidis, Y.: Second Workshop on Very Large Digital Libraries: in conjunction with the European Conference on Digital Libraries. SIGMOD Rec. 38, 46–48 (2010), http://doi.acm.org/10.1145/1815948.1815959 CrossRefGoogle Scholar
  6. 6.
    Rasmussen, E.M.: Introduction: parallel processing and information retrieval. Inf. Process. Manage. 27, 255–263 (1991), http://portal.acm.org/citation.cfm?id=117658.117659 CrossRefGoogle Scholar
  7. 7.
    Stanfill, C.: Partitioned posting files: a parallel inverted file structure for information retrieval. In: Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1990, pp. 413–428. ACM, New York (1990), http://doi.acm.org/10.1145/96749.98247 CrossRefGoogle Scholar
  8. 8.
    Witten, I.H., Bainbridge, D., Nichols, D.M.: How to Build a Digital Library, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • John Thompson
    • 1
  • David Bainbridge
    • 1
  • Hussein Suleman
    • 2
  1. 1.Department of Computer ScienceUniversity of WaikatoHamiltonNew Zealand
  2. 2.Department of Computer ScienceUniversity of Cape TownCape TownSouth Africa

Personalised recommendations