Digital librarires and spatial information processing
The process of constructing and supporting general digital libraries offers a rich area for research into, and application of, parallel and distributed computing. The ability of large, distributed digital libraries to perform adequately over the Internet may well depend on the contributions that parallel computing can make to their performance. This is particularly the case for digital libraries that support access to multimedia materials by geographical reference, as compared with more traditional forms of access relating to author, title, and subject matter. We provide a survey of significant classes of problems in the construction and maintenance of such libraries that can benefit from parallel and distributed processing. Particular emphasis is placed on the processing of information that is implicitly or explicitly organized in terms of spatial reference. Such information includes items whose contents are explicitly organized in terms of spatial reference, such as maps, images, and video (and hence in terms of the “vector” and “raster” data models), and items whose contents are only implicitly organized in such terms, such as texts. Key aspects of such information that encourage the application of parallel computation are the large sizes of individual items (often in the gigabyte range); the fact that much of the information contained in the items is in “implicit” form and requires significant computation to make explicit; and possibilities for decomposing a computation into relatively independent components corresponding to some partition of the geographic space.
The survey is organized in terms of the main functional components of a general digital library: the interface, catalog, storage, and ingest components. Examples of the problems discussed in the survey include the use of library servers that take the form of multicomputers and whose scheduling is optimized to support many simultaneous library users; the user of high-performance parallel computing for the extraction of catalog meta-information and the translation of meta-information representation languages; the application of high-performance parallel computing for the preprocessing of large datasets in preparation for storage in appropriate forms; the application of high-performance parallel computing for the processing of spatially-indexed items at query time, in support of user-centered workspaces; and the use of multicomputing in processing user queries over large networks of workstations.