The UC Berkeley Digital Library Project: Re-thinking Scholarly Information Dissemination and Use
Information technology is not merely provided enhanced versions of services of the sort we have come to expect from libraries; it is inducing a fundamental change in the way information is created, disseminated, and used. The shift from the current centralized, discrete publishing model, toward a distributed, continuous, and self-publishing model, is already underway. However, left to its own devices, some of the the better aspects of the current model, such as peer review, may be compromised, even as the opportunity for new services is afforded. Effort will also be required to provide first class support in the emerging infrastructure for data that are not textual in nature, such as images, videos, maps, and scientific data sets.
Many tools and technologies will be useful in enhancing and exploiting this view of the emerging information infrastructure. One set of tools relates to document technologies. ”Multivalent Documents” is a new model of documents that seems useful in this context. The multivalent document model is (i) highly open, meaning that is supports an open-ended variety of document formats and functions, (ii) highly extensible, meaning that it can be extended and customized in novel ways and to meet particular user needs, and (iii) highly distributed, meaning that components of a document may exist as separate networked resources, which are combined dynamically into a coherent documents. A particularly attractive aspect of the model is the manner in which it supports ”spontaneous collaboration”, the ability of a user to annotate web pages, scanned images, and other networked, resources for which that user has no privileged relation.
Multivalent documents address some issues in manipulating on-line resources. Finding those resources is still problematic, especially for those in image form. ”Automatic content analysis” is the set of techniques for analyzing the content of information objects so as to facilitate their subsequent access. We present some recent developments in this area for accessing document images, photographs, and text.