Skip to main content

Importing Documents and Metadata into Digital Libraries: Requirements Analysis and an Extensible Architecture

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2458))

Abstract

Flexible digital library systems need to be able to accept, or “import,” documents and metadata in a variety of forms, and associate metadata with the appropriate documents. This paper analyzes the requirements of the import process for general digital libraries. The requirements include (a) format conversion for source documents, (b) the ability to incorporate existing conversion utilities, (c) provision for metadata to be specified in the document files themselves and/or in separate metadata files, (d) format conversion for metadata files, (e) provision for metadata to be computed from the document content, and (f) flexible ways of associating metadata with documents or sets of documents. We argue that these requirements are so open-ended that they are best met by an extensible architecture that facilitates the addition of new document formats and metadata facilities to existing digital library systems. An implementation of this architecture is briefly described.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dumais, S.T., Platt, J., Heckerman, D. and Sahami, M.: Inductive learning algorithms and representations for text categorization. Proc ACM Conf on Information and Knowledge Management. (1998) 148–155

    Google Scholar 

  2. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C. and Nevill-Manning, C.: Domain-specific keyphrase extraction. Proc Int Joint Conference on Artificial Intelligence, Stockholm, Sweden. San Francisco, CA: Morgan Kaufmann Publishers. (1999) 668–673

    Google Scholar 

  3. Lavoie, Brian. Meeting the Challenges of Digital Preservation: The OAIS Reference Model. OCLC Newsletter, No. 243. (2000) 26–30

    Google Scholar 

  4. Van de Sompel, H. and Lagoze, C.: The Santa Fe convention of the Open Archives Initiative. D-Lib Magazine, Vol 6, No 2. (2000)

    Google Scholar 

  5. Witten, I.H., Bainbridge, D. and Boddie, S.J.: Power to the people: end-user building of digital library collections. Proc Joint Conference on Digital Libraries, Roanoke, Virginia. (2001) 94–103

    Google Scholar 

  6. Witten, I.H., Bainbridge, D., Paynter, S. and Boddie, S.J.: The Greenstone plugin architecture. Proc Joint Conference on Digital Libraries, Portland, Oregon. (2002)

    Google Scholar 

  7. Yeates, S., Bainbridge, D. and Witten, I.H.: Using compression to identify acronyms in text. Proc Data Compression Conference, edited by J.A. Storer and M. Cohn. IEEE Press Los Alamitos, CA. (2000) 582

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Witten, I.H., Bainbridge, D., Paynter, G., Boddie, S. (2002). Importing Documents and Metadata into Digital Libraries: Requirements Analysis and an Extensible Architecture. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_29

Download citation

  • DOI: https://doi.org/10.1007/3-540-45747-X_29

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44178-6

  • Online ISBN: 978-3-540-45747-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics