Skip to main content

Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (ECDL 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2458))

Included in the following conference series:

Abstract

In this paper we show how we used robust human language technology, such as our domain-independent and customisable named entity recogniser, for automatic content annotation and indexing in two digital library applications. Each of these applications posed a unique challenge: one required adapting the language processing components to the non-standard written conventions of 18th century English, while the other presented the challenge of processing material in multiple modalities. This reusable technology could also form the basis for the creation of computational tools for the study of cultural heritage languages, such as Ancient Greek and Latin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Appelt. An Introduction to Information Extraction. Artificial Intelligence Communications, 12(3):161–172, 1999.

    Google Scholar 

  2. D.E. Appelt. The Common Pattern Specification Language. Technical report, SRI International, Arti.cial Intelligence Center, 1996.

    Google Scholar 

  3. E. R. Appelt, J.R. Hobbs, J. Bear, D. Israel, M. Kameyama, A. Kehler, D. Martin, K. Myers, and M. Tyson. SRII nternational FASTUS System: MUC-6 Test Results and Analysis. In Proceedings of the Sixth Message Understunding Conference (MUC-6). Morgan Kaufmann, California, 1995.

    Google Scholar 

  4. W. Black and F. Rinaldi. Facile pre-processor 3.0—a user guide. Technical report, Department of Language Engineering, UMIST, 2000.

    Google Scholar 

  5. W. Black, F. Rinaldi, and D. Mowatt. Facile: Description of the named entity System used for muc-7. In Proceedings of the 7th MUC, 1998.

    Google Scholar 

  6. F. Ciravegna, A. Lavelli, N. Maria, J. Matiasek, L. Gilardoni, S. Mazza, M. Ferraro, W. Black, F. Rinaldi, and D. Mowatt. Facile: Classifying texts integrating Pattern matching and information extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCA I99), 1999.

    Google Scholar 

  7. J. Cowie and W. Lehnert. Information Extraction. Communications of the ACM, 39(1):80–91, 1996.

    Article  Google Scholar 

  8. Gregory Crane, Robert F. Chavez, Anne Mahoney, Thomas L. Milbank, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Drudgery and deep thought. Communications of the ACM, 44(5):34–40, 2001.

    Article  Google Scholar 

  9. H. Cunningham. Information Extraction: a User Guide (revised Version). Research Memorandum CS-99-07, Department of Computer Science, University of Sheffield, May 1999.

    Google Scholar 

  10. H. Cunningham, D. Maynard, K. Bontcheva and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.

    Google Scholar 

  11. H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, and C. Ursu. The GATE User Guide. http://gate.ac.uk/, 2002.

  12. R. Grishman. Information Extractionr Techniques and Challenges. In Information Extraction: a Multidisciplinary Approach to an Emerging Information Technology, Springer 1997.

    Google Scholar 

  13. D. Maynard, V. Tablan, H. Cunningham? C. Ursu, H. Saggion, K. Bontcheva, and Y. Wilks. Architectural elements of language engineering robustness. Journal of Natural Language Engineering—Special Issue on Robust Methods in Analysis of Natural Language Data, 2002. forthcoming.

    Google Scholar 

  14. D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition from Diverse Text Types. In Recent Advances in Natura1 Language Processing 2001 Conference, Tzigov Chark, Bulgaria, 2001.

    Google Scholar 

  15. J. McNaught, W. Black, F. Rinaldi, E. Bertino, A. Brasher, D. Deavin, B. Catania, D. Silvestri, B. Armani, A. Persidis, G. Semerano, F. Esposito, V. Candela, G.P. Zarri, and L. Gilardoni. Integrated document and knowledge management for the knowledge-based enterprise. In Proceedings of the 3rd International Conference on the practical application of Knowledge Management. The paractical application company, 2000.

    Google Scholar 

  16. H. Saggion, H. Cunningham, D. Maynard, K. Bontcheva, O. Hamza, C. Ursu, and Y. Wilks. Extracting Information for Automatic Indexing of Multimedia Material. In 3rd International Conference on Language Resources and Evaluation (LREC 2002), pages 669–676, Las Palmas, Gran Canaria, Spain, 2002.

    Google Scholar 

  17. S. Soderland. Learning to extract text-based Information from the world wide web. Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-9Y), 1997.

    Google Scholar 

  18. Beth Sundheim, editor. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 1995. ARPA, Morgan Kaufmann.

    Google Scholar 

  19. Beth Sundheim, editor. Proceedings of the Seventh Message Understanding Conference (MUC-7). ARPA, Morgan Kaufmann, 1998.

    Google Scholar 

  20. V. Tablan, C. Ursu, K. Bontcheva, H. Cunningham, D. Maynard, O. Hamza, Tony McEnery, Paul Baker, and Mark Leisher. A unicode-based environment for creation and use of language resources. In Proceedings of 3rd Language Resources and Evaluation Conference, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bontcheva, K., Maynard, D., Cunningham, H., Saggion, H. (2002). Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_46

Download citation

  • DOI: https://doi.org/10.1007/3-540-45747-X_46

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44178-6

  • Online ISBN: 978-3-540-45747-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics