Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content

Bontcheva, Kalina; Maynard, Diana; Cunningham, Hamish; Saggion, Horacio

doi:10.1007/3-540-45747-X_46

Kalina Bontcheva⁶,
Diana Maynard⁶,
Hamish Cunningham⁶ &
…
Horacio Saggion⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2458))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1713 Accesses
5 Citations

Abstract

In this paper we show how we used robust human language technology, such as our domain-independent and customisable named entity recogniser, for automatic content annotation and indexing in two digital library applications. Each of these applications posed a unique challenge: one required adapting the language processing components to the non-standard written conventions of 18th century English, while the other presented the challenge of processing material in multiple modalities. This reusable technology could also form the basis for the creation of computational tools for the study of cultural heritage languages, such as Ancient Greek and Latin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Appelt. An Introduction to Information Extraction. Artificial Intelligence Communications, 12(3):161–172, 1999.
Google Scholar
D.E. Appelt. The Common Pattern Specification Language. Technical report, SRI International, Arti.cial Intelligence Center, 1996.
Google Scholar
E. R. Appelt, J.R. Hobbs, J. Bear, D. Israel, M. Kameyama, A. Kehler, D. Martin, K. Myers, and M. Tyson. SRII nternational FASTUS System: MUC-6 Test Results and Analysis. In Proceedings of the Sixth Message Understunding Conference (MUC-6). Morgan Kaufmann, California, 1995.
Google Scholar
W. Black and F. Rinaldi. Facile pre-processor 3.0—a user guide. Technical report, Department of Language Engineering, UMIST, 2000.
Google Scholar
W. Black, F. Rinaldi, and D. Mowatt. Facile: Description of the named entity System used for muc-7. In Proceedings of the 7th MUC, 1998.
Google Scholar
F. Ciravegna, A. Lavelli, N. Maria, J. Matiasek, L. Gilardoni, S. Mazza, M. Ferraro, W. Black, F. Rinaldi, and D. Mowatt. Facile: Classifying texts integrating Pattern matching and information extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCA I99), 1999.
Google Scholar
J. Cowie and W. Lehnert. Information Extraction. Communications of the ACM, 39(1):80–91, 1996.
Article Google Scholar
Gregory Crane, Robert F. Chavez, Anne Mahoney, Thomas L. Milbank, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Drudgery and deep thought. Communications of the ACM, 44(5):34–40, 2001.
Article Google Scholar
H. Cunningham. Information Extraction: a User Guide (revised Version). Research Memorandum CS-99-07, Department of Computer Science, University of Sheffield, May 1999.
Google Scholar
H. Cunningham, D. Maynard, K. Bontcheva and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
Google Scholar
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, and C. Ursu. The GATE User Guide. http://gate.ac.uk/, 2002.
R. Grishman. Information Extractionr Techniques and Challenges. In Information Extraction: a Multidisciplinary Approach to an Emerging Information Technology, Springer 1997.
Google Scholar
D. Maynard, V. Tablan, H. Cunningham? C. Ursu, H. Saggion, K. Bontcheva, and Y. Wilks. Architectural elements of language engineering robustness. Journal of Natural Language Engineering—Special Issue on Robust Methods in Analysis of Natural Language Data, 2002. forthcoming.
Google Scholar
D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition from Diverse Text Types. In Recent Advances in Natura1 Language Processing 2001 Conference, Tzigov Chark, Bulgaria, 2001.
Google Scholar
J. McNaught, W. Black, F. Rinaldi, E. Bertino, A. Brasher, D. Deavin, B. Catania, D. Silvestri, B. Armani, A. Persidis, G. Semerano, F. Esposito, V. Candela, G.P. Zarri, and L. Gilardoni. Integrated document and knowledge management for the knowledge-based enterprise. In Proceedings of the 3rd International Conference on the practical application of Knowledge Management. The paractical application company, 2000.
Google Scholar
H. Saggion, H. Cunningham, D. Maynard, K. Bontcheva, O. Hamza, C. Ursu, and Y. Wilks. Extracting Information for Automatic Indexing of Multimedia Material. In 3rd International Conference on Language Resources and Evaluation (LREC 2002), pages 669–676, Las Palmas, Gran Canaria, Spain, 2002.
Google Scholar
S. Soderland. Learning to extract text-based Information from the world wide web. Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-9Y), 1997.
Google Scholar
Beth Sundheim, editor. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 1995. ARPA, Morgan Kaufmann.
Google Scholar
Beth Sundheim, editor. Proceedings of the Seventh Message Understanding Conference (MUC-7). ARPA, Morgan Kaufmann, 1998.
Google Scholar
V. Tablan, C. Ursu, K. Bontcheva, H. Cunningham, D. Maynard, O. Hamza, Tony McEnery, Paul Baker, and Mark Leisher. A unicode-based environment for creation and use of language resources. In Proceedings of 3rd Language Resources and Evaluation Conference, 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept of Computer Science, University of Sheffield, 211 Portobello St, S1 4DP, Sheffield, UK
Kalina Bontcheva, Diana Maynard, Hamish Cunningham & Horacio Saggion

Authors

Kalina Bontcheva
View author publications
You can also search for this author in PubMed Google Scholar
Diana Maynard
View author publications
You can also search for this author in PubMed Google Scholar
Hamish Cunningham
View author publications
You can also search for this author in PubMed Google Scholar
Horacio Saggion
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131, Padova, Italy
Maristella Agosti
Istituto di Scienza e Tecnologie dell’ Informazione (ISTI-CNR), Area della Ricerca CNR di Pisa, Via G. Moruzzi 1, 56124, Pisa, Italy
Costantino Thanos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bontcheva, K., Maynard, D., Cunningham, H., Saggion, H. (2002). Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_46

Download citation

DOI: https://doi.org/10.1007/3-540-45747-X_46
Published: 13 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44178-6
Online ISBN: 978-3-540-45747-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics