Abstract
In this paper we show how we used robust human language technology, such as our domain-independent and customisable named entity recogniser, for automatic content annotation and indexing in two digital library applications. Each of these applications posed a unique challenge: one required adapting the language processing components to the non-standard written conventions of 18th century English, while the other presented the challenge of processing material in multiple modalities. This reusable technology could also form the basis for the creation of computational tools for the study of cultural heritage languages, such as Ancient Greek and Latin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Appelt. An Introduction to Information Extraction. Artificial Intelligence Communications, 12(3):161–172, 1999.
D.E. Appelt. The Common Pattern Specification Language. Technical report, SRI International, Arti.cial Intelligence Center, 1996.
E. R. Appelt, J.R. Hobbs, J. Bear, D. Israel, M. Kameyama, A. Kehler, D. Martin, K. Myers, and M. Tyson. SRII nternational FASTUS System: MUC-6 Test Results and Analysis. In Proceedings of the Sixth Message Understunding Conference (MUC-6). Morgan Kaufmann, California, 1995.
W. Black and F. Rinaldi. Facile pre-processor 3.0—a user guide. Technical report, Department of Language Engineering, UMIST, 2000.
W. Black, F. Rinaldi, and D. Mowatt. Facile: Description of the named entity System used for muc-7. In Proceedings of the 7th MUC, 1998.
F. Ciravegna, A. Lavelli, N. Maria, J. Matiasek, L. Gilardoni, S. Mazza, M. Ferraro, W. Black, F. Rinaldi, and D. Mowatt. Facile: Classifying texts integrating Pattern matching and information extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCA I99), 1999.
J. Cowie and W. Lehnert. Information Extraction. Communications of the ACM, 39(1):80–91, 1996.
Gregory Crane, Robert F. Chavez, Anne Mahoney, Thomas L. Milbank, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Drudgery and deep thought. Communications of the ACM, 44(5):34–40, 2001.
H. Cunningham. Information Extraction: a User Guide (revised Version). Research Memorandum CS-99-07, Department of Computer Science, University of Sheffield, May 1999.
H. Cunningham, D. Maynard, K. Bontcheva and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, and C. Ursu. The GATE User Guide. http://gate.ac.uk/, 2002.
R. Grishman. Information Extractionr Techniques and Challenges. In Information Extraction: a Multidisciplinary Approach to an Emerging Information Technology, Springer 1997.
D. Maynard, V. Tablan, H. Cunningham? C. Ursu, H. Saggion, K. Bontcheva, and Y. Wilks. Architectural elements of language engineering robustness. Journal of Natural Language Engineering—Special Issue on Robust Methods in Analysis of Natural Language Data, 2002. forthcoming.
D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition from Diverse Text Types. In Recent Advances in Natura1 Language Processing 2001 Conference, Tzigov Chark, Bulgaria, 2001.
J. McNaught, W. Black, F. Rinaldi, E. Bertino, A. Brasher, D. Deavin, B. Catania, D. Silvestri, B. Armani, A. Persidis, G. Semerano, F. Esposito, V. Candela, G.P. Zarri, and L. Gilardoni. Integrated document and knowledge management for the knowledge-based enterprise. In Proceedings of the 3rd International Conference on the practical application of Knowledge Management. The paractical application company, 2000.
H. Saggion, H. Cunningham, D. Maynard, K. Bontcheva, O. Hamza, C. Ursu, and Y. Wilks. Extracting Information for Automatic Indexing of Multimedia Material. In 3rd International Conference on Language Resources and Evaluation (LREC 2002), pages 669–676, Las Palmas, Gran Canaria, Spain, 2002.
S. Soderland. Learning to extract text-based Information from the world wide web. Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-9Y), 1997.
Beth Sundheim, editor. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 1995. ARPA, Morgan Kaufmann.
Beth Sundheim, editor. Proceedings of the Seventh Message Understanding Conference (MUC-7). ARPA, Morgan Kaufmann, 1998.
V. Tablan, C. Ursu, K. Bontcheva, H. Cunningham, D. Maynard, O. Hamza, Tony McEnery, Paul Baker, and Mark Leisher. A unicode-based environment for creation and use of language resources. In Proceedings of 3rd Language Resources and Evaluation Conference, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bontcheva, K., Maynard, D., Cunningham, H., Saggion, H. (2002). Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_46
Download citation
DOI: https://doi.org/10.1007/3-540-45747-X_46
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44178-6
Online ISBN: 978-3-540-45747-3
eBook Packages: Springer Book Archive