Abstract
Given the high labor costs of developing new lexical resources for Machine Translation (MT) and language processing systems, it is desirable to make the most of those resources already in existence. This paper describes the work being carried out on two MT projects that share a common goal: the creation, maintenance and reuse of lexical information. This goal calls into play a range of tasks from dictionary mining of machine-readable dictionaries (MRDs) to the definition of a repository capable of housing this diverse lexical information. This paper outlines the two efforts, focusing on the problems encountered and the intermediate results achieved. While the ultimate goal of the automated processing of on-line resources into multi-purpose lexical repositories is far from being achieved, our experience has shown that there are significant applications that can make use of the partially processed information produced en route. We will describe our experience with two projects, with a focus on one which utilized multiple lexical resources to provide the basis for two natural language processing (NLP) tools: a segmenter and a glosser for Thai. Finally, we make recommendations for future resource development, with a view toward mitigating the difficulties of merging information from diverse sources.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Copestake, A. and Sanfilippo, A.: “Multilingual Lexical Representation”. Building Lexicons for Machine Translation, Papers from the 1993 AAAI Spring Symposium. AAAI Press, Menlo Park, California. 1993.
EAGLES, Expert Advisory Group on Language Engineering Standards: Synopsis and Comparison of Morphosyntactic Phenonmena Encoded in Lexicons and Corpora. A Common Proposal and Applications to European Languages. Technical Report EAGCLWG-Morphsyn/R, ILC-CNR, Pisa 1996.
Kamei, S., Itoh, E., Fujii, M., Hirai, T., Saitoh, Y., Takahashi, M., Hiyama, T., & Muraki, K.: “Sharable Formats and Their Supporting Environments for Exchanging User Dictionaries among Different MT Systems as a Part of AAMT Activities”. Proceedings of Machine Translation Summit IV, San Diego, California, 1997.
Kilgarriff, A.: “Foreground and Background Lexicons and Word Sense Disambiguation for Information Extraction”. Proceedings of the International Workshop on Lexically Driven Information Extraction. Frascati, Italy. July 1997. pp. 51–62.
Melamed, I.D.: “A Word-to-Word Model of Translational Equivalence”. Proceedings of the 35th Conference of the Association for Computational Linguistics (ACL’97). Madrid, Spain. 1997.
Melamed, I.D. & Resnik, P.: “Semi-Automatic Acquisition of Domain-Specific Translation Lexicons”. Proceedings of the 5th ANLP Conference. 1997.
Vanderwende, L.: “Ambiguity in the Acquisition of Lexical Information”. Proceedings of the AAAI 1995 Spring Symposium Series, working notes of the Symposium on Representation and Acquisition of Lexical Knowledge. 1995. pp. 174–179.
Wilks, Y.A., Slator, B.M, & Guthrie, L.M.: Electric Words: dictionaries, computers, and meanings. MIT Press. Cambridge, Massachusetts. 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miller, K.J., Zajic, D.M. (1998). Lexicons as Gold: Mining, Embellishment, and Reuse. In: Farwell, D., Gerber, L., Hovy, E. (eds) Machine Translation and the Information Soup. AMTA 1998. Lecture Notes in Computer Science(), vol 1529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49478-2_43
Download citation
DOI: https://doi.org/10.1007/3-540-49478-2_43
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65259-5
Online ISBN: 978-3-540-49478-2
eBook Packages: Springer Book Archive