Lexicons as Gold: Mining, Embellishment, and Reuse

Miller, Keith J.; Zajic, David M.

doi:10.1007/3-540-49478-2_43

Lexicons as Gold: Mining, Embellishment, and Reuse

Keith J. Miller⁴ &
David M. Zajic⁴

Conference paper
First Online: 01 January 2002

682 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1529))

Abstract

Given the high labor costs of developing new lexical resources for Machine Translation (MT) and language processing systems, it is desirable to make the most of those resources already in existence. This paper describes the work being carried out on two MT projects that share a common goal: the creation, maintenance and reuse of lexical information. This goal calls into play a range of tasks from dictionary mining of machine-readable dictionaries (MRDs) to the definition of a repository capable of housing this diverse lexical information. This paper outlines the two efforts, focusing on the problems encountered and the intermediate results achieved. While the ultimate goal of the automated processing of on-line resources into multi-purpose lexical repositories is far from being achieved, our experience has shown that there are significant applications that can make use of the partially processed information produced en route. We will describe our experience with two projects, with a focus on one which utilized multiple lexical resources to provide the basis for two natural language processing (NLP) tools: a segmenter and a glosser for Thai. Finally, we make recommendations for future resource development, with a view toward mitigating the difficulties of merging information from diverse sources.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Copestake, A. and Sanfilippo, A.: “Multilingual Lexical Representation”. Building Lexicons for Machine Translation, Papers from the 1993 AAAI Spring Symposium. AAAI Press, Menlo Park, California. 1993.
Google Scholar
EAGLES, Expert Advisory Group on Language Engineering Standards: Synopsis and Comparison of Morphosyntactic Phenonmena Encoded in Lexicons and Corpora. A Common Proposal and Applications to European Languages. Technical Report EAGCLWG-Morphsyn/R, ILC-CNR, Pisa 1996.
Google Scholar
Kamei, S., Itoh, E., Fujii, M., Hirai, T., Saitoh, Y., Takahashi, M., Hiyama, T., & Muraki, K.: “Sharable Formats and Their Supporting Environments for Exchanging User Dictionaries among Different MT Systems as a Part of AAMT Activities”. Proceedings of Machine Translation Summit IV, San Diego, California, 1997.
Google Scholar
Kilgarriff, A.: “Foreground and Background Lexicons and Word Sense Disambiguation for Information Extraction”. Proceedings of the International Workshop on Lexically Driven Information Extraction. Frascati, Italy. July 1997. pp. 51–62.
Google Scholar
Melamed, I.D.: “A Word-to-Word Model of Translational Equivalence”. Proceedings of the 35th Conference of the Association for Computational Linguistics (ACL’97). Madrid, Spain. 1997.
Google Scholar
Melamed, I.D. & Resnik, P.: “Semi-Automatic Acquisition of Domain-Specific Translation Lexicons”. Proceedings of the 5th ANLP Conference. 1997.
Google Scholar
Vanderwende, L.: “Ambiguity in the Acquisition of Lexical Information”. Proceedings of the AAAI 1995 Spring Symposium Series, working notes of the Symposium on Representation and Acquisition of Lexical Knowledge. 1995. pp. 174–179.
Google Scholar
Wilks, Y.A., Slator, B.M, & Guthrie, L.M.: Electric Words: dictionaries, computers, and meanings. MIT Press. Cambridge, Massachusetts. 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

The MITRE Corporation, 1820 Dolley Madison Boulevard, McLean, VA, 22102-3481
Keith J. Miller & David M. Zajic

Authors

Keith J. Miller
View author publications
You can also search for this author in PubMed Google Scholar
David M. Zajic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computing Research Lab, New Mexico State University, Box 30001 / 3CRL, Las Cruces, NM, 88003, USA
David Farwell
SYSTRAN Inc., 7855 Fay Avenue, Suite 300, P.O. Box 907, La Jolla, CA, 92037, USA
Laurie Gerber
Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA, 90292-6695, USA
Eduard Hovy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miller, K.J., Zajic, D.M. (1998). Lexicons as Gold: Mining, Embellishment, and Reuse. In: Farwell, D., Gerber, L., Hovy, E. (eds) Machine Translation and the Information Soup. AMTA 1998. Lecture Notes in Computer Science(), vol 1529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49478-2_43

Download citation

DOI: https://doi.org/10.1007/3-540-49478-2_43
Published: 24 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65259-5
Online ISBN: 978-3-540-49478-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics