Skip to main content
Log in

Translation Disambiguation in Mixed Language Queries

  • Published:
Machine Translation

Abstract

Code-switching is very common among bilingual speakers. Spoken queries by these speakers are typically in mixed language. In this paper, we propose an unsupervised method for mixed-language query understanding, using only a monolingual corpus and a bilingual dictionary. Secondary-language words mixed in a primary-language query are translated into words in the primary language. We found that using a single disambiguation feature for translation is more effective than using multiple features, provided this feature is based on the most salient seed-word, chosen automatically by confidence scoring. We propose and compare four types of disambiguation features that are based on context seed-words. A baseline method uses the nearest neighboring seed-word as disambiguation feature. Multiple-context seed-word voting is also proposed in order to enlarge the context window. On the other hand, merely using the inverse-distance as weights on context words degrades the performance as it runs counter to the potential underlying syntactic relations between words. Our final proposal is a solution that uses multiple-context seed-words and the translation candidates of all mixed language words to select a single most salient seed-word for translation disambiguation. The translation disambiguation accuracy for this feature is at 83.7% for all words in the ATIS spontaneous speech query database, and 66.7% for content words.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Reference

  • Peter Auer (1998) Code-Switching in Conversation Language Interaction and Identity London, Routledge

    Google Scholar 

  • Ballesteros Lisa., W. Bruce Croft. 1998, ‘Resolving Ambiguity for Cross-Language Retrieval’. in Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 64–71.

  • F Brown Peter Cocke John A. Della Pietra Stephen J. Della Pietra Vincent Jelinek Fredrick D. Lafferty John L. Mercer Robert S. Roossin Paul (1990) ArticleTitle‘A Statistical Approach to Machine Translation’ Computational Linguistics 16 79–85

    Google Scholar 

  • Brown Peter F., Jennifer C. Lai, and Robert L. Mercer. 1991, ‘Aligning Sentences in Parallel Corpora’. in 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 169–176.

  • Dagan, Ido and Kenneth W. Church. (1994), ‘ Termight. Identifying and Translating Technical Terminology’. in 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, pp. 34–40.

  • Dagan Ido Alon Itai (1994) ArticleTitle‘Word Sense Disambiguation using a Second Language Monolingual Corpus’. Computational Linguistics 20 564–596

    Google Scholar 

  • Dagan, Ido, Alon Itai, and Ulrike Schwall. (1991), ‘Two Languages are More Informative Than One’. in 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, pp. 130–137.

  • Davis, M.W W.C. Ogden. (1998), ‘Free Resources and Advanced Alignment for Cross-Language Text Retrieval’. in Proceedings of the 6th Text Retrieval Conference (TREC-6), Gaithersburg, MD, pp. 385–402.

  • Diab, Mona and Philip Resnik. (2002), ‘An Unsupervised Method for Word Sense Tagging using Parallel Corpora’. in ACL-02: 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, pp. 255–262.

  • Fung Pascale. 1998, ‘A Statistical View on Bilingual Lexicon Extraction. From Parallel Corpora to Non-Parallel Corpora’, in David farwell, Laurie Gerber, and Eduard Hovy (eds.), Machine Translation and the Information Soup. Third Conference of the Association for Machine Translation in the Americas, AMTA’98, Langhorne, PA, Lecture Notes in Artifical Intelligence (1529), Berlin. Springer, pp. 1–17.

  • Fung, Pascale and Yuen Yee Lo. 1998, ‘An IR Approach for Translating New Words from Nonparallel, Comparable Texts’. in COLING-ACL ’98. 36th Annual Conference of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, pp. 414–420.

  • Fung, Pascale and Dekai Wu. 1994. ‘Statistical augmentation of a Chinese Machine-readable dictionary’. in Second Annual Workshop on Very Large Corpora (WVLC2), Kyoto, Japan, pp. 69–85.

  • Gale A. Kenneth William W Church (1993) ArticleTitle‘A Program for Aligning Sentences in Bilingual Corpora’. Computational Linguistics 19 75–102

    Google Scholar 

  • Gale A. William Church. Kenneth W. (1994) ‘Discrimination Decisions in 100,000 Dimensional Spaces’ Zampolli Antonio Calzolari Nicoletta Palmer Martha (Eds) Current Issues in Computational Linguistics: In Honour of Don Walker Kluwer Dordrecht 429–550

    Google Scholar 

  • Gale, William, Kenneth Ward Church, and David Yarowsky. 1992a, ‘Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs’. in 30th Annual Meeting of the Association for Computational Linguistics, Newark, Delaware, pp. 249–256.

  • Gale, William A., Kenneth W. Church, and David Yarowsky. 1992b, ‘Using Bilingual Materials to Develop Word Sense Disambiguation Methods’. in Quatrième Colloque international sur les aspects théoriques etméthodologiques de la traduction automatique, Fourth international Conference on Theoretical and Methodological Issues in Machine Translation TMI - 92, Montréal, Canada, pp. 101–112.

  • Gale, William A., Kenneth W. Church, and David Yarowsky. 1992c, ‘Work on Statistical Methods for Word Sense Disambiguation’. in AAAI-92: Proceedings of the Tenth National Conference on Artifical Intelligence, San Jose, California

  • Gale A. William W. Church Kenneth Yarowsky. David (1993) ArticleTitle‘A Method for Disambiguating Word Senses in a Large Corpus’. Computers and the Humanities 26 415–439 Occurrence Handle10.1007/BF00136984

    Article  Google Scholar 

  • Gao, Jianfeng, Jian-Yun Nie, Hongzhao He, Weijun Chen, and Ming Zhou. 2002, ‘Resolving Query Translation Ambiguity using a Decaying Co-Occurrence Model and Syntactic Dependency Relations’. in Proceedings of the 25th Annual International Conference on Research and Development in Information Retrieval, SIGIR 02, Tampere, Finland, pp. 183–190.

  • A.L. Gorin G. Riccardi J.H. Wright (1997) ArticleTitle‘How May I Help You?’ Speech Communication 23 113–127 Occurrence Handle10.1016/S0167-6393(97)00040-X

    Article  Google Scholar 

  • Grefenstette. Gregory. (1998) Cross-language Information Retrieval Kluwer Academic Publishers Boston

    Google Scholar 

  • Xuedong. Huang Alleva. Fileno Hong. Hisao-Wuen Hwang. Mei-Yuh Lee. Kai-Fu Rosenfeld. Ronald (1993) ArticleTitle‘The SPHINX-II Speech Recognition System. an Overview’ Computer Speech and Language 2 137–148 Occurrence Handle10.1006/csla.1993.1007

    Article  Google Scholar 

  • Kupiec, Julian. 1993, ‘An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora’. in 31st Annual Conference of the Association for Computational Linguistics, Columbus, Ohio, pp. 17–22.

  • Hull, David~A., Gregory Grefenstette. 1996, ‘A Dictionary-Based Approach to Multilingual Informaion Retrieval’. in Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’96, Zurich, Switzerland, pp. 49–57.

  • K. Joshi Aravind (1985) ‘Processing of Sentences with Intrasentential Codeswitching’ Dowty. David R. Karttunen. Lauri Zwicky. Arnold M. (Eds) Natural Language Parsing: Psychological, computational, and theoretical perspectives. Cambridge University Press Cambridge 190–205

    Google Scholar 

  • Liu, Xiaohu and Sheng Li. 1997, ‘Statistic-Based Target Word Selection in English–Chinese Machine Translation’. Journal of Harbin Institute of Technology, May

  • Manning D. Christopher Schütze. Hinrich (1999) Foundations of Statistical Natural Language Processing MIT Press Cambridge, Massachusetts

    Google Scholar 

  • Nuance. (1999) Nuance Speech Recognition System Developer’s Manual Version 6.2 Nuance Communications Menlo Park, CA

    Google Scholar 

  • Oard, D.W. 1997, ‘Alternative Approaches for Cross-Language text retrieval’. in AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Stanford, CA.

  • Picchi, Eugenio and Carol Peters. 1998, ‘Cross-Language Information Retrieval. a System for Comparable Corpus Querying’. in Grefenstette, pp. 81–92.

  • Rayner, Manny, John Dowding, and Beth Ann Hockey. (2001), ‘Baseline Method for Compiling Typed Unification Grammars into Context Free Language Models’. in Proceedings of EUROSPEECH-2001: 7th European Conference on Speech Communication and Technology, Aalborg, Denmark.

  • Rosenfeld Rony. (1995), ‘A Corpus-Based Approach to Language Learning’. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.

  • Schütze Hinrich. (1992), ‘Dimensions of meaning’. in Proceedings Supercomputing ’92, Minneapolis, Minnesota, pp. 787–796.

  • Frank Smadja (1993) ArticleTitle‘Retrieving Collocations From Text Xtract’. Computational Linguistics 19 143–177

    Google Scholar 

  • Frank. Smadja McKeown. Kathleen Hatzsivassiloglou. Vasileios (1996) ArticleTitle‘Translating Collocations for Bilingual Lexicons: a Statistical Approach’ Computational Linguistics 21 1–38

    Google Scholar 

  • Tanaka Kumiko., Hideya Iwasaki. 1996. ‘Extraction of Lexical Translations from Non-Aligned Corpora’. in COLING-96. The 16th International Conference on Computational linguistics, Copenhagan, Denmark, pp. 580–585.

  • Wu Dekai. 1995. ‘Grammarless Extraction of Phrasal Translation Examples from Parallel Texts’. in Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine translation, TMI 95, Leuven, Belgium, pp. 354–372.

  • Yarowsky David. 1993, ‘One Sense per Collocation’. in Proceedings of the 5th DARPA Human Language Technology Workshop, Princeton, NJ, pp. 266–271.

  • Yarowsky David. 1995, ‘Unsupervised Word Sense Disambiguation Rivaling Supervised Methods’. in 33rd Meeting of the Association for Computational Linguistics, Cambridge, Massachusetts, pp. 189–196.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pascale Fung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheung, P., Fung, P. Translation Disambiguation in Mixed Language Queries. Mach Translat 18, 251–273 (2004). https://doi.org/10.1007/s10590-004-7692-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-004-7692-5

Keywords

Navigation