ICE-TEA: In-Context Expansion and Translation of English Abbreviations

Ammar, Waleed; Darwish, Kareem; El Kahki, Ali; Hafez, Khaled

doi:10.1007/978-3-642-19437-5_4

Waleed Ammar¹⁷,
Kareem Darwish¹⁷,
Ali El Kahki¹⁷ &
…
Khaled Hafez¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1316 Accesses
3 Citations

Abstract

The wide use of abbreviations in modern texts poses interesting challenges and opportunities in the field of NLP. In addition to their dynamic nature, abbreviations are highly polysemous with respect to regular words. Technologies that exhibit some level of language understanding may be adversely impacted by the presence of abbreviations. This paper addresses two related problems: (1) expansion of abbreviations given a context, and (2) translation of sentences with abbreviations. First, an efficient retrieval-based method for English abbreviation expansion is presented. Then, a hybrid system is used to pick among simple abbreviation-translation methods. The hybrid system achieves an improvement of 1.48 BLEU points over the baseline MT system, using sentences that contain abbreviations as a test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agirre, E., Martinez, D.: Smoothing and word sense disambiguation. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 360–371. Springer, Heidelberg (2004)
Chapter Google Scholar
Callison-Burch, C., Koehn, P., Osborne, M.: Improved Statistical Machine Translation Using Paraphrases. In: NAACL 2006 (2006)
Google Scholar
Carpuat, M., Wu, D.: Improving Statistical Machine Translation using Word Sense Disambiguation. In: Proceedings of EMNLP 2007, pp. 61–72 (2007)
Google Scholar
Chan, Y., Ng, H., Chiang, D.: Word Sense Disambiguation Improves Statistical Machine Translation. In: Proceedings of ACL 2007, pp. 33–40 (2007)
Google Scholar
Chiang, D.: Hierarchical Phrase-Based Translation. Computational Linguistics 33(2), 201–228 (2007)
Article MATH Google Scholar
Galley, M., Graehl, J., Knight, K., Marcu, D., DeNeefe, S., Wang, W., Thayer, I.: Scalable Inference and Training of Context-Rich Syntactic Translation Models. In: Proceedings of COLING/ACL 2006, pp. 961–968 (2006)
Google Scholar
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in Medline. Bioinformatics 21(18), 3658–3664 (2005)
Article Google Scholar
Hiroko, A., Takagi, T.: ALICE: An Algorithm to Extract Abbreviations from MEDLINE. Journal of the American Medical Informatics Association, 576–586 (2005)
Google Scholar
Koehn, P., Och, F., Marcu, D.: Statistical phrase-based translation. In: Proceedings of NAACL 2003, pp. 48–54 (2003)
Google Scholar
Larkey, L., Ogilvie, P., Price, A., Tamilio, B.: Acrophile: an automated acronym extractor and server. In: Intl. Conf. on Digital Libraries archive, 5^th ACM Conf. on Digital libraries, pp. 205-214 (2000)
Google Scholar
Li, Z., Yarowsky, D.: Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora. In: ACL 2008 (2008a)
Google Scholar
Li, Z., Yarowsky, D.: Mining and modeling relations between formal and informal Chinese phrases from web corpora (2008b)
Google Scholar
Menezes, A., Quirk, C.: Syntactic Models for Structural Word Insertion and Deletion. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 735–744, Honolulu (October 2008)
Google Scholar
Metzler, D., Croft, W.B.: Combining the Language Model and Inference Network Approaches to Retrieval. Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval 40(5), 735–750 (2004)
Google Scholar
Molloy, M.: Acronym Finder (1997), from http://www.acronymfinder.com/ (retrieved February 8, 2010)
Navigli, R.: Word Sense Disambiguation: a Survey. ACM Computing Surveys 41(2) (2009)
Google Scholar
Pakhomov, S.: Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts. In: ACL 2002 (2002)
Google Scholar
Quirk, C., Menezes, A., Cherry, C.: 2005. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: ACL 2005 (2005)
Google Scholar
Quirk, C., Udupa, R., Menezes, A.: Generative Models of Noisy Translations with Applications to Parallel Fragment Extraction. In: European Assoc. for MT (2007)
Google Scholar
Roche, M., Prince, V.: AcroDef: A quality measure for discriminating expansions of ambiguous acronyms. In: Kokinov, B., Richardson, D.C., Roth-Berghofer, T.R., Vieu, L. (eds.) CONTEXT 2007. LNCS (LNAI), vol. 4635, pp. 411–424. Springer, Heidelberg (2007)
Chapter Google Scholar
Roche, M., Prince, V.: Managing the Acronym/Expansion Identification Process for Text-Mining Applications. Int. Journal of Software and Informatics 2(2), 163–179 (2008)
Google Scholar
Stevenson, M., Guo, Y., Al Amri, A., Gaizauskas, R.: Disambiguation of Biomedical Abbreviations. In: BioNLP Workshop, HLT 2009 (2009)
Google Scholar
Xu, J., Huang, Y.: Using SVM to Extract Acronyms from Text. In: Soft Computing - A Fusion of Foundations, Methodologies and Applications, pp. 369–373 (2006)
Google Scholar
Yeates, S.: Automatic Extraction of Acronyms from Text. In: New Zealand Computer Science Research Students Conference 1999, pp. 117–124 (1999)
Google Scholar
Zahariev, M.: Automatic Sense Disambiguation for Acronyms. In: SIGIR 2004, pp. 586–587 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Cairo Microsoft Innovation Center, Microsoft, 306 Chorniche El-Nile, Maadi, Cairo, Egypt
Waleed Ammar, Kareem Darwish, Ali El Kahki & Khaled Hafez

Authors

Waleed Ammar
View author publications
You can also search for this author in PubMed Google Scholar
Kareem Darwish
View author publications
You can also search for this author in PubMed Google Scholar
Ali El Kahki
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Hafez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ammar, W., Darwish, K., El Kahki, A., Hafez, K. (2011). ICE-TEA: In-Context Expansion and Translation of English Abbreviations. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics