A Language Modeling Approach for Acronym Expansion Disambiguation
Nonstandard words such as proper nouns, abbreviations, and acronyms are a major obstacle in natural language text processing and information retrieval. Acronyms, in particular, are difficult to read and process because they are often domain-specific with high degree of polysemy. In this paper, we propose a language modeling approach for the automatic disambiguation of acronym senses using context information. First, a dictionary of all possible expansions of acronyms is generated automatically. The dictionary is used to search for all possible expansions or senses to expand a given acronym. The extracted dictionary consists of about 17 thousands acronym-expansion pairs defining 1,829 expansions from different fields where the average number of expansions per acronym was 9.47. Training data is automatically collected from downloaded documents identified from the results of search engine queries. The collected data is used to build a unigram language model that models the context of each candidate expansion. At the in-context expansion prediction phase, the relevance of acronym expansion candidates is calculated based on the similarity between the context of each specific acronym occurrence and the language model of each candidate expansion. Unlike other work in the literature, our approach has the option to reject to expand an acronym if it is not confident on disambiguation. We have evaluated the performance of our language modeling approach and compared it with tf-idf discriminative approach.
Keywordsword sense disambiguation information extraction language modeling
Unable to display preview. Download preview PDF.
- 2.Terada, A., Tokunaga, T., Tanaka, H.: Automatic expansion of abbreviations by using context and character. Information Processing and Management 40(1) (2004)Google Scholar
- 3.Yu, H., Kim, W., Hatzivassiloglou, V., Wilbur, J.: A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations. ACM Transactions on Information Systems 24(3) (2006)Google Scholar
- 4.Zahariev, M.: Automatic sense disambiguation for acronyms. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 124–132 (2004)Google Scholar
- 5.Fellbaum, C.: MIT Press (1998) Google Scholar
- 6.Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys 41(2) (2009)Google Scholar
- 7.Klavans, J., Chodorow, M., Wachokder, N.: From dictionary to knowledge base via taxononym. In: Proceedings of the 6th Conference of the UW Contre for the New OED, pp. 41–54 (1990)Google Scholar
- 8.Taghva, K., Gilbreth, J.: Recognizing acronyms and their definitions. International Journal on Document Analysis and Recognition, 191–198 (1999)Google Scholar
- 9.Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in biomedical texts. In: Proceedings of the Pacific Symposium on Biocomputing (PSB) (2003)Google Scholar
- 10.Jain, A., Cucerzan, S., Azzam, S.: Acronym-expansion recognition and ranking on the web. In: Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI 2007), pp. 209–214 (2007)Google Scholar
- 12.Stevenson, M., Guo, Y., Amri, A.A., Gaizauskas, R.: Disambiguation of biomedical abbreviations. In: BioNLP Workshop, HLT 2009 (2009)Google Scholar
- 13.Ponte, J., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 275–281 (1998)Google Scholar
- 14.Mahajan, M., Beeferman, D., Huang, X.D.: Improved topic-dependent language modeling using information retrieval techniques. In: Proceedings of ICASSP (1999)Google Scholar