Advertisement

A Language Modeling Approach for Acronym Expansion Disambiguation

  • Akram Gaballah AhmedEmail author
  • Mohamed Farouk Abdel Hady
  • Emad Nabil
  • Amr Badr
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9041)

Abstract

Nonstandard words such as proper nouns, abbreviations, and acronyms are a major obstacle in natural language text processing and information retrieval. Acronyms, in particular, are difficult to read and process because they are often domain-specific with high degree of polysemy. In this paper, we propose a language modeling approach for the automatic disambiguation of acronym senses using context information. First, a dictionary of all possible expansions of acronyms is generated automatically. The dictionary is used to search for all possible expansions or senses to expand a given acronym. The extracted dictionary consists of about 17 thousands acronym-expansion pairs defining 1,829 expansions from different fields where the average number of expansions per acronym was 9.47. Training data is automatically collected from downloaded documents identified from the results of search engine queries. The collected data is used to build a unigram language model that models the context of each candidate expansion. At the in-context expansion prediction phase, the relevance of acronym expansion candidates is calculated based on the similarity between the context of each specific acronym occurrence and the language model of each candidate expansion. Unlike other work in the literature, our approach has the option to reject to expand an acronym if it is not confident on disambiguation. We have evaluated the performance of our language modeling approach and compared it with tf-idf discriminative approach.

Keywords

word sense disambiguation information extraction language modeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ammar, W., Darwish, K., El Kahki, A., Hafez, K.: ICE-TEA: In-context expansion and translation of english abbreviations. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 41–54. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Terada, A., Tokunaga, T., Tanaka, H.: Automatic expansion of abbreviations by using context and character. Information Processing and Management 40(1) (2004)Google Scholar
  3. 3.
    Yu, H., Kim, W., Hatzivassiloglou, V., Wilbur, J.: A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations. ACM Transactions on Information Systems 24(3) (2006)Google Scholar
  4. 4.
    Zahariev, M.: Automatic sense disambiguation for acronyms. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 124–132 (2004)Google Scholar
  5. 5.
    Fellbaum, C.: MIT Press (1998) Google Scholar
  6. 6.
    Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys 41(2) (2009)Google Scholar
  7. 7.
    Klavans, J., Chodorow, M., Wachokder, N.: From dictionary to knowledge base via taxononym. In: Proceedings of the 6th Conference of the UW Contre for the New OED, pp. 41–54 (1990)Google Scholar
  8. 8.
    Taghva, K., Gilbreth, J.: Recognizing acronyms and their definitions. International Journal on Document Analysis and Recognition, 191–198 (1999)Google Scholar
  9. 9.
    Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in biomedical texts. In: Proceedings of the Pacific Symposium on Biocomputing (PSB) (2003)Google Scholar
  10. 10.
    Jain, A., Cucerzan, S., Azzam, S.: Acronym-expansion recognition and ranking on the web. In: Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI 2007), pp. 209–214 (2007)Google Scholar
  11. 11.
    Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in medline. Bioinformatics 21(18), 3658–3664 (2005)CrossRefGoogle Scholar
  12. 12.
    Stevenson, M., Guo, Y., Amri, A.A., Gaizauskas, R.: Disambiguation of biomedical abbreviations. In: BioNLP Workshop, HLT 2009 (2009)Google Scholar
  13. 13.
    Ponte, J., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 275–281 (1998)Google Scholar
  14. 14.
    Mahajan, M., Beeferman, D., Huang, X.D.: Improved topic-dependent language modeling using information retrieval techniques. In: Proceedings of ICASSP (1999)Google Scholar
  15. 15.
    Kuncheva, L., Bezdek, J.: An integrated framework for generalized nearest prototype classifier design. International Journal of Uncertainty, Fuzziness and Knowledge-based Systems 6(5), 437–457 (1998)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Akram Gaballah Ahmed
    • 2
    Email author
  • Mohamed Farouk Abdel Hady
    • 1
  • Emad Nabil
    • 2
  • Amr Badr
    • 2
  1. 1.MicrosoftRedmondUSA
  2. 2.Faculty of Computers and InformationCairo UniversityCairoEgypt

Personalised recommendations