Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis

  • Yaakov HaCohen-Kerner
  • Ariel Kass
  • Ariel Peretz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5039)


Abbreviations are very common and are widely used in both written and spoken language. However, they are not always explicitly defined and in many cases they are ambiguous. In this research, we present a process that attempts to solve the problem of abbreviation ambiguity. Various features have been explored, including context-related methods and statistical methods. The application domain is Jewish Law documents written in Hebrew, which are known to be rich in ambiguous abbreviations. Various variants of the one sense per discourse hypothesis (by varying the scope of discourse) have been implemented. Several common machine learning methods have been tested to find a successful integration of these variants. The best results have been achieved by SVM, with 96.09% accuracy.


Support Vector Machine Natural Language Processing Good Variant Word Sense Disambiguation Count Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abdi, H., Valentin, D., Edelman, B.: Neural networks. Sage, Thousand, Oaks (1999)Google Scholar
  2. 2.
    Adar, E.: S-RAD: A Simple and Robust Abbreviation Dictionary. Technical Report, HP Laboratories (2002)Google Scholar
  3. 3.
    Ashkenazi, S., Jarden, D.: Ozar Rashe Tevot: Thesaurus of Hebrew Abbreviations (in Hebrew). Kiryat Sefere LTD., Jerusalem (1994)Google Scholar
  4. 4.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)zbMATHGoogle Scholar
  5. 5.
    Chang, C., Lin, C.: LIBSVM: a Library for Support Vector Machines. Software in Python (2001),
  6. 6.
    Frantzi, K., Ananiadou, S.: The C value domain independent method for multiword term extraction. JNLP 6(3), 145–179 (1999)Google Scholar
  7. 7.
    Gale, W., Church, K., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the 4th DARPA speech in Natural Language Workshop, pp. 233–237 (1992)Google Scholar
  8. 8.
    Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving Abbreviations to their Senses in Medline. Bioinformatics 21(18), 3658–3664 (2005)CrossRefGoogle Scholar
  9. 9.
    Good, I. J.: The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, Cambridge (1965)zbMATHGoogle Scholar
  10. 10.
    Hacohen, Y. M.: Mishnah Berurah (in Hebrew). Hotzaat Leshem, Jerusalem (1995)Google Scholar
  11. 11.
    Hacohen, Y. M.: Mishnah Berurah. English Translation, Pisgah Foundation. Feldheim Publishers, Jerusalem (1990)Google Scholar
  12. 12.
    HaCohen-Kerner, Y., Kass, A., Peretz, A.: Baseline Methods for Automatic Disambiguation of Abbreviations in Jewish Law Documents. In: Vicedo, J. L., Martinez-Barco, P., Munoz, R., Noeda, M. S. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 58–69. Springer, Heidelberg (2004)Google Scholar
  13. 13.
    Ide, N., Véronis, J.: Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)Google Scholar
  14. 14.
    Joint Commission on Accreditation of Healthcare Organizations: Medication errors related to potentially dangerous abbreviation. Sentinel Event Alert 23 (2001)Google Scholar
  15. 15.
    Liu, H., Aronson, A. R., Friedman, C.: A Study of Abbreviations in MEDLINE Abstracts. In: Proc AMIA Symp., pp. 464–469 (2002)Google Scholar
  16. 16.
    Miller, G. A.: The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity of Information. Psychological Science 63, 81–97 (1956)Google Scholar
  17. 17.
    Okazaki, N., Ananiadou, S.: Building an Abbreviation Dictionary using a Term Recognition Approach. Bioinformatics 22(24), 3089–3095 (2006)CrossRefGoogle Scholar
  18. 18.
    Okazaki, N., Ananiadou, S.: Clustering Acronyms in Biomedical Text for Disambiguation. In: Proceedings of fifth international conference on Language Resources and Evaluation (LREC), pp. 959–962 (2006)Google Scholar
  19. 19.
    Ovadia, Y.: Yechave Daat (in Hebrew). Chazon Ovadia, Jerusalem (1977)Google Scholar
  20. 20.
    Ovadia, Y.: Yabia Omer (in Hebrew). Chazon Ovadia, Jerusalem (1986)Google Scholar
  21. 21.
    Pakhomov, S.: Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts. Association for Computational Linguistics (ACL), pp. 160-167 (2002)Google Scholar
  22. 22.
    Pakhomov, S., Pedersen, T., Chute, C. G.: Abbreviation and Acronym Disambiguation in Clinical Discourse. In: American Medical Informatics Association Annual Symposium, pp. 589–593 (2005)Google Scholar
  23. 23.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet: Similarity - Measuring the Relatedness of Concepts. In: Proceedings of the 9th National Conference on Artificial Intelligence, pp. 1024–1025 (2004)Google Scholar
  24. 24.
    Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Extraction and Disambiguation of Acronym-Meaning Pairs in Medline (unpublished manuscript) (2001)Google Scholar
  25. 25.
    Quinlan, J. R.: C4.5: Programs For Machine Learning. Morgan Kaufmann, Los Altos (1993)Google Scholar
  26. 26.
    Salton, G.: The SMART Information Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs (1971)Google Scholar
  27. 27.
    Witten, H., Frank, E.: Weka 3.4.12: Machine Learning Software in Java(2007),
  28. 28.
    Yarowsky, D.: One Sense per Collocation. In: Proceedings of the Workshop on Human Language Technology, pp. 266–271 (1993)Google Scholar
  29. 29.
    Yu, H., Hripcsak, G., Friedman, C.: Mapping Abbreviations to Full Forms in Biomedical Articles. J. Am. Med. Inform. Assoc. 9(3), 262–272 (2002)CrossRefGoogle Scholar
  30. 30.
    Yu, Z., Tsuruoka, Y., Tsujii, J.: Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using SVM and One Sense per Discourse Hypothesis. In: SIGIR 2003 Workshop on Text Analysis and Search for Bioinformatics (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yaakov HaCohen-Kerner
    • 1
  • Ariel Kass
    • 1
  • Ariel Peretz
    • 1
  1. 1.Department of Computer ScienceJerusalem College of Technology (Machon Lev)JerusalemIsrael

Personalised recommendations