Advertisement

Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study

  • Julie MennesEmail author
  • Ted Pedersen
  • Els Lefever
Project Notes
  • 9 Downloads

Abstract

Cross-disciplinary communication is often impeded by terminological ambiguity. Hence, cross-disciplinary teams would greatly benefit from using a language technology-based tool that allows for the (at least semi-) automated resolution of ambiguous terms. Although no such tool is readily available, an interesting theoretical outline of one does exist. The main obstacle for the concrete realization of this tool is the current lack of an effective method for the automatic detection of the different meanings of ambiguous terms across different disciplinary jargons. In this paper, we set up a pilot study to experimentally assess whether the word sense induction technique of ‘context clustering’, as implemented in the software package ‘SenseClusters’, might be a solution. More specifically, given several sets of sentences coming from a cross-disciplinary corpus containing a specific ambiguous term, we verify whether this technique can classify each sentence in accordance to the meaning of the ambiguous term in that sentence. For the experiments, we first compile a corpus that represents the disciplinary jargons involved in a project on Bone Tissue Engineering. Next, we conduct two series of experiments. The first series focuses on determining appropriate SenseClusters parameter settings using manually selected test data for the ambiguous target terms ‘matrix’ and ‘model’. The second series evaluates the actual performance of SenseClusters using randomly selected test data for an extended set of target terms. We observe that SenseClusters can successfully classify sentences from a cross-disciplinary corpus according to the meaning of the ambiguous term they contain. Hence, we argue that this implementation of context clustering shows potential as a method for the automatic detection of the meanings of ambiguous terms in cross-disciplinary communication.

Keywords

Cross-disciplinary communication Disambiguation Word sense induction SenseClusters Terminological ambiguity 

Notes

Acknowledgements

The work presented in this paper was carried out in the context of a PhD fellowship funded by the Research Foundation—Flanders (FWO). We thank Prof. Dr. Liesbet Geris for sharing her cross-disciplinary experiences as the Scientific Coordinator of Prometheus and providing us with the necessary information for the corpus compilation. We also want to thank Prof. Dr. Stephan van der Waart van Gulik for his constructive feedback which helped to improve the paper significantly.

References

  1. Agirre, E., & Edmonds, P. (2006). Word sense disambiguation: Algorithms and applications. Berlin: Springer.CrossRefGoogle Scholar
  2. Ankeny, R. A., & Leonelli, S. (2011). What’s so special about model organisms? Studies in History and Philosophy of Science Part A, 42(2), 313–323.CrossRefGoogle Scholar
  3. Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the chapter of the association for computational linguistics, Baltimore (pp. 238–247). Maryland, USA: ACL.Google Scholar
  4. Benda, L., Poff, L., Tague, C., Palmer, M., Pizzuto, J., Cooper, S., et al. (2002). How to avoid train wrecks when using science in environmental problem solving. BioScience, 52(12), 1127–1139.CrossRefGoogle Scholar
  5. Biemann, C. (2006). Chinese whispers: An efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing, New York City (pp. 73–80).Google Scholar
  6. Bracken, L. J., & Oughton, E. A. (2006). ‘What do you mean?’ The importance of language in developing interdisciplinary research. Transactions of the Institute of British Geographers, 31(3), 371–382.CrossRefGoogle Scholar
  7. Church, K., & Hanks, P. (1989). Word association norms, mutual information, and lexicography. In Proceedings of the 27th annual conference of the association of computational linguistics, Vancouver, British Columbia (pp. 76–83).Google Scholar
  8. de Boer, Y., de Gier, A., Verschuur, M., & de Wit, B. (2006). Bruggen bouwen. Onderzoekers over hun ervaringen met interdisciplinair onderzoek in Nederland. RMNO, KNAW, NWO & COS. Retrieved from https://www.knaw.nl/shared/resources/actueel/publicaties/pdf/Bruggen_Bouwen_Onderzoekers_over_interdisciplinair_onderzoek_2006.pdf/view.
  9. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., & Harshman, R. (1990). Indexing by latent sematnic analysis. Journal of the American SOciety for Information Science, 41(6), 391–407.CrossRefGoogle Scholar
  10. Edmonds, P., & Kilgarriff, A. (2002). Introduction to the special issue on evaluating word sense disambiguation systems. Natural Language Engineering, 8(4), 279–291.CrossRefGoogle Scholar
  11. Escudero, G., Màrquez, L., & Rigau, G. (2000). Boosting applied to word sense disambiguation. In R. López de Mántaras & E. Plaza (Eds.), Machine learning: ECML 2000 (pp. 129–141). Berlin: Springer.CrossRefGoogle Scholar
  12. Francl, M. (2015). Chemical doublespeak. Nature Chemistry, 7(7), 533.CrossRefGoogle Scholar
  13. Hall, T. E., & O’Rourke, M. (2014). Responding to communication challenges in transdisciplinary sustainability science. In Huutoniemi, K. & Tapio, P. (Eds.), Transdisciplinary Sustainability Studies (pp. 135–155). Routledge.Google Scholar
  14. Harris, Z. (1954). Distributional structure. Word, 10(23), 146–162.CrossRefGoogle Scholar
  15. Harvey, R., & Lund, V. (2007). Biofilms and chronic rhinosinusitis: systematic review of evidence, current concepts and directions for research. Rhinology, 45(1), 3–13.Google Scholar
  16. Heemskerk, M. (2003). Conceptual models as tools for communication across disciplines. Conservation Ecology, 7(3), ??.CrossRefGoogle Scholar
  17. Iacobacci, I., Pilehvar, M., & Navigli, R. (2016). Embeddings for word sense disambiguation: An evaluation study. In Proceedings of the 54th annual meeting of the association for computational linguistics (pp. 897–907). Berlin, Germany: ACL.Google Scholar
  18. Karypis, G. (2002). Cluto-a clustering toolkit. Tech. rep., Minnesota Univ Minneapolis Dept of Computer Science.Google Scholar
  19. Klein, J. T. (1996). Crossing boundaries: Knowledge, disciplinarities, and interdisciplinarities. Charlottesville: University of Virginia Press.Google Scholar
  20. Lefever, E., Hoste, V., & De Cock, M. (2011). ParaSense or how to use parallel corpora for word sense disambiguation. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp 317–322) Portland, Oregon, USA: Association for Computational Linguistics.Google Scholar
  21. Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of the 54th Annual meeting of the association for computational linguistics (pp. 302–308) Baltimore, Maryland, USA: ACL.Google Scholar
  22. Lutter, C. (2015). Comparative approaches to visions of community. History and Anthropology, 26(1), 129–143.CrossRefGoogle Scholar
  23. Macken, L., Lefever, E., & Hoste, V. (2013). Texsis: Bilingual terminology extraction from parallel corpora using chunk-based alignment. Terminology International Journal of Theoretical and Applied Issues in Specialized Communication, 19(1), 1–30.Google Scholar
  24. Mennes, J. (2018). SenseDisclosure. A new procedure for dealing with problematically ambiguous terms in cross-disciplinary communication. Language Sciences, 69, 57–67.CrossRefGoogle Scholar
  25. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. In Proceedings of the international conference on learning representations (ICLR).Google Scholar
  26. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing sysems (pp. 3111–3119).Google Scholar
  27. Naiman, R. (1999). A perspective on interdisciplinary science. Ecosystems, 2(4), 292–295.CrossRefGoogle Scholar
  28. Nijhout, H., Reed, M., & Ulrich, C. (2008). Mathematical models of folate-mediated one-carbon metabolism. Vitamins & Hormones, 79, 45–82.CrossRefGoogle Scholar
  29. O’Rourke, M., & Crowley, S. J. (2013). Philosophical intervention and cross-disciplinary science: The story of the toolbox project. Synthese, 190, 1–18.Google Scholar
  30. Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161–199.CrossRefGoogle Scholar
  31. Pedersen, T. (2006). Unsupervised corpus-based methods for WSD. In Word sense disambiguation: Algorithms and applications, Springer, pp 133–166.Google Scholar
  32. Pedersen, T. (2013). Duluth: Word sense induction applied to web page clustering. In Second joint conference on lexical and computational semantics (* SEM), Volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013) (vol. 2, pp. 202–206).Google Scholar
  33. Pedersen, T. (2015). Duluth: Word sense discrimination in the service of lexicography. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 438–442).Google Scholar
  34. Pedersen, T., Purandare, A., & Kulkarni, A. (2005). Name discrimination by clustering similar contexts. In Proceedings of the sixth international conference on intelligent text processing and computational linguistics, Mexico City (pp. 220–231).Google Scholar
  35. Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).Google Scholar
  36. Purandare, A., & Pedersen, T. (2004). Word sense discrimination by clustering contexts in vector and similarity spaces. In Proceedings of the conference on computational natural language learning, Boston, MA (pp. 41–48).Google Scholar
  37. Salton, G. (1971). The SMART retrieval system: Experiments in automatic document processing. Upper Saddle River, NJ: Prentice-Hall.Google Scholar
  38. Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–123.Google Scholar
  39. Serre, D. (2010). Matrices: Theory and applications. Graduate texts in mathematics. (2nd ed.). Springer-Verlag New York.Google Scholar
  40. Spârck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–21.CrossRefGoogle Scholar
  41. Thompson, J. (2009). Building collective communication competence in interdisciplinary research teams. Journal of Applied Communication Research, 37(3), 278–297.CrossRefGoogle Scholar
  42. Turney, P., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188.CrossRefGoogle Scholar
  43. Van de Kauter, M., Coorman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V. (2013). Lets preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal, 3, 103–120.Google Scholar
  44. Van de Cruys, T., & Apidianaki, M. (2011). Latent semantic word sense induction and disambiguation. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, association for computational linguistics, Portland, Oregon, USA (pp. 1476–1485).Google Scholar
  45. Vick, D. W. (2004). Interdisciplinarity and the discipline of law. Journal of Law and Society, 31(2), 163–193.CrossRefGoogle Scholar
  46. Yu, L. C., Wang, J., Lai, K., & Zhang, X. (2017). Refining word embeddings for sentiment analysis. In Empirical methods in natural language processing (EMNLP) (pp. 545–550).Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Language and Translation Technology TeamGhent UniversityGhentBelgium
  2. 2.Department of Computer ScienceUniversity of MinnesotaDuluth, MNUSA

Personalised recommendations