Skip to main content

Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study

Abstract

Cross-disciplinary communication is often impeded by terminological ambiguity. Hence, cross-disciplinary teams would greatly benefit from using a language technology-based tool that allows for the (at least semi-) automated resolution of ambiguous terms. Although no such tool is readily available, an interesting theoretical outline of one does exist. The main obstacle for the concrete realization of this tool is the current lack of an effective method for the automatic detection of the different meanings of ambiguous terms across different disciplinary jargons. In this paper, we set up a pilot study to experimentally assess whether the word sense induction technique of ‘context clustering’, as implemented in the software package ‘SenseClusters’, might be a solution. More specifically, given several sets of sentences coming from a cross-disciplinary corpus containing a specific ambiguous term, we verify whether this technique can classify each sentence in accordance to the meaning of the ambiguous term in that sentence. For the experiments, we first compile a corpus that represents the disciplinary jargons involved in a project on Bone Tissue Engineering. Next, we conduct two series of experiments. The first series focuses on determining appropriate SenseClusters parameter settings using manually selected test data for the ambiguous target terms ‘matrix’ and ‘model’. The second series evaluates the actual performance of SenseClusters using randomly selected test data for an extended set of target terms. We observe that SenseClusters can successfully classify sentences from a cross-disciplinary corpus according to the meaning of the ambiguous term they contain. Hence, we argue that this implementation of context clustering shows potential as a method for the automatic detection of the meanings of ambiguous terms in cross-disciplinary communication.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. Note that, in this paper, we remain agnostic with respect to the relation between ambiguity and related phenomena like polysemy, fuzziness, vagueness and generality.

  2. There are many general concerns about the use of sense inventory based approaches. For example, there is the difficulty of demarcating the semantic information that should be included in a sense description, and that of distinguishing between closely related senses (Edmonds and Kilgarriff 2002). Relying on sense inventories is particularly problematic in the context of CD projects. Such projects require the compilation of a custom inventory by selecting sense descriptions from existing ‘disciplinary’ sense inventories. Yet, it is unclear how one can ensure that relevant sense descriptions are selected from relevant sense inventories. Moreover, new sense descriptions would need to be developed for terms that are not included in existing inventories.

  3. In this paper, we use the notions ‘word’ and ‘term’ interchangeably, though we use the latter especially when we want to stress that a lexical unit has a meaning.

  4. For more information, go to http://senseclusters.sourceforge.net.

  5. The more specialized a corpus is, the less broad definitions it contains. This means that references to more general or high-level components of the meanings of terms will be scarce, and thus are less likely to be picked up by means of a context clustering technique.

  6. By spanning different disciplines, the corpus becomes highly variegated as one meaning (e.g. ‘having the capacity to cause rotation’) will often be referred to by different terms (e.g. ‘couple’ in kinematics and ‘force’ in kinetics). This poses a challenge for context clustering, as not only term ambiguity is present but also (latent) synonymy.

  7. For more information, go to https://www.mtm.kuleuven.be/prometheus.

  8. The sub-corpora do not perfectly mirror the original texts as the accuracy of the recognition results was only sanity-checked.

  9. The underlying reason is that SenseClusters is based on the distributional hypothesis as mentioned earlier in Section 2. See also Subsection 3.4.

  10. We define ‘best result’ as the highest accuracy.

  11. Because the settings combination of feature type ‘co-occurrences’ and a window size of ‘6’ yielded better results than the settings combination of feature type ‘co-occurrence’ and a window size of ‘3’, we omitted the latter combination of parameter settings in the fourth round of experiments.

  12. https://www.wordclouds.com

References

  • Agirre, E., & Edmonds, P. (2006). Word sense disambiguation: Algorithms and applications. Berlin: Springer.

    Book  Google Scholar 

  • Ankeny, R. A., & Leonelli, S. (2011). What’s so special about model organisms? Studies in History and Philosophy of Science Part A, 42(2), 313–323.

    Article  Google Scholar 

  • Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the chapter of the association for computational linguistics, Baltimore (pp. 238–247). Maryland, USA: ACL.

  • Benda, L., Poff, L., Tague, C., Palmer, M., Pizzuto, J., Cooper, S., et al. (2002). How to avoid train wrecks when using science in environmental problem solving. BioScience, 52(12), 1127–1139.

    Article  Google Scholar 

  • Biemann, C. (2006). Chinese whispers: An efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing, New York City (pp. 73–80).

  • Bracken, L. J., & Oughton, E. A. (2006). ‘What do you mean?’ The importance of language in developing interdisciplinary research. Transactions of the Institute of British Geographers, 31(3), 371–382.

    Article  Google Scholar 

  • Church, K., & Hanks, P. (1989). Word association norms, mutual information, and lexicography. In Proceedings of the 27th annual conference of the association of computational linguistics, Vancouver, British Columbia (pp. 76–83).

  • de Boer, Y., de Gier, A., Verschuur, M., & de Wit, B. (2006). Bruggen bouwen. Onderzoekers over hun ervaringen met interdisciplinair onderzoek in Nederland. RMNO, KNAW, NWO & COS. Retrieved from https://www.knaw.nl/shared/resources/actueel/publicaties/pdf/Bruggen_Bouwen_Onderzoekers_over_interdisciplinair_onderzoek_2006.pdf/view.

  • Deerwester, S., Dumais, S., Landauer, T., Furnas, G., & Harshman, R. (1990). Indexing by latent sematnic analysis. Journal of the American SOciety for Information Science, 41(6), 391–407.

    Article  Google Scholar 

  • Edmonds, P., & Kilgarriff, A. (2002). Introduction to the special issue on evaluating word sense disambiguation systems. Natural Language Engineering, 8(4), 279–291.

    Article  Google Scholar 

  • Escudero, G., Màrquez, L., & Rigau, G. (2000). Boosting applied to word sense disambiguation. In R. López de Mántaras & E. Plaza (Eds.), Machine learning: ECML 2000 (pp. 129–141). Berlin: Springer.

    Chapter  Google Scholar 

  • Francl, M. (2015). Chemical doublespeak. Nature Chemistry, 7(7), 533.

    Article  Google Scholar 

  • Hall, T. E., & O’Rourke, M. (2014). Responding to communication challenges in transdisciplinary sustainability science. In Huutoniemi, K. & Tapio, P. (Eds.), Transdisciplinary Sustainability Studies (pp. 135–155). Routledge.

  • Harris, Z. (1954). Distributional structure. Word, 10(23), 146–162.

    Article  Google Scholar 

  • Harvey, R., & Lund, V. (2007). Biofilms and chronic rhinosinusitis: systematic review of evidence, current concepts and directions for research. Rhinology, 45(1), 3–13.

    Google Scholar 

  • Heemskerk, M. (2003). Conceptual models as tools for communication across disciplines. Conservation Ecology, 7(3), ??.

    Article  Google Scholar 

  • Iacobacci, I., Pilehvar, M., & Navigli, R. (2016). Embeddings for word sense disambiguation: An evaluation study. In Proceedings of the 54th annual meeting of the association for computational linguistics (pp. 897–907). Berlin, Germany: ACL.

  • Karypis, G. (2002). Cluto-a clustering toolkit. Tech. rep., Minnesota Univ Minneapolis Dept of Computer Science.

  • Klein, J. T. (1996). Crossing boundaries: Knowledge, disciplinarities, and interdisciplinarities. Charlottesville: University of Virginia Press.

    Google Scholar 

  • Lefever, E., Hoste, V., & De Cock, M. (2011). ParaSense or how to use parallel corpora for word sense disambiguation. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp 317–322) Portland, Oregon, USA: Association for Computational Linguistics.

  • Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of the 54th Annual meeting of the association for computational linguistics (pp. 302–308) Baltimore, Maryland, USA: ACL.

  • Lutter, C. (2015). Comparative approaches to visions of community. History and Anthropology, 26(1), 129–143.

    Article  Google Scholar 

  • Macken, L., Lefever, E., & Hoste, V. (2013). Texsis: Bilingual terminology extraction from parallel corpora using chunk-based alignment. Terminology International Journal of Theoretical and Applied Issues in Specialized Communication, 19(1), 1–30.

    Google Scholar 

  • Mennes, J. (2018). SenseDisclosure. A new procedure for dealing with problematically ambiguous terms in cross-disciplinary communication. Language Sciences, 69, 57–67.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. In Proceedings of the international conference on learning representations (ICLR).

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing sysems (pp. 3111–3119).

  • Naiman, R. (1999). A perspective on interdisciplinary science. Ecosystems, 2(4), 292–295.

    Article  Google Scholar 

  • Nijhout, H., Reed, M., & Ulrich, C. (2008). Mathematical models of folate-mediated one-carbon metabolism. Vitamins & Hormones, 79, 45–82.

    Article  Google Scholar 

  • O’Rourke, M., & Crowley, S. J. (2013). Philosophical intervention and cross-disciplinary science: The story of the toolbox project. Synthese, 190, 1–18.

    Article  Google Scholar 

  • Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161–199.

    Article  Google Scholar 

  • Pedersen, T. (2006). Unsupervised corpus-based methods for WSD. In Word sense disambiguation: Algorithms and applications, Springer, pp 133–166.

  • Pedersen, T. (2013). Duluth: Word sense induction applied to web page clustering. In Second joint conference on lexical and computational semantics (* SEM), Volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013) (vol. 2, pp. 202–206).

  • Pedersen, T. (2015). Duluth: Word sense discrimination in the service of lexicography. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 438–442).

  • Pedersen, T., Purandare, A., & Kulkarni, A. (2005). Name discrimination by clustering similar contexts. In Proceedings of the sixth international conference on intelligent text processing and computational linguistics, Mexico City (pp. 220–231).

  • Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

  • Purandare, A., & Pedersen, T. (2004). Word sense discrimination by clustering contexts in vector and similarity spaces. In Proceedings of the conference on computational natural language learning, Boston, MA (pp. 41–48).

  • Salton, G. (1971). The SMART retrieval system: Experiments in automatic document processing. Upper Saddle River, NJ: Prentice-Hall.

    Google Scholar 

  • Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–123.

    Google Scholar 

  • Serre, D. (2010). Matrices: Theory and applications. Graduate texts in mathematics. (2nd ed.). Springer-Verlag New York.

  • Spârck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–21.

    Article  Google Scholar 

  • Thompson, J. (2009). Building collective communication competence in interdisciplinary research teams. Journal of Applied Communication Research, 37(3), 278–297.

    Article  Google Scholar 

  • Turney, P., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188.

    Article  Google Scholar 

  • Van de Kauter, M., Coorman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V. (2013). Lets preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal, 3, 103–120.

    Google Scholar 

  • Van de Cruys, T., & Apidianaki, M. (2011). Latent semantic word sense induction and disambiguation. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, association for computational linguistics, Portland, Oregon, USA (pp. 1476–1485).

  • Vick, D. W. (2004). Interdisciplinarity and the discipline of law. Journal of Law and Society, 31(2), 163–193.

    Article  Google Scholar 

  • Yu, L. C., Wang, J., Lai, K., & Zhang, X. (2017). Refining word embeddings for sentiment analysis. In Empirical methods in natural language processing (EMNLP) (pp. 545–550).

Download references

Acknowledgements

The work presented in this paper was carried out in the context of a PhD fellowship funded by the Research Foundation—Flanders (FWO). We thank Prof. Dr. Liesbet Geris for sharing her cross-disciplinary experiences as the Scientific Coordinator of Prometheus and providing us with the necessary information for the corpus compilation. We also want to thank Prof. Dr. Stephan van der Waart van Gulik for his constructive feedback which helped to improve the paper significantly.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julie Mennes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mennes, J., Pedersen, T. & Lefever, E. Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study. Lang Resources & Evaluation 53, 889–917 (2019). https://doi.org/10.1007/s10579-019-09455-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-019-09455-7

Keywords

  • Cross-disciplinary communication
  • Disambiguation
  • Word sense induction
  • SenseClusters
  • Terminological ambiguity