Advertisement

Semantic word shifts in a scientific domain

  • Baitong Chen
  • Ying Ding
  • Feicheng Ma
Article
  • 56 Downloads

Abstract

Understanding semantic word shifts in scientific domains is essential for facilitating interdisciplinary communication. Using a data set of published papers in the field of information retrieval (IR), this paper studies the semantic shifts of words in IR based on mining per-word topic distribution over time. We propose that semantic word shifts not only occur over time, but also over topics. The shifts are examined from two perspectives, the topic-level and the context-level. According to the over-time word-topic distribution, stable words and unstable words are recognized. The diverging and converging trends in the unstable type reveal characteristics of the topic evolution process. The context-level shifts are further detected by similarities between word vectors. Our work associates semantic word shifts with the evolving of topics, which facilitates a better understanding of semantic word shifts from both topics and contexts.

Keywords

Word-topic distribution Semantic shifts Semantic analysis 

Notes

Acknowledgements

This work is funded by the National Natural Science Foundation of China (Grant Nos. 71420107026 and 71704138). The present study is an extended version of an article presented at the 16th International Conference on Scientometrics and Informetrics, Wuhan (China), 16–20 October 2017 (Chen et al. 2017a).

References

  1. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.CrossRefGoogle Scholar
  2. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.CrossRefGoogle Scholar
  3. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.MATHGoogle Scholar
  4. Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Proceedings of GSCL (pp. 31–40).Google Scholar
  5. Chen, B., Ding, Y., & Ma, F. (2017a). Mapping the semantic word shifts in topics in the field of information retrieval. In Proceedings of ISSI 2017The 16th international conference on scientometrics and informetrics (pp. 1335–1341). Wuhan University, China.Google Scholar
  6. Chen, B., Tsutsui, S., Ding, Y., & Ma, F. (2017b). Understanding the topic evolution in a scientific domain: an exploratory study for the field of information retrieval. Journal of Informetrics, 11(4), 1175–1189.CrossRefGoogle Scholar
  7. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning (pp. 160–167). New York, NY, USA: ACM.Google Scholar
  8. Ding, Y., & Stirling, K. (2016). Data-driven discovery: a new era of exploiting the literature and data. Journal of Data and Information Science, 1(4), 1–9.CrossRefGoogle Scholar
  9. Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Advances in Neural Information Processing Systems (pp. 11–18). Cambridge, MA, USA: MIT Press.Google Scholar
  10. Gulordava, K., & Baroni, M. (2011). A distributional similarity approach to the detection of semantic change in the Google Books Ngram Corpus. In Proceedings of the GEMS 2011 workshop on geometrical models of natural language semantics (pp. 67–71). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  11. Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. arXiv:1605.09096 [Cs].
  12. Harris, Z. S. (1954). Distributional structure. Word, 10, 146–162.CrossRefGoogle Scholar
  13. Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864). Cambridge, MA, USA: MIT Press.Google Scholar
  14. Kenter, T., Wevers, M., Huijnen, P., & de Rijke, M. (2015). Ad hoc monitoring of vocabulary shifts over time. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1191–1200). New York, NY, USA: ACM.Google Scholar
  15. Kim, Y., Chiu, Y.-I., Hanaki, K., Hegde, D., & Petrov, S. (2014). Temporal analysis of language through neural language models. arXiv:1405.3515 [Cs].
  16. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.CrossRefGoogle Scholar
  17. Lehmann, W. P. (1993). Historical linguistics: An introduction (3rd edition). London; New York: Routledge.Google Scholar
  18. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781 [Cs].
  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems 26 (pp. 3111–3119). New York: Curran Associates Inc.Google Scholar
  20. Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the Lrec 2010 workshop on new challenges for Nlp Frameworks (pp. 45–50).Google Scholar
  21. Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web (pp. 287–297). Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee.Google Scholar
  22. Wang, S., Schlobach, S., & Klein, M. (2011). Concept drift and how to identify it. Web Semantics: Science, Services and Agents on the World Wide Web, 9(3), 247–265.CrossRefGoogle Scholar
  23. Wijaya, D. T., & Yeniterzi, R. (2011). Understanding semantic change of words over centuries. In Proceedings of the 2011 international workshop on detecting and exploiting cultural diversity on the social web (pp. 35–40). New York, NY, USA: ACM.Google Scholar
  24. Xu, J., Ding, Y., & Malic, V. (2015). Author credit for transdisciplinary collaboration. PLoS ONE, 10(9), e0137968.CrossRefGoogle Scholar
  25. Yan, E., Ding, Y., Milojević, S., & Sugimoto, C. R. (2012). Topics in dynamic research communities: an exploratory study for the field of information retrieval. Journal of Informetrics, 6(1), 140–153.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2018

Authors and Affiliations

  1. 1.Shanghai UniversityShanghaiChina
  2. 2.Indiana UniversityBloomingtonUSA
  3. 3.Wuhan UniversityWuhanChina
  4. 4.Tianjin Normal UniversityTianjinChina

Personalised recommendations