Skip to main content
Log in

A distant supervision method based on paradigmatic relations for learning word embeddings

  • Hybrid Artificial Intelligence and Machine Learning Technologies
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Word embeddings learned on external resources have succeeded in improving many NLP tasks. However, existing embedding models still face challenges in situations where fine-gained semantic information is required, e.g., distinguishing antonyms from synonyms. In this paper, a distant supervision method is proposed to guide the training process by introducing semantic knowledge in a thesaurus. Specifically, the proposed model shortens the distance between target word and its synonyms by controlling the movements of them in both unidirectional and bidirectional, yielding three different models, namely Unidirectional Movement of Target Model (UMT), Unidirectional Movement of Synonyms Model (UMS) and Bidirectional Movement of Target and Synonyms Model (BMTS). Extensive computational experiments have been conducted, and results are collected for analysis purpose. The results show that the proposed models not only efficiently capture semantic information of antonyms but also achieve significant improvements in both intrinsic and extrinsic evaluation tasks. To validate the performance of the proposed models (UMT, UMS and BMTS), results are compared against well-known models, namely Skip-gram, JointRCM, WE-TD and dict2vec. The performances of the proposed models are evaluated on four tasks (benchmarks): word analogy (intrinsic), synonym-antonym detection (intrinsic), sentence matching (extrinsic) and text classification (extrinsic). A case study is provided to illustrate the working of the proposed models in an effective manner. Overall, a distant supervision method based on paradigmatic relations is proposed for learning word embeddings and it outperformed when compared against other existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://code.google.com/archive/p/word2vec/.

  2. It is planned to release all the datasets and code used in this study after the paper is published.

  3. https://github.com/wabyking/TextClassificationBenchmark [31].

References

  1. Adel H, Schütze H (2004) Using mined coreference chains as a resource for a semantictask. In: Proceedings of the 2014 conference on empirical methods in natural language processing(EMNLP), pp 1447–1452

  2. Baker CF, Fillmore CJ, Lowe JB (1998) The berkeley framenet project. In: Proceedings of the 17th international conference on Computational linguistics, vol 1. Associationfor Computational Linguistics, pp 86–90

  3. Bian J, Gao B, Liu T-Y (2014) Knowledge-powered deep learning for word embedding. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 132–148

  4. Chen Z, Lin W, Chen Q, Chen X, Wei S, Jiang H, Zhu X (2015) Revisiting word embedding for contrasting meaning. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers), vol 1, pp 106–115

  5. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference onMachine learning. ACM, pp 160–167

  6. Culler JD (1986) Ferdinand de Saussure. Cornell University Press, Ithaca

    Google Scholar 

  7. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deepbidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805

  8. Huang EH, Socher R, Manning D, Ng AY (2012) Improving wordrepresentations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers-volume 1.Association for Computational Linguistics, pp 873–882

  9. Faruqui M, Dodge J, Jauhar SK, Dyer CD, Hovy E, Smith NA (2014) Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166

  10. Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T et al (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129

  11. Ganitkevitch J, Van Durme B, Callison-Burch C (2013) Ppdb: the paraphrase database. In: Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 758–764

  12. Harris ZS (1954) Distributional structure. Word 10(2–3):146–162

    Article  Google Scholar 

  13. Hinton GE (1986) Learning distributed representations of concepts. In: Proceedings of the eighth annual conference of the cognitive science society, vol 1. Amherst, MA, pp 12

  14. Laura EB (2017) Key and Brittany Pheiffer Noble. Course in general linguistics. Macat Library

  15. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

  16. Kusner M, Sun Y, Kolkin N, Weinberger K (2015) From word embeddings to documentdistances. In: International conference on machine learning, pp 957–966

  17. Lazaridou A, Baroni M et al (2015) A multitask objective to inject lexical contrast into distributional semantics. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 2: short papers), vol 2, pp 21–26

  18. Li Q, Uprety S, Wang B, Song D (2018) Quantum-inspired complex word embedding. arXiv preprint arXiv:1805.11351

  19. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representationsin vector space. arXiv:1301.3781

  20. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representationsof words and phrases and their compositionality. In: Advances in neural information processingsystems, pp 3111–3119

  21. Mikolov T, Yih W-T, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies, pp 746–751, 2013

  22. Nguyen KA, Walde SS, Vu NT (2016) Integrating distributional lexicalcontrast into word embeddings for antonym-synonym distinction. arXiv preprint arXiv:1605.07766

  23. Nguyen KA, Walde SS, Vu NT (2017) Distinguishing antonyms and synonyms in a pattern-based neural network. arXiv preprint arXiv:1701.02962

  24. Nickel M, Kiela D (2017) Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pages 6338–6347, 2017

  25. Ono M, Miwa M, Sasaki Y (2015) Word embedding-based antonym detection using thesauri and distributional information. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 984–989

  26. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543

  27. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365

  28. Sala F, De Sa C, Gu A, Ré C (2018) Representation tradeoffs for hyperbolic embeddings. In: International conference on machine learning, pp 4457–4466

  29. Tissier J, Gravier C, Habrard A (2017) Dict2vec: learning word embeddings using lexical dictionaries. In: Conference on empirical methods in natural language processing (EMNLP2017), pp 254–263

  30. Vilnis L, McCallum A (2014) Word representations via gaussian embedding. arXiv preprintarXiv:1412.6623

  31. Wang B, Wang L, Wei Q (2018) Textzoo, a new benchmark for reconsidering text classification. arXiv preprint arXiv:1802.03656

  32. Yu M, Dredze M (2014) Improving lexical embeddings with semantic knowledge. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 2: short papers), vol 2, pp 545–550

Download references

Acknowledgements

This work is supported by the Quantum Access and Retrieval Theory (QUARTZ) project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 721321.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benyou Wang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Hu, R., Liu, X. et al. A distant supervision method based on paradigmatic relations for learning word embeddings. Neural Comput & Applic 32, 7759–7768 (2020). https://doi.org/10.1007/s00521-019-04071-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04071-6

Keywords

Navigation