Abstract
Word embeddings learned on external resources have succeeded in improving many NLP tasks. However, existing embedding models still face challenges in situations where fine-gained semantic information is required, e.g., distinguishing antonyms from synonyms. In this paper, a distant supervision method is proposed to guide the training process by introducing semantic knowledge in a thesaurus. Specifically, the proposed model shortens the distance between target word and its synonyms by controlling the movements of them in both unidirectional and bidirectional, yielding three different models, namely Unidirectional Movement of Target Model (UMT), Unidirectional Movement of Synonyms Model (UMS) and Bidirectional Movement of Target and Synonyms Model (BMTS). Extensive computational experiments have been conducted, and results are collected for analysis purpose. The results show that the proposed models not only efficiently capture semantic information of antonyms but also achieve significant improvements in both intrinsic and extrinsic evaluation tasks. To validate the performance of the proposed models (UMT, UMS and BMTS), results are compared against well-known models, namely Skip-gram, JointRCM, WE-TD and dict2vec. The performances of the proposed models are evaluated on four tasks (benchmarks): word analogy (intrinsic), synonym-antonym detection (intrinsic), sentence matching (extrinsic) and text classification (extrinsic). A case study is provided to illustrate the working of the proposed models in an effective manner. Overall, a distant supervision method based on paradigmatic relations is proposed for learning word embeddings and it outperformed when compared against other existing models.
Similar content being viewed by others
Notes
It is planned to release all the datasets and code used in this study after the paper is published.
References
Adel H, Schütze H (2004) Using mined coreference chains as a resource for a semantictask. In: Proceedings of the 2014 conference on empirical methods in natural language processing(EMNLP), pp 1447–1452
Baker CF, Fillmore CJ, Lowe JB (1998) The berkeley framenet project. In: Proceedings of the 17th international conference on Computational linguistics, vol 1. Associationfor Computational Linguistics, pp 86–90
Bian J, Gao B, Liu T-Y (2014) Knowledge-powered deep learning for word embedding. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 132–148
Chen Z, Lin W, Chen Q, Chen X, Wei S, Jiang H, Zhu X (2015) Revisiting word embedding for contrasting meaning. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers), vol 1, pp 106–115
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference onMachine learning. ACM, pp 160–167
Culler JD (1986) Ferdinand de Saussure. Cornell University Press, Ithaca
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deepbidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
Huang EH, Socher R, Manning D, Ng AY (2012) Improving wordrepresentations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers-volume 1.Association for Computational Linguistics, pp 873–882
Faruqui M, Dodge J, Jauhar SK, Dyer CD, Hovy E, Smith NA (2014) Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T et al (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129
Ganitkevitch J, Van Durme B, Callison-Burch C (2013) Ppdb: the paraphrase database. In: Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 758–764
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
Hinton GE (1986) Learning distributed representations of concepts. In: Proceedings of the eighth annual conference of the cognitive science society, vol 1. Amherst, MA, pp 12
Laura EB (2017) Key and Brittany Pheiffer Noble. Course in general linguistics. Macat Library
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
Kusner M, Sun Y, Kolkin N, Weinberger K (2015) From word embeddings to documentdistances. In: International conference on machine learning, pp 957–966
Lazaridou A, Baroni M et al (2015) A multitask objective to inject lexical contrast into distributional semantics. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 2: short papers), vol 2, pp 21–26
Li Q, Uprety S, Wang B, Song D (2018) Quantum-inspired complex word embedding. arXiv preprint arXiv:1805.11351
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representationsin vector space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representationsof words and phrases and their compositionality. In: Advances in neural information processingsystems, pp 3111–3119
Mikolov T, Yih W-T, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies, pp 746–751, 2013
Nguyen KA, Walde SS, Vu NT (2016) Integrating distributional lexicalcontrast into word embeddings for antonym-synonym distinction. arXiv preprint arXiv:1605.07766
Nguyen KA, Walde SS, Vu NT (2017) Distinguishing antonyms and synonyms in a pattern-based neural network. arXiv preprint arXiv:1701.02962
Nickel M, Kiela D (2017) Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pages 6338–6347, 2017
Ono M, Miwa M, Sasaki Y (2015) Word embedding-based antonym detection using thesauri and distributional information. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 984–989
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365
Sala F, De Sa C, Gu A, Ré C (2018) Representation tradeoffs for hyperbolic embeddings. In: International conference on machine learning, pp 4457–4466
Tissier J, Gravier C, Habrard A (2017) Dict2vec: learning word embeddings using lexical dictionaries. In: Conference on empirical methods in natural language processing (EMNLP2017), pp 254–263
Vilnis L, McCallum A (2014) Word representations via gaussian embedding. arXiv preprintarXiv:1412.6623
Wang B, Wang L, Wei Q (2018) Textzoo, a new benchmark for reconsidering text classification. arXiv preprint arXiv:1802.03656
Yu M, Dredze M (2014) Improving lexical embeddings with semantic knowledge. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 2: short papers), vol 2, pp 545–550
Acknowledgements
This work is supported by the Quantum Access and Retrieval Theory (QUARTZ) project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 721321.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, J., Hu, R., Liu, X. et al. A distant supervision method based on paradigmatic relations for learning word embeddings. Neural Comput & Applic 32, 7759–7768 (2020). https://doi.org/10.1007/s00521-019-04071-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04071-6