Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Learning class-specific word embeddings

  • 177 Accesses


Recent years have seen the success of applying word embedding algorithms to natural language processing (NLP) tasks. Most word embedding algorithms only produce a single embedding per word. This makes the learned embeddings indiscriminative since many words are polysemous. Some prior work utilizes the context in which the word resides to learn multiple word embeddings. However, context-based solutions are problematic for short texts, such as tweets, which have limited context. Moreover, existing approaches tend to enumerate all possible context types of a particular word regardless of their target applications. Applying multiple vector representations per word in NLP tasks can be computationally expensive because all possible combinations of senses of words in a snippet need to be considered. Sometimes, a word sense can be captured when the class information or label of the short text is presented. For example, in a disaster-related dataset, when a text snippet is labeled as “hurricane related”, the word “water” in the snippet is more likely to be interpreted as rain and flood; when a snippet is labeled as “hurricane unrelated”, the word “water” can be interpreted as its more general meaning. In this work, we propose to use class information to enhance the discriminativeness of words. Instead of enumerating all potential senses per word in the text, the number of vector representations per word should be a function of the future classification task. We show that learning the number of vector representations per word according to the number of classes in the classification task is often sufficient to clarify the polysemy. Word embeddings learned from neural language models typically have the property of good linear compositionality. We utilize this property to encode class information into the vector representation of a word. We explore four approaches to train class-specific embeddings to encode class information by utilizing the label information and the linear compositionality property of word embeddings. We present a general framework consisting of a pair of convolutional neural networks to utilize the learned class-specific word embeddings as input for text classification tasks. We evaluate our approach and framework on topic classification of a disaster-focused Twitter dataset and a benchmark Twitter sentiment classification dataset from SemEval 2013. Our results show a relative accuracy improvement of 3–4% over a recent baseline.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    Example 3 in Table 1 is extracted from the disaster-focused Twitter corpus T6 [13] which we describe in Sect. 4.1.

  2. 2.


  3. 3.

    These three tweets are extracted from SemEval 2013 training data.


  1. 1.

    Nematzadeh A, Meylan SC, Griffiths TL (2017) Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. In: Proceedings of the 39th Annual Meeting of the Cognitive Science Society

  2. 2.

    Harris ZS (1954) Distributional structure. Word 10:146–162

  3. 3.

    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp 3111–3119

  4. 4.

    Liu Q, Ling Z-H, Jiang H, Hu Y (2016) Part-of-speech relevance weights for learning word embeddings, arXiv preprint arXiv:1603.07695

  5. 5.

    Sienčnik SK (2015) Adapting word2vec to named entity recognition, In: Proceedings of the 20th Nordic Conference of Computational Linguistics, Vilnius, Lithuania, 109, Linköping University Electronic Press, pp 239–243

  6. 6.

    Levy O, Goldberg Y (2014) Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol 2, pp 302–308

  7. 7.

    Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceeding of the 52nd Annual Meeting of the Association for Computational Linguistics

  8. 8.

    Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

  9. 9.

    Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp 427–431

  10. 10.

    Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations, In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol 1, pp 2227–2237

  11. 11.

    Zheng X, Feng J, Chen Y, Peng H, Zhang W (2017) Learning context-specific word/character embeddings, AAAI Conference on Artificial Intelligence

  12. 12.

    Tian F, Dai H, Bian J, Gao B, Zhang R, Chen E, Liu T-Y (2014) A probabilistic model for learning multi-prototype word embeddings, In: The 25th International Conference on Computational Linguistics: Technical Papers Proceedings of COLING 2014, pp 151–160

  13. 13.

    Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: A lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the International AAAI Conference on Weblogs and Social Media

  14. 14.

    Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes, In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Association for Computational Linguistics, pp 873–882

  15. 15.

    Neelakantan A, Shankar J, Passos A, McCallum A (2015) Efficient non-parametric estimation of multiple embeddings per word in vector space, arXiv preprint arXiv:1504.06654

  16. 16.

    Yu M, Dredze M (2014) Improving lexical embeddings with semantic knowledge, In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 545–550

  17. 17.

    Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy EH, Smith NA (2014) Retrofitting word vectors to semantic lexicons, CoRR abs/1411.4166

  18. 18.

    Yu M, Gormley M, Dredze M (2014) Factor-based compositional embedding models. In: NIPS Workshop on Learning Semantics

  19. 19.

    Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1025–1035

  20. 20.

    Kuang S, Davison BD (2018) Class-specific word embedding through linear compositionality. In: Proceedings of the IEEE international conference on big data and smart computing (BigComp), pp 390–397

  21. 21.

    Kim Y (2014) Convolutional neural networks for sentence classification, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. ArXiv preprint arXiv:1408.5882

  22. 22.

    Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25:259–284

  23. 23.

    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

  24. 24.

    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

  25. 25.

    Ling W, Dyer C, Black A, Trancoso I (2015) Two/too simple adaptations of word2vec for syntax problems, In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1299–1304

  26. 26.

    Chen Y, Perozzi B, Al-Rfou R, Skiena S (2013) The expressive power of word embeddings. In: ICML 2013 Workshop on Deep Learning for Audio, Speech, and Language Processing

  27. 27.

    Trask A, Michalak P, Liu J (2015) sense2vec-a fast and accurate method for word sense disambiguation in neural word embeddings, arXiv preprint arXiv:1511.06388

  28. 28.

    Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources, In: The 25th International Conference on Computational Linguistics: Technical Papers Proceedings of COLING 2014, pp 497–507

  29. 29.

    Su J, Wu S, Zhang B, Wu C, Qin Y, Xiong D (2018) A neural generative autoencoder for bilingual word embeddings. Inf Sci 424:287–300

  30. 30.

    Pelevina M, Arefyev N, Biemann C, Panchenko A (2017) Making sense of word embeddings, arXiv preprint arXiv:1708.03390

  31. 31.

    Bollegala D, Yoshida Y, Kawarabayashi K (2018) Using k-way co-occurrences for learning word embeddings, In: AAAI 2018 Conference on Artificial Intelligence

  32. 32.

    Pennington J, Socher R, Manning CD (2014) GloVe: Global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543

  33. 33.

    Scheepers T, Kanoulas E, Gavves E (2018) Improving word embedding compositionality using lexicographic definitions, In: Proceedings of the 2018 World Wide Web Conference, WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 1083–1093

  34. 34.

    Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information, arXiv preprint arXiv:1607.04606

  35. 35.

    Athiwaratkun AGW Ben, Anandkumar A (2018) Probabilistic FastText for multi-sense word embeddings, In: Conference of the Association for Computational Linguistics (ACL)

  36. 36.

    Reynolds D (2015) Gaussian mixture models. Encyclopedia of biometrics, pp 827–832

  37. 37.

    Chen H, Wei B, Liu Y, Li Y, Yu J, Zhu W (2018) Bilinear joint learning of word and entity embeddings for entity linking. Neurocomputing 294:12–18

  38. 38.

    Mitchell J, Lapata M (2008) Vector-based models of semantic composition. In: Proceeding of the Annual Meeting of the Association for Computational Linguistics, pp 236–244

  39. 39.

    Li Q, Shah S, Liu X, Nourbakhsh A (2017) Data sets: word embeddings learned from tweets and general data, arXiv preprint arXiv:1708.03994

  40. 40.

    Attardi G (2015) DeepNL: a deep learning NLP pipeline. In: Proceedings of NAACL-HLT, pp 109–115

  41. 41.

    Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28:100–108

Download references


This material is based in part upon work supported by the National Science Foundation under Grant No. CMMI-1541177.

Author information

Correspondence to Sicong Kuang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: The article Learning class‑specific word embeddings, written by Sicong Kuang and Brian D. Davison, was originally published electronically on the publisher's internet portal (currently SpringerLink) on 23 October 2019 with open access. With the author(s)’ decision to step back from Open Choice, the copyright of the article changed on 18 November 2019 to © Springer Science+Business Media, LLC, part of Springer Nature 2019 and the article is forthwith distributed under the terms of copyright.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kuang, S., Davison, B.D. Learning class-specific word embeddings. J Supercomput (2019). https://doi.org/10.1007/s11227-019-03024-z

Download citation


  • Word embeddings
  • Text classification
  • Polysemy