Learning class-specific word embeddings

  • Sicong KuangEmail author
  • Brian D. Davison


Recent years have seen the success of applying word embedding algorithms to natural language processing (NLP) tasks. Most word embedding algorithms only produce a single embedding per word. This makes the learned embeddings indiscriminative since many words are polysemous. Some prior work utilizes the context in which the word resides to learn multiple word embeddings. However, context-based solutions are problematic for short texts, such as tweets, which have limited context. Moreover, existing approaches tend to enumerate all possible context types of a particular word regardless of their target applications. Applying multiple vector representations per word in NLP tasks can be computationally expensive because all possible combinations of senses of words in a snippet need to be considered. Sometimes, a word sense can be captured when the class information or label of the short text is presented. For example, in a disaster-related dataset, when a text snippet is labeled as “hurricane related”, the word “water” in the snippet is more likely to be interpreted as rain and flood; when a snippet is labeled as “hurricane unrelated”, the word “water” can be interpreted as its more general meaning. In this work, we propose to use class information to enhance the discriminativeness of words. Instead of enumerating all potential senses per word in the text, the number of vector representations per word should be a function of the future classification task. We show that learning the number of vector representations per word according to the number of classes in the classification task is often sufficient to clarify the polysemy. Word embeddings learned from neural language models typically have the property of good linear compositionality. We utilize this property to encode class information into the vector representation of a word. We explore four approaches to train class-specific embeddings to encode class information by utilizing the label information and the linear compositionality property of word embeddings. We present a general framework consisting of a pair of convolutional neural networks to utilize the learned class-specific word embeddings as input for text classification tasks. We evaluate our approach and framework on topic classification of a disaster-focused Twitter dataset and a benchmark Twitter sentiment classification dataset from SemEval 2013. Our results show a relative accuracy improvement of 3–4% over a recent baseline.


Word embeddings Text classification Polysemy 



This material is based in part upon work supported by the National Science Foundation under Grant No. CMMI-1541177.


  1. 1.
    Nematzadeh A, Meylan SC, Griffiths TL (2017) Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. In: Proceedings of the 39th Annual Meeting of the Cognitive Science SocietyGoogle Scholar
  2. 2.
    Harris ZS (1954) Distributional structure. Word 10:146–162CrossRefGoogle Scholar
  3. 3.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp 3111–3119Google Scholar
  4. 4.
    Liu Q, Ling Z-H, Jiang H, Hu Y (2016) Part-of-speech relevance weights for learning word embeddings, arXiv preprint arXiv:1603.07695
  5. 5.
    Sienčnik SK (2015) Adapting word2vec to named entity recognition, In: Proceedings of the 20th Nordic Conference of Computational Linguistics, Vilnius, Lithuania, 109, Linköping University Electronic Press, pp 239–243Google Scholar
  6. 6.
    Levy O, Goldberg Y (2014) Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol 2, pp 302–308Google Scholar
  7. 7.
    Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceeding of the 52nd Annual Meeting of the Association for Computational LinguisticsGoogle Scholar
  8. 8.
    Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155zbMATHGoogle Scholar
  9. 9.
    Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp 427–431Google Scholar
  10. 10.
    Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations, In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol 1, pp 2227–2237Google Scholar
  11. 11.
    Zheng X, Feng J, Chen Y, Peng H, Zhang W (2017) Learning context-specific word/character embeddings, AAAI Conference on Artificial IntelligenceGoogle Scholar
  12. 12.
    Tian F, Dai H, Bian J, Gao B, Zhang R, Chen E, Liu T-Y (2014) A probabilistic model for learning multi-prototype word embeddings, In: The 25th International Conference on Computational Linguistics: Technical Papers Proceedings of COLING 2014, pp 151–160Google Scholar
  13. 13.
    Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: A lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the International AAAI Conference on Weblogs and Social MediaGoogle Scholar
  14. 14.
    Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes, In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Association for Computational Linguistics, pp 873–882Google Scholar
  15. 15.
    Neelakantan A, Shankar J, Passos A, McCallum A (2015) Efficient non-parametric estimation of multiple embeddings per word in vector space, arXiv preprint arXiv:1504.06654
  16. 16.
    Yu M, Dredze M (2014) Improving lexical embeddings with semantic knowledge, In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 545–550Google Scholar
  17. 17.
    Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy EH, Smith NA (2014) Retrofitting word vectors to semantic lexicons, CoRR abs/1411.4166Google Scholar
  18. 18.
    Yu M, Gormley M, Dredze M (2014) Factor-based compositional embedding models. In: NIPS Workshop on Learning SemanticsGoogle Scholar
  19. 19.
    Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1025–1035Google Scholar
  20. 20.
    Kuang S, Davison BD (2018) Class-specific word embedding through linear compositionality. In: Proceedings of the IEEE international conference on big data and smart computing (BigComp), pp 390–397Google Scholar
  21. 21.
    Kim Y (2014) Convolutional neural networks for sentence classification, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. ArXiv preprint arXiv:1408.5882
  22. 22.
    Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25:259–284CrossRefGoogle Scholar
  23. 23.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  24. 24.
    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537zbMATHGoogle Scholar
  25. 25.
    Ling W, Dyer C, Black A, Trancoso I (2015) Two/too simple adaptations of word2vec for syntax problems, In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1299–1304Google Scholar
  26. 26.
    Chen Y, Perozzi B, Al-Rfou R, Skiena S (2013) The expressive power of word embeddings. In: ICML 2013 Workshop on Deep Learning for Audio, Speech, and Language ProcessingGoogle Scholar
  27. 27.
    Trask A, Michalak P, Liu J (2015) sense2vec-a fast and accurate method for word sense disambiguation in neural word embeddings, arXiv preprint arXiv:1511.06388
  28. 28.
    Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources, In: The 25th International Conference on Computational Linguistics: Technical Papers Proceedings of COLING 2014, pp 497–507Google Scholar
  29. 29.
    Su J, Wu S, Zhang B, Wu C, Qin Y, Xiong D (2018) A neural generative autoencoder for bilingual word embeddings. Inf Sci 424:287–300MathSciNetCrossRefGoogle Scholar
  30. 30.
    Pelevina M, Arefyev N, Biemann C, Panchenko A (2017) Making sense of word embeddings, arXiv preprint arXiv:1708.03390
  31. 31.
    Bollegala D, Yoshida Y, Kawarabayashi K (2018) Using k-way co-occurrences for learning word embeddings, In: AAAI 2018 Conference on Artificial IntelligenceGoogle Scholar
  32. 32.
    Pennington J, Socher R, Manning CD (2014) GloVe: Global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543Google Scholar
  33. 33.
    Scheepers T, Kanoulas E, Gavves E (2018) Improving word embedding compositionality using lexicographic definitions, In: Proceedings of the 2018 World Wide Web Conference, WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 1083–1093Google Scholar
  34. 34.
    Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information, arXiv preprint arXiv:1607.04606
  35. 35.
    Athiwaratkun AGW Ben, Anandkumar A (2018) Probabilistic FastText for multi-sense word embeddings, In: Conference of the Association for Computational Linguistics (ACL)Google Scholar
  36. 36.
    Reynolds D (2015) Gaussian mixture models. Encyclopedia of biometrics, pp 827–832CrossRefGoogle Scholar
  37. 37.
    Chen H, Wei B, Liu Y, Li Y, Yu J, Zhu W (2018) Bilinear joint learning of word and entity embeddings for entity linking. Neurocomputing 294:12–18CrossRefGoogle Scholar
  38. 38.
    Mitchell J, Lapata M (2008) Vector-based models of semantic composition. In: Proceeding of the Annual Meeting of the Association for Computational Linguistics, pp 236–244Google Scholar
  39. 39.
    Li Q, Shah S, Liu X, Nourbakhsh A (2017) Data sets: word embeddings learned from tweets and general data, arXiv preprint arXiv:1708.03994
  40. 40.
    Attardi G (2015) DeepNL: a deep learning NLP pipeline. In: Proceedings of NAACL-HLT, pp 109–115Google Scholar
  41. 41.
    Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28:100–108zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019
Corrected Publication November 2019

Authors and Affiliations

  1. 1.Lehigh UniversityBethlehemUSA

Personalised recommendations