Linguistic Information in Word Embeddings

  • Ali BasiratEmail author
  • Marc Tang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11352)


We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.


Neural network Nominal classification Swedish Word embedding 



Our work on this paper was fully collaborative; the order of the authors names is alphabetical and does not reflect any asymmetry in contribution. We are grateful for the fruitful discussion with the audience of the Special Session on Natural Language Processing in Artificial Intelligence at the 10th International Conference on Agents and Artificial Intelligence in Funchal, Madeira. We would like to express our gratitude to our colleagues Linnea Öberg, Karin Koltay, Rima Haddad, and Josefin Lindgren for their comments and support. We also appreciate the constructive comments from the anonymous referees. We are fully responsible for any remaining errors.


  1. 1.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)zbMATHGoogle Scholar
  2. 2.
    Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750 (2014)Google Scholar
  3. 3.
    Sahlgren, M.: The word-space model. Ph.D. thesis, Stockholm University (2006)Google Scholar
  4. 4.
    Faruqui, M., Dyer, C.: Community evaluation and exchange of word vectors at In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, USA. Association for Computational Linguistics, June 2014Google Scholar
  5. 5.
    Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, pp. 746–751. Association for Computational Linguistics, June 2013Google Scholar
  6. 6.
    Nayak, N., Angeli, G., Manning, C.D.: Evaluating word embeddings using a representative suite of practical tasks. In: Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany. Association for Computational Linguistics, August 2016Google Scholar
  7. 7.
    Basirat, A., Nivre, J.: Real-valued syntactic word vectors (RSV) for greedy neural dependency parsing. In: Proceedings of the 21st Nordic Conference on Computational Linguistics (NoDaLiDa), Gothenburg, Sweden, pp. 20–28. Association for Computational Linguistics, May 2017Google Scholar
  8. 8.
    Basirat, A., Tang, M.: Lexical and morpho-syntactic features in word embeddings: a casestudy of nouns in Swedish. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence (ICAART), Madeira, vol. 2, January 2018Google Scholar
  9. 9.
    Corbett, G.G.: Gender. Cambridge University Press, Cambridge (1991)CrossRefGoogle Scholar
  10. 10.
    Seifart, F.: Nominal classification. Lang. Linguist. Compass 4(8), 719–736 (2010)CrossRefGoogle Scholar
  11. 11.
    Corbett, G.G.: Number of genders. In: Dryer, M.S., Haspelmath, M. (eds.) The World Atlas of Language Structures Online. Max Planck Institute for Evolutionary Anthropology, Leipzig (2013)Google Scholar
  12. 12.
    Delahunty, G.P., Garvey, J.J.: The English Language: From Sound to Sense. Parlor press, West Lafayette (2010)Google Scholar
  13. 13.
    Gillon, B.S.: The lexical semantics of English count and mass nouns. In: Viegas, E. (ed.) Breadth and Depth of Semantic Lexicons, vol. 10, pp. 19–37. Springer, Dordrecht (1999). Scholar
  14. 14.
    Doetjes, J.: Count/mass distinctions across languages. In: Maienborn, C., Heusinger, K.V., Portner, P. (eds.) Semantics: An International Handbook of Natural Language Meaning, Part III, pp. 2559–2580.Mouton de Gruyter, Berlin (2012)Google Scholar
  15. 15.
    Chiarelli, V., El Yagoubi, R., Mondini, S., Bisiacchi, P., Semenza, C.: The syntactic and semantic processing of mass and count nouns: an ERP study. PLoS ONE 6(10), e25885 (2011)CrossRefGoogle Scholar
  16. 16.
    Corbett, G.G.: Number. Cambridge University Press, Cambridge (2000)CrossRefGoogle Scholar
  17. 17.
    Dryer, M.S.: Coding of nominal plurality. In: Haspelmath, M., Dryer, M.S., Gil, D., Comrie, B. (eds.) The Word Atlas of Language Structures, pp. 138–141. Oxford University Press, Oxford (2005)Google Scholar
  18. 18.
    Pelletier, F.J., Schubert, L.K.: Mass expressions. In: Gabbay, D., Guenther, F. (eds.) Handbook of Philosophical Logic. Volume IV: Topics in the Philosophy of Language, pp. 327–408. Reidel, Dordrecht (1989)Google Scholar
  19. 19.
    Schütze, H.: Dimensions of meaning. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pp. 787–796. IEEE Computer Society Press (1992)Google Scholar
  20. 20.
    Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)CrossRefGoogle Scholar
  21. 21.
    Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)CrossRefGoogle Scholar
  22. 22.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 14, pp. 1532–1543 (2014)Google Scholar
  23. 23.
    Lebret, R., Collobert, R.: Word embeddings through Hellinger PCA. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Gothenburg, Sweden, pp. 482–490. Association for Computational Linguistics, April 2014Google Scholar
  24. 24.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at International Conference on Learning Representations (ICLR) (2013)Google Scholar
  25. 25.
    Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2014)Google Scholar
  26. 26.
    Dahl, G., Adams, R., Larochelle, H.: Training restricted Boltzmann machines on word observations. In: Langford, J., Pineau, J. (eds.) Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML 2012, New York, NY, USA, pp. 679–686. Omnipress, July 2012Google Scholar
  27. 27.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)Google Scholar
  29. 29.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)zbMATHGoogle Scholar
  30. 30.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  31. 31.
    Aikhenvald, A.Y.: Round women and long men: shape, size, and the meanings of gender in New Guinea and beyond. Anthropol. Linguist. 54(1), 33–86 (2012)CrossRefGoogle Scholar
  32. 32.
    Contini-Morava, E., Kilarski, M.: Functions of nominal classification. Lang. Sci. 40, 263–299 (2013)CrossRefGoogle Scholar
  33. 33.
    Kemmerer, D.: Categories of object concepts across languages and brains: the relevance of nominal classification systems to cognitive neuroscience. Lang. Cogn. Neurosci. 32(4), 401–424 (2017)CrossRefGoogle Scholar
  34. 34.
    Aikhenvald, A.Y.: Classifiers: A Typology of Noun Categorization Devices. Oxford University Press, Oxford (2000)Google Scholar
  35. 35.
    Senft, G.: Systems of Nominal Classification. Cambridge University Press, Cambridge (2000)Google Scholar
  36. 36.
    Bohnacker, U.: Nominal phrases. In: Josefsson, G., Platzack, C., Hkansson, G. (eds.) The Acquisition of Swedish Grammar, pp. 195–260. John Benjamins, Amsterdam (2004)Google Scholar
  37. 37.
    Dixon, R.M.W.: Noun class and noun classification. In: Craig, C. (ed.) Noun Classes and Categorization, pp. 105–112. John Benjamins, Amsterdam (1986)Google Scholar
  38. 38.
    Nichols, J.: The origin of nominal classification. In: Proceedings of the Fifteenth Annual Meeting of the Berkeley Linguistics Society, pp. 409–420 (1989)Google Scholar
  39. 39.
    Andersson, A.B.: Second language learners’ acquisition of grammatical gender in Swedish. Ph.D. dissertation, University of Gothenburg, Gothenburg (1992)Google Scholar
  40. 40.
    Teleman, U., Hellberg, S., Andersson, E.: Svenska Akademiens Grammatik, vol. 2. Norstedts, Stockholm (1999). Ord. [The Swedish Academy Grammar, Part 2: Words]Google Scholar
  41. 41.
    Dahl, O.: Elementary gender distinctions. In: Unterbeck, B., Rissanen, M. (eds.) Gender in Grammar and Cognition, pp. 577–593. Mouton de Gruyter, Berlin (2000)Google Scholar
  42. 42.
    Fraurud, K.: Proper names and gender in Swedish. In: Unterbeck, B., Rissanen, M., Nevalainen, T., Saari, M. (eds.) Gender in Grammar and Cognition, pp. 167–220. Mouton de Gruyter, Berlin (2000)Google Scholar
  43. 43.
    Lopez, A.: Statistical machine translation. ACM Comput. Surv. 40(3), 1–49 (2008)CrossRefGoogle Scholar
  44. 44.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  45. 45.
    Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 181–189 (2010)Google Scholar
  46. 46.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search. Addison Wesley Longman Limited, Essex (2011)Google Scholar
  47. 47.
    Baldwin, T., Joseph, M.P.A.K.: Restoring punctuation and casing in English text. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS (LNAI), vol. 5866, pp. 547–556. Springer, Heidelberg (2009). Scholar
  48. 48.
    Preiss, J., Stevenson, M.: Distinguishing common and proper nouns. In: Second Joint Conference on Lexical and Computational Semantics: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 80–84 (2013)Google Scholar
  49. 49.
    Quine, W.V.O.: Word and Object. MIT Press, Cambridge (1960)zbMATHGoogle Scholar
  50. 50.
    Chierchia, G.: Plurality of mass nouns and the notion of semantic parameter. In: Rothstein, S. (ed.) Events and Grammar, pp. 53–104. Kluwer, Dordrecht (1998)Google Scholar
  51. 51.
    Chierchia, G.: Mass nouns, vagueness and semantic variation. Synthese 174(1), 99–149 (2010)CrossRefGoogle Scholar
  52. 52.
    Borer, H.: Structuring Sense, Part I. Oxford University Press, Oxford (2005)Google Scholar
  53. 53.
    Cheng, C.Y.: Response to Moravcsik. In: Hintikka, J., Moravczik, J., Suppes, P. (eds.) Approaches to Natural Language, pp. 286–288. D. Reidel, Dordrecht (1973)Google Scholar
  54. 54.
    Kilarski, M.: The place of classifiers in the history of linguistics. Hist. Linguist. 41(1), 33–79 (2014)CrossRefGoogle Scholar
  55. 55.
    Katz, G., Zamparelli, R.: Quantifying count/mass elasticity. In: Proceedings of the 29th West Coast Conference on Formal Linguistics, pp. 371–379 (2012)Google Scholar
  56. 56.
    Carter, D., Kaja, J., Neumeyer, L., Rayner, M., Weng, F., Wirn, M.: Handling compound nouns in a Swedish speech-understanding system. In: Proceedings of the Fourth International Conference on Spoken Language, vol. 1, pp. 26–29 (1996)Google Scholar
  57. 57.
    Ostling, R., Wirn, M.: Compounding in a Swedish blog corpus. In: Acta Universitatis Stockholmiensis, pp. 45–63 (2013)Google Scholar
  58. 58.
    Ullman, E., Nivre, J.: Paraphrasing Swedish compound nouns in machine translation. In: MWE@ EACL, pp. 99–103 (2014)Google Scholar
  59. 59.
    Borin, L., Forsberg, M., Lnngren, L.: The hunting of the BLARK - SALDO, a freely available lexical database for Swedish language technology. Studia Linguistica Upsaliensia, pp. 21–32 (2008)Google Scholar
  60. 60.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRefGoogle Scholar
  61. 61.
    Ting, K.M.: Precision and recall. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 781–781. Springer, Boston (2010). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Linguistics and PhilologyUppsala UniversityUppsalaSweden

Personalised recommendations