Using Word Embeddings to Analyze how Universities Conceptualize “Diversity” in their Online Institutional Presence


The term diversity can be operationalized demographically (in terms of physical or external characteristics such as race, gender, ethnicity and nationality) or intellectually (in terms of mental phenomena such as viewpoints, beliefs, ideas and political opinion). This work examines the context in which the concept of diversity is used by 50 US elite universities in their online institutional presence. Distributional semantics theory is leveraged to quantify semantic similarity between linguistic items based on their distributional properties in a large sample of language data taken from universities online profiles. The language modelling is carried out using Word2vec, a state-of-the-art machine learning model widely used by the natural language processing community to create vector representations of words (i.e. word embeddings). The model uses a neural network trained to reconstruct the linguistic context of words in the training corpus. As a by-product of the training objective, word2vec embeds words into a learned vector space where words that share common contexts and thus semantic meaning according to the distributional hypotheses, are located in close proximity to one another. A quantitative analysis of cosine similarities between word vectors derived from the corpus of text retrieved from universities online institutional profiles shows that the diversity concept is much closer to demographic operationalisations of diversity such as race, gender, ethnicity or nationality than to intellectual ones such as viewpoints, values, beliefs or political orientation. That is, the universities studied tend to use the word diversity predominantly in its demographic denotation to refer to variety of external appearance instead of to variety of mental phenomena. This is significant in light of the severe lack of ideological diversity in universities across the US, with the vast majority of faculty leaning left of center. Universities emphasis on the usage of the term diversity to denote demographic subtypes of diversity could be indicative of a majority power structure in the Academy which tries to hinder the fostering of viewpoint diversity by steering diversity efforts towards demographic interpretations of the word. At the very least, the results of this work suggest that universities, as judged from the way they use language in their own online institutional profiles, prioritize demographic types of diversity around variety of external appearance cues over intellectual heterogeneity.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Further Reading

  1. Cardiff, C. F., & Klein, D. B. 2005. Faculty partisan affiliations in all disciplines: A voter-registration study. Critical Review, 17(3–4), 237–255.

    Article  Google Scholar 

  2. Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E. 2014. Political Diversity Will Improve Social Psychological Science. Behavioral and Brain Sciences, 1–54.

  3. Emler, N., Renwick, S., & Malone, B. 1983. The Relationship Between Moral Reasoning and Political Orientation. Journal of Personality and Social Psychology, 45, 1073–1080.

    Article  Google Scholar 

  4. Firth, J. R. 1957. A synopsis of linguistic theory (pp. 19301955).

  5. Inbar, Y., & Lammers, J. 2012. Political Diversity in Social and Personality Psychology. Perspectives on Psychological Science, 7(5), 496–503.

    Article  Google Scholar 

  6. Lamm, H., & Myers, D. G. 1978. Group-Induced Polarization of Attitudes and Behavior. Advances in Experimental Social Psychology, 11, 145–195.

    Google Scholar 

  7. Langbert, M., Quain, A., & B. Klein, D. 2016. Faculty Voter Registration in Economics, History, Journalism, Law, and Psychology. Econ Journal Watch, 13, 422–451.

  8. Maaten, L. van der, & Hinton, G. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. 2013. Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (vol. 26, pp. 3111–3119). Curran Associates, Inc Retrieved from

  10. Page, S. E. 2008. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton University Press.

  11. Putnam, R. D. 2007. E Pluribus Unum: Diversity and Community in the Twenty-first Century The 2006 Johan Skytte Prize Lecture. Scandinavian Political Studies, 30(2), 137–174.

    Article  Google Scholar 

  12. Řehůřek, R., & Sojka, P. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–50). Valletta, Malta: ELRA.

    Google Scholar 

  13. Shi, F., Teplitskiy, M., Duede, E., & Evans, J. 2017. The Wisdom of Polarized Crowds. ArXiv:1712.06414 [Cs, Stat]. Retrieved from

  14. Simonton, D. K. 1999. Origins of Genius: Darwinian Perspectives on Creativity. Oxford, New York:Oxford University Press.

    Google Scholar 

  15. Williams, K., & O’Reilly, C. 1998. Demography and Diversity in Organizations: A Review of 40 Years of Research. Research in Organizational Behavior, 20, 77–140.

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to David Rozado.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rozado, D. Using Word Embeddings to Analyze how Universities Conceptualize “Diversity” in their Online Institutional Presence. Soc 56, 256–266 (2019).

Download citation


  • Diversity
  • Word embeddings
  • Word2vec
  • Computational content analysis
  • Viewpoint diversity