Using Word Embeddings to Analyze how Universities Conceptualize “Diversity” in their Online Institutional Presence


The term diversity can be operationalized demographically (in terms of physical or external characteristics such as race, gender, ethnicity and nationality) or intellectually (in terms of mental phenomena such as viewpoints, beliefs, ideas and political opinion). This work examines the context in which the concept of diversity is used by 50 US elite universities in their online institutional presence. Distributional semantics theory is leveraged to quantify semantic similarity between linguistic items based on their distributional properties in a large sample of language data taken from universities online profiles. The language modelling is carried out using Word2vec, a state-of-the-art machine learning model widely used by the natural language processing community to create vector representations of words (i.e. word embeddings). The model uses a neural network trained to reconstruct the linguistic context of words in the training corpus. As a by-product of the training objective, word2vec embeds words into a learned vector space where words that share common contexts and thus semantic meaning according to the distributional hypotheses, are located in close proximity to one another. A quantitative analysis of cosine similarities between word vectors derived from the corpus of text retrieved from universities online institutional profiles shows that the diversity concept is much closer to demographic operationalisations of diversity such as race, gender, ethnicity or nationality than to intellectual ones such as viewpoints, values, beliefs or political orientation. That is, the universities studied tend to use the word diversity predominantly in its demographic denotation to refer to variety of external appearance instead of to variety of mental phenomena. This is significant in light of the severe lack of ideological diversity in universities across the US, with the vast majority of faculty leaning left of center. Universities emphasis on the usage of the term diversity to denote demographic subtypes of diversity could be indicative of a majority power structure in the Academy which tries to hinder the fostering of viewpoint diversity by steering diversity efforts towards demographic interpretations of the word. At the very least, the results of this work suggest that universities, as judged from the way they use language in their own online institutional profiles, prioritize demographic types of diversity around variety of external appearance cues over intellectual heterogeneity.

