Method
The meanings of politically charged words as used by the two major American political parties at different points in time as well as the 2016 presidential nominees and a population of voters were estimated separately, based on spontaneous speech from primary election debates, and then compared. We also compared the party alignment of major candidates from the 2016 election by feeding the debates into the model in chronological order and sampling the relationships between candidates and their respective political parties at different time-points. We used the learned semantic representations to identify words that are both disproportionately important to a single party and that are relatively equal in terms of their importance based on graph analyses of the centrality of each of the politically charged words in the semantic spaces. Finally, we compared word associations generated from human raters with those from computational models to identify similarities and differences between presidential candidates’ representations of these terms and non-political everyday representation of the same terms.
Input corpus and key terms
The input corpus to our computational model is based on publicly available presidential speeches and documents, all in machine-readable text format. We selected only primary Republican and Democratic presidential debate transcripts between the years 1999 and 2016 from the Presidential Document Archive of the American Presidency Project (http://www.presidency.ucsb.edu/index_docs.php), a non-profit and non-partisan document archive hosted at the University of California, Santa Barbara, CA (Peters & Woolley, 2016; http://www.presidency.ucsb.edu/). We did not use data from election years 2004 and 2012 because there was an incumbent who ended up winning the election, as we were more interested in tracking changes over time on political concepts based on party lines.
To select from this database the keywords/terms that represent the politically charged concepts, we used a combination of human judgment and computational analyses. The corpus was cleaned by identifying named entities and combining them into single words, removing stop words, removing inflections from nouns, verbs, and adverbs, and lemmatizing the resulting words using the POS (part of speech) tagger and named entity recognizer from Stanford Natural Language Toolkit, an open-source software for the analysis of language and speech corpora (http://nlp.stanford.edu/software). The words were first passed through an algorithm that (a) favored nouns that were used frequently in the debates and (b) avoided nouns that were used disproportionately by any single political party. This was achieved as follows: each word identified as a noun by the POS tagger was assigned a score, which was equal to the sum of the normalized frequency of that word in all of the debates for each party divided by the difference in these normalized frequency scores. This was to ensure that the selected nouns were both important and as non-partisan as possible because words that were used disproportionately by only one party received lower scores. A subset of high-scoring nouns was then manually selected by the researchers. We also included noun phrases because many political concepts are expressed by two or more words (e.g., “health care”). The noun phrases were first passed through an algorithm that identified sets of nouns that were more likely to co-occur with each other than with other words. The noun phrases were then combined into “single words” in the corpus so the model would learn a distinct representation for each noun phrase. The final list of words included a total of 213 single word and 397 word phrases as the key concepts. A subset of these – the 136 words as shown in Fig. 1 – is presented in Table 1. This subset of 136 words met one further criterion, which was that the word had to appear at least five times in the input to any semantic space. For example, when building a semantic space for Donald Trump, the word had to appear five or more times in Donald Trump’s speech. If comparing Donald Trump, Hillary Clinton, and the Republican and Democratic parties across all time points, then the word had to appear at least five times in each candidate’s speech, and in the documents corresponding to the parties and time points.
Table 1 Word list of 136 concepts for the semantic spaces
Model algorithm
A three-layer artificial neural network as implemented in the word2vec model (Mikolov et al., 2013a, b, c; see also https://code.google.com/p/word2vec/) is used to learn the distributional statistics among words and contexts from the input corpus (see Input Corpus and Key Terms). Like other computational semantic space models, word2vec exploits the semantic information distributed in large-scale text or speech corpora, specifically by learning the co-occurrence statistics that hold among words and contexts (which could include words, phrases, sentences, or entire documents, as discussed in the Introduction). However, it learns the distributional statistics by using a neural network algorithm, specifically the back-propagation algorithm (Rumelhart et al., 1986), according to which the network updates its connection weights as follows: each time the network is presented with an input-to-output mapping, the discrepancy (or error) between the target output and the actual output is calculated; this error is then propagated back to the network so that the relevant connection weights can be updated relative to the amount of error present, so that over time the network’s connection weights will be optimized for producing the desired output given new input patterns.
There are two major mechanisms implemented in word2vec: (a) Skip-gram (SG): given a target word, the network predicts the context associated with the word (e.g., multiple words co-occurring with the target word), and (b) Continuous bag-of-word (CBOW): given a continuous set of word strings as the context, the network finds the target word that best fits in the context. In our modeling, we trained the model using both algorithms as implemented in Python’s Gensim package (Rehurek, 2010) and concatenated the representations into vectors with 4,000 dimensions. The decision to use concatenated vector representations was based on the consideration that (a) the two algorithms may be sensitive to different types of word associations (e.g., dominant, paradigmatic, vs. non-dominant associations), which may implicate different processing mechanisms (Jung-Beeman, 2005), and (b) previous work has suggested that concatenated vectors can in some cases provide increased accuracy in representing subtle semantic differences (Fyshe et al., 2013; Schloss & Li, 2016).Footnote 1 We used word2vec’s default settings, with a window size of five and a minimum word-count of five (words that were used less than five times were excluded) for all models reported below except for the time-course analysis. As reported in Fig. 2, we used the initial 2,000 dimensions from the Fyshe semantic spaces (Fyshe et al., 2013), and updated the vectors based on the BEAGLE model’s algorithm for updating semantic information (Jones & Mewhort, 2007). This implements an episodic memory model of semantic learning that simulates continuous changes over time. The purpose of this model was to plot gradual changes in the semantic space over the course of the presidential debates (see Fig. 2). Although we used a different learning algorithm for this analysis, we kept the same window size and minimum word cut-off threshold as in the word2vec models. Furthermore, we only used the debates from 2015–2016 for this analysis, and they were entered into the model in chronological order.
Construction of political semantic spaces
To derive semantic space vectors based on documents of presidential debates, we initialized the vectors of all target words as follows: (a) vectors that did not correspond to the political concepts in Table 1 were assigned the Fyshe vector representation with 2,000 dimensions; and (b) vectors that were neither in the Fyshe model vocabulary nor in the list of words in Table 1 were assigned a random vector based on a normal distribution from the average value and standard deviation of each of the 2,000 dimensions in the Fyshe vectors that were used in our model (Fyshe et al., 2013). The Fyshe vectors were calculated from a generic, politically neutral, 16-billion word and 50-million document corpus (Callan and Hoy, 2009). When training the model, a single semantic space was constructed each time, but the political concepts were allowed to vary individually so that candidate-specific, party-specific, or time-specific semantic vectors could be constructed depending on the goal of each analysis. This was achieved by tagging each politically charged word with a unique marker, for example, “_c” for Hillary Clinton. Thus, the model treated “economy_c” (for Clinton) and “economy_t” (for Donald Trump) as separate words. This method then allowed us to compare the word “economy” as it was used by Clinton versus that used by Trump. Different numbers of words contributed to each analysis depending on the specific semantic space derived (and how they were tagged) as a function of party, candidate, and election year. For example, if the word “child_education_c” and “child_education_t” did not appear five times in Clinton’s and Trump’s speech, the concept “child_education” was excluded. The minimum word cut-off was to ensure that there was enough data to build an accurate representation of a given word. In cases where we compared entire political parties or political parties at certain time points, all speeches in the corresponding debates tagged as belonging to a specific party were entered into the model. When comparing Clinton and Trump to their respective parties, the speeches from Clinton or Trump would be separated from the speeches by other presidential candidates that entered into the model for the respective parties (e.g., “DEM_2016” for the Democratic Party 2016 that included speeches by Bernie Sanders, and “REP_2016” for the Republican Party 2016 that included speeches by Ted Cruz and Jeb Bush). For the analysis comparing the political parties across time depicted in Fig. 1, we used 135 of the terms from Table 1, i.e., all of them except “border” (which was not a frequently used word before September 11, 2001). For the analyses that included individual candidates, we used 53 terms (see Results section).
Analysis and display of semantic spaces
In addition to building the semantic spaces from the speech corpora as described, we used several statistical methods to analyze and visualize the vectors of key political concepts from the high-dimensional semantic spaces that varied by party, candidate, or time of election.
Similarity measures and comparisons
Both cosine similarity and the Euclidean distance were used as measures of similarity (see Fig. 1). The cosine similarity between two vectors is the cosine of the angle, a measure of geometric similarity of the two vectors in a high dimensional space, widely to reflect similarity of semantic representation of the language users (e.g., Landauer et al., 2013). The Euclidean distance is the standard measure of distance between two points in space. In general, we characterized the meaning of any single political concept as its cosine similarity with all other political concepts (of which there were 135 or 53, see previous section), and a semantic space as the entire set of these pairwise cosine similarities. To compare political semantic spaces, we used the Euclidean distance to measure (the square root of) the sum of the squared differences between pairwise political concepts in the two semantic spaces. While the individual cosine measures provide detailed information about which concepts move closer or farther apart at different time points across parties and individuals, the Euclidean distances characterize the aggregate of these changes.
Ordinary least squares regression (OLS)
Ordinary least squares (OLS) is a standard regression method used to evaluate linear relationships between a set of predictor variables and an outcome variable. In our case the outcome variable was the Euclidean distance between two concepts, and the predictor variables were the difference in time (0, 8, or 16 years) as a continuous variable, REP-REP and DEM-DEM as categorical variables, and it’s linear change over time as 1,2,3 for comparisons of the same concept over time as in REP_00-DEM_00, REP_08-DEM_08, and REP_16-DEM_16, respectively. That is, the last variable tested whether there was a linear increase or decrease in the distance between the same concept in the Republican and Democratic parties over the three big elections. For each of the 135 concepts, there were 15 (\( \left(\genfrac{}{}{0pt}{}{6}{2}\right) \)) Footnote 2 possible comparisons, resulting in 2,025 (=15×135) data points.
Multi-dimensional scaling (MDS)
MDS is a method that analyzes the multi-dimensional features of objects or groups with respect to their similarities or dissimilarities, and transforms the overall similarity into Euclidean distances on a two-dimensional plot, where Euclidean distance is the straight-line distance between two points in a plane. The farther apart two objects/groups are located on the MDS, the more dissimilar their multi-dimensional features are.
Graph/network centrality
Network centrality is a graph-theoretical measure of how important a node is to a network in terms of its connectivity to other nodes. In our analyses, the network nodes are individual words, and the edges are the connections with Euclidean distance values indicating the distance (edge length) between nodes. Words that are high in eigenvector centrality are considered to be important, or central, to the organization of the political concepts. This method was used to examine the concept centrality (see Results section), where we calculated the Euclidean distance between the semantic spaces of all terms in REP_2008, DEM_2008, REP_2016, DEM_2016, and in Trump and Clinton’s semantic spaces, and converted these to a graph structure where each concept was a node, and the Euclidean distance between concepts were weighted edges. The eigenvector centrality for each concept was then calculated.
Results
Figure 1 presents a snapshot of the overall political semantic spaces. Figure 1A–C are the similarity adjacency matrices for 135 semantic vectors of politically charged concepts (see Table 1 for the word list). Each matrix is symmetric and has three quadrants indicating different meanings: the bottom left lower triangle contains the pairwise cosine similarities between 135 concepts corresponding to the speeches from the Republican candidates; upper right is the same for the Democratic candidates (the same 135 concepts that appear in both groups); bottom right square shows the between-party cosine similarities. Figure 1D plots the multi-dimensional scaling (MDS) results of the semantic spaces for each major party (DEM, REP) in three major election years (2000, 2008, and 2016). The figure indicates very distinct profiles of semantic spaces for the two parties, and the DEM and the REP spaces are clearly divided on the MDS plot.
We further verified these trends across the political semantic spaces in a regression analysis to predict how different each of the individual 135 concepts would be between any two of the semantic spaces (see Table 2 and Fig. 1). The main effects of time difference (0, 8, or 16 years) and “between party” (a binary variable indicating whether the representation was calculated based on presidential debates from different parties) are shown in Table 2. The results suggest that the representation of a word is likely to be different across parties, p <.001, but not over time, p > .05. However, a second model, which included an interaction term for the time difference and between party variables, that provided a better overall fit, indicated the existence of a significant interaction between the variables (p <.001) such that the change in a word’s political meaning was more likely to be different if it was between parties and more time had passed, but more similar within the same party as more time had passed (i.e., the meanings of concepts have diverged over time between parties but could have consolidated to a greater degree within parties; see also Figs. 2 and 3). Both of these models included 135 random intercepts for each individual concept (not depicted in Table 2) and six random intercepts for the combinations of the three different election years and the two different parties (which are depicted in Table 2), indicating that our findings have accounted for variance specific to the stimuli used in this study and may generalize to other politically charged concepts and contexts. Second, a contrast on the beta coefficients comparing the random effects of the three Republican election years and the three Democratic election years (see Table 2) revealed that the expected change in meaning was smaller for the Republican semantic spaces than for the Democratic spaces (T=14.88, p <.001). This suggests that although the same party tended to use the same words more similarly than across parties (see also Fig. 4), there was greater internal similarity or consistency in the Republican Party than in the Democratic Party at different election time points in terms of the organization and representation of political concepts and ideologies.
Table 2 Results of ordinary least squares (OLS) regression analysis predicting similarity differences in the political semantic spaces across time and party
Given the between-party differences over time, we are interested in whether political conceptual changes might occur at two different time scales: the “macro-changes” within an extended period of time (i.e., from 1999 to 2016), and the “micro-changes” within a short time span (e.g., the 12 months within 2016). Figure 2 shows the macro-changes (2A) and micro-changes (2B) with pairwise comparisons of the Euclidean distances in the semantic space similarity. The higher the bar, the more dissimilar the two semantic spaces are. For example, in Fig. 2A, the largest difference was between the Republican Party at 2000 and the Democratic Party at 2016 (i.e., REP_00 vs. DEM_16), followed by REP_08 versus DEM_16 and REP_16 versus DEM_16, indicating that the Democrats in the election year 2016 maximally differed from the early Republicans. Importantly, within the same party, the semantic spaces of REP_16 versus REP_00 were more dissimilar than those of REP_08 versus REP_00 (similarly for DEM spaces), suggesting that there is not only divergence between the political parties but also within each party over the last 16 years, which is consistent with our regression analysis. These patterns of macro-changes are suggestive of the increasingly more extreme conceptual views in each party.
Such changes can also occur in the micro-change landscape, as shown in Fig. 2B for the 2016 primary elections. The figure charted the dissimilarities between the semantic spaces from October 2015 to April 2016 by contrasting the monthly-based semantic spaces of the prominent party candidates (Trump, Clinton, and Sanders) with their respective parties, showing that (a) both Sanders and Trump were highly dissimilar with their respective party, while Clinton was more similar to the average Democrat; (b) both Clinton and Trump became more similar to their respective party over time; and (c) the changes occurred earlier for Clinton (end of 2015), while they occurred much later for Trump (April 2016).
The semantic space models of the macro- and micro-changes reflect quite closely the general picture of the political dynamics and are consistent with the public intuitions of an increasingly polarized political system in the USA. To corroborate this, we further compared the political semantic spaces with the Fyshe vectors (Fyshe et al., 2013), a semantic space that was based on distributional statistics from a very large corpus that likely reflect a more generic, politically neutral word usage. Figure 3 displays (a) the MDS plot of the DEM and REP spaces at two different times, against the Clinton, Trump, and Fyshe spaces, and (b) pairwise comparisons. Since we were particularly interested in how the 2016 presidential candidates were compared against previous candidates, we collaposed election years 2000 and 2008 in this analysis. Both the MDS analysis and the pairwise comparisons indicate that (a) politically charged semantic spaces were maximally different from the politically neutral semantic space of the same concepts (i.e., the Fyshe space being maximally dissimilar to the other spaces); (b) Trump’s semantic space was maximally different from the other political semantic spaces (in MDS plot), but more similar to the Fyshe space; and (c) Clinton’s semantic space was more similar to the DEM spaces than to the REP spaces, and more similar to both DEM and REP spaces than Trump’s. These model comparisons illustrate on the one hand the conceptual differences between the partisan ideologies and the general public’s concepts, and on the other the alignment (or misalignment) between the presidential candidates and their respective parties.
Finally, to see the specific conceptual contrasts between parties and candidates, we conducted a graph analysis of concept centrality (see Analysis and display of semantic spaces under Method). Figure 4 shows the results of this analysis on 53 key political concepts that had appeared at least five times in each presidential candidate’s speeches. In Fig. 4A, words are ordered and color-coded in their relative centrality, and the edges (thickness of lines) indicate how closely associated two words are to each other across semantic spaces. In Fig. 4B the size of each word is weighted by the relative frequency of use, but the order of the words is random. Thus, Fig. 4 shows what concepts are the most important to each party, and what other concepts may be associated with the concepts. To further identify the differences in word associations, we compared the politically central concepts (on the left side of the ring network in Fig. 4A) and their five most closely associated words (the “nearest neighbors”; see Table 3 for examples). We then compared the nearest neighbors of the associated words from Clinton and Trump’s semantic spaces with human word ratings from the University of South Florida (USF) Free Association Norms (Nelson et al., 1998; http://w3.usf.edu/FreeAssociation). The USF word associations were based on more than 6,000 human raters’ responses to over 5,000 words: the associations were generated by asking each rater to write down the first word that came to mind when presented with another word, the two of which should be strongly associated. Comparison of the political semantic spaces with daily human association ratings should reveal more clearly how the political concepts are differentially central to different parties. To further verify if the USF word associations (which were from participants in 1998) would be different from word associations generated by participants today, we used Amazon’s Mechanical Turk (MTurk) to collect free associations of the same 53 words (as in Fig. 4) from 324 participants (same participants as in Study Three; see below): ksimilarly as with the USF word-association task, participants were asked to write down the first three words that came to mind when given a word prompt.
Table 3 Examples of politically central concepts/terms and word associations in presidential candidate speeches and in non-political human ratings
Examination of the key concepts and their word associations show very clear distinctions between the political parties and the candidates on the one hand, and between the political semantic spaces and the non-political word associations on the other. For example, for Democrats, “health-care,” “education,” and “family” had higher centrality, whereas for Republicans “border,” “country,” and “military” were the most central. Furthermore, for Clinton, for example, “education” was most strongly associated with “women” and “family,” whereas for Trump, its closest associations were “money” and “Democrat”; for Clinton, “business” was related to “education” and “help,” whereas for Trump, “business” was associated with “deal” and “country” (e.g., Table 1). These data are highly consistent with the analyses of Kievit-Kylar and Jones (2012) on Obama’s versus Bush’s speeches (see Introduction). Interestingly, Trump treated many key notions to be highly associated with “deal” and “business” (including “family” and “education”), probably reflecting his views of these concepts from a business person’s perspective, whereas Clinton frequently associated many concepts with “women,” perhaps from her perspective of gender and equality. In both the USF Association Norms and the MTurk workers’ responses, the nearest neighbors are more “mundane” and non-politically oriented, such as “love” and “home” for “family” and “school” and “teacher” for “education.” Very similar word associations were observed in the USF norms and the MTurk data, showing that human raters did not differ between 1998 and 2016 for these terms. Our analyses above reflect deep-rooted conceptual differences between the candidates and between their parties, and between the political and non-political meanings of these terms, providing evidence for the use of free word associations to validate statistical relations captured in the semantic spaces.