Introduction

Alongside the growing levels of immigration flows experienced in European countries in recent decades, the increasing negative framing of immigrants and refugees in public discourse has become a major concern [1,2,3,4,5,6]. The media, politicians, and key social actors are often responsible for propagating misperceptions concerning the image of immigrant and refugee groups inside the host countries [2, 7,8,9,10] through the repetition and amplification of stereotyped discourse, which can foster fear and encourage hate-motivated attitudes, leading to problematic outcomes. Such misconceptions are especially timely and relevant, having played a major role in important political events, such as the Brexit, the increasing support for extreme right-wing political parties, and the rising nationalism in Europe [8, 11,12,13].

Past work indicates that attitudes and biases of dominant social groups are reflected in language [9, 14,15,16,17]. Thus, by studying the discourse of such groups it is possible to observe explicit and/or implicit stereotypes and other types of social discrimination. For instance, biases may be expressed by explicitly voicing biased beliefs, e.g., “immigrants are bringing crime and terror to the host country”, or in a more subtle and implicit manner, e.g., “... se tratarán asuntos como el terrorismo internacional, la delincuencia organizada, la inmigración irregular, o esa área de libertad, seguridad, y justicia en la que se está trabajando a nivel europeo...” (“... they will discuss issues such as international terrorism, organized crime, irregular immigration or that area of freedom, security, and justice in which work is being done at a European level...”) where the irregular immigration problem is mentioned on the same level as terrorism and organized crime.

However, qualitative studies of language bias are work-intensive, and often limited to small datasets or concepts. This problem is further aggravated in settings where it is necessary to examine several years of data, i.e., a diachronic analysis. It is not only the large amounts of data that are generated each day (e.g., in social media, news) that impose a challenge in this scenario, but also the necessity of identifying potentially uncovered language nuances over time which calls for systematic, efficient, and reusable methods for conducting discourse analysis. Fortunately, computational techniques are very helpful tools to this end.

In the past decade, language models have become highly popular in the Natural Language Processing (NLP) area. Word embedding models, for instance, are powerful machine-learning-based representations of human language, that allow for the quantification of relationships between words through efficient numerical operations inside the vector space, i.e., a quantitative model for representing word meaning. By identifying patterns of word association present in the training data, such models are then able to quantify word meaning similarity, solve analogies, among others. However, also due to this ability, the learned representations reflect social biases present in the training dataset, e.g., sexism, racism, antisemitism [2, 9, 15, 18,19,20,21]. Among the biases present in human language, there are stereotypes, a type of social bias that is present when discourse about a given group overlooks the diversity of its members and focuses only on a small set of features [22, 23].

As the literature shows, word embedding models are convenient for the analysis of stereotypes and other social biases, as embedding-based methods are useful for depicting even biases that are not directly stated in the texts  [2, 9, 14, 15, 18, 20, 24]. Due to being able to learn patterns of word associations, word embeddings are capable of encoding both implicit and explicit biases in their geometry.

In this paper, we apply embedding-based methods to investigate stereotypes related to immigrant/refugee groups and stereotypical concepts in 22 years of political discourse (1997–2018). We take a culturally diverse approach, by analyzing the discourse of four different European countries, namely Denmark, the Netherlands, Spain, and the United Kingdom. We observe how the image of immigrants changes across the years for each of the aforementioned countries by analyzing (a) changes over time in the semantic spaces of immigration-related target words and; (b) performing embedding projections over five stereotypical frame categories of immigrants, proposed by Sánchez-Junquera et al., 2021 [22]: (i) discrimination victims, (ii) suffering victims, (iii) economic resource, (iv) collective threat, and (v) personal threat. Then, we examine the effects of sociopolitical variables, such as the number of offences reported in the host country and the public opinion measured by social surveys, over our stereotype measurements computed in step (b) using the Bayesian multilevel framework. Finally, we reflect on the prospects and challenges of using word embedding methods for studying immigrant and refugee stereotypes in multilingual settings. Our contributions are focused on the immigrant and refugee stereotypical bias analysis including non-English data sources (Danish, Dutch, and Spanish) in a multilingual and diachronic setting as well as the interdisciplinarity with social sciences and survey research.

Our findings indicate that the aforementioned outgroups, i.e., immigrants and refugees, are associated with the aforementioned stereotypical categories and that the immigrant group is more strongly associated with the stereotypical frames than the refugee groups, especially in the case of the collective and personal threat categories. Furthermore, we show that the analysis of word embeddings was capable of detecting certain events, e.g., the British Windrush scandal, and the Kosovo conflict.

This paper is organized as follows. Firstly, we introduce the fundamental concepts for the understanding of this work in Sect. 2. Then, in Sect. 3 we discuss related work. Subsequently, in Sect. 4 we present our hypothesis and method, encompassing data, metrics, model training and evaluation, and statistical frameworks. Our findings are presented in Sect. 5, followed by a discussion of both results, challenges, and limitations in Sect. 6. Finally, in Sect. 7 we present our conclusions and future work.

Fundamentals

In this section, we define some concepts that are fundamental for both delimiting the scope and facilitating the reader’s understanding of this work. We start by introducing social bias. Then, we briefly explain how word embeddings are capable of encoding social biases in their geometry.

Social bias and stereotypes

According to Mummendey and Wenzel, 1999 [25], social discrimination is “... an ingroup’s subjectively justified unequal, usually disadvantageous, evaluation or treatment of an outgroup, that the latter (or an outside observer) would deem unjustified.” . Despite the fact that usually, one would think only about adverse biases when talking about social discrimination intergroup bias is not necessarily negative in nature, it can also be positive [26,27,28]. Regardless of being negative or positive, social theory states that biases arise from the process of one’s identification with a social group and trying to positively distinguish from other social groups, thus creating a source of increased self-worth and an “us-and-them” duality [28]. Several types of biases can be observed in human languages, and among them there is the stereotype, a kind of social bias that can be observed when discourse is focused on a set of beliefs about the characteristics of a given group [29], thus ignoring the diversity of its members.

Social scientists and psychologists have been studying both explicit and implicit forms of biases imprinted in language for many years, as a way of investigating patterns of social stereotyping and discrimination. One way of measuring biases is through the use of surveys. Survey projects, such as the European Social Survey (ESS) [30] or the European Values Study (EVS),Footnote 1 aim to measure respondents’ attitudes in relevant social domains (e.g., immigration, politics, climate change, social trust) through the administration of standardized and structured questionnaires to representative population samples across countries. In the questionnaires, the respondents are presented with opinion statements, for instance “Would you say it is generally bad or good for [country]’s economy that people come to live here from other countries?”. Then, respondents are asked to evaluate the statement on a scale basis, e.g., from 0 (bad for the economy) to 10 (good for the economy). Although negative sentiment towards immigrants and refugees can be significantly masked and under-reported in opinion surveys [1, 31,32,33,34], this method has been applied to monitor anti-immigrant perceptions in many countries across the years.

Another well-known method for quantifying social biases is the Implicit Association Test (IAT) [35], which aims to measure biases by analyzing the association between certain categories and attributes. When taking the IAT test, the participants are prompted to quickly pair attributes (e.g., peaceful or violent, pleasant or unpleasant) with categories (e.g., immigrants and locals, Catholics and Muslims) by similarity. The test works under the assumption that there are large differences in response times when subjects are asked to pair two given concepts they find similar, in contrast to two concepts they find different [15]. Therefore, the IAT was often used to measure human biases and stereotypes, and later inspired a method for measuring biases in word embeddings: the Word Embeddings Association Test (WEAT) introduced by Caliskan et al., [15]. Both IAT and WEAT use two lists of target words, i.e. the categories, and two lists of attributes to analyze the strength of associations between concepts, or groups (e.g., women, immigrants) and attributes (e.g., good or bad) or stereotypes (e.g., safe or dangerous).

However, the aforementioned approaches need highly standardized measurement instruments and a minimally controlled environment to be applied. Furthermore, there are contexts where stereotypes are presented in more subtle and strategic ways, e.g., in political discourse, where explicit judgment of the traits (e.g., competence, integrity) of migrant groups is unlikely to be found. In this scenario, the use of stereotypes also assumes the function of shaping the attitudes and opinions of the general public and even influencing certain political outcomes [36,37,38,39,40]. For instance, the statement “En lo que va de año han llegado a Canarias más de 3500 personas en pateras.” (“So far this year, more than 3500 people have arrived in the Canary Islands in boats.”) does not explicitly frame immigrants as a threat, but it implicitly raises concerns about large numbers of “illegal immigrants/refugees” disorderly entering the country.

In this work, we focus on the study of stereotypes concerning immigrants and refugees. We are especially interested in analyzing specific stereotypical frames that are commonly applied in the political debate about asylum-seeking and immigration, such as the association between immigrant groups and criminality in the host country.

Measuring biases with word embeddings

When talking about biases in the context of algorithms, according to Friedman and Nissenbaum, 1996 [41] three types of biases should be taken into account: preexisting, technical, and emergent. While technical (e.g., algorithm overfit) and emergent (e.g., bias measured in extrinsic task evaluations) are also problems in NLP models, in this work we focus on the first kind, i.e., the preexisting bias, which concerns the social bias that is encompassed in the text used to train the models.

Preexisting bias exists in texts due to the nature of language, i.e., members of dominant social groups either implicitly or explicitly propagate stereotypes and biases in the language they use when talking about certain outgroups, such as immigrants and refugees. In the case of political discourse, rather than unintended bias occurrences, in most cases, the stereotypes are inserted or even designed in the narrative in a deliberate way that allows politicians to construct a frame useful for shaping public opinion [14, 42]. Due to this reason, it is often difficult to observe explicit bias in political discourse and methods for uncovering implicit connections are necessary.

Word embedding analysis is a useful method for investigating the implicit connections in human language since they learn how to represent word meanings by observing the context in which the words appear. For instance, if in a given dataset there are many instances of sentences similar to “The majority of illegal immigrants to Italy come from countries such as Nigeria, Ghana, and Senegal where the drivers for emigration tend to be more economic rather than fear of persecution.” the training algorithm will then learn relations between the words “immigrants” and “illegal” since they often co-occur. Moreover, embeddings do not simply represent word co-occurrence, but rather they depict the relations of each word to every other in the training dataset [17]. In other words, if the model learns that “immigrants” and “refugees” are used in similar contexts, then their word vector representation will be similar and the word “refugees” will be also associated with “illegal”, even if “refugees” do not co-occur explicitly with “illegal” in the training data.

An example of bias imprinted in the word embedding geometry is represented in Fig. 1. These graph networks depict the 20 nearest neighbors of the word “immigrants” and “immigranten” (“immigrants”) computed using our word embedding models for the year 2001. It is possible to observe that in both the Dutch and the English datasets these words are strongly linked to the concept of illegality and trafficking (e.g., illegal, traffickers, mensensmokkelaars and clandestiene in the Dutch dataset). Here, it is important to point out that “illegal” is not simply a term to describe the administrative condition of migrants, i.e., lacking adequate documentation to authorize their presence in the host country, but rather that illegality implies criminality and thus this term confers the criminal status to all individuals that could end up in an irregular situation due to a myriad of reasons [43]. That is, it not only oversimplifies a complex situation but also invokes a negative frame that could influence public opinion.

Fig. 1
figure 1

The 20 nearest neighbors of the words “immigrants” and “immigranten”

Related Work

The study of biases in human language through embedding methods became popular with gender-bias studies [18, 19, 44,45,46]. Then, in the following years, the NLP research community started exploring other types of social discrimination, such as ethnic, age, and religious bias, also expanding the frameworks of analysis from time-invariant to diachronic [14, 20, 47,48,49,50,51,52]. However, as often happens in the NLP area, most works were conducted using English as a target language. Nonetheless, biases exist in all human languages, as well as in many shapes, which calls for the conduction of research using other target languages and types of biases.

Wevers, 2019 [24] quantified gender biases in six Dutch newspapers categorized ideologically as liberal, social-democratic, neutral/conservative, Protestant, and Catholic, spanning 40 years of data. They compute the strength of association between group vectors representing the female and male gender spaces and a list of target words. The results show gender bias towards women and changes concerning the measured biases within and between newspapers over time. Tripodi et al., 2019 [9] investigated the antisemitism in public discourse in France, by using diachronic word embeddings trained on a large corpus of French books and periodicals containing keywords related to Jews. Computing the local changes of Jewish-related target words over time and embedding projections, they tracked the dynamics of antisemitic bias in the religious, economic, sociopolitical, racial, ethnic, and conspiratorial domains. They proved that their embedding method was useful to observe the social discrimination patterns against Jews previously described by historians. Lauscher et al., 2020 [21] conducted an analysis concerning racism and sexism-related biases in Arabic word embeddings across different types of embedding models and texts (e.g., user-generated content, news), dialects, and time. They applied different tests for measuring biases in word embeddings and found that the bias steadily increased over time for their period of analysis (2007 to 2017).

Kroon et al., 2020 [2] quantified the dynamics of stereotypical associations towards different outgroup nationalities (e.g., Moroccan, Somali, Afghani, Belgian, German) concerning low-status and high-threat concepts in 11 years of Dutch news data. The authors investigate both time-invariant and time-variant hypotheses, focusing on the difference in the strength of associations regarding the group membership, i.e., ingroups such as Dutch and German versus outgroups such as Moroccan and Somali. The authors found strong associations with the outgroups, that increase throughout the years of analysis. Moreover, by using sociopolitical variables and panel data analysis, e.g., the size of the outgroup population and criminality rates, their results indicated that the media narrative concerning such outgroups is dissociated from real demographic trends. Sánchez-Junquera et al., 2021 [22] detected stereotypes towards immigrants in political discourse by focusing on the narrative scenarios, i.e. the frames, used by political actors. They created their own taxonomy to capture immigrant stereotype dimensions, which is adopted in this work. Then, using the aforementioned taxonomy, they produced an annotated dataset with sentences that Spanish politicians have stated in the Congress of Deputies. Such dataset was used to train classifiers to automatically detect stereotypes and distinguish between the stereotype categories proposed by the authors.

Chulvi et al., 2023 [37] analyzed immigrant stereotypical framing in the Spanish Parliament for the period of 1996–2016 through the construction of linguistic indices. The authors studied 2,516 interventions about immigration delivered by representatives of the two political parties that alternated in power during that period (conservative Popular Party and Socialist Party). The study shows that both the rhetorical strategy to present immigrants as victims or as a threat and the language style that politicians employ reveal an interaction between the ideology of the party and the party’s political position in government or in the opposition.

Moreover, other recent works [53, 54] investigate how stereotypes and prejudice against immigrants, among other targets, are often conveyed in social media using irony or humor, due to being subtle strategies to spread prejudice and perpetuate stereotypes because they evade moral judgment and justify discriminatory acts.

The literature concerning bias detection in multilingual settings is still scarce and recent, as such a scenario imposes greater challenges than monolingual ones, such as the coherence of word meanings across different languages. Câmara et al., 2022 [55] quantified gender, racial, ethnic, and intersectional social biases across five models trained on sentiment analysis tasks in English, Spanish, and Arabic. Ahn and Oh, 2021 [56] verified the existence of ethnic biases in monolingual BERT models for English, German, Spanish, Korean, Turkish, and Chinese, while proposing a new multi-class bias measure to quantify the degree of ethnic bias in such language models. Further, they proposed two bias mitigation methods using multilingual and word alignment approaches. Névéol et al., 2022 [57] contributed to the analysis of multilingual stereotypes by creating an English and French datasetFootnote 2 that enables the comparison across such languages, while also characterizing biases that are specific to each country (United States and France) and language. Their dataset addresses ethnic, gender, sexual orientation, nationality, and age biases, among others. afterward, the authors used their dataset to quantify stereotypes in three French and one multilingual language model.

Our study distinguishes itself from the aforementioned studies by (i) the interdisciplinarity with social sciences and survey research, as the selected survey questions measure attitudes of the ingroups towards immigrant groups and can be interpreted as a proxy for cultural/economic threat perception; (ii) the study, selection, and processing of specific words for analyzing immigration stereotypes across 4 different languages; and (iii) the distinction between immigrant and refugee groups in our analysis. Additionally, we contribute to the scarce literature on stereotypical bias analysis with non-English data sources (Danish, Dutch, and Spanish), multilingual, and diachronic settings.

Method

In this work, we apply word embedding-based methods for quantifying social stereotyping toward immigrants and refugees in the political discourse of Denmark, Netherlands, Spain, and the United Kingdom over time (1997–2018). We justify our choice of target country/languages according to the following factors: (i) contrast and similarities between, as well as shifts of political stances concerning migration within the countries over time; (ii) occurrence of meaningful events that shaped the debate along with the image of immigration and asylum-seeking in these countries; (iii) size of available parliamentary datasets including the target languages for analysis; and (iv) familiarity of the authors with the target languages.

Concerning aspects (i) and (ii), the United Kingdom, for instance, has experienced debates and policy changes regarding migration, notably in the context of Brexit, as the referendum in 2016 to leave the European Union (EU) was influenced by concerns about immigration [58, 59]. Since the 1960s, the United Kingdom’s immigration and asylum policies became progressively restrictive [60,61,62], especially in the period of 2010–2015 during the Conservative-Liberal Democrat coalition government and later the Conservative government which included measures to reduce net migration, tighten asylum procedures, and limit benefits and access to public services for immigrants [63].

Likewise, the Netherlands and Denmark experienced growing negative framing of migrants and restraining immigration/asylum-seeking policies. The Netherlands tightened immigration/asylum policies in the late 1990s and early 2000s as the political landscape saw a move towards right-leaning parties [64] and particularly after the 2002 elections, marked by the assassination of the populist politician Pim Fortuyn [65]. The changes included modifications in requirements for family reunification, integration exams, and policies for encouraging skilled migration while discouraging low-skilled migration. The Netherlands also changed its view of integration, as earlier policies advocated for cultural diversity and encouraged migrants to retain their own cultural identity, but recent ones focused on Dutch culture assimilation, and slogans such as “multiculturalism has failed” became common in the political sphere [65,66,67]. In Denmark, immigration and asylum-seeking were framed as relatively minor political issues during the 1980s, but the stance and rhetoric radically changed during the 1990s and continued through the following decades [68,69,70]. In the Netherlands, the approach to the integration of newcomers also changed. While in the 1990s Denmark was well-known in terms of granting its citizens equal opportunities and respecting the cultural and religious differences of minority groups, from 2006 onward the focus switched to what Danish society should demand from migrants, culturally and economically [69]. Throughout the years, both countries adopted measures concerning language proficiency, mandatory culture courses, integration exams, strict requirements for family reunification and permanent residency, as well as decreased social benefits for migrants.

Spain, in contrast to the aforementioned countries, had primarily been an emigration country until the mid-1980s [71], and it was not until the 1990s, and especially the mid-90s, that the migrant inflow became relevant [72]. Nonetheless, the increase in immigrant/refugee influxes did not lead to significant public and political backlash [73]. Notably, up to the early 2000s immigration was seldom framed as an issue and Spain’s immigration/asylum-seeking policies included initiatives such as regularization programs for undocumented immigrants, improving access to welfare benefits, integration, and social inclusion. This scenario changed around 2005 when irregular migration became a hot topic and was frequently broadcasted in the media [74] as well as brought up in the rhetoric of the parties due to electoral political competition [75, 76].

Due to the aforementioned factors and political contexts, we believe Denmark, Netherlands, Spain, and the United Kingdom are interesting case studies to investigate the dynamics of stereotypical associations towards immigrant and refugee groups over time.

To study the domain of political discourse about immigration, we selected the Danish, Dutch, English, and Spanish portions of four multilingual and diachronic parliamentary corpora, namely Europarl, Parlspeech V2, ParlaMint and the Digital Corpus of the European Parliament (DCEP), while to handle the analysis of large datasets and uncover both implicit and explicit patterns of word associations, we employ the representation of texts through word embeddings. We provide further information about the selected corpora and embedding models in Subsection 4.1.

To verify the association between immigrant and refugee groups and stereotypical frames, we adopt the social psychology grounded categories proposed by Sánchez-Junquera et al., 2021 [22]: “We found that in public discourse immigrants could be presented as (i) equals to the majority but the target of xenophobia (i.e., must have same rights and same duties but are discriminated), (ii) victims (e.g., people suffering from poverty or exploitation), (iii) an economic resource (i.e., workers that contribute to economic development), (iv) a threat for the group (i.e., cause of disorder because they are illegal and too many and introduce unbalances in our societies), or (v) a threat for the individual (i.e., a competitor for limited resources or a danger to personal welfare and safety).” Based on the aforementioned categories, we both analyze (a) the changes over time in the semantic spaces of immigrant and refugee target words and (b) perform embedding projections over a set of words that represent such stereotypical frames. While step (a) will give us a sense of how the context surrounding immigrant/refugee target words changes across the years (e.g., through analysis of the target words’ nearest neighbors), step (b) allows us to quantify the strength of association between the target words and the stereotypical frames.

In this work, we distinguish between immigrants and refugees in our analysis, aiming to assess differences in the representation and stereotypical associations concerning these groups. Migrant categories such as “immigrants” and “refugees” are seldom conflated in political discourse, nonetheless, they theoretically refer to distinct groups of people and motives for immigration and therefore may inspire different preferences in public opinion [77]. For instance, previous work indicates that some European countries display more positive attitudes toward refugee groups due to having legitimate reasons for their immigration, when compared to groups perceived as “economic migrants”  [77,78,79,80,81,82,83,84,85]. Thus, our first research hypothesis is that we can notice differences in the stereotypical framing of immigrants and refugees (H1).

Even though it is not possible to be completely sure that the political actors actively distinguish between immigrant and refugee groups in their discourse when applying quantitative methods, by analyzing the underlying linguistic patterns through the use of word embeddings we may be able to reduce this uncertainty.

Then, we investigate the strength of association between immigrant/refugee groups and stereotypical frames in a multilingual context. That is, other than analyzing the stereotypical associations for each of the countries individually, we are interested in seeing if cross-national patterns of social discrimination emerge. Albeit stereotype and bias formation are highly influenced by culture, we believe that certain sociopolitical processes, e.g. refugee crisis triggered by the Syrian civil war [86,87,88], or the rise of far-right parties in Europe in recent years [89,90,91,92,93], can spark the use of social discrimination frames in public discourse. Although the four countries have distinct histories and approaches to handling migration, all political parties make use of frames to invoke specific mental representations of immigrants/refugees, especially in recent years, since the topics of immigration/asylum-seeking and integration issues have become politicised [64, 86, 94, 95]. For instance, in view of events such as the refugee crisis, countries with far-right parties in power frequently center their discourse on framing immigrants/refugees as a threat, also stressing the need to secure external borders. On the other hand, center and/or left-oriented parties may be more inclined to adhere to the victimization frames and address the topic as a humanitarian emergency. Therefore, our second hypothesis is that we can observe cross-national patterns in the stereotypical framing of immigrant and refugee groups across the different European countries selected in this study (H2).

As much discourse theory and research into cross-lingual text analysis argue, the social and political context is central to the meaning of discourse [96,97,98]. Consequently, there are country-specific variables that influence stereotypes and each country has its own political history with migrant groups of certain backgrounds [10, 99,100,101]. That is, what might be a stereotype in a given culture might not stand relevant in another [102], and furthermore, there are stereotypical words that are context-specific, e.g., “Moros” in Spanish, or “Perker” in Danish.Footnote 3 Although such explicit and derogatory words most probably will not be present in political discourse, we allow for local occurrences to come forward and observe words that refer to specific immigrant/refugee backgrounds.

Finally, certain sociopolitical indicators, such as unemployment and criminality rates or the outgroup influx, could be relevant to indicate changes in public perception and discourse about immigrants/refugees [103,104,105,106,107], even though the link between immigration/asylum-seeking and for instance, increase in crime numbers, is not necessarily observed in reality [108, 109]. We aim to examine the effect of sociopolitical indicators that are relevant to the context of attitudes towards immigrants/refugees in our stereotype measurements, thereby allowing both for a comprehensive comparative and a more context-specific analysis, enriching the findings in general. Hence, our third hypothesis is that sociopolitical indicators such as the GDP of the host country, criminality, and unemployment numbers will have an effect on the stereotype measurements (H3).

To assess the aforementioned hypothesis, we adopt the following data, metrics, and models described in this section.

Data

To train our word embedding models we combine the Danish, Dutch, English, and Spanish portions of the following parliamentary corpora: (i) Europarl [110] (release 7)Footnote 4; (ii) Parlspeech V2 [111]; (iii) ParlaMint [112]; and (iv) IM-PRESS/PRESS, Written Question, Written Question Answer, Oral Question and Questions for Question Time portionsFootnote 5 of the Digital Corpus of the European Parliament (DCEP) [113].

We merged the texts coming from the 4 aforementioned corpora into language-specific datasets. Then, we split our language-specific datasets by year and preprocessed the data by removing all punctuation except for apostrophes and hyphens, lowercased all words, removed URIs, and concatenated some expressions of interest for our analysis (e.g., “people_trafficking” “organised_crime”, “illegal_work”). The number of tokens per year and language after the preprocessing phase is depicted in Figs. 2 and 3.

Fig. 2
figure 2

Number of tokens per year in the Spanish and English datasets

Fig. 3
figure 3

Number of tokens per year in the Danish and Dutch datasets

Finally, we use the yearly datasets to train different word embedding models, resulting in 88 models (4 languages times 22 years).

Defining multilingual lists

To quantify the associations between immigrants and refugees and stereotypical frames, it is crucial to ensure that the words chosen to represent such frames are adequate and that we can maintain the meaning equivalence across languages. Our initial word list to describe such concepts was constructed based on the multilingual European Migration Network (EMN) glossary, which contains approximately 500 terms and concepts reflecting the most recent European policy on migration and asylum.Footnote 6

We manually created our initial set of words using the vocabulary of the aforementioned glossary, taking into account the term entries and descriptions. We selected words that fit in the following topics: security and threat perception, poverty, employment conditions, social welfare, social acceptance and integration, anti-immigrant sentiment, migratory movements, exploitation of vulnerable groups and trafficking, social trust, documentation/authorization to reside in the host country, hosting and reception conditions, and perception of outgroup size. Then, we consulted with native speakers to verify the appropriateness of the selected initial subset, provide translations (using the English terms as source), and expand the list if deemed necessary.

Most of the words selected through this process were used exclusively for preprocessing the dataset, while others were also used to quantify the strength of association with the five stereotypical categories. Preprocessing the datasets to concatenate multi-word expressions of interest for our analysis (e.g. “organized crime” becomes “organized_crime”), was a crucial step since the unit of representation of the embedding models are words, i.e. multi-word expressions are not automatically recognized and treated as a single unit. Having the multi-word expressions of interest represented as a unit is especially important for the analysis concerning the local changes in the semantic space (Sect. 5.1).

For the words used to measure the association with the five stereotypical categories, we additionally prompted our yearly datasets to check the term frequencies across all languages. We opted for using terms that had high frequencies in all years of the language-specific datasets.

Sociopolitical data

To build an indicator of social threat perception, we use the mean score of three survey items from the European Social Survey (ESS) [30] rounds 1 to 9 (2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016 and 2018).Footnote 7 Each survey was responded to by at least 1500 people (per country). We used the Danish, Dutch, English, and Spanish respondent’s answers on 11-point scales to the following questions: (i) “Is [country] made a worse or a better place to live by people coming to live here from other countries?” (imwbcnt variable); (ii) “Would you say that [country]’s cultural life is generally undermined or enriched by people coming to live here from other countries?” (imueclt variable) and; (iii) “Would you say it is generally bad or good for [country]’s economy that people come to live here from other countries?” (imbgeco variable). The indicator of social threat perception has the role of representing attitudinal data in the analysis, or in other words, it indicates if the measured stereotype is also a reflection of the ingroup perceptions of immigrant/refugee groups.

The missing data points were imputed using software for multiple imputations of multivariate incomplete data, Amelia II[114], which uses a combination of bootstrapping and expectation-maximization (EM) algorithms as a data imputation strategy and was specifically created to handle incomplete Political Science datasets.

The respondents' answers were weighted the survey data using the design times population weights provided by the ESS, which corrects the probability of selection bias. We also removed survey respondent entries when they corresponded to special answer categories, namely “77 - Refusal”, “88 - Don’t know”, and “99 - No answer”. The percentage of special answer category entries over the total dataset size per year, language, and variable name is shown in Appendix A.

For the remaining indicators, we use the following country-specific times-series from the Eurostat,Footnote 8 the Organisation for Economic Co-operation and Development (OECD)Footnote 9 and the World Development Indicators (WDI)Footnote 10 databases: (i) Immigration by age and sex (Eurostat); (ii) “Refugee population by country or territory of asylum” (WDI); (iii) Unemployment by sex and age (Eurostat); (iv) Offences recorded by the police by offence category (Eurostat); (v) Gross domestic product (GDP) per capita (OECD) and; (vi) Aid disbursements to countries and regions - humanitarian aid destined for developing countries (OECD). Such datasets are publicly available.

In the case of the offences indicator, it was necessary to merge two datasets (CRIM_GEN and CRIM_OFF_CAT), since the first has records of historical data (1993–2007) and the latter from 2007 until the present. To maintain consistency between the two datasets, we kept only the International Classification of Crime for Statistical Purposes (ICCS) categories that were present in both datasets. Namely, the included categories and their respective ICCS codes are: Burglary of private residential premises (ICCS05012), Intentional homicide (ICCS0101), Robbery (ICCS0401), and Unlawful acts involving controlled drugs or precursors (ICCS0601).

Except for the ESS data, there were very few instances of missing data points in the aforementioned datasets, namely: the number of immigrants in the year 2005 for the United Kindgom, and the number of homicides committed (ICCS0101 category) in the Netherlands for the years 2010, 2011, and 2012. Such missing data points were also imputed the data using Amelia II.

Models

Our analysis encompasses both word embedding and statistical models. While the word embedding models are our main object of analysis in this paper, the statistical models allow us to examine the effect of sociopolitical indicators in the time series composed of the yearly stereotype measurements. In this section, we provide details about the embedding training and evaluation, as well as the specification of the statistical models.

Word embedding models

Using the language-specific datasets filtered by year, we trained 300-dimension Fasttext skip-gram embedding models using a context window of 6 words on both sides and \(2 n-grams\). Only words that appeared at least 10 times in each yearly dataset were considered in the training phase, and the resulting word vectors were \(L_2\) normalized.

We evaluate the quality of the Dutch, English, and Spanish embeddings using generic word similarity benchmarks originally in English and then extended to other languages, namely the Miller & Charles (MC-30), Rubenstein & Goodenough (RG-65), and WordSimilarity 353 (WS-353) benchmarks provided by Barzegar et al., [115]. For the Danish models, we use only the WS-353 benchmark,Footnote 11 since there are no translations of the other aforementioned benchmarks for Danish, to the best of our knowledge. We provide the mean accuracy of the embedding models per language and evaluation benchmark in Appendix B.

Statistical models

To explore the relationship between wealth (measured as GDP per capita), criminality, unemployment, immigrant/refugee group size in the host country, humanitarian aid destined for developing countries, public opinion (measured by the ESS), and stereotypical associations, we use the Bayesian multilevel modeling framework. A multilevel model is an extension of a regression, in which data is structured in groups and coefficients can vary by group [116] and it is helpful for scenarios where there is some dependency in the data, i.e., correlations that arise from the observations being clustered in some way.

We consider the Bayesian model an appropriate choice for this analysis since it takes into account the pooled structure of our data and allows accounting for both group effects and error correlation. For all the five stereotype categories taken into account in this work, we operationalize the dependent variable stereotype association as described in Eq. 1:

$$\begin{aligned} \begin{aligned} stereotype&= (\beta _0 + b_0,country) + (\beta _1 + b_1,country)year \\&\quad + \beta _2 ESS + \beta _3 offences + \beta _4 size + \beta _5 GDP + \beta _6 unemp \\&\quad + \beta _7 aid + \beta _8 immigrant + \beta _9 year + \epsilon \end{aligned} \end{aligned}$$
(1)

where size is the size of the immigrant/refugee groups, unemp is the unemployment numbers, aid is the humanitarian aid destined to developing countries, \(\epsilon\) is the random error term, and immigrant is a dummy variable whose value is 0 when representing the refugee group and 1 when representing the immigrant group.

The \(\beta\) coefficients represent the fixed, or population-level effects, which apply to all observations in the data. On the other hand, the b coefficients represent the random effects, which concern the variations within sub-populations, like country and year. By adding the \((\beta _0 + b_0,country)\) and \((\beta _1 + b_1,country)year\) terms we let each country have its own intercept and year slopes.

Due to the limited availability of the sociopolitical indicators hereby mentioned, we restrict the time period to 2000–2018 for the analysis with the Bayesian models. Additionally, we applied a log transformation to the GDP and then standardized all predictors (except immigrant and year, which are categorical variables) per country data using the standard z-score, which applies the following transformation:

$$\begin{aligned} standardized\;value=(original\;value - mean)/standard\;deviation \end{aligned}$$
(2)

We also scale up the stereotype association by multiplying the measurements by 10. By doing the aforementioned data transformations, all variables have approximately the same scale, which then helps with model convergence and avoids performance issues during the model fit.

We fit one model for each of the five stereotypical categories, using fifteen thousand iterations. We provide further information about the models’ robustness in Appendix 3.

Metrics

Distributional semantic models maintain the properties of vector spaces and adopt the hypothesis that the meaning of a word is conveyed in its co-occurrences, i.e., as stated by the English linguist J. R. Firth, “You shall know a word by the company it keeps.” [117]. Therefore, to measure the similarity between two given words represented by the vectors \(v_1\) and \(v_2\) we can apply the \(L_2\) normalized cosine similarity. As shown by Garg et al., [20], one could also apply the Euclidean distance interchangeably.

First, we analyze the changes in the semantic space of the immigrant and refugee words. The words used for each of the languages are shown in Table 1. The plural masculine forms were chosen in the Dutch, Danish, and Spanish languages due to having a higher frequency than singular/feminine inflections.

Table 1 English, Dutch, Danish, and Spanish target words used to investigate stereotypical associations concerning immigrants and refugees. *In the case of Dutch, we include the word allochtonen in some steps of the analysis as this term is widely used to refer to immigrants and their descendants in the Netherlands

To track the changes that occur in the semantic space of the aforementioned words for each of the 4 languages, we apply the local neighborhood measure introduced by Hamilton et al., 2016 [118], which quantifies the extent to which a word vector’s similarity with its nearest semantic neighbors has changed across time.

In order to calculate the local neighborhood measure, first it is necessary to compute a second-order similarity vector. We begin by computing the word \(w_{\textrm{i}}\)’s set of k nearest-neighbors using the cosine similarity metric for each given year y and its subsequent year \(y+1\), designated by the ordered sets \(N_{\textrm{k}}(w^{\mathrm{(y)}}_{\textrm{i}})\) and \(N_{\textrm{k}}(w^{\mathrm{(y+1)}}_{\textrm{i}})\) respectively. In our experiments, we set \(k=50\). Then, we construct the second-order similarity vector of the word \(w_{\textrm{i}}\)’s for the years y and \(y+1\) using the aforementioned neighbor sets as follows:

$$\begin{aligned} s^{\mathrm{(y)}}(j)=cossim(w^{\mathrm{(y)}}_{\textrm{i}},w^{\mathrm{(y)}}_{\textrm{j}})\;|\;w_{\textrm{j}} \in N_{\textrm{k}}(w^{\mathrm{(y)}}_{\textrm{i}}) \cup N_{\textrm{k}}(w^{\mathrm{(y+1)}}_{\textrm{i}}) \end{aligned}$$
(3)

Knowing the second-order similarity vectors \(s^{\mathrm{(y)}}i\) and \(s^{\mathrm{(y+1)}}i\), we can finally calculate the cosine distance as depicted in Eq. 4:

$$\begin{aligned} distance(w_{\textrm{i}}^{\mathrm{(y)}},w_{\textrm{i}}^{\mathrm{(y+1)}})=1\;-\;cossim(s^{\mathrm{(y)}}i,s^{\mathrm{(y+1)}}i) \end{aligned}$$
(4)

The cosine distance depicts how distant two given vectors are from each other, i.e., the closer to zero the distance is, the more similar the two vectors are.

Next, to quantify biases in the embeddings semantic space, we project words into certain semantic axis [9, 15, 18]. In our case, we project the immigrant and refugee words into the semantic axis representing the 5 different stereotype categories we use in our analysis. We operationalize the semantic axis as \(a = w_{\textrm{i}} - w_{\textrm{j}}\) and its projection as the dot product \( p = w \cdot g\), where the higher the values of p, the more biased the word w is toward the semantic axis a.

We define sets of word pairs for each of the five stereotype categories to compute the bias subspaces. The sets are depicted in Table 2, where each line is a different word pair and in each pair, the word to the right represents a positive concept, such as “integration”, while the word to the right represents a negative concept, like “discrimination”.

The words were defined from resources such as the vocabulary mentioned in Sect. 4.1, as well as from literature about immigration studies. We checked the frequency of each word of the yearly language-specific datasets removing those with low frequency. Due to this restriction, words such as xenophobia (and its respective translations) were not added to the Discrimination victims category, for instance. Additionally, when defining the words we prioritized keeping the consistency of meaning among the four languages.

Table 2 Word pairs used to compute the bias subspaces for each of the five stereotype categories

After defining the word pairs that represent the stereotypical categories, we quantify the mean stereotype for all the years in our dataset using the Eq. 5, where n is the number of word pairs in each stereotype category, and negative and positive are the negative (e.g., criminality, exclusion, competition) and positive (e.g., safety, integration, cooperation) words in the word pairs, respectively.

$$\begin{aligned} stereotype(w_{\textrm{i}}, category) = \frac{1}{n} \sum _{j=1}^{n}w_{\textrm{i}} \cdot (negative_{\textrm{j}} - positive_{\textrm{j}}) \end{aligned}$$
(5)

When stereotype is positive in value, it means that the group (e.g., immigrants, refugees) is more strongly associated with the negative words (e.g., criminality, exclusion, competition), whereas if the stereotype is negative in value, the group is more strongly associated with the positive words (e.g., safety, integration, cooperation). We specifically chose this method for measuring biases aiming at literature consistency, since it was used and validated in past works concerning bias measurements in word embeddings [9, 18].

Finally, to quantify the similarity between the different stereotype time series and check for patterns we use Dynamic Time Warping (DTW). In short, DTW is an algorithm that measures the similarity of time series by means of finding the optimal alignment path between them, with the objective of minimizing some distance measurement between them [119]. In our case, we use Euclidean distance when computing the DTW.

Results

In this section, we present the results derived from our study in three parts. We start by (i) quantifying the local changes in the semantic space and examine how the context, i.e., the neighborhood, of the words used to refer to both immigrants and refugees changes across the years 1997–2018. Then, we (ii) show the findings concerning the projections of the immigrant/refugee target words on the five different semantic axes that correspond to stereotype categories adopted in this work, namely discrimination victims, suffering victims, economic resources, collective threat, and personal threat. Lastly, we (iii) analyze the effect of certain sociopolitical, such as criminality and unemployment numbers, on our yearly stereotype measurements using the Bayesian multilevel analysis framework.

Local changes in the semantic space

The local changes concerning each of the target words are shown in Figs. 4, 5, 6 to 7Footnote 12. In a nutshell, these graphs depict how much the representation of a given word, e.g., the word vector corresponding to the word refugees, changes when compared to the previous (orange line) and the base year, 1997 (blue line). Since the word vector representations in the embedding models are linked to the context in which the corresponding words are used, such local changes give us a sense of how the context in which the political actors refer to immigrants and refugees differs across the years.

In the case of the refugees word, it is possible to observe that when comparing the vector from one year to another (orange line) the word’s context differs substantially initially, and then it stabilizes with the passage of time, i.e., the cosine distance decreases. When compared to 1997 (blue line), the context distances itself from the original one with the passage of time. The same behavior can be observed for the immigranten and allochtonen (Dutch), and indvandrere (Danish) words. In addition, a similar pattern emerges for the immigrants word vector, but in this case, the trends are not as sharp as when compared to the aforementioned terms. In the case of the indvandrere word, there is a noticeable peak in the cosine distance for the years 2013–2015 when compared to the previous year.

What we notice is that, in some cases, there are increases in the difference of the context surrounding the target words around the period of 2012–2015, depicted by some peaks in the trends. The years where those peaks happen differ depending on the analyzed word vector, for instance, there is a pronounced peak in 2015 for vluchtelingen whereas for the flygtninge vector, the peaks happen in 2012–2013. This can be observed for most of the refugee target words, namely refugiados, vluchtelingen, and flygtninge. Therefore such increases could be related to the beginning of the sociopolitical process known as the refugee crisis, since this sudden flow of population had a substantial impact on domestic politics and immigration/asylum-seeking policies of most European countries [120, 121].

Fig. 4
figure 4

Local neighborhood measure for the words immigrants and refugees. The blue line shows the cosine distance of the second-order vector for each year compared to 1997, while the orange line shows the cosine distance of each year compared to the preceding year

Fig. 5
figure 5

Local neighborhood measure for the words inmigrantes and refugiados (Spanish). The blue line shows the cosine distance of the second-order vector for each year compared to 1997, while the orange line shows the cosine distance of each year compared to the preceding year

Fig. 6
figure 6

Local neighborhood measure for the words indvandrere and flygtninge (Danish). The blue line shows the cosine distance of the second-order vector for each year compared to 1997, while the orange line shows the cosine distance of each year compared to the preceding year

Fig. 7
figure 7

Local neighborhood measure for the words immigranten, allochtonen, and vluchtelingen (Dutch). The blue line shows the cosine distance of the second-order vector for each year compared to 1997, while the orange line shows the cosine distance of each year compared to the preceding year

To further investigate the local changes, we now analyze the words that were introduced as nearest neighbors of the immigrant and refugee target words for each of the four languages. Five of the nearest neighbors concerning the immigrants and refugees target words are shown in Tables 3 and 4 respectively, ordered by decreasing cosine similarity. These tables depict the new words that have been introduced in the local neighborhood of the target words when compared to the previous year. Therefore, the words in the row “2005–2006” indicate which new neighbors were introduced in the year 2006 when compared to the year 2005, for instance. Due to space limitations, we restrict the number of neighbouring words to 5 per year.

Concerning the local neighborhood of the words indvandrere (Danish), immigranten (Danish), immigrants (English), and inmigrantes (Spanish) depicted in Table 3, it is clearly visible the association between immigrants and illegal acts. In all datasets, but especially in the case of Dutch, English, and Spanish, we notice a high amount of neighbors referring to either human or drug trafficking, e.g., mensensmokkel, menneskesmuglere, tráfico_seres_humanos (meaning “people smuggling”, “drug_smuggling”, “child_trafficking”), and criminality, such as delincuentes (“delinquents”), criminals, misdadige (“criminal”), or criminal organizations like organised_crime, mafias, indvandrerbander (“immigrant gangs”), and georganiseerde_criminaliteit (“organized crime”). Several forms of the word illegal (e.g., illegaal, ulovlige, ilegal, illegality) can be observed as well. Terms related to illegal working also emerge, such as illegal_working, illegale_arbejdere (“illegal workers”), illegale_arbeid (“illegal work”). It seems that the victimization frame is also used, due to the presence of words related to labor exploitation/slave work, e.g., explotación_laboral (“labor exploitation”), exploitative, slaves, uitgebuit (“exploited”).

Other topics evident in the local neighborhood are the arrival of immigrants by sea and mass arrivals. Although the topic of immigrants arriving by sea borders has been present in the nearest neighbors for the English and Spanish texts since the first years of analysis, e.g., represented by words such as shores, pateraFootnote 13 and shipwrecks, it is possible to observe points in time where the topic of mass arrivals becomes relevant for all the languages. Starting in 2006, words related to mass immigration and migratory pressure, such as masseindvandring (“mass immigration”), llegada_masiva (“massive arrival”), avalanchas (“avalanches”), migratiedruk (“migratory pressure”) begin to appear. The Canary Islands are also mentioned, matching the timeline of the arrivals of more than thirty thousand immigrants in this place in the year 2006, an event known as the “Crisis de los cayucos”.

Another point in time when massive arrivals are mentioned in all the local neighborhoods is the years 2015–2017, which coincide with the sociopolitical process known as the refugee crisis. We again notice the emergence of words related to mass immigration and migratory pressure, e.g., asylpres (“asylum pressure”), flygtningekrisen (“refugee crisis”), migrant_crisis, migratiecrisis, presión_migratoria and migratiedruk (both meaning “migratory pressure”). The discourse in the Spanish dataset seems to be more focused on humanitarian aid for these years, given the presence of words such as drama_humanitario (“humanitarian drama”), ayuda_humanitaria (“humanitarian aid”), and derecho_asilo (“right of asylum”).

In the case of the Dutch neighborhood, we perceive words such as asieltsunami and asielinvasie (“asylum tsunami” and “asylum invasion”) which denote a threat framing of the migrant groups. By examining a few occurrences of this word in the 2015 Dutch dataset we find some evidences of this assumption, for instance “... de premier moet echter juist het nederlandse belang dienen en hij moet de nederlandse grenzen sluiten voor asielzoekers zodat de nederlandse burger wordt verlost van de voortgaande asielinvasie. Helaas gaat het kabinet echter door met het weggeven van ons land aan de massa immigrate, aan de islamisering, en aan de ongekozen bureaucraten van de Europese Unie...” (“... the prime minister must serve the Dutch interest and he must close the Dutch borders to asylum seekers so that the Dutch citizen is relieved of the ongoing asylum invasion. Unfortunately, however, the cabinet continues to give away our country to mass immigration to islamization, and to the unelected bureaucrats of the European Union...”).

In fact, we notice a high incidence of words related to the topic of Islam in the Dutch local neighborhood in 2017, such as islamisering (“Islamization”), de-islamiseren (“de-Islamization”), niet-moslims (“non-Muslims”), and moslimterroristen (“muslim terrorists”). The appearance of such words matches the timeline of the political scenario of the Netherlands in 2017, with the presence of a strong framing of Islam as one of the greatest issues for the country used by the founder and frontman of the radical right Party for Freedom (PVV), Geert Wilders. One of the slogans used by the PVV, immigratiestop (“Stop immigration”) is also present in the nearest neighbors of the target word immigraten in 2007. Muslims also appear in the Danish local neighborhood (muslimske) in the years 2013 and 2018. Furthermore, Moroccan and other African or Middle Eastern ethnic groups such as Somalians, Eritreans, and Kurdish people are mentioned sometimes in the neighborhood of all four languages throughout the years. In 2013 we find an explicit instance of Moroccans being framed as a problem (marokkanenprobleem) in the Dutch local neighborhood.

Additionally, for the Danish and Dutch local neighborhoods, we noticed instances of words that referred to certain immigrant and refugee backgrounds as a monolithic group. For instance, we observed occurrences of words such as “ikke-vestlige” and “niet-westerse” (both meaning “non-western”). By analyzing the term “non-western”, one could grasp that this word does not make reference to actual geographic borders, but rather a certain set of values (e.g., cultural and religious) that separates the Western countries from the “rest” of the world. In fact, in a similar vein and to further prove this point, in recent years in Denmark another category has become dominant: MENAPT, referring to people from the Middle East, North Africa, Pakistan, or Turkey, that is mainly Muslim countries. Replacing explicit references to migrant nationalities or ethnic backgrounds using a term that refers to the differences, and even supposedly incompatibility, between cultures can be interpreted as a semantic strategy for masking social discriminatory arguments and policies [122].

Similarly, in the case of the Danish local neighborhood, we also noticed the presence of the word “nydanskere”, i.e. “new Danes”, or Danes of immigrant descent, which distinguish between citizens of Danish ethnicity from “other” Danes. The term “nydanskere”, originally created by a group of companiesFootnote 14 founded in 1998, originally had a positive meaning of diversity management and labor integration. However, it was then adopted by the Danish media and right-wing government, which resignified the term and associated it with that government’s agenda of defining what it means to be Danish and ethnic minorities, especially those of non-western origin, as a burden to the society [123]. Therefore, nydanskere became one of the politically correct labels for referring to minority Danes mainly from the Middle East and North Africa [124].

Finally, we observe some of the geographic locations that appear in the neighborhood of the “indvandrere”, “immigranten”, “immigrants”, and “inmigrantes” target words. Among the locations, we detect that the Spanish autonomous cities Ceuta and Melilla, the French island of Mayotte, and especially the Italian island of Lampedusa are mentioned in more than one language for the same year throughout the years of analysis. That is because these places played an important role in the debates about borders and irregular migration since they were considered entry points for migrants and refugees.

The isle of Lampedusa for instance, started receiving a lot of attention since the dramatic increase in the arrivals of immigrants and refugees, especially from 2011 onward, due to the migratory influxes triggered by the conflicts related to the Arab Spring and one of the worst migrant-related tragedies where more than 300 people died. The increase in migratory flows was framed by some governments as an invasion and a potential threat to public order which raised social alarm, and gave way to the implementation of more restrictive migration policies, and the rise in support for populist parties in many European countries. This framing was often tactically intertwined with the one of victimization and the need for humanitarian aid as an excuse for implementing “tough-but-humane” migration management procedures [125].

The reception centers for immigrants in both Lampedusa and Mayotte were harshly criticized by the United Nations High Commissioner for Refugees (UNHCR) due to the terrible conditions and overcrowding in 2009. Due to this situation, the reception center in Lampedusa was set on fire in both 2009 and 2011 as a form of protest. In 2011 it is possible to observe the reference to lampedusa in the local neighborhoods of all four languages this year.

Moreover, we observe the presence of the words tarajal and mueren (“die”) in the nearest neighbors of the word inmigrante in 2014, which matches the event known as the “Tarajal tragedy” where African immigrants died trying to reach the Spanish beach of El Tarajal. This episode was the subject of controversy, due to the reaction of the Spanish Civil Guard, which opened fire against the immigrants trying to reach the Spanish coast in an attempt to disperse them.

Another polemic event detected in the nearest neighbors of the target words is the Windrush British scandal in 2018. In this political scandal, several citizens were wrongly detained and threatened with deportation. Many of these detained citizens were from the Windrush generation.Footnote 15

Table 3 Words introduced in the local neighborhood of the words referring to immigrants (“indvandrere”, “immigranten”, “immigrants”, and “inmigrantes”) target words in comparison to the previous year for the Danish, Dutch, English and Spanish embeddings. Words are ordered according to the cosine similarity with the target words
Table 4 Words introduced in the local neighborhood of the words referring to refugees (“flygtninge”, “vluchtelingen”, “refugees”, and “refugiados”) target words in comparison to the previous year for the Danish, Dutch, English and Spanish embeddings. Words are ordered according to the cosine similarity with the target words

We now turn our attention to the nearest neighbors of the refugee target words (“flygtninge”, “vluchtelingen”, “refugees”, and “refugiados”) depicted in Table 4. Differently from the local neighborhood of the immigrant target words, which contained several terms related to illegality, crime, and trafficking, the neighborhood of refugee target words seems to be more in the spectrum of discourse about humanitarian actions, like ayuda_humanitaria, humanitaire_hulp (both meaning “humanitarian aid”), humanitarian_aid, flygtningehjælp (“refugee aid”), humanitarian_protection, and voedselhulp (“food aid”).

On the other hand, we notice the presence of words framing refugees as a problem, e.g., flygtningekatastrofe (“refugee disaster”), flygtningeproblem and vluchtelingenprobleem (both meaning “refugee problem”), especially in the Danish and Dutch nearest neighbors. Furthermore, we see the occurrence of many words that relate to deportation and repatriation, such as repatriating, repatriering, tilbagesendelse (“returns”), expulsiones (“expulsions”), deportatie (“deportation”), deportados (“deported”), non-refoulement.Footnote 16 In other words, although the discourse of humanitarian aid has been strongly present over the years, it seems that discussing the return of refugees to their home countries is more relevant than topics such as refugee integration.

Other recurring topics in the neighborhood of all languages are the conflicts, e.g., war-torn, krijgsgevangenen (“war prisoners”), conflict-affected, krigszonen (“war zone”), combates (“combats”), burgeroorlog (“civil war”), ethnic_cleansing, massamoorden (“mass killings”), and torture/persecution, like marteling, folteringen (both meaning “torture”), torturados (“tortured”), persecution, torturofre (“torture victims”), etc. Mentions to starvation are also noticed, like “hongersnood” and “hambruna” (both meaning “famine”), starvation, etc. Such terms are linked to the suffering victim’s frame.

Furthermore, we notice the presence of many terms related to wars or conflicts which lead to the displacement of refugee groups. For instance, in the first years of analysis, we find occurrences of mentions of Bosnians, Kosovars, and Albanians (kosovoalbanere, kosovo-albanezen, bosnische, kosovan, albanokosovares), which refer to the Kosovo conflict (1998–1999) between Albanian Muslims and Serb Christians. The ethnic tensions and war crimes committed during this conflict led many civilians to flee the affected areas, and in addition, many other Albanians were deported from Kosovo, being displaced to the bordering countries Albania and Macedonia.

In the following years, we observe occurrences of references to the bhutanese and nepal. The conflicts between the government of Bhutan and immigrants/descendants of the Nepali ethnic group date back to the 1980s [126]. The nationalist policies and propaganda led to a series of acts of violence against the ethnic Nepalis in Bhutan, including torture and persecution, a context that is also captured in the target words vicinity, judging by the presence of words like persecution, tortura, and marteling (both meaning “torture”). During this time, many members of the persecuted group were either expelled or fled from Bhutan, which took shelter mostly in the United Nations High Commissioner for Refugees (UNHCR) camps in Nepal. Finally, in the 2000s, after years of discussion and under increasing pressure from the international community, Bhutan and Nepal reached an agreement about the voluntary return of Bhutanese refugees living in Nepalese camps.

Another issue well discussed by the international community and also captured in the embeddings vicinity in the year 2001 was the situation of the Burundian refugees in Tanzania, which fled their home countries due to the civil war that started in 1993, which led to the mass killings of Tutsis ethnic group. In 2001, a return plan was outlined to help these refugees get to their home country, assisted by the UNHCR. In this year, mentions to tanzania are observed in the vicinity of all target words. In the same year, we also start finding mentions to chechen, which is connected to the Second Chechen War, between Russia and Chechnya, which lasted from 2000 to 2009. Many civilians escaping from the war fled to Ingushetia, resulting in a crisis in reception management and an epidemic of tuberculosis. It is possible to observe references to Ingushetia in the nearest neighbors of the target words for the year 2003. Mentions to chechen were also observed in 2005.

Moreover, we perceive that analyzing the word embeddings vicinity in the case of the refugees is quite useful to distinguish which ethnic group of refugees is being more actively discussed at the moment in the parliaments. Other than the already mentioned ethnic groups, we see that throughout the years many others are detected, such as the Iraqi refugees which were mostly received in Jordan and SyriaFootnote 17 and the Eritrean refugees kept as hostages in Sinai.Footnote 18 Other than the groups mentioned, the nearest neighbors sometimes contained words referring to vulnerable groups like minors, e.g., flygtningebørn, vluchtelingenkinderen (both meaning “refugee children”), minors, mindreårig, minderjarigen (both meaning “minors”).

Additionally, in accordance with our expectations, the embedding vicinity was successful in capturing the convergence of topics triggered by relevant sociopolitical processes. For instance, especially in the period of 2014–2016, we can see the emergence of terms related to the “refugee crisis” and the struggle to deal with the reception of the refugees, such as flygtningekrise, vluchtelingencrisis, crisis_refugiados, migration_crisis, asylpres (“asylum pressure”), drama_humanitario (“humanitarian drama”), asielcrisis (“asylum crisis”), and vluchtelingendrama (“refugee drama”).

The nearest neighbors also depict many locations that are relevant for the debates about the refugees since they represent places where the refugee groups come from, e.g., Syria, including the ones coming from the city of Kobane due to the siege launched by the Islamic State of Iraq in 2014 (detected in the English nearest neighbors), or where they are sheltered, for instance, the 2004 mentions to chad (tsjaad in Dutch), which may be related to the arrival of Sudanese refugees in Chad escaping from the war in the neighboring country Darfur (which is also mentioned in the nearest neighbors) [127].

In the case of the Danish local neighborhood, it is possible to observe the appearance of interesting terms related to the refugee status, such as fup-asylansøgere (“fraudulent asylum seekers”), facto-flygtninge (referring to “de facto-flygtninge” which means “de facto refugees”), and kvoteflygtninge (“quota refugees”).Footnote 19 The appearance of such terms is most probably linked to the changes in the Danish 1983 Immigration Act. The original Danish 1983 Immigration Act, was viewed as quite progressive and improved the legal position for asylum seekers, however, it was greatly tightened on several important points, e.g., family reunification and the time to acquire permanent residence.

By 2002, the view of the Danish immigration legislation had changed from one of the most liberal to one of the most restrictive in Europe. Although “kvoteflygtninge” and “de facto-flygtninge” were actually legal categories, with the election of the Dansk Folkeparti (“Danish People’s Party”)Footnote 20 polemic terms such as “fup-asylansøgere” and “bekvemmelighedsflygtninge” (“refugees of convenience”),Footnote 21 which are not legal categories, but instead politically positioned terms, started permeating the political language [128].

Another topic that is quite relevant in the nearest neighbors of the Danish word flygtninge but has very few occurrences in the other languages is that of family reunification. Across the years it is possible to observe many instances concerning this topic, e.g., familiesammenføringer (“family reunifications”), and familiesammenføringsreglerne (“the family reunification rules”), Family reunification and marriage immigration were some of the ways of legally living in European countries, therefore since the 2000s several European countries, such as Austria, Belgium, Denmark, Germany, France, the Netherlands, Sweden, and the United Kingdom, passed significant legislation amendments to restrict family reunification rules throughout the years [129,130,131,132,133]. By 2002, Denmark had adopted one of the most restrictive regulations concerning family reunification, aiming at preventing practices of arranged marriages practiced among certain immigrant groups, but also at imposing great difficulties on individuals coming from non-European “third world” countries [132, 134].

To conclude this section of the analysis, we noticed that although the words used to refer to immigrants are certainly more associated with concepts related to the personal and collective threat frames (e.g., terrorism, trafficking, criminality), the discourse about immigrant groups might be mixed with discourse about refugee groups. That is, sometimes terms that clearly belong to the sphere of the discourse about refugees, such as klimaflygtninge (“climate refugees”) and vluchtelingencrisis (“refugee crisis”), appeared in the vicinity of target words used to refer to immigrants. This could be related to political actors and the media conflating the terms used to refer to refugees with the ones used to refer to immigrants even though these are two different categories [135,136,137,138].

Stereotype projections

The results of the target word projections on the five stereotype categories are depicted in Figs. 8 and 9, where positive values indicate a stronger association with adverse concepts, such as criminality, poverty, etc. We observe that both in the case of immigrants (Fig. 8) and refugees (Fig. 9), the association with adverse concepts is overall positive, especially for the categories of collective threat, economic resource, personal threat, and suffering victims.

In the case of the collective threat frame, it is possible to observe that the association with the English and Spanish target words immigrants and inmigrantes is higher than for the Danish and Dutch target words. We also notice that the trends concerning the words indvandrere and immigranten follow a similar pattern in certain time periods, e.g., 2005–2010, and then 2012–2018, which we confirmed by computing the alignment path between these two trends using DTW. The computed distances (d) between the 2005–2010 and 2012–2018 periods are 0.05 and 0.08, where lower values indicate greater similarity. In the context of the words used to refer to refugees, for the Danish word flygtninge we observe a mostly decreasing trend with some peaks in 2001 and 2006. For the other target words the picture is mixed and no meaningful patterns emerge.

Regarding the discrimination frame, in the case of the Danish words indvandrere and flygtninge, there are many years where the stereotype association is negative, which means that the target words are more strongly associated with positive concepts, such as integration and inclusion. For the target words concerning refugees, all the trends seem to follow the roughly same behavior in the years 2001–2004. However, when applying the DTW comparing trends two by two, we found alignments only between refugees-flygtninge, and refugiados-vluchtelingen for the years 2002–2004 and 2001–2002 (\(d=0.04\) in both cases) respectively. As for the target words concerning immigrants, there is a noticeable peak in 2014 for the immigranten, allochtonen, and immigrants words. Likewise, an increase in the strength of association can be observed in 2014 for immigranten, immigrants, inmigrantes, and indvandrere terms. Furthermore, when analyzing the aligned paths produced by the DTW, we observe a pattern for the trends concerning immigranten, immigrants, and inmigrantes for the years 1997–1999 (\(d_{immigranten-immigrants,inmigrantes}=0.04\) and \(d_{immigrants-inmigrantes}=0.07\)).

In the matter of the economic resource stereotypical frame, we observe that the trends of immigranten and inmigrantes follow the same pattern in 1997–2000, whereas inmigrantes and immigrants coincide in 2004–2007, which we confirm by computing the alignment paths using DTW, where the values of resulting distances are 0.05 and 0.01 respectively. Furthermore, the strength of association with adverse concepts concerning all the immigrant-related target words decreased in the year 2014 when compared to 2013, which also happens for the Dutch and Spanish words vluchtelingen and refugiados.

We now turn our attention to the personal threat stereotypical frame. For all the immigrant-related target words, we notice a rise in the strength of association in the year 2011, followed by a drop in 2012, except concerning the allochtonen term. Then in 2013, the association with adverse concepts rises again for the allochtonen, immigranten, and indvandrere words. For the immigrants and inmigrantes terms, although the association values also rose in 2013, the local peak happened in 2014. Furthermore, by computing the DTW we find an alignment path between immigranten and inmigrantes trends for the years 1997–2000 (\(d=0.05\)). As for the target words concerning refugees, we observe certain partial patterns. For instance, the trends regarding the flygtninge and refugees words have roughly the same behavior in the years 1999–2004, and then again in 2015–2017. When computing the alignment, we observe that the years 2002–2004 (\(d=0.01\)) and 2016–2017 (\(d=0.005\)) were included in the path. We also find an alignment between flygtninge and refugiados, but only for the years 2013–2014 (\(d=0.009\)).

Lastly, we analyze the graphs concerning the suffering victim frame. For the immigrant target words, we see that the word allochtonen is more strongly associated with the adverse concepts. We also notice that all the trends behave similarly between 2009–2011. When comparing the trends with DTW two by two, we find that the 2009–2011 period appears in the alignment paths except between the immigrants-immigranten/inmigrantes and inmigrantes-allochtonen. Moreover, the alignment path between immigranten and indvandrere covers the 2009–2018 period (\(d=0.07\)), which is the largest pattern we observed. For the refugee target words, we see that the trends regarding the flygtninge and vluchtelingen words are similar between 2007–2012. Through validation with DTW, we see that the 2008–2012 period is included in the alignment (\(d=0.01\)). We also find alignments between refugees-flygtninge/vluchtelingen for the years 2008–2014 (\(d=0.05\)) and 2008–2013 (\(d=0.03\)).

Fig. 8
figure 8

Projection of stereotypical bias concerning immigrants according to the 5 stereotype categories for all languages. The positive values indicate a stronger association with adverse concepts, e.g., criminality, poverty, etc

Fig. 9
figure 9

Projection of stereotypical bias concerning refugees according to the 5 stereotype categories for all languages. The positive values indicate a stronger association with adverse concepts, e.g., criminality, poverty, etc

Therefore, although we did not observe cross-national patterns that span the whole period of analysis, we were able to identify some partial patterns between target words. We also detect that for many target words, the highest values of strength of association with the stereotypical frames happened between 2011 and 2016.

We also compare the strength of associations between the stereotypical frames and the immigrant, refugee, and citizen groups. The results of this analysis are depicted in Figs. 10 to 13. As seen in Figs. 10 and 11, for the English and Spanish embeddings the association with adverse concepts is overall positive for the immigrant and refugee groups, while it is overall negative for the words that refer to the country citizens (british and españoles). As can be observed, the collective and personal threat stereotype categories are more strongly associated with the immigrant and refugee groups than the other categories. Furthermore, the values are noticeably higher for the immigrant words when compared to the refugee words, meaning that immigrant words are more negatively framed.

Fig. 10
figure 10

Comparative projection of stereotypical bias according to the 5 stereotype categories for the English language. The positive values indicate a stronger association with adverse concepts, e.g., criminality, poverty, etc

Fig. 11
figure 11

Comparative projection of stereotypical bias according to the 5 stereotype categories for the Spanish language. The positive values indicate a stronger association with adverse concepts, e.g., criminality, poverty, etc

In the case of the Danish and Dutch embedding stereotype projections, as seen in Figs. 12 and 13 the association values are also higher for the target words concerning immigrants (indvandrere, allochtonen, and immigranten). Like in the case of inmigrantes and immigrant terms, the personal and collective threat frames are overall more associated with these target words. However, we see that the target words regarding refugees (flygtninge and vluchtelingen) are often more strongly associated with the suffering victims’ fame than with the personal threat.

Regarding the citizen groups, for the Danish embeddings, we observe that the plural definite form of Dansker (“Danish”), which is danskerne, is less associated with the adverse concepts in the stereotype categories than the plural indefinite of Dansker (danskere). Both word forms are used in the yearly datasets to refer to Danish citizens, with similar frequency. We believe that one of the reasons for that is the higher lexical similarity between danskere and words used to refer to immigrants, such as nydanskere since the Fasttext embeddings take into account sub-word information to generate the word vectors, i.e., each word is represented by an n-gram sequence of characters.

Similarly, the strength of association with the stereotypical frames for the word nederlanders might be higher due to the presence of word forms such as niet-nederlanders (“non-Dutch”) found in the yearly datasets. Also by quickly examining the yearly datasets we find instances of statements such as “... Hebt u met de minister-president gesproken over de mogelijkheid om nederlanders van marokkaanse of andere afkomst het nederlanderschap te ontnemen en ze daarna alsnog uit te zetten?...” (“... Have you spoken to the Prime Minister about the possibility of depriving Dutch nationals of Moroccan or other origin of their Dutch citizenship and then deporting them?...”) where the political actors use the term nederlanders to refer to immigrants that acquired the Dutch citizenship.

Fig. 12
figure 12

Comparative projection of stereotypical bias according to the 5 stereotype categories for the Danish language. The positive values indicate a stronger association with adverse concepts, e.g., criminality, poverty, etc

Fig. 13
figure 13

Comparative projection of stereotypical bias according to the 5 stereotype categories for the Dutch language. The positive values indicate a stronger association with adverse concepts, e.g., criminality, poverty, etc

Effects of sociopolitical indicators

In this section, we explore the effects of the sociopolitical indicators on our stereotypical association time series using the Bayesian multilevel framework and the model specification described in Eq. 1. The summary of the population-level and the group-level effects for the 5 different models are shown in Tables 5 and 6, respectively.

Table 5 Population-Level (fixed) effects of the predictors used to describe the five different stereotypical frame associations. Estimated errors are shown in parentheses

To interpret these models, we take one as an example, namely the one referring to the collective threat category. The other models can be interpreted using the same logic. The first important point we notice in the population-level effects is that the effect of the dummy variable Immigrant is positive. This means that, in accordance with our expectations, immigrants are more strongly associated with the collective threat stereotypical frame than refugees. As mentioned in the methods section, the independent variables were standardized per country, and thus the regression coefficients are interpreted as standard deviations conditional to the country. Therefore, in this case, the strength of association between immigrants and the collective threat category 0.15 standard deviations higher than for refugees, conditional to the country.

Then, we turn our attention to the other predictors included in the model. We see that the regression coefficients such as the size of the refugee/immigrant groups, the amount of money spent by the host country to help developing countries (Aid), and the unemployment numbers (Unemp) are also positive. Hence, the increases in the strength of stereotypical association are associated with the growth in the number of refugees/immigrants and unemployed nationals in the host countries, as well as larger amounts of money destined for humanitarian aid.

On the other hand, the GDP predictor, which serves as a proxy for the country’s economic growth, has a negative regression coefficient value, which means that as the GDP of the host country rises, the stereotypical association decreases. Our proxy for social threat perception (ESS) coefficient is also negative. Since the ESS questions measure public opinion on a scale from 0 to 10, with 10 being the most positive view (see Sect. 4.1), a larger value in the ESS predictor means that the population has a better view of the immigrant groups. Thus, the more immigrants/refugees are framed as a collective threat, the more the ESS decreases, which means that the public opinion about these groups is worse.

Interestingly, the number of offences reported in the host country also has a negative coefficient. That is, although there may be lower crime rates in a given country, the sense of perceived threat remains high. Most people do not search for the real values of criminality rates when forming a conception of how dangerous their country or neighborhood is, but rather the threat perception is a reflection of their personal experiences and information received from their peers, news, and government. Therefore, although the framing of immigrants as a collective threat seems to be dissociated from actual crime rates, it can have a real impact on the citizen’s perceptions.

As for the time predictor, we see that the association with the stereotypical frames is usually higher than the basis year (2000). We also perceive that the increase in the association is higher in the years 2001, 2008, 2009, and 2011. Furthermore, we can notice some points of inflection in the strength of association, for instance in 2005 the association was \(-0.02\) standard deviations lower than the basis year, but in 2006 it was 0.17 standard deviations higher.

We now focus on the random effects terms that we can interpret, shown in Table 6. The variance for the intercept (sd__(Intercept)) depicts how much the stereotypical frame association varies from country to country. Seeing the value of the coefficient, the variance across countries is high, which we already suspected when looking at the stereotype projections graphs in Sect. 5.2. Likewise, the sd__(yearx) terms show how much the year trends differ from country to country, and we also observe large fluctuations across the years. Judging by the large variance and the partial patterns found in Sect. 5.2, we believe that somehow clustering the countries in groups, could be a way of better assessing similarities between countries that belong to the same cluster and differences between clusters.

Table 6 Group-Level (random) effects concerning the five different stereotypical frame associations. Estimated errors are shown in parentheses

Discussion

We now reflect on some of the challenges and promises of using embeddings for the discourse analysis of diachronic data.

As shown in our analysis and supported by the literature, word embedding models are a powerful tool for analyzing texts, particularly in diachronic studies or settings where there is a large amount of data involved. We found that the analysis of the nearest neighborhood of the target words used to refer to immigrants was quite useful to pinpoint certain locations and events relevant for migration-related discussions, e.g., Lampedusa, or language adopted by politicians to frame certain minorities, e.g., nydanskere. In the case of the refugee target words, it was interesting to see that the vicinity depicted the different ethnic groups that the political debate was most focused on, depending on the year.

Nonetheless, the findings should be supplemented by social theory, as it is not possible to deepen the interpretation of some word embedding outputs without knowing the political, cultural, and social context in which they appear. For instance, we detected some instances of references to integration in Tables 4 and 3, such as integración_inmigrantes, integration_flygtninge (both meaning “immigrant integration”) and integration. However, one might wonder what is the actual meaning of integration for the government of each country. As we mentioned in Sect. 4, countries like Denmark and the Netherlands changed their perspective of what integration means over the years, shifting from a socially and culturally inclusive approach to one much more labor-oriented and focused on culture assimilation.

For instance, in the case of Denmark, the Integration Programme for immigrants, refugees, and reunified family members over the age of 18 is basically a reward system that gives economic incentives to these individuals and municipalities that receive them, as long as they comply with compulsory training, acquire a job, pass the Danish language exam, etc. Although this perspective of integration aims at self-sufficiency and financial independence, it overlooks cultural diversity. In fact, since the employment numbers for the refugees and some immigrant groups are significantly lower than for Danish citizens, one of the main political narratives at the time is that the integration and employment policies had failed to integrate “non-western” immigrants and refugees into the labor market [139].

Besides the uncertainty about the meaning of integration, based only on the word form it is also not possible to know if the integration is being framed as a success or as a failure. Judging by the presence of other terms such as integrationsproblemer (“integration problems”) and flygtningeproblem (“refugee problem”), we suspect that integration is being negatively framed, but it would be necessary to further investigate the issue in order to be sure.

Indeed, one of the main limitations of any kind of multimodal or multilingual study is the lack of details about national, but also potentially regional, local, and community level, variations. Therefore, we can only talk about the broader picture, but this is also a strong side of this approach, i.e., it can be the previous step of a more specific and detailed multi-scalar analysis of a word such as integration.

Another limitation imposed by the setting of this study, i.e., being both multilingual and diachronic, is the impossibility of using measurement instruments, e.g. survey questions, that leverage certain ingroup perceptions. For instance, it could be that the public perception is that the size of the immigrant/refugee groups is much larger than it really is, and that could be a better indicator of immigration bias than the real immigrant/refugee group sizes. Although there are published cross-national survey data about this topic, such studies are rarely conducted, resulting in very few data points over the years, i.e. missing data, which is not suitable for diachronic studies.

Concerning the technical challenges, the preprocessing of the training dataset also requires expert knowledge. For example, if certain multi-word expressions (MWE) that could be relevant for the analysis, e.g., “organized crime”, are not properly preprocessed, then the embedding model would have learned the representation of the two words separately, i.e., “organized” and “crime”, and not as a single unit. Resources such as the EMN glossary of asylum and migration terms used in this work are helpful tools to identify relevant MWE, however since human language is creative and MWE does not always appear in the same form (e.g., human trafficking, trafficking in human beings), having a procedure to recognize MWE based on bi-grams/trig-rams and proximity with the target words could potentially speed up the process. Nonetheless, it would still be beneficial for domain specialists to revise and complement this information.

Moreover, mixing types of embeddings, such as word and sentence embeddings, or even contextual embedding models such as the Bidirectional Encoder Representations from Transformers (BERT),Footnote 22 could both enrich the set of results and give more flexibility concerning the unit of analysis, i.e., from words to sentences. Therefore, this strategy of combining different model architectures and comparing different embedding semantic spaces could also potentially reduce the time spent on preprocessing tasks and provide more information to the analysis based solely on the embedding outputs, which is worth exploring.

When dealing with a multilingual setting, some difficulties arise, such as keeping the equivalence of the meaning of the words used to investigate the association between the target words and the desired categories. In this context, it is not just a matter of finding an adequate translation for a given word, but also that the translation in question needs to appear at least a certain number of times (the more, the better) in the dataset used to train the embedding models. This problem is further aggravated when dealing with domain-specific texts, such as parliamentary speeches. As political actors choose carefully and internationally the words used to communicate their message, the vocabulary adopted to study this phenomenon is more restrictive than the one that would be used to investigate media text, for instance. Defining such can be time-consuming, therefore auxiliary resources such as the EMN glossary of asylum and migration terms and/or the knowledge of scholars of migration studies are convenient to speed up the process.

Also regarding multilingual settings, we observe for many languages, such as the case of Danish, the classic benchmarks for embedding evaluation (e.g., MC-30, RG-65) are not available. There is an immense body of research concerning word embeddings, however, this fact does not seem well reflected in the way embeddings are evaluated. This gap in the literature is problematic, as ensuring that the learned language representations, i.e. the embeddings, have good quality should be as important as ensuring the performance of the learning process or creating different forms of representations. Furthermore, although not strictly necessary, it would be beneficial to have domain-specific benchmarks to evaluate the quality of word embeddings trained with domain-specific data.

In the absence of read-to-use embedding evaluation benchmarks, the next better option would be to have a set of guidelines on how to develop quality evaluation benchmarks. Nonetheless, we could not find literature concerning this topic. Not having a clear set of guidelines for expanding the resources for embedding evaluation to other languages and domains is detrimental, since it affects the consistency of evaluation in both monolingual and multilingual settings.

Finally, we also ponder the re-usability of the trained embedding models in other studies. That is, although the hereby-trained models are valuable to political discourse analysis, and could be leveraged for insights into studies concerning the media, they are not very useful for everyday discourse analysis. Given the amount of work and energy involved in the creation of such models (even more when taking into account LLMs such as BERT), we believe it would be beneficial for the scientific community to invest resources in exploring the possibility of isolating and activating different parts of multi-domain models or transferring the knowledge from one model to others, i.e. transfer learning.

Conclusion and future work

In this work, we quantified the association of words used to refer to immigrants/refugees with five different stereotype categories and then explored the effects of sociopolitical variables on our stereotype measurements in a multilingual and diachronic setting. As shown in our analysis, we found evidence that political discourse links immigrant and refugee groups to stereotypical frames and that the word embedding models were perceived as useful to pinpoint important events, locations, and the vocabulary adopted by political actors concerning immigrant and refugee debates across time. It was also possible to verify distinct points in time where the strength of association with certain stereotypical frames would rise cross-nationally, or discourse converged to a specific topic, e.g., the Iraqi refugees in 2014.

Our findings also show that the words used to refer to immigrant groups are more strongly associated with negative concepts, such as trafficking, terrorism, and criminality, i.e., threat-related frames, while terms regarding refugee groups seem mostly linked to a humanitarian perspective for the tested datasets. Furthermore, although the words used to refer to immigrants are certainly more negatively loaded, the terms used to refer to immigrants and refugees seem to be sometimes conflated. As is often the case with generalizations concerning minorities, it is dangerous to invoke a homogeneous vision of groups that have dramatically different contexts. Furthermore, the conflation of terms can influence public opinion concerning these two different groups, and political actors may leverage the already negative framing of immigrants to invoke the same sentiment against refugees [138].

The Bayesian analysis using sociopolitical indicators confirmed that immigrant groups are more negatively framed than refugee groups and that depending on the analyzed frame/indicator, discourse about immigrants and refugees can be dissociated from variables such as the number of offenses reported in the host country. Here, it is important to reflect that despite the actual demographic trends, stereotypical discourse can have a real and negative effect on the perception of the public concerning the relationship between immigrants/refugees and concepts such as criminality and unemployment. Additionally, the association with adverse stereotypes mostly rises across the years, when compared to the base year of analysis, especially in 2011 for most stereotypical frames.

In future work, we intend to expand the types of embeddings used in our analysis, therefore including sentence embeddings in our multilingual and diachronic settings. We believe that using sentences as the unit of analysis will nicely complement the word embedding outputs, and give more context and flexibility for operationalizing the stereotypical frames. Moreover, we are interested in developing procedures for automatically identifying and preprocessing multi-word expressions that can be relevant to the domain of analysis, such as the examples given in this work (e.g., organized crime, organized criminal organization, criminal network, etc).