Analyzing social media for measuring public attitudes toward controversies and their driving factors: a case study of migration

Chen, Yiyi; Sack, Harald; Alam, Mehwish

doi:10.1007/s13278-022-00915-7

Analyzing social media for measuring public attitudes toward controversies and their driving factors: a case study of migration

Original Article
Open access
Published: 10 September 2022

Volume 12, article number 135, (2022)
Cite this article

Download PDF

You have full access to this open access article

Social Network Analysis and Mining Aims and scope Submit manuscript

Analyzing social media for measuring public attitudes toward controversies and their driving factors: a case study of migration

Download PDF

Yiyi Chen^1,2,
Harald Sack^1,2 &
Mehwish Alam^1,2

4052 Accesses
2 Citations
8 Altmetric
Explore all metrics

Abstract

Among other ways of expressing opinions on media such as blogs, and forums, social media (such as Twitter) has become one of the most widely used channels by populations for expressing their opinions. With an increasing interest in the topic of migration in Europe, it is important to process and analyze these opinions. To this end, this study aims at measuring the public attitudes toward migration in terms of sentiments and hate speech from a large number of tweets crawled on the decisive topic of migration. This study introduces a knowledge base (KB) of anonymized migration-related annotated tweets termed as MigrationsKB (MGKB). The tweets from 2013 to July 2021 in the European countries that are hosts of immigrants are collected, pre-processed, and filtered using advanced topic modeling techniques. BERT-based entity linking and sentiment analysis, complemented by attention-based hate speech detection, are performed to annotate the curated tweets. Moreover, external databases are used to identify the potential social and economic factors causing negative public attitudes toward migration. The analysis aligns with the hypothesis that the countries with more migrants have fewer negative and hateful tweets. To further promote research in the interdisciplinary fields of social sciences and computer science, the outcomes are integrated into MGKB, which significantly extends the existing ontology to consider the public attitudes toward migrations and economic indicators. This study further discusses the use-cases and exploitation of MGKB. Finally, MGKB is made publicly available, fully supporting the FAIR principles.

The Political Debate on Immigration in the Election Campaigns in Europe

What do COVID-19 Tweets Reveal about Public Engagement with Nature of Science?

Article 21 July 2021

Social media mining, debate and feelings: digital public opinion’s reaction in five presidential elections in Latin America

Article 21 February 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Measuring public attitudes toward a controversial issue such as war, COVID-19, migration, and climate change has become one of the mainstream challenges in social sciences. These attitudes can be measured with the help of surveys as well as interviews with specific individuals (Dennison and Drazanova 2018; Drazanova 2020). However, only a limited amount of data can be collected, processed, and analyzed in such a case. On the other hand, social media has become one of the most widely used and essential channels for the public to express their opinions about events around the globe.

Furthermore, migration has become one of the mainstream controversial topics in developed countries due to its effects on their culture, economy, demographics (such as age, gender and distribution). Many efforts have been put into studying the attitudes of the public toward migrations from various perspectives based on survey data (Hainmueller and Hopkins 2014; Dennison and Drazanova 2018; Helen Dempster and Hargrave 2020). This study, in particular, focuses on analyzing the social media platform Twitter to quantify and study public attitudes toward migrations and identify different factors that could be probable causes of these attitudes. Since the study mainly focuses on analyzing Twitter data, many kinds of challenges arise, i.e., millions of tweets in noisy natural language are being posted around the globe about a particular topic each day, which makes it impossible for humans to process this information, leading to the necessity of automated processing.

This paper focuses more explicitly on proposing a framework that measures public attitudes toward a chosen controversial issue, i.e., migration. Within the framework, MigrationsKB (MGKB) is constructed to achieve the following goals of the case study: (i) providing a better understanding of public attitudes toward migrations, (ii) explaining possible reasons why these attitudes toward migrations are what they are, (iii) defining a KB called MGKB built by taking into account the semantics underlying this field of study, (iv) defining possible scenarios where it can be applied, (v) and publishing this resource using FAIR principles (Wilkinson et al. 2016), i.e., make the resource Findable, Accessible, Interoperable, and Reusable (FAIR).

In order to study the public attitudes toward migrations as well as their drivers, the current study utilizes advanced artificial intelligence (AI) methods based on knowledge graphs and neural networks. The geotagged tweets are extracted using migration-related keywords to analyze public attitudes toward migrations in the destination countries in Europe. The irrelevant tweets are then filtered by using the state-of-the-art neural network-based topic modeling technique Embedded Topic Model (Dieng et al. 2020). It further utilizes contextualized word embeddings (Liu et al. 2020) and transfer learning for sentiment analysis and attention-based convolutional neural networks and bidirectional long-term short memory for hate speech detection. Temporal and geographical dimensions are then explored to measure public attitudes toward migrations at a specific time in a specific country. Entity linking is applied to identify the entity mentions linked to Wikipedia and Wikidata to enable easy search over the tweets related to a particular topic. In order to identify the potential social and economic factors driving the migration flows, external databases, such as Eurostat (2021) and Statista(O’Neill 2021), are used to analyze the correlation between the public attitudes and the established social and economic indicators (i.e., unemployment rate, disposable income, etc.) in a specific country in a certain period. The analysis aligns with the hypothesis that the countries with more migrants have fewer negative and hateful tweets. Such kind of analysis can help provide an overview of the countries that are more welcome to migrants. Further analysis is provided in Sect. 5 .

In order to enable reusability of the analysis results, the outcome is then integrated into MGKB, which is an extension of the ontology as initially defined in TweetsKB (Fafalios et al. 2018). It is extended by defining new classes and entities to cover the geographical information of the tweets, the results of hate speech detection, and integrating the information about the social and economic indicators that could be the potential cause of negativity or hatred toward migrants. Using the populated MGKB and social and economic indicators, a detailed analysis of potential factors affecting the public attitudes toward migrations is conducted. Finally, the use cases and scenarios are defined, and the answers can be retrieved with the help of SPARQL queries. The source code has been made publicly available for reproducibility reasons via GitHub.^{Footnote 1} Information related to MGKB is available through the web page.^{Footnote 2} MGKB is query-able via a SPARQL endpoint,^{Footnote 3} and the dump of annotated data is available at Zenodo.^{Footnote 4}

This paper is structured as follows: Sect. 2 discusses the related work. Sect. 3 details how the resource is generated, while Sect. 4 presents ontology underlying MGKB. Sect. 5 presents a detailed analysis of economic/social factors affecting the public attitudes toward migrations. Sect. 6 discusses some use cases, scenarios, and sustainability of MGKB. Finally, Sect. 7 concludes the paper and gives an insight into future work.

2 Related work

This section discusses studies that combine KBs and Twitter information belonging to various domains. It then discusses studies that analyze migration-related social media data. Finally, an insight into the studies assessing the public attitudes toward migrations is presented.

2.1 Knowledge bases based on Twitter data

Several studies have been conducted that provide a KB containing Tweets from a particular time for making it more usable by researchers. TweetsKB contains more than a 1.5 billion tweets spanning more than 7 years (Feburary 2013–December 2020), including entity and sentiment annotations. It provides a publicly available RDF dataset using established vocabularies to explore different data scenarios, such as entity-centric sentiment analysis and temporal entity analysis. In the event of the COVID-19 pandemic, TweetsCOV19 (Dimitrov et al. 2020) was released, which deploys the RDF schema of TweetsKB. It provides a KB of COVID-19-related tweets, building on a TweetsKB subset spanning from October 2019 to April 2020. The study applies the same feature extraction and data publishing methods as TweetsKB. Apollo (Alam et al. 2020b) is a visualization tool analyzing textual information in the geotagged Twitter streams of COVID-19-related hashtags using sliding windows, which performs sentiment and emotion detection of the masses regarding the trending topics of #COVID-19.

As a step forward in combining KB and Twitter information in the field of analyzing migration-related data, MigrAnalytics (Alam et al. 2020a) is introduced. It uses TweetsKB as a starting point to select data during the peak migration period from 2016 to 2017. MigrAnalytics analyzes tweets about migrations from TweetsKB and then further combines European migration statistics to correlate with the selected tweets. However, it uses a very naive algorithm for performing sentiment analysis, and it does not introduce any sophisticated way to remove irrelevant tweets. Most recently, dynamic embedded topic model (Dieng et al. 2019) is deployed to analyze tweets and capture the temporal evolution of migration-related topics on relevant tweets. The results are then used to extend the TweetsKB (Chen et al. 2021). In contrast, the methods used for generating MGKB are more advanced and recent in sentiment analysis, hate speech detection, and entity linking. The RDFS model is extended with relevant topics, as well as geographical information. Moreover, the social and economic factors extracted from external databases and the correlation analysis between the potential driving factors and semantic analysis output are conducted (cf. Sect. 5).

2.2 Migration-related social media data analysis

With the ever-growing prolific user input on social media platforms, there have been many efforts in analyzing the data from social media networks, such as Twitter and Facebook, regarding the topic of migrations. In Zagheni et al. (2014), the authors use geolocated data for about 500,000 users in OECD^{Footnote 5} countries from Twitter to infer international and internal migration patterns during May 2011–April 2013, while using a difference-in-difference approach to reduce selection bias of the Twitter data with the OECD population when inferring trends in out-migration rates for single countries. Another work (Hübl et al. 2017) uses geotagged tweets that focus on identifying and visualizing refugee migration patterns from the Middle East and North Africa to Europe during the initial surge of refugees aiming for Europe in 2015. In another study (Drakopoulos et al. 2020) leveraging the geoinformation of the Tweets, the authors use machine learning techniques to study Twitter’s political conversation about the negotiation process for the formation of the government in Spain between 2015 and 2016 over different cities, and the factors conditioning the debate are analyzed, such as demographics, cultural factors and proximity to the centers of political power. Recently, Armstrong et al. (2021) discusses the challenges when identifying migration from geolocated Twitter data. Furthermore, it concludes that the data used for analyzing migration patterns are highly skewed by the subpopulation “transnationals” (i.e., citizens who seemingly live in two or more countries simultaneously and seamlessly move across borders) rather than conventional classified migrants. The skewness of the data limits its utility in studying migration populations. In comparison, MGKB deals with text data in EU destination countries of refugees regardless of the origin countries and measures the attitudes toward migrations in general.

Focused on migration-related party communication on social media, Heidenreich et al. (2020) analyzes migration discourses from the official accounts of political parties on Facebook across Spain, the UK, Germany, Austria, Sweden, and Poland. The study concludes that political actors from extreme left/right parties address migration more frequently and negatively than the political players in the middle of the political spectrum. Instead of political actors, MGKB focuses on general public sentiments toward migration.

To analyze Twitter data over time, Aletti et al. (2021) presents a model to reproduce the sentiment curve of the tweets related to specific topics and periods, including the Italian debate on migration from January to February 2019, and to provide a prediction of the sentiment of the future posts based on a reinforcement learning mechanism (Aletti et al. 2020). The reinforcement learning mechanism is based on the most recent observations and a random persistent fluctuation of the predictive mean. While in Drakopoulos et al. (2021), the authors focus on determining which tweets cause multiple sentiment polarity alternations to occur based on a window segmentation approach and an offline framework for discovering and tracking sentiment shifts of a Twitter conversation while it unfolds. However, the sentiment analysis conducted in Aletti et al. (2021) uses polyglot (Chen and Skiena 2014) python sentiment module, and Drakopoulos et al. (2021) uses SentiStrength,^{Footnote 6} which are lexicon and rule-based methods. In comparison, the language models trained for MGKB provide state-of-the-art sentiment analysis and hate speech detection models. MGKB facilitates the sentiment evolution over time concerning refugees (cf. Sect. 3.3).

2.3 Public attitudes toward migrations

While the popularity of the topic of migration has risen dramatically over the last decade in Europe, many efforts have been invested in analyzing the public attitudes toward migrations from various aspects. For instance, Hainmueller and Hopkins (2014) is based on the studies conducted during the last two decades explaining public attitudes on immigration policy in North America and Western Europe. The authors investigate the natives’ attitudes toward immigration from political economy and political psychology perspectives. In Dennison and Drazanova (2018), the authors explore the academic literature and the most up-to-date data across 17 countries on both sides of the Mediterranean. The study summarizes theoretical explanations for attitudes toward immigration, including media effects, economic competition, contact and group threat theories, early life socialization effects, and psychological effects. It also concludes that in Europe, attitudes toward immigration are notably stable rather than becoming more negative. While (Helen Dempster and Hargrave 2020) emphasizes the factors of individuals’ values and worldviews. It states that individual factors (i.e., personality, early life norm acquisition, tertiary education, familial lifestyle, and personal worldview) have a more stable and strong impact on the person’s attitudes toward immigration than the influence of politicians and media. More recently, Coninck et al. (2021) researches to relate the quality and quantity of (in)direct intergroup contact to attitudes toward refugees, based on the contact hypothesis proposed by (Allport.1954). The hypothesis postulates that intergroup contact reduces prejudice between members of traditionally opposed racial groups (Ata et al. 2009; Barlow et al. 2012), which is reflected in the analysis of driving factors of public attitudes toward migrations in this study (cf. Sects. 5.2 and 5.3). In Dennison and Drazanova (2018), Helen Dempster and Hargrave (2020), and Coninck et al. (2021), the survey data are used exclusively, while for Hainmueller and Hopkins (2014), a comprehensive assessment of approximately 100 studies, including both survey and field experiment data, is conducted.

On the contrary, many analyses regarding public attitudes toward migrations are performed based on automated approaches. In Freire-Vidal and Graells-Garrido (2019), Twitter data are leveraged to characterize local attitudes toward immigration, with a case study on Chile, where the immigrant population has drastically increased in recent years. Lapesa et al. (2020) and Blokker et al. (2021) introduce a debate corpus specific to the immigration discourse, sourced from Die Tageszeitung in 2015, a major national German quality newspaper, using a semi-automatic procedure, which integrates manual annotation and natural language processing (NLP) methods. In comparison, MGKB focuses on the data from average Twitter users to reflect a more realistic public opinion regarding migrations. Most recently, instead of using news outlets as a data source, Rowe et al. (2021) uses Twitter to track the public sentiment regarding immigration during the early stages of the COVID-19 pandemic in Germany, Italy, Spain, the UK, and the United States (US). In conclusion, it finds no evidence of a significant increase in anti-immigration sentiment. However, in this study, we observe a slight increase, 2% and 1%, in the percentages of hateful tweets and negative tweets toward migrations from 2019 to 2020 during the early stages of the pandemic in the European destination countries, as shown in Fig. 8(left). While the figure also shows no significant increase in overall anti-immigration sentiment over the last decade. However, evidence of persistent anti-immigration sentiment is presented (as shown in Fig. 5). In Pitropakis et al. (2020), the authors collected and annotated an immigration-related dataset of publicly available Tweets in the UK, the US, and Canadian English and explored anti-immigration speech detection using language features. In our study, we have studied Twitter data across 11 EU countries spanning from 2013 to 2020, with the help of the topics and linked entities in the MGKB, the public sentiments toward migrations, including specific events, e.g., COVID-19, Syrian Civil War, over the last decade can be queried (cf. Sect. 6).

3 Pipeline for constructing MigrationsKB

MGKB is an extension over TweetsKB with a specific focus on the topic of migration (as the name depicts). The goal of MGKB is: (i) to provide a semantically annotated, queryable resource about public attitudes on social media toward migrations, (ii) to provide an insight into which factors in terms of economic/social indicators are the cause of that attitude. In order to achieve these goals, a pipeline for constructing MigrationsKB is shown in Fig. 1. Step $\textcircled {1}$ is defining migration-related keywords and performing keyword-based extraction of geotagged tweets and their metadata. In step $\textcircled {2}$, the extracted tweets are preprocessed before further analysis. In step $\textcircled {3}$, topic modeling is performed for refining the tweets by removing irrelevant tweets crawled in the tweet extraction phase. Contextual Embeddings are then used for performing sentiment analysis in step $\textcircled {4}$. In order to further analyze the negative sentiments in terms of hate speech against the immigrants/refugees, tweets are further classified into three classes, i.e., hate, offensive, and normal, which is step $\textcircled {5}$. To enable search by mentioned entities in tweets, entity linking to Wikipedia and Wikidata is conducted in step $\textcircled {6}$. Furthermore, an analysis of factors causing the negative sentiment or the hatred against immigrants/refugees is performed with the help of visualization and statistical methods. In order to make this information queryable with the help of SPARQL queries, MGKB is constructed and populated with information extracted using the previously described steps (step $\textcircled {7}$). The statistics about these relevant factors, such as the unemployment rate and the gross domestic product growth rate (GDPR), are extracted from Eurostat, Statista, UK Parliament (Powell et al. 2021), and Office for National Statistics (Leaker 2021) (step $\textcircled {8}$). To facilitate applications in various fields, in step $\textcircled {9}$, the use-cases and queries are given in detail.

3.1 Collecting migration-related tweets

In order to identify the public attitudes toward migrations in the EU countries, the first step is to select a list of destination countries, i.e., the countries hosting the immigrants/refugees. The statistics about asylum applications (annual aggregated) present on Eurostat is used to obtain the countries with a higher frequency of asylum applications during the period from 2013 to 2020. The list of countries includes Germany, Spain, Poland, France, Sweden, the United Kingdom (UK), Austria, Hungary, Switzerland, the Netherlands, and Italy.

In the second step, relevant tweets are extracted using keywords related to the topic of immigration and refugees using word embeddings. The words “immigration” and “refugee” are used as the seed words based on which top-50 most similar words are extracted using the pretrained Word2Vec model on Google News and fastText embeddings. These keywords are then manually filtered for relevance. Based on these keywords, the initial round of crawling the tweets is performed. Then, the crawled tweets are analyzed, and the most frequent hashtags, i.e., the hashtags occurring in more than 100 crawled tweets, are selected. These hashtags are verified manually for relevance and then used with the keywords for crawling tweets spanning from January 2013 to July 2021. The keywords and selected popular hashtags for filtering tweets are available on the GitHub repository.^{Footnote 7} The 20 most frequently occurring hashtags containing “refugee” and “immigrant” are shown as the result of a query example on the web page.^{Footnote 8}

Table 1 Statistics of crawled and preprocessed tweets

Analyzing social media for measuring public attitudes toward controversies and their driving factors: a case study of migration

Abstract

Similar content being viewed by others

The Political Debate on Immigration in the Election Campaigns in Europe

What do COVID-19 Tweets Reveal about Public Engagement with Nature of Science?

Social media mining, debate and feelings: digital public opinion’s reaction in five presidential elections in Latin America

1 Introduction

2 Related work

2.1 Knowledge bases based on Twitter data

2.2 Migration-related social media data analysis

2.3 Public attitudes toward migrations

3 Pipeline for constructing MigrationsKB

3.1 Collecting migration-related tweets

3.2 Topic modeling

3.3 Sentiment analysis

3.4 Hate speech detection

3.5 Entity linking

4 MGKB Ontology

4.1 Representing economic indicators of EU

4.2 Representing provenance information

4.3 Further extensions

5 Factors affecting the public attitudes toward migrations

5.1 Correlation visualization

5.2 Multivariate analysis of potential determinants

5.3 Explanatory models

6 Use cases and sustainability

6.1 Scenarios and queries

6.2 Sustainability, maintenance, and extensibility

7 Discussion, conclusion and future work

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Data availability

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation