Data and Methodology in the Twitter EP2019 Analysis

Palonen, Emilia; Sibinescu, Laura; Koljonen, Juha; Herkman, Juha

doi:10.1007/978-3-031-41737-5_1

Emilia Palonen³,
Laura Sibinescu⁴,
Juha Koljonen⁴ &
…
Juha Herkman⁵

199 Accesses

Abstract

The chapter introduces the data collection process and methods used in the study. The main dataset was assembled form material collected from seven EU countries that represented so-called Twitter countries during the 2019 EP elections: the Netherlands, Germany, Finland, Italy, Spain, Ireland and the UK. The countries cover the South-North and Centre-Periphery dimensions in Europe and adequately follow the system models of politics and media devised by Hallin and Mancini. The data were gathered in real time during the EP election campaign in May 2019, based on hashtags. Two datasets were collected: raw data comprising 1,552,674 tweets from 222,169 accounts from all 27 EU countries covering all actors participating the campaign discussions, and a more selective main database of 49,492 tweets belonging to 2512 politically affiliated accounts in the seven above-mentioned countries. The raw data were used in computational topic modelling to find the timeline of various topics, and how they relate to each other. The computational and manual word frequency analysis of the main data was used to figure the themes favoured by various political actors in specific countries, and a network analysis was carried out to map the activities of these tweeters and their relationships. In addition, the chapter shows the methodological particularities in each country and discusses the 2019 EP elections as a specific context for the study.

You have full access to this open access chapter, Download chapter PDF

1.1 Introduction

The starting point in collecting our data and defining our methodology was the idea that the EP elections can be seen as a specific moment and Twitter data as a ‘piñata’ (Lindgren, 2020) or keyhole to the discussions of European issues in their national contexts. We believe that analysing the Twitter debates during the 2019 EP election campaigns in several European countries can provide us with an insight into a complex question of the Europeanisation of national political public spheres in times of a hybrid media environment (Chadwick, 2013). Existing research on Twitter for the 2019 EP elections has addressed candidates (Stier et al., 2021), Spitzenkandidates (Rivas-de-Roca & García-Gordillo, 2020), political parties (Alonso-Muñoz & Casero-Ripollés, 2020), and the populist radical right (Heft et al., 2022), but we chose to look at the wider discussions online by immersing ourselves in data gathered through campaign hashtags, operationalising the idea of ‘hashtag publics’ (Rambukkana, 2015) or ‘landscapes’ (Koljonen & Palonen, 2021).

By using hashtags instead of selected accounts as a principle for collecting data, we believe we can sieve out better the themes and discourses promoted by various actors around the European (election) issues with no pre-determination. We can cover both the strategic top-down communication of institutions such as political parties and the bottom-up communication of their rivals and individual actors (see Heft et al., 2022). This also helps us to explore the relationality between the various Twitter accounts. It enables us to discover the role of populism as an antagonist logic of constructing political dynamics and ‘us’ building in national political communication contexts (Herkman, 2022; Vulovic & Palonen, 2023). Therefore, in this research, we focused on sets of national data based on hashtags used in particular countries.

The data collection and analyses here are linked to two research projects funded by the Academy of Finland: Mainstreaming Populism in the Twenty-First Century (MAPO) and Whirl of Knowledge: Cultural Populism and Polarisation in European Politics and Societies (WhiKnow) run out between 2017 and 2022. The Twitter data were originally collected in real time under the WhiKnow project from all EU countries during a one-month period around the EP elections in May 2019. The original data covers actors of all types taking part in social media discussions. However, the scope of the data was narrowed for this analysis from the MAPO perspective to ‘political actors’ and to the so-called Twitter countries in which Twitter played a central role in the political communication of the 2019 EP election debates. Therefore, this study investigates political communication in seven countries: the Netherlands, Germany, Finland, Italy, Spain, Ireland and the UK (the selection of countries is discussed later in more detail).

For our research purposes, a combination of multi-method computational and manual analysis of actors and Twitter content in the 2019 EP elections seemed the most appropriate. Therefore, we first proceeded by manually coding and computational grouping Twitter metadata to find the main actors and tweeting activities during the 2019 EP election discussions. Secondly, we carried out computational and manual arrangement of the vocabularies and word counts of the terminologies used to find the themes promoted by various actors in national political public spheres. By undertaking this word-frequency analysis we aimed to investigate how Europeanism and nationalism or Europhilia and Euroscepticism were represented in our data. Thirdly, we proceeded with computational LDA topic modelling to find out how different topics and actors were related to each other and the topics’ timelined during the campaign period in our data. Fourthly, we carried out social network analysis to find out the importance of different actors and possible clusters in communication networks of our Twitter data.

Gathering a full database from social media is an impossible quest, as it will always be a partial iteration of what the engines provide (cf. Maurer, 2022; Hopster, 2021). The research strategy has been interpretive (Bevir, 2010), one in which in the ethos of poststructuralism, a backbone of the chosen theorising of cultural approach to populism, emphasises relationality (Laclau, 2005; Howarth, 2015). The data were gathered mainly in real time around Europe, and is always partial which makes it challenging for taking it for granted as the exact measures of the ultimately uncapturable hashtag landscape. Therefore, an approach to data anarchism has been adopted; the results, even if partial, indicate some developments in each of the countries, and what is beyond the data piñata can be discussed in the interpretive analysis of the data (Lindgren, 2020). Whereas in positivist political science, the measures of the surveys or election results are taken as the given indication of reality, the interpretive approach seeks to approach perceptions of reality through the indications provided by the available data. Heuristic tools enable this research to go beyond descriptive and provide explanations or logics for political developments (Glynos & Howarth, 2007).

There are several ways to minimise the discrepancy of the received data, from intercoder reliability to matching datasets, but in this pilot project approach, we went beyond choosing one single hashtag such as EP2019, we were at the mercy of the hashtags provided and their use in the national contexts (see Table 1.3). In fact, these were even local contexts in cases like Spain and Ireland where local elections were combined with the European ones. This has effects on the data as results of our inquiry, which the interpretive approach therefore needs to consider. We also would like to note that conducting the data collection in 2019 in real time was experience that was different from what it subsequently became as Twitter as a platform made available its historical data before restricting access in 2023 (e.g., Kupperschmidt, 2023).

We believe that despite these discrepancies in the data, the results are indicative of some dynamics in the EP elections, emerging themes and how different political parties articulate them. Considering the tradition of the analysis of political ideology, significant dynamics would be visible in a sample set of data. Furthermore, despite the interpretive research strategy, we do believe that the results are more or less reproducible and can be interpreted similarly as other studies on the 2019 EP elections indicated (e.g., Stier et al., 2021; Heft et al., 2022). We maintain that even if a similar approach cannot be adopted for the 2024 elections, it will be a study which is similar enough of a widely used social media platform. This would provide interesting longitudinal insights for the work we have embarked on in this study. Next, we discuss the selection of the data and methods above more deeply, but before that, we briefly describe the procedure for selecting the seven countries included, and the context of EP elections as a specific research subject.

1.2 Selecting the Countries

The seven countries selected for our analysis represent different contexts for populism and political communication. Three of these countries (the Netherlands, Germany and Finland) represent the ‘democratic corporatist model’, two are ‘polarised pluralist’ (Italy, and Spain), and two are ‘liberal’ countries (Ireland, and UK) in Hallin and Mancini’s (2004) seminal classification of politics and media systems. A long history of agrarian populism colours the Finnish political field, but there is also a long history of both right and left populism in Italy, the Netherlands and the UK. In all of the countries, populist radical right movements have been successful during the twenty-first century (especially in Italy, the Netherlands, Finland, and Germany), but in Spain the left-wing populist Podemos became popular before the radical right Vox. In Ireland, clearly radical right parties have remained marginal so far, but populist antagonisms have coloured party politics in other ways, and in the UK, the Brexit especially had devastating effect in populist antagonisms during the 2010s.

All the countries included in the analysis may be called ‘Twitter countries’, in the sense that Twitter had a significant role in their political communications. This does not mean that Twitter would be the most important channel for political communication – indeed how importance could even be measured is debatable. In many of these countries, traditional news media – television, press, radio – continue to play a vital role in political communication, and of the social media platforms Facebook may have had more accounts among the population than Twitter. However, Twitter was in 2019 used especially by decision-makers, politicians, journalists, experts, and civil society activists, making it a powerful platform for political communication (e.g., Jungherr, 2016). Therefore, in the ‘hybrid media’ circulation of political discussions (Chadwick, 2013), Twitter has played a key role in these countries.

We did gather data for several other countries where Twitter is widely used, but left out many potential ones for the scope of our study. France, Denmark, Portugal and Sweden, were excluded, because we believe that the countries chosen meet the criteria of particular country groups in Europe. We also gathered data from the Central East European region or the member states from 2004 onwards, but we managed to collect proper data from Twitter only for Poland. We finally left Poland out of the comparative analysis for editorial reasons. We believe that in addition to Hallin and Mancini’s (2004) categorisation of political and media systems our selection covers at least the South-North and Centre-Periphery dimensions in Europe adequately enough. We also gathered transnational hashtags, but these are not analysed in this volume for the reasons of the country case focus.

1.3 Context: European Parliamentary Elections

European parliamentary politics is structured through national elections held every five years. The European Parliamentary (EP) elections are a significant event on the continent, as the EU parliamentarians in each member state are elected to represent the citizens at the European level. Generally, the Members of the European Parliament (MEPs) are elected through the national parties, or in some cases through citizen lists or movements. In the 2019 elections, the Greek politician Giannis Varoufakis, for example, was running in Germany on the DiEM25 list. The number of nationally available seats are relational to the population of the member states, and in 2019 there were additional Brexit seats that were distributed between the member states due to the effects of the event. The electoral systems and the voting dates also vary between the constituent countries (see Table 1.1).

Table 1.1 EP 2019 elections in countries included in our study

Full size table

In the European Parliament, the elected politicians join parliamentary groups, the largest of which are the established centre-right European People’s Party group EEP, the Alliance of Liberal and Democrats for Europe group ALDE, and the Socialist and Democrats group S&D. However, especially among the populist radical right (PRR) party family, new groups are negotiated after the elections for each parliamentary cycle. Over time, other populist parties have joined, such as the Eurosceptics group, the European Conservatives and Reformists group ECR, as well as the new Identity and Democracy group ID after the 2019 EP election. The power distribution between European Parliament groups has been transformed in the twenty-first century, so that the traditional and largest right and left-wing groups, the EPP and S&D respectively, have lost ground, and liberal, green, and radical populist groups have increased their proportion of seats in the parliament.

Whereas EPP and S&D represented almost 66% of the vote after the 1999 EP election, their share of the vote had been no more than 45% in the 2019 election; thus, 20% of the vote has been re-distributed from traditional political groups to more radical liberal or populist conservative actors over the last 20 years. In addition, the share of the Non-Inscrits (NI) group has significantly increased over the years, demonstrating the increasing diversification in the political spectrum of the European Parliament (see Table 1.2).

Table 1.2 Share (%) of votes for different EP party groups in the twenty-first century

Full size table

Most of the parties in our analysis that were described as populist joined the Identity and Democracy group after the EP 2019 election. These parties included radical right movements such as the Finns Party (Finland), the Dutch Party for Freedom PVV (the Netherlands), Lega (Italy), and Alternativ für Deutschland AfD (Germany). However, Vox (Spain) joined the European conservatives and reformists group ECR, and members from Podemos (Spain) and Sinn Féin, along with some independents from Ireland, joined the leftist GUE/NGL group. Another political party from Italy often called populist, the Five Star Movement, joined the Non-Inscrits group of the European Parliament after the 2019 election. Thus, even if the ID group has unified many European radical right movements, not all of them belong to the same group in the European Parliament, and there is even more variation in the group memberships among all political movements described as populist, depending on their ideological orientations. Later the group membership changed but our focus here is on the EP2019 campaign period. Therefore, in this study, the actors are investigated in the context of their own national elections, not through their election manifestos, as we seek to give precedence to the ‘political over the institutional politics, or democracy over demography’ (Palonen, 2021).

1.4 The Data Collection

The data used in this book originates from the collaboration between MAPO and WhiKnow during 2019–2021. For its purposes, WhiKnow required a wider range of data focusing not just on politicians and political parties, but also institutions and public figures in other fields, such as media, academia and NGOs. In collecting this data, the goal of WhiKnow was to examine how knowledge forms and circulates in polarising contexts, such as elections. Thus, WhiKnow data were collected from the overall social media conversation around the EP elections in 2019. The MAPO project network and research focus on populist dynamics was used to analyse this data, whereby focus was put on political actors.

We will also emphasise that in the period of data collection in spring 2019, Twitter lived its ‘golden age’ as a forum for political communication. In 2019, Twitter had proved to be a popular platform, used by several most powerful politicians in the world, not least by Donald Trump, whose success and public visibility largely relied on his provocative tweeting activities (see Groshek & Koc-Michalska, 2017; Kreis, 2017). Twitter’s reputation as a platform was intrinsically intertwined with political communication cycles in hybrid media environments (Chadwick, 2013). It was not before 2021, when Twitter banned Trump’s and some other actors’ accounts as harmful to society, that the star of the platform as a provocative political communication forum began to come down. After the controversial businessman and billionaire Elon Musk bought the platform in 2022 and started to alter its organisation and principles according to his own mindset, the popularity of Twitter as a political communication forum decreased again. Therefore, when writing this, the future of Twitter was unclear, and we can argue that our Twitter data come from a historical moment when Twitter really was the social media platform of political communication in many countries – making our data emblematic for analysis of Europeanisation of national public spheres. At the time when this volume is published, Twitter has been renamed as X, but we use ‘Twitter’ throughout the book.

Data collection and preparation consisted of three steps. Firstly, we received a number of hashtags and keywords from country experts collaborating with our project, relevant to the EP elections both domestically and internationally. Aside from general hashtags, such as #EP2019, #europeanelections or #eupol, we used country-specific hashtags, as well as hashtags we identified during the data collection process in relation to emerging events, such as Ibizagate in Austria, and voter disenfranchisement in the UK (see Table 1.3).

Table 1.3 List of hashtags and sampling keywords in different countries

Full size table

The second step was to collect data based on these hashtags and keywords. The data was conducted as synchronous gathering in May 2019 for the month when the EP elections took place, using the MeCodify platform (Al-Saqaf, 2016). Early in the process, however, we noticed that, based on the volume of incoming data, there was a clear divide in social media use between what we have termed ‘Twitter countries’ and ‘Facebook countries’, with the latter being significantly concentrated in Central and Eastern Europe. Combined with the fact that opportunities for automated data collection on Facebook were already limited at the time, this resulted in an overall dataset largely consisting of Twitter data, and skewed towards Western, Southern and Northern Europe. Synchronous data collection in batches of weeks meant that there are also different follower rates for the different accounts, which are reported mainly for the latest entry. In the case of amendment with historical data to Netherlands, we have used the follower numbers of the gathering period, too, indicated in the respective table.

In total, we collected 1,552,674 tweets from 222,169 accounts raw dataset for WhiKnow from all 27 EU member states. From these raw data 969,562 tweets belonged to the seven ‘Twitter countries’ included in our analysis. Once data collection concluded at the end of May 2019, the final step was to prepare it for analysis. To do this, we asked our collaborating country experts to manually classify tweeters based on their public Twitter bios into several categories. In particular, we were interested in public figures and institutions: politicians, journalists, academics, political parties, media outlets, universities and NGOs/think-tanks. In addition to this categorisation, our collaborators also classified accounts, where possible, according to their political affiliation, either to specific parties, or more generically as left/right or single-issue (for instance environment, immigration, LGBTQ+ issues).

This categorisation or coding was done by hand with country expert partners or research assistants with country-specific knowledge in 2019–2020. Coders were trained for this task by the WhiKnow research team. Nevertheless, there were some challenges regarding codeability, as in some countries the bio descriptions of the Twitter accounts are transparent with the political sympathies whereas in others they are not. General knowledge was used in many cases. Some countries, like the UK, were double-coded by the project team before analysis. Inter-coder reliability was sought for, but not always successfully due to the largely pro-bono nature of this activity.

Some datasets, such as the UK and the Netherlands, were updated through a historical dataset, after coding of the accounts, as there were certain sections missing. The update for the Netherlands was due to the weakness of the original range of hashtags, while for the UK the lack of the Conservative Party in the data was due to the fact that the Tories were not communicating about the 2019 European Parliamentary elections in the first place.

These insights are useful in considering the kind of research Twitter research was in 2019. On the one hand, the need for synchronous data gathering in the absence of historical data, meant getting the bios in the time of communication and getting more fuller data as removed tweets would still appear. On the other hand, it was not possible to test different hashtags or keywords, and the data gathering was quite an intensive real-time process. We are grateful to all the people involved in the collection, consultation and categorisation of the data that was done under the leadership of Laura Sibinescu, in particular, and Emilia Palonen, the PI of the WhiKnow project. Hand-coding of the data was something that enabled us to see a larger picture of the types of accounts that were active in the countries. In hindsight some combination of automated and hand-coding would have made sense. In the current time of writing, we could consider trying AI such as ChatGPT-4 in the annotation process (Törnberg, 2023).

1.5 The Data for This Research

To answer more in detail to questions of political communication, we carried out an in-depth content analysis of political actors’ tweets extracted from the WhiKnow raw data. Therefore, for the analysis other than topic modelling, from the WhiKnow data we selected only those tweets and accounts that were classified as belonging to political actors including politicians and parties in seven countries: the Netherlands, Germany, Finland, Italy, Spain, Ireland, and the UK. This gave us 49,492 tweets belonging to 2512 accounts, which is about 5.1% of all the tweeting activity in the original raw data in these countries. We might have some flaws in the classification of political actors depending on countries because of difficulties in recognising them comprehensively. However, we believe that our classification gives a qualified enough overview about the distribution of the themes and topics between various political actors in selected countries. The distribution of these accounts across countries is shown in the Table 1.4.

Table 1.4 Data summary

Full size table

In regard to country selection according to the Hallin and Mancini (2004) model of media and political systems, we can see some over-emphasis of tweets from democratic-corporatist countries and accounts from liberal countries. This derives from the fact that there was an additional democratic-corporatist country in the country selection compared to the other two models, and in Ireland the local elections and referendum were conducted simultaneously with the 2019 EP elections especially increased the number of accounts in liberal countries. However, with these exceptions in mind, the distribution of sample tweets was relatively even between the country groups.

The full dataset per country was used for LDA topic modelling, which was matched with politics-coded accounts. This means that the original dataset was larger but was analysed using the same range of accounts. The combination of the datasets means better reliability of the interpretive process: we did both thematic analysis and topic modelling, explained in more detail below.

Although the amount of data used in political actors’ tweeting activity and content analysis is small when viewed against recent ‘big data’ comparative studies on political campaigning and polarisation on Twitter (see e.g., Urman, 2020; Stier et al., 2018; Stier et al., 2021) in our study we sought to combine qualitative semi-automated content analysis with careful contextualisation rather than actor analysis emphasised in big data analyses. The chapters in this book thus join other smaller-scope studies on populism and online polarisation, such as Waisbord and Amado (2017), Ernst et al. (2017) and Gruzd and Roy (2014). In addition to contextualisation, the smaller amount of data also makes it easier for human readers to identify and zoom into particular areas of interest, such as the content communicated by a specific politician or party, or the emergence and evolutions of certain themes and events during the elections process. Therefore, our data fits well into our purposes of interpretive approach.

The semi-automated approach we use throughout this book is meant to answer our overarching empirical research questions: what particular topics and themes did political actors distribute on Twitter during the 2019 EP elections, and how were various actors and topics linked to each other? To answer these questions, we applied three analysis methods: thematic analysis, topic modelling and social network analysis.

1.6 Thematic Analysis

In thematic analysis, we explored the content of 49,492 tweets summarised in Table 1.4. The idea behind thematic analysis is to identify the most frequently discussed themes in the text of the tweets. This is where our semi-automated analysis had the most substantial human contribution, as we were interested in a closer reading of content, i.e., ‘what is there’, rather than a completely automated extraction of latent topics, which is used on a wider scale, discussed in a section below.

The first step in the thematic analysis was to separate tweets into categories according to the party affiliation of their authors. Thus, we obtained specific text corpora for each political party. The size of these corpora varies quite significantly between party affiliations and countries, although accounts affiliated with larger parties tend to also tweet more. The varying corpus sizes influences results, in that less data will result in fewer or more generic themes, compared to the richer insights gained from more substantial content. Differences in language syntaxes also may influence on thematic variations in our overall data. Despite these remarks, we believe that we have enough data from each country to cover the main thematic emphases in country-specific election discussions.

In the second step, we used SpaCy (Honnibal & Montani, 2017) to clean the text corpora and find word frequencies for each party. SpaCy is an open-source Python package for information extraction and Natural Language Processing of large texts. Finding word frequencies is a fairly simple procedure that first requires the text to be tidied up by filtering out stop words such as prepositions, pronouns, common verbs and other very frequently used words; and lemmatisation, which removes grammatical inflections and returns words to their dictionary forms so they can be counted together. For instance, if a tweet corpus contains several instances of both ‘party’ and ‘parties’, lemmatisation transforms the plural form to a singular, so the true frequency of the term ‘party’ can be counted.

In the final step, the authors went through the lists of words and their frequencies manually, and grouped together words they determined to be thematically related into more general topics, such as elections, the EU, environment and economic issues. Adding up the frequencies of these words shows the prevalent themes affiliated with each political party. Because many words appear just once or a few times in data, not all words but the most significant word groups were listed to find out the most important themes favoured by various actors and actor groups. In each country, from 300 to 500 most significant words and word groups were included in the analysis, meaning a majority from the whole word corpus in Twitter data. Because of our focus on Europeanisation of national public spheres and relationship between populism and political communication, we were especially interested in vocabularies discussing such themes as Europhilia, Euroscepticism, nationalism, patriotism, immigration, climate change and media.

Of course, due to data being collected around the 2019 EP elections, the themes of voting and campaigning, as well as EU institutions, are at the top in most cases. However, in each country specific themes outside the election vocabulary raised and indicated interesting results. Once themes were identified, they were examined in the broader context of each country’s political and media system.

1.7 Topic Modelling

As mentioned above, in the topic modelling we used the whole WhiKnow Twitter raw data corpus, rather than the category-defined dataset in which only tweets from accounts that had been coded to match particular parties were used. These data were first pre-processed and cleaned by discarding information not related to our analysis. We retained only the timestamp, the username of the tweeter and the actual text of the tweet. Contrary to the thematic analysis, for topic modelling, the data were not stemmed or lemmatised, because stemming or lemmatisation might discard the parts of the words which are used to define ‘us’, ‘them’ and ‘other’. We were interested in the dynamics within the text, particularly the articulation of a frontier between ‘us’ and others in the tweets. In addition, there is some evidence that stemming might not produce any significant improvement of the results and might in fact degrade the topic stability (Schofield & Mimno, 2016).

We used LDA topic modelling to generate a meaningful overview of the tweets. First, we applied R script ‘ldatuning’ to estimate the optimal number of topics for each country’s data set (k-means). Second, from the optimised suggestions of models we selected the one which contained the fewest number of topics. Reason for this was to ease the interpretation as models with more topics (in some cases well over 100) can be harder to interpret than the models with comparably smaller number of topics.

Each country’s data was analysed with LDA topic modelling separately using R package ‘topicmodels’, with each tweet as separate document. As the topic volatility or stability is a known challenge with topic modelling (e.g., Wilkerson & Casas, 2017), we analysed five models with the same parameters for each country. Reason for this was to produce more robust results as country specialists could base their interpretation on all five models. These five models are non-identical, although the results are similar. This adds to the reliability of the data. In addition, a simple topic validation was done by a non-country expert by first checking if party affiliated accounts did mainly tweet about the party, and second how the election day affected the number of topics.

The results were post-processed to facilitate the interpretation. Party affiliation (if any) was added to the accounts based on the country-based classification sheets. Next, all tweets which were from party affiliated accounts were taken separately per political party per country. That is, we formed separate data sheets of all tweets from all party affiliated accounts in all countries. On the data sheets, the topics were sorted to show which were the topics the party affiliated accounts tweeted more often, and which topics party affiliated accounts did not tweet about. Generally, party affiliated accounts’ tweets belonged to the topics which were related to their party. A time series for each country was produced by listing tweets per topic for each day on a simple line graph. This time series was prepared from all tweets (that is, time series pictures ignore party affiliation).

We applied topic modelling in this study to understand the discursive field and its focus points. We analysed the five sets of topic models with the same parameters, as in our interpretation this allows us to see five alternative but viable computational interpretations of the data. That is, each of the five models offers an interpretation of the data, and none of them is better than another. This also means that the unit of the investigation or analysis is not a singular word in the topic modelling term listing, but the field or sheet of topics in all five models and the common themes manifested in them. Furthermore, we discovered a recurring pattern in all datasets irrespective of country/language and party that there were recognisable ‘campaign topics’, and we consider it a strong sign of data reliability and comparability of the data.

1.8 Social Network Analysis

Our goal with social network analysis (SNS) was to examine how Twitter accounts are connected to each other and the patterns which emerged from these connections: how were different political actors related to each other in campaign communication of the 2019 EP elections? For instance, do parties consistently keep to their own Twitter bubbles, or do they communicate with other political parties? And if so, which accounts would act as such communication nodes? Network analysis is a way of visualising the polarisation expressed as retweets, replies and mentions between the Twitter accounts.

Similar to thematic analysis, preparing the data for SNS begins with separating tweets according to party affiliation. However, instead of content, the point of interest is to filter out the tweets that contain any reference to another Twitter account, denoted in-text by ‘@’ in front of a username (some tweets reference multiple accounts). This is most often a retweet, but also a mention or reply from one account to another. However, since there is no clear consensus in literature over the nature of such linkages, we chose not to differentiate between them (see Metaxas et al., 2014; Wells et al., 2020). Thus, the mention of a ‘@username’ is treated more generically as a connection between the tweet author and said username within the social network.

In the following step, we used Gephi, an open-source network analysis software (Bastian et al., 2009), to visualise these connections as part of a social network. Here we were particularly interested in the possible formation of separate communication bubbles that might indicate polarisation, so we measured the role of each account in the network by considering the degree to which they might act as connections between such clusters. Thus, we calculated each network node’s betweenness centrality, which measures how well they allow information to pass from one part of the network to another (Golbeck, 2015).

In other words, Twitter accounts with high degrees of betweenness centrality are worth examining as they connect different parts of the overall network. On the other hand, low values of betweenness centrality suggests a network in which certain clusters are formed (for instance, several accounts retweeting a single tweet but not communicating with each other) but they are weakly connected or wholly unconnected, suggesting distinct communication bubbles. Both these situations, combined with the thematic analysis described in the section above, can be used to study polarisation in political communication.

1.9 Some Country Particularities

The final points to address in this chapter are a few country-specific characteristics that emerged in the data collection and classification process. In general, these characteristics and their impact on country insights are discussed in more depth in the corresponding country chapters.

The Netherlands was a stand-out case due to the significantly larger number of tweets produced by a small number of accounts: the average number of tweets per account is almost double compared to the overall cross-country dataset (see Table 1.4): during the initial data collection with country-specific hashtags, the Netherlands yielded only 1342 tweets from 168 accounts, because hashtags were used less by politicians on Dutch Twitter compared to the other cases in this book. By adding keywords related to the 2019 EP elections to the hashtags used in the original data collection round, we were able to significantly increase the number of tweets for analysis. However, the larger amount of data does not affect the results from the Netherlands case compared to other countries examined in this book. This is firstly because thematic analysis is mainly concerned with qualitative insights resulting from linguistic analysis, and secondly, because the number of Dutch Twitter accounts – which are used in the network analysis part of each chapter – is not significantly larger compared to other countries.

The Irish data were also re-extracted from the master database and re-coded because of deficiencies in the sample of political actors and their classifications developed in the original database. Therefore, the total number of Irish tweets increased from 7759 to 10,446 in re-coding and was ultimately more comprehensive. In the Italian case, we also had lower numbers of tweets compared to the body of Italian data, but this was because of the coding, which was more selective on the category of politician. The data was nevertheless not re-coded. The Italian data that appears rather small has been subsequently amended by re-coding and used in a comparative setting with the Finnish and the Dutch data by Yannick Lahti for doctoral research successfully defended at the University of Bologna in June 2022.

Finally, the thematic analysis of the Finnish case was in large part done manually due to the fact that the SpaCy package used for the other countries does not currently have a Finnish language model. Thus, while word frequency was obtained automatically, much of the data cleaning and lemmatisation was done manually by Finnish speakers to obtain results comparable to the other cases. Furthermore, in the chapter on Germany, another approach to topic modelling was taken: the chapter used hashtag co-occurrence tables imported in the network visualisation and analysis software ‘Visone’, and finally clustered by applying Louvain clustering method to the network (Blondel et al., 2008).

1.10 Conclusion

This chapter has included information on the EP elections’ results across the countries to give background to the country chapters and their analysis of the communication, politics and populist dynamics. It further outlined our data-gathering strategy based on hashtag landscape through Twitter, and the multiple analytical methods that were applied on the data to be further interpreted by our country experts. These included simple attention to the tweet activity, themes, topics and networks that could be found in the data.

The following chapters use the methods described here to examine and contextualise populism and polarisation at the country level. We trust that the insights of the multi-method analysis are robust and comparable for an interpretive approach to European politics. In fact, the idea of carrying out multiple forms of analysis on the same data, whether dataset defined by category or original full-country data, enables us a more nuanced reading of the data. While, the top-lists of tweeters and tweets are particular in this data, they are the result of systematic data gathering, and they also illustrate the social-mediated election debate in the country. The networks have enabled us to see the interconnectedness of the accounts while the themes and topics on different datasets speak of the thematic priorities and contestations in the election campaigns.

The study gives testimony to the use of Twitter in 2019 as a major platform for political debates, that it was made available for research purposes. We recognise that this was a unique opportunity for data gathering on the European scale. It was premised on the idea of wide-spread use of Twitter in political communication, which we used to focus on seven countries with different media and political systems. In the next European Parliamentary elections in 2024, the platform and data availability may differ. In fact, we doubt it would be feasible to carry out an identical study due to the changes in the Twitter API policy and the Europe-wide appeal of the platform. For a multi-method interpretive analysis, there will be other opportunities. We hope our current research strategy will provide a basis for both regionally and longitudinally comparative analysis on another set of material, using similar analytical tools from the interpretive perspective.

References

Alonso-Muñoz, L., & Casero-Ripollés, A. (2020). Populism against Europe in social media: The Eurosceptic discourse on Twitter in Spain, Italy, France, and United Kingdom during the campaign of the 2019 European Parliament Election. Frontiers in Communication, 5. https://doi.org/10.3389/fcomm.2020.00054
Al-Saqaf, W. (2016). MeCoDEM’s open-source tool for simplifying big data analysis and visualization [Computer software]. http://www.mecodem.eu/mecodify/
Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. Third international AAAI conference on weblogs and social media. https://doi.org/10.1609/icwsm.v3i1.13937
Bevir, M. (2010). Editor’s introduction: Interpretive political science. In M. Bevir (Ed.), Interpretive Political Science Interpretive theories (p. xxi–xlii). Sage Publications.
Google Scholar
Blondel, V., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
Article Google Scholar
Chadwick, A. (2013). The hybrid media system: Politics and power. Oxford University Press.
Book Google Scholar
Ernst, N., Engesser, S., Büchel, F., Blassnig, S., & Esser, F. (2017). Extreme parties and populism: An analysis of Facebook and Twitter across six countries. Information, Communication & Society, 20(9), 1347–1364. https://doi.org/10.1080/1369118X.2017.1329333
Article Google Scholar
Glynos, J., & Howarth, D. (2007). Logics of critical explanation in social and political theory (1st ed.). Routledge. https://doi.org/10.4324/9780203934753
Book Google Scholar
Golbeck, J. (2015). Analyzing networks. In J. Golbeck (Ed.), Introduction to social media investigation: A hands-on approach (pp. 221–235). Syngress.
Chapter Google Scholar
Groshek, J., & Koc-Michalska, K. (2017). Helping populism win? Social media use, filter bubbles, and support for populist presidential candidates in the 2016 US election campaign. Information, Communication & Society, 20(9), 1389–1407. https://doi.org/10.1080/1369118X.2017.1329334
Article Google Scholar
Gruzd, A., & Roy, J. (2014). Investigating political polarization on Twitter: A Canadian perspective. Policy & Internet, 6(1), 28–45. https://doi.org/10.1002/1944-2866.POI354
Article Google Scholar
Hallin, D. C., & Mancini, P. (2004). Comparing media systems: Three models of media and politics. Cambridge University Press. https://doi.org/10.1017/CBO9780511790867
Book Google Scholar
Heft, A., Reinhardt, S., & Pfetsch, B. (2022). Mobilization and support structures in radical right party networks. Digital political communication ecologies in the 2019 European parliamentary elections. Information, Communication & Society. https://doi.org/10.1080/1369118X.2022.2129269
Herkman, J. (2022). A cultural approach to populism. Routledge. https://doi.org/10.4324/9781003267539
Book Google Scholar
Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.
Google Scholar
Hopster, J. (2021). Mutual affordances: The dynamics between social media and populism. Media, Culture & Society, 43(3), 551–560. https://doi.org/10.1177/0163443720957889
Article Google Scholar
Howarth, D. (2015). Introduction: Discourse, hegemony and populism: Ernesto Laclau’s political theory. In D. Howarth (Ed.), Ernesto Laclau: Post-Marxism, populism and critique (pp. 1–20). Routledge. https://doi.org/10.4324/9780203762288
Chapter Google Scholar
Jungherr, A. (2016). Twitter use in election campaigns: A systematic literature review. Journal of Information Technology & Politics, 13(1), 72–91. https://doi.org/10.1080/19331681.2015.1132401
Article Google Scholar
Koljonen, J., & Palonen, E. (2021). Performing and contesting control during of the Covid-19 pandemic in Finland: Interpretative topic modelling and discourse theoretical reading of the government communication and hashtag landscape. Frontiers in Political Science. https://doi.org/10.3389/fpos.2021.689614
Kreis, R. (2017). The ‘tweet politics’ of President Trump. Journal of Language and Politics, 16(4), 607–618. https://doi.org/10.1075/jlp.17032.kre
Article Google Scholar
Kupperschmidt, K. (2023, Feb 16). Twitter’s plan to cut free data access evokes fair amount of panic among scientists. Science. https://www.science.org/content/article/twitters-plan-cut-free-data-access-evokes-fair-amount-panic-among-scientists
Laclau, E. (2005). On populist reason. Verso.
Google Scholar
Lindgren, S. (2020). Data theory: Interpretive sociology and computational methods. Polity Press.
Google Scholar
Maurer, P. (2022). Populism and social media. In A. Ceron (Ed.), Elgar Encyclopedia of technology and politics (pp. 37–42). Edward Elgar Publishing LTD.
Chapter Google Scholar
Metaxas, P. T., Mustafaraj, E., Wong, K., Zeng, L., O’Keefe, M., & Finn, S. (2014). Do retweets indicate interest, trust, agreement?. https://doi.org/10.48550/arXiv.1411.3555
Palonen, E. (2021). Democracy vs. demography: Rethinking politics and the people as debate. Thesis Eleven, 164(1), 88–103. https://doi.org/10.1177/0725513620983686
Article Google Scholar
Rambukkana, N. (Ed.). (2015). Hashtag publics: The power and politics of discursive networks (pp. 13–28). Peter Lang Publishing.
Google Scholar
Rivas-de-Roca, R., & García-Gordillo, M. (2020). Thematic Agenda on Twitter in the 2019 European Parliament Elections: A comparative study between ‘Spitzenkandidaten’ and National Candidates. Tripodos, 49, 29–49. https://doi.org/10.51698/tripodos.2020.49p29-49
Article Google Scholar
Schofield, A., & Mimno, D. (2016). Comparing Apples to Apple: The effects of Stemmers on topic models. Transactions of the Association for Computational Linguistics, 4, 287–300. https://doi.org/10.1162/tacl_a_00099
Article Google Scholar
Stier, S., Bleier, A., Lietz, H., & Strohmaier, M. (2018). Election campaigning on social media: Politicians, audiences, and the mediation of political communication on Facebook and Twitter. Political Communication, 35(1), 50–74. https://doi.org/10.1080/10584609.2017.1334728
Article Google Scholar
Stier, S., Froio, C., & Schünemann, W. J. (2021). Going transnational? Candidates’ transnational linkages on Twitter during the 2019 European Parliament elections. West European Politics, 44(7), 1455–1481. https://doi.org/10.1080/01402382.2020.1812267
Article Google Scholar
Törnberg, P. (2023). ChatGPT-4 outperforms experts and crowd workers in annotating political Twitter messages with zero-shot learning. https://doi.org/10.48550/arXiv.2304.06588.
Urman, A. (2020). Context matters: Political polarization on Twitter from a comparative perspective. Media, Culture & Society, 42(6), 857–879. https://doi.org/10.1177/0163443719876541
Article Google Scholar
Vulovic, M., & Palonen, E. (2023). Nationalism, populism or peopleism? Clarifying the distinction through a two-dimensional lens. Nations and Nationalism, 29(2), 546–561. https://doi.org/10.1111/nana.12920
Waisbord, S., & Amado, A. (2017). Populist communication by digital means: Presidential Twitter in Latin America. Information, Communication & Society, 20(9), 1330–1346. https://doi.org/10.1080/1369118X.2017.1328521
Article Google Scholar
Wells, C., Shah, D., Lukito, J., Pelled, A., Pevehouse, J. C., & Yang, J. (2020). Trump, Twitter, and news media responsiveness: A media systems approach. New Media & Society, 22(4), 659–682. https://doi.org/10.1177/1461444819893987
Article Google Scholar
Wilkerson, J., & Casas, A. (2017). Large-scale computerized text analysis in political science: Opportunities and challenges. Annual Review of Political Science, 20, 529–544. https://doi.org/10.1146/annurev-polisci-052615-025542
Article Google Scholar

Download references

Author information

Authors and Affiliations

Helsinki Institute for Social Sciences and Humanities (HSSH); Political Science, Faculty of Social Sciences, University of Helsinki, Helsinki, Finland
Emilia Palonen
Political Science, Faculty of Social Science, University of Helsinki, Helsinki, Finland
Laura Sibinescu & Juha Koljonen
Media and Communication Studies, Faculty of Social Sciences, University of Helsinki, Helsinki, Finland
Juha Herkman

Authors

Emilia Palonen
View author publications
You can also search for this author in PubMed Google Scholar
Laura Sibinescu
View author publications
You can also search for this author in PubMed Google Scholar
Juha Koljonen
View author publications
You can also search for this author in PubMed Google Scholar
Juha Herkman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emilia Palonen .

Editor information

Editors and Affiliations

Media and Communication Studies, Faculty of Social Sciences, University of Helsinki, Helsinki, Finland
Juha Herkman
Helsinki Institute for Social Sciences and Humanities (HSSH); Political Science, Faculty of Social Sciences, University of Helsinki, Helsinki, Finland
Emilia Palonen

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Palonen, E., Sibinescu, L., Koljonen, J., Herkman, J. (2024). Data and Methodology in the Twitter EP2019 Analysis. In: Herkman, J., Palonen, E. (eds) Populism, Twitter and the European Public Sphere. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-41737-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-41737-5_1
Published: 31 March 2024
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-031-41736-8
Online ISBN: 978-3-031-41737-5
eBook Packages: Literature, Cultural and Media StudiesLiterature, Cultural and Media Studies (R0)

Publish with us

Policies and ethics

Data and Methodology in the Twitter EP2019 Analysis

Abstract

1.1 Introduction

1.2 Selecting the Countries

1.3 Context: European Parliamentary Elections

1.4 The Data Collection

1.5 The Data for This Research

1.6 Thematic Analysis

1.7 Topic Modelling

1.8 Social Network Analysis

1.9 Some Country Particularities

1.10 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation