1 Introduction

The political ideology and polarization of a country are tightly linked to aspects related to the development of a society. This has attracted the interest of the research community to study the effect of ideology and polarization in society. We can find works exploring the impact of political ideology on the economic growth of a country [13] or discussing whether the level of education may impact the ideology of an individual [47]. Some works provide evidence on how the ideology of individuals also affects the trust in science [8, 9], the attitudes toward environmental decisions [10], climate change [11], fashion trends [12], consumption preferences [13], etc. Similarly, the literature has proved that political polarization is closely related to other socioeconomic variables such as the economic expectations [14] or the corruption perception in a country [15]. Some works also claim that polarization may improve the level of democracy [16]. For instance, the degree of polarization of a country could affect the voters’ turnout [17, 18]. Finally, recent works have also analyzed the impact of political ideology and polarization on the COVID-19 pandemic [1922].

This large body of research shows that measuring political ideology and polarization is of great importance (directly) for the research community to tackle multiple socio-economic issues and (indirectly) for policymakers that can use the output of the literature to take action to improve the quality of societies. For this reason, it is crucial to have efficient methodologies to measure political ideology and polarization.

The most widespread way to measure ideology and polarization relies on defining ad-hoc surveys, interviews, polls, etc., and processing the collected information to derive a quantitative metric for ideology or polarization in a given country or region. Public opinion research entities run surveys regularly for a specific country or region, often electoral polls (e.g., once a month or quarter), in most of the countries or a broader scope including multiple countries or regions. Examples of the latter are the World Value Survey (WVS) [23], the European Value Survey (EVS) [24] or the European Quality of Government Index survey (EQGI) [25]. Although the quality of these surveys is not under discussion since they are of great value for the research community, policymakers, and other stakeholders, they have some limitations. First of all, all these surveys, independently whether they are run for a single or multiple country/regions are expensive. Second, they usually present either low geographical and/or temporal resolutions. On the one hand, surveys run frequently (e.g., electoral polls) and usually focus on a single country or region. Expanding them further will very likely imply substantially higher costs and time. On the other hand, surveys with a large geographical coverage usually have a limited temporal resolution. For instance, the WVS, EVS, or EQGI are released every 4 years.

We believe that there is room to explore novel alternatives to surveys that allow us to achieve the three requirements implicitly discussed before: large geographical coverage, low temporal resolution, and low cost. This paper aims to propose a novel methodology using information publicly available on the Facebook (FB) pages of political parties (e.g., number of followers, posts published, posts reactions, etc.) to predict political ideology and polarization across the 27 countries forming the European Union (EU) plus the United Kingdom (UK), i.e., the 28 EU members before Brexit. We clarify that in the case of polarization we are actually focusing on party system polarization as a way to measure the distribution of parties in the left-right dimension in a parliament. The underlying idea is to understand whether we can effectively predict ideology and polarization with data that roughly captures: political parties’ popularity (number of followers), political party activity (posting frequency), and users’ engagement (reactions such as likes, shares, or comments). Note, that the reason to select users’ engagement [2630], popularity [27, 31] and activity [28, 32, 33] is because previous work in the literature have studied the relationship between those metrics from social media and elements around electoral results. In case we succeed in generating an accurate prediction model we will meet the three pre-defined requirements. First, our methodology is designed to cover a large geographical area including 28 different countries. Second, our methodology is designed to provide results at a very high frequency (e.g., every week) since the data for our model is readily available. Third, our methodology is inexpensive since it relies on publicly available data that can be collected and processed at a low cost. In addition, our solution allows using a single data source and methodology to compute the polarization and ideology in parallel in all EU countries, which is of great interest to perform comparative studies. Furthermore, the high temporal resolution offered by this methodology enables the possibility of studying the ideology and polarization evolution over time with fine granularity. Finally, the methodology is extensible beyond the EU + UK. In a nutshell, the main contribution of this paper is defining a methodology that allows computing the political ideology and polarization in each of the 28 EU + UK countries at a low cost and high frequency. To the best of our knowledge, there is no prior work that meets these requirements in a large geographical scope such as the EU + UK.

The core of the proposed methodology is the definition of a model to predict the political ideology and the party polarization of a country. To create that model, we have first obtained the data from 234 FB pages belonging to the most important political parties in each of the 28 countries considered in our study between Jan 1. 2019, and Dec. 31. 2021. For each of these pages, we collect all the posts publicly available along with their timestamp, the number of reactions attracted by each post (likes, shares, and comments), and the total number of followers of the page. Second, we study what are the best variables available in our dataset to build the model Third, we compute ideology and polarization metrics with the variables collected using them as a proxy of party vote share for each country, with the possibility of computing them in different time windows. Fourth, we apply a regression model based on the previous metrics between Jan 2019 and May 2019 because we used the May 2019 EU parliament election, where all the 28 countries were involved, as ground truth for training our model. Finally, to evaluate and validate the performance and accuracy of our model, we use national elections of 15 EU countries that took place between 2019 and 2020 and 4 elections run in 2016. The results show that the proposed models of ideology and polarization display good accuracy. This means that they can be used as a good proxy to measure these political parameters meeting the three requirements we imposed. Our findings suggest that the post activity and the comments received in posts are good indicators to predict ideology, whereas the post activity and reactions received in posts are the best features to estimate polarization. The proposed methodology allows monitoring the evolution of the ideology and polarization over time in small intervals.

We also evaluated the performance of the ideology and polarization models to predict parties’ vote share. We obtain an average Mean Average Error (MAE) higher than 8 (i.e., 8 percentage points). As we expected, a model created from a few metrics extracted from FB posts posted on political parties’ pages is not accurate enough to predict parties’ vote share.

The second contribution of this paper is the implementation of a barometer that leverages the output of our model to monitor the evolution of the political ideology and polarization of the 28 countries in our dataset. The barometer has been implemented as an interactive website that displays the ideology and polarization of the political parties and countries over time. The website allows selecting only one party/country or comparing multiple parties/countries in the same visualization. The site also provides aggregated information on the activity and engagement in FB across all parties used in this study. Finally, we adopt an open science approach making the data used to generate the polarization and ideology graphs available to be downloaded and used by other researchers and stakeholders.

To conclude the introduction section, we would like to state that the goal of this paper is not replacing traditional methods to measure political ideology, but propose a novel approach that can coexist with traditional ones and provide other beneficial features (i.e., low-cost, high temporal and spatial resolution, etc.) that may be of interest for studies that require fine-grained temporal data across many countries.

2 Dataset

In this section, we introduce the data collection process and the resulting dataset. We use this information for a twofold purpose. First, to create a regression model to quantify the political ideology and polarization for each of the EU + UK countries based on the information related to the Facebook posts from political parties in these countries. Second, to create an interactive website that implements a Eurobarometer that monitors the political ideology and polarization within the EU + UK countries. After introducing the data collection and dataset, we also present a few interesting insights out of a preliminary analysis of the data.

2.1 Dataset collection and description

We use two different data collection processes to create a complete dataset used for implementing the model that predicts political ideology and polarization. On the one hand, we have used a publicly available repository to retrieve the political ideology assigned to the political parties within the 28 countries covered in this paper. On the other hand, we have collected the information from the FB pages associated with those political parties including the number of followers, the posts published, and the users’ reactions to these posts. Following, we describe in detail each of the two processes.

2.1.1 Political parties and ideology dataset

We relied on an external geopolitical open database to retrieve a comprehensive list of EU + UK political parties for our study along with their political ideology. We did an intensive search and evaluated multiple open databases such as the Global Party Survey 2019 [34], V-Party [35], or ParlGov [36], which provide political information for many countries. We selected ParlGov because it includes the best information for the goal of our research. Overall, the ParlGov database includes 1294 political party entries within the EU and UK.Footnote 1 The main reasons to select ParlGov are: \((i)\) it includes an extensive list of political parties within the EU + UK; \((\mathit{ii})\) it provides the ideology for each political party using the standard left-right scale [37]. This scale ranges between 0 (extreme left) and 10 (extreme right). It is important to note that we assumed the ideology of a party is time-invariant in the period considered in our work; \((\mathit{iii})\) ParlGov provides relevant information about the electoral events in the 28 EU + UK countries. We use the electoral results to train and evaluate the performance of our prediction models.

2.1.2 Facebook parties’ page selection

To create the dataset, the first step was mapping the EU + UK political parties included in ParlGov to their official main Facebook accounts. This was not an easy step since it required carrying out manual verification in many of the analyzed countries. For instance, some parties have multiple official pages that include the main page of the party but also Facebook pages belonging to regional sections of the party, or particular groups within the party such as young associations. In this paper, we only take into consideration the information from the main page of each party because \((i)\) in the vast majority, the main page is the one attracting the most attention in terms of followers, page likes, and reactions to posts; \((\mathit{ii})\) the main page of the party is the one that best captures the political messages the organization wants to disseminate to the overall society, \((\mathit{iii})\) even though in some cases the FB page of the political leader of a party attracts more attention than the main page of the party, we do not consider them for our study, as the leaders of parties could change at any point. For instance, since the last Spanish elections in Nov. 2019 3 out of the 5 leaders of major parties have changed. Altogether, out of the 1294 political parties retrieved from ParlGov in EU + UK, we were able to find the official Facebook main page for 246 of them, out of which 234 have an assigned value on the left-right scale.

Our goal is to represent the highest amount of vote share possible in each country, some small parties may not be represented, but we had made an exhaustive search for every party that represents at least 5% of the vote share in recent elections. Out of the 159 parties with a left-right value and more than 5% of votes in the 2019 EU elections, we collected information from 153 parties. Altogether, the 246 parties received (in average) the 90.45% of the votes in the 2019 EU Elections (see the disaggregated values per country in Table 1). All the countries’ parties we considered represent above 65% of votes, and all but Croatia and Greece above 75%. Overall, our FB dataset is very comprehensive and includes the vast majority of relevant EU parties. The remaining parties within the ParlGov dataset are either very minor parties, inactive, or simply do not exist anymore.

Table 1 Number of posts and political parties in our dataset from each country

To illustrate the ParlGov left-right ideology distribution, Fig. 1 displays the left-right position of each of the 234 parties of our Facebook dataset. Each vertical line represents the ideology of each political party according to the value that ParlGov assigns them on the left-right scale. This figure depicts at a glance the political-ideological distribution of the main political Facebook parties across the EU and UK. We observe that the left-most party belongs to Luxembourg, i.e., Déi Lénk, with an ideology value equal to 0.5263. On the opposite extreme, the two right-most parties are in France (i.e., Rassemblement National) and Belgium (i.e., Vlaams Blok), with values of 9.6854 and 9.6622, respectively. Finally, the number of political parties is quite heterogeneous across the countries in our dataset. In Table 1 we can see that Belgium and the Netherlands include the most political parties in our dataset (12), while Malta has the least (4).

Figure 1
figure 1

Political parties’ ideology distribution in European countries. Each vertical line represents the ideology of each political party

2.1.3 Facebook dataset

We collect the information available within the main official Facebook pages of the political parties. These pages are public on Facebook and anyone can browse through their posts. For each party page on Facebook, we collect \((i)\) the number of followers on the page, \((\mathit{ii})\) the number of likes the page has received, \((\mathit{iii})\) the content (i.e., text, URL links, etc.) of the public posts published by the political parties in their FB page from January 1st, 2019 until December 31st, 2021, \((\mathit{iv})\) users’ engagement for each of the posts. This includes the number of reactions, shares, comments, and video views (in case the content of the post is a video) from the users. In particular, the reactions include multiple options such as like, love, care, haha, wow, sad and angry. We obtain the aggregated number of reactions as well as the specific number for each reaction type. We gather the value of each of the engagement variables for a post at least 48 hours after the post was published. We have observed that the engagement increase mainly occurs in the 48 hours after the post publication in the vast majority of posts, as shown in Fig. 2. Overall, our dataset is roughly capturing the popularity (i.e., followers and likes of the page), activity (number of posts published over time), and engagement (number of reactions, shares, and comments per post) of each political party under analysis. The relationship between these dimensions and the vote share received by a party in elections had been studied on Facebook, with different degrees of success [26, 27, 32], which could suggest that these indicators could be related to ideology and polarization, as the definitions of both ideology and polarization need the party vote share as an input, those definitions will be introduced in Sect. 3.1.

Figure 2
figure 2

Engagement evolution after the publication of a post. The vertical line represents the 48 hours mark

We created an automatic process to navigate through the parties’ pages and scrap the publicly available posts from the 234 FB pages in our list. Our process collected posts information for 3 complete years between January 2019 and December 2021. As a result of this, our dataset includes more than 500k posts across the 234 parties along with the left-right ideology ParlGov assigned to each of them.

2.2 Insights of political parties activity in Facebook

Before moving to the main objective of this paper, i.e., creating the political ideology and polarization model, we briefly discuss a few interesting insights derived from a comparative analysis of the activity of the EU + UK political parties on Facebook. We look into four specific elements: (i) activity overtime referred to as temporal analysis; (ii) activity across countries referred to as inter-country analysis; (iii) activity across political parties within the same country referred to as intra-country analysis; and, (iv) activity per ideology ideology analysis. To carry out some of these analyses we rely on the results depicted in Table 1. This table shows the number of political parties considered in each country, the aggregated percentage of votes those parties obtained in the EU parliament 2019 elections per country, the overall number of posts collected per country, and the metrics associated with two different methods to measure the FB activity of political parties. The first one measures the aggregated activity of all parties by looking at the average number of posts per day published in each country (Country-PPD) along with its standard deviation (Country-SD). The second one measures the average posts per day published by a single political party in each country (Party-PPD) along with its standard deviation (Party-SD).

2.2.1 Temporal analysis of activity

Figure 3 shows the aggregated number of posts published per week across the 28 countries between Jan. 1, 2019, and Dec. 31, 2021. As we expected, the distribution of posts per week is not homogeneously distributed. Instead, we observe spikes and valleys.

Figure 3
figure 3

The total number of posts published. The vertical line represents the 2019 European Elections

We have included a vertical dashed line in the figure corresponding to the last EU parliament election days (23-26 May 2019) that ran in parallel in all the 28 countries analyzed in this paper because at that time UK was still an EU member. We observe that the largest peak happens in the week before the elections May 13-19, 2019. After the elections, the number of posts published reduced drastically.

The second-largest spike happens in autumn 2020, mainly due to the regional elections of Italy on September 20 and 21, with a relevant but smaller contribution from the Czech Republic regional elections on 2 and 3 October. Finally, we observe that Christmas week always shows a decrease in political party activity on Facebook in 2019, 2020, and 2021. We note that Appendix A.1 includes the figures for the number of posts per week for each of the 28 countries of our dataset. We observe the same pattern in the countries holding individual elections at different periods after the European elections.

The temporal activity results demonstrate that, as previously reported in the literature [38], political parties increase their activity before the electoral periods. This implicitly implies that political parties consider Facebook as a relevant channel to disseminate their messages.

2.2.2 Inter-country activity comparison

On average, we have collected 18,940 posts per country. Luxembourg is the country with the lowest number of posts (5144), and Italy is the one with the greatest number (42,399). Similarly, we observe a substantial difference in the intensity level of daily political posts across the 28 analyzed countries. Italy is the most active country with 38.7 posts per day on average, followed by Spain (32.9) and Portugal (24.7). Contrarily, the countries showing the lowest daily activity are Luxembourg (5.3), Malta (5.7), and Lithuania (9.4). The large standard deviation observed in the Country-PPD aligns with the results depicted in the temporal analysis since there are days/weeks with intense activity (mostly before electoral periods) and days of very low activities (e.g., Christmas).

To roughly understand whether the publishing intensity may respond to basic socio-economic parameters such as the population or wealth, we have computed the Pearson correlation between the Country-PPD and the population and GDP per capita of each country. The population shows a rather strong correlation of 0.447 with the Country-PPD, while the GDP per capita is in contrast negatively correlated with a person correlation equal to −0.395. Therefore, high population and low GDP per capita seem to be indicators of a higher activity of political parties in FB.

To understand why a lower GDP per capita may imply more Facebook activity, we have also computed the correlation between GDP per capita and country ideology obtaining a result of −0.37. This roughly means that countries with lower GDP per capita are more biased towards the right. As we will see in Sect. 2.2.4, left parties (a group of parties whose ideology is between 1.5 and 3.5) are less active on Facebook. Therefore, the observed correlation between GDP per capita and ideology may explain why countries with a lower GDP per capita are more active.

In a nutshell, our results show that the FB activity of political parties substantially diverges from country to country when considering the EU + UK.

2.2.3 Intra-country activity comparison

In this subsection, we analyze whether the posts are evenly distributed among the political parties when you consider a single country or instead a few parties are responsible for a large portion of all published posts. We leverage the Party-PPD and the Party-SD depicted in Table 1 to perform the analysis.

In most of the countries, the average publishing activity of parties (Party-PPD) is below 3, with 17 out of the 28 countries with a value between 1.5 and 3. Luxembourg is the country with the lowest Party-PPD with a value of 0.8 posts per day. Contrarily, Italian parties are the ones that publish the most on average, with a Party-PPD of 5.5.

The average Party-SD/Party-PPD ratio across the 28 countries is almost 0.5. The higher this ratio is, the more divergent the publication rate is across political parties in a country. Italy is the country showing the highest ratio (1.3) which means there may be some parties that publish many more FB posts than others. If we look in detail, we find that Lega per Salvini Premier is responsible for 55.95% of all the posts published in Italy in the analyzed period. Portugal and Greece are the countries showing the second and third largest divergence in the FB posting rate among their political parties (0.84 and 0.71, respectively). Contrarily, Estonia, Latvia, Luxembourg, and Austria show a rather homogeneous posting rate across parties with a Party-SD/Party-PPD ratio below 0.3.

In summary, the divergence among the FB publishing rate of political parties within a country largely varies from country to country in our dataset, with Italy representing the extreme case with a very dominant party.

2.2.4 Ideology-based activity comparison

The literature has evidenced the differences in the impact that ideological messages have on users depending on the spectrum of such messages in the left-right scale [39]. In this section, we aim to explore whether we can observe substantial differences in the aggregated activity of political parties according to the ideology assigned by ParlGov. This means, whether parties at a particular part of the political spectrum (e.g., left, central, right) post more messages or not.

Figure 4 displays a boxplot showing per each ideological position the distribution of the number of posts published per party for a particular ideology. We divided the parties into 6 ideological groups according to the left-right scale using the following intervals: far-left \(0 < 1.5\), left \(1.5 < 3.5\) centre-left \(3.5 < 5\), centre-right \(5 < 6.5\), right \(6.5 < 8.5\), far-right \(8.5 < 10\). In the analyzed period, the right parties group is the set with the highest number of posts per party on median (2088). In contrast, the left parties are clearly the group where parties published the least posts on median (1598), which represents an outlier in comparison with the other ideological groups. Finally, Appendix A.2 includes a figure with the number of posts published per week for each of the six ideological groups considered. All the ideologies present similar patterns with an increase in activity around electoral periods.

Figure 4
figure 4

Boxplot of posts published by each party, divided by ideology. The ideology is divided in the left-right scale using the following intervals: far-left \(0 < 1.5\), left \(1.5 < 3.5\) centre-left \(3.5 < 5\), centre-right \(5 < 6.5\), right \(6.5 < 8.5\), far-right \(8.5 < 10\)

In a nutshell, our results reveal that parties on the right-wing spectrum are more active in FB than parties on the left wing.

3 Methods

This section introduces the methodology we propose to predict the ideology and polarization of EU + UK countries using the FB dataset we have collected for this purpose. We organize the section as follows: (i) we briefly introduce the metrics of ideology and polarization used in this paper along with the ground truth data used to train our prediction model; (ii) we list the variables extracted from our FB dataset to be used as input of our model; (iii) we describe our regression model to predict the ideology and polarization metrics.

3.1 Ideology and party system polarization definition

The political picture of a country can be defined in many different ways. Each country has its unique national issues and particularities that are intrinsic to them. However, our goal is to study and compare the European political framework to build an automatic and general predictor of the ideology and polarization that can be applied to all EU + UK countries. However, ideology and polarization are different metrics that require different definitions.

The first one, country ideology, specifies the general ideology of a country, which is known as the ideological center of gravity [40]. In other words, we compute the country ideology as the weighted average ideology according to the portion of votes received by each political party, as follows:

$$ \mathit{II}_{c} = \frac{\sum_{i=1}^{N}\mathit{VoteShare}_{i} * \mathit{PartyLR}_{i}}{\sum_{i=1}^{N}\mathit{VoteShare}_{i} }, $$
(1)

where VoteShare refers to the percentage of votes received by party i in country c, and PartyLR to the left-right value of that party. In this paper, the ideology of a party is defined as a value in the left-right scale between 0 (far-left ideology) and 10 (far-right ideology). We will use the ideological value ParlGov assigns EU + UK political parties.

The second one, polarization, measures the ideological spread of a particular country. There are multiple definitions of political polarization, in particular, in this paper, we use the Party System Polarization which measures the variance in the distribution of parties in a parliament in the left-right dimension. We can find multiple metrics to measure party polarization in the literature. Some authors propose to use the fragmentation of a party system with metrics such as the Herfindahl index or the Laakso and Taagepera index [41] (they are interchangeable since one is the inverse of the other). These indexes use the number of effective parties, reducing the relevance of small parties. We can find more sophisticated polarization indexes that consider the ideology of the parties on the left-right scale. For instance, Sigelman and Yough [42] proposed a metric introducing the ideology of each party. In this metric, the polarization is computed with a quadratic formulation that introduces relatively large polarization changes associated with small ideology variations. Dalton [43] defined an improved index that presented a more balanced scale that facilitates the comparison among countries and the interpretability of the results. Finally, other proposals combine both inputs, the number of parties and the ideology. An example is introduced in [44], where authors use an index similar to Dalton’s taking also into consideration the number of effective parties.

In this work, we choose the Polarization Index (PI) proposed by Dalton [43] since, compared to the state-of-the-art metrics, Dalton’s PI allows us to: \((i)\) do a cross-national comparison with a simple index, \((\mathit{ii})\) consider the parties relevance and their position in a political scale, \((\mathit{iii})\) it is bounded between 0 and 10 as the left-right scale, which simplifies the comparison and interpretability of the results. The Polarization Index proposed by Dalton is defined as:

$$ \mathit{PI}_{c} = \sqrt{\sum_{i=1}^{N} \mathit{VoteShare}_{i} * \biggl( \frac{\mathit{PartyLR}_{i}-\mathit{II}_{c}}{5} \biggr)^{2}}, $$
(2)

where \(\mathit{II}_{c}\) represents the ideology value of the country c (see Equation (1)). Hence, this index allows us to understand the spread of polarization between a group of N analyzed parties. Particularly, a value of 0 means that every party is in the same ideological position, i.e., the same left-right ideology value; and a value of 10 would mean that the vote is split in half in the opposite positions of the ideological spectrum.

3.2 Ground truth

Before discussing how to estimate the ideology and polarization of a country over time, it is necessary to have a reliable ground-truth reference for the ideology and polarization metrics we have defined. The best proxy for our approach is the electoral results. This proxy assumes that the aggregate ideology of a country is linked to the percentage of votes received by the different political parties with different ideologies. Therefore, we use electoral results as ground truth for our model.

Since the main goal of our paper is to create an ideology and polarization prediction model that can be used in any of the 28 countries considered in our work, the best ground truth dataset is the results of the May 2019 EU parliament elections because all the 28 countries run elections in parallel. The EU parliament ballot results have been used as the ground-truth information to train our regression model.

We could have used instead, the results of the 28 countries’ national elections. Unfortunately, in the 3 years considered in our work, not all the countries had national elections. Nevertheless, we will use the results of 15 national elections between 2019 and 2020 to validate the performance of our ideology and polarization prediction models.

As we introduced above, to obtain the country ideology in a given country based on the actual ballot results, we compute a weighted mean taking into account the Vote Share of each party in the election along with their ideology value on the scale of 0-10. Analogously, we use the Dalton formula (see Equation (2)) to compute the polarization ground truth using the electoral results.

3.3 List of Facebook variables considered in the prediction model

Based on the data extracted from Facebook, we can compute a country’s ideology and polarization by using different variables from our dataset. First, we calculate the value of a variable for every party in a country (e.g., the total number of posts published by each party), then, we express those variables as a percentage. Finally, we get an ideology or polarization value for each country using the formulas described in Sect. 3.1, replacing the Vote Share for the percentages calculated. Our dataset includes variables covering three dimensions: popularity, activity, and engagement. It is worth noting that we use those dimensions to replace the Vote Share as previous works have studied the relationship between them [26, 27, 32].

For popularity, we use the number of followers and page likes from political parties’ Facebook pages. To capture activity we rely on two metrics related to the posting activity of political parties: the total number of posts and the posts published per day, which measures the median time difference of two consecutive posts by a party, expressed as the number of publications made each day. Finally, to cover the engagement attracted by political parties we create multiple metrics extracted from the reactions, comments, shares, and video views received by political parties’ posts. In particular, if we consider reactionsFootnote 2 we produce the following metrics per political party: the total number of reactions from all the aggregated posts, the median number of reactions per post, the average number of reactions per post, and the median number of reactions per day or week. We compute the same metrics for comments, shares, and video views as well. We have decided to use multiple time windows (day, week, all-time) as well as the median and the mean in the case of engagement because we were not sure beforehand which metric would work better in the model.

Table 2 summarizes all the variables we have just introduced. These variables represent the set we will explore to create our prediction model, but the model, as we will see later, will not use all of them.

Table 2 List of metrics used to understand, build and compute the ideology and polarization regression model

The methodology to create the ideology and polarization models based on Facebook data and assess their quality is divided into three steps: (i) we first find correlations among the variables included in Table 2 to eliminate those variables highly correlated; (ii) second, we build and train multiple regression models using different combinations of variables and select the one showing the best performance based on the ground-truth data obtained from the May 2019 EU parliament elections. We also analyze the results using different data time windows to understand what is the most appropriate time span to maximize the performance of our model. For the month before the elections, we use a time window for every week, and the rest are every other week except the last one, i.e., the largest time window covers 4 months and 3 weeks from Jan. 1 2019 until May 21. 2019, two days before the EU parliament elections started.Footnote 3 The shortest time window is one week before the elections. (iii) We use the electoral results of national elections to assess the performance of our prediction model.

Finally, we note that we tried to use regularization techniques using the lasso regression [45], which led to similar results, but the performance did not improve overall. In Appendix A.6 we discuss the models and results obtained with Lasso regression.

3.4 Regression model

We introduce the described methodology to build the best-performing regression model to estimate the ideology and polarization of a country using Facebook information from political parties. We first break down and study the correlation between the metrics presented in Sect. 3.3. Then, we display the procedure used to train the model. Finally, we depict the cross-validation methodology that allows for finding the most promising models among the tested ones.

3.4.1 Variables selection

A good variable selection is crucial to building an efficient model and preventing cases of over and underfitting. We study the correlation between every pair of metrics to understand which ones are equivalent and highly correlated. Hence, if we find several correlated metrics, we will only use one of them in the model.

Figure 5 shows the correlation between each pair of ideology metrics considering the whole data period from January 1, 2019, to December 31, 2021. The Figure displays both a heat color map (right upper part) and the values of the Pearson correlation (left lower part) between each pair of metrics in our study. The diagonal represents the correlation of a variable with itself, and therefore, it will always be 1. This analysis will allow unveiling which variables are highly correlated to avoid using them together in the model.

Figure 5
figure 5

Pearson correlation between Country Ideology variables

We find high correlations in the activity variables among them and the engagement variables among them above. In many cases, this correlation is above 0.85. Contrarily, the correlation is generally considerably lower when you compare variables belonging to different types. The max and min correlation between engagement and activity variables are 0.78 and 0.51, respectively. The popularity metrics are the ones showing less correlation among them and with the engagement and activity group. We note that when we consider a different time window the correlation values between pairs of metrics could change. To create our regression models we avoid using together variables that show a \(\text{correlation} \geq 0.60\).

3.4.2 Multiple regression models

The number of samples to build our model is limited to the 27 countries in our research.Footnote 4 For this reason, the regression analysis should not contain many independent variables. We consider a general rule of thumb for the sample size, where there should be at least 10 samples per model variable [46]. Hence, we set to 2 the number of metrics we can use for our model to be representative enough. There could be other metrics that may add value to the regression, however, due to the limited number of samples we avoid using more than 2 variables to prevent our model from overfitting. Therefore, we will construct our final linear regression model (both for ideology and polarization) as in Equation (3):

$$ y \sim \beta _{0} + \beta _{1} x_{1} + \beta _{2} x_{2} $$
(3)

Where y is the predicted ideology (or polarization) value, \(x_{1}\) and \(x_{2}\) are the computed values with the dataset, and β describes the coefficients.

We remind that we trained models using different time windows (as described in Sect. 3.3) for all pairs of metrics with a correlation lower than 0.60. It is worth noting that we eliminated the models where any of the variables have no statistical significance i.e., p-value greater than 0.05 in any of the features or the intercept. Finally, we compute the Variance Inflation Factor (VIF) metric for all of the models ensuring they do not present any colinearity.

In order to discern which are the best models to estimate both the ideology and polarization, it is necessary to perform some kind of validation. For this purpose, we perform a Repeated K-Fold Cross-Validation using the data from the European Elections. This validation allows us to measure the success of our model and pick the ones achieving the greatest accuracy.

To increase the robustness of the cross-validation, we used the following procedure:

\((i)\):

For each country in our dataset \(c_{i}\) (\(i \in [1,27]\)), we include each \(c_{i}\) in a random group \(r_{k}\) (\(k \in [1,5]\)). Having 27 samples, it is not possible to split the data in a uniformly balanced manner, the majority of folds would include 5 samples but some of them would vary in size from 4 to 7 samples.

\((\mathit{ii})\):

Following, to train and cross-validate the models of Sect. 3.4.2, we use \(k-1\) (4 out of 5 in our case) groups to train the models and the remaining k to validate the result and compute the accuracy metrics coefficient of determination (\(R^{2}\)) and Root-Mean-Square Error (RMSE). We take all the combinations in this process and repeat the cross-validation k times. Therefore, first, we use \(r_{k} \in [1,4]\) to train the model and \(r_{5}\) to validate it. Second, we use \(r_{k} \in 1,2,3,5\) to train the model and \(r_{4}\) to validate. Again, we follow this procedure for all the \(r_{k}\) combinations letting aside one \(r_{k}\) for validation. Lastly, we extract the aggregated \(R^{2}\) and RMSE.

To increase the robustness of the cross-validation, the process \((i)\) is iterated 100 times, and each time we include each \(c_{i}\) in a random group \(r_{k}\). After that, the K-Fold Cross-Validation method \((\mathit{ii})\) of the previous paragraph is repeated. Finally, we average the cross-validation results of those iterations, providing insight into how the models work in comparison with each other.

Figure 6 shows the R2 value for all the models cross-validated for the different combinations of variables. The figure shows in the left-side diagonal the maximum R2 value among the evaluated time windows. The right-side diagonal depicts the time window (in months before the elections) that maximizes the R2 for each pair of variables. In Table 3 a summary of the best performing models is shown, including variables used, the value of coefficients and their statistical significance, and the performance of the model (R2 & RMSE), and the optimal time window. In this table, we did not include pairs of variables of the same group, i.e., \(I_{2}\) (posts and reactions_median) is the best performing model using posts and reactions, therefore the model of posts and reactions_mean is not included.

Figure 6
figure 6

Cross-validated linear regression models. (a) Ideology models, (b) Polarization models

Table 3 Best performing models

For polarization models, it is clear that \(P_{1}\) is by far the best model, as it provides a \(R^{2}\) of 0.8, this model uses posts and reactions_median to estimate this metric. In the case of ideology, both \(I_{1}\) and \(I_{2}\) provide a very similar result in both R2 and RMSE, selecting one or another should not impact the performance. Therefore, we selected the \(I_{1}\) model, as its RMSE is slightly lower.

3.4.3 Temporal analysis of the selected models

Even though the models had been cross-validated, it is necessary to explore how the performance of these models changes when the time window varies in size, as we would like our models to work in different window lengths. This analysis is relevant and necessary since our purpose is to depict a time window where our models are accurate in the case at some point we are not able to collect the optimum data period from Facebook pages.

Figure 7 shows the performance of these models for time windows between 7 to almost 150 days, using the cross-validation methodology of the previous section.

Figure 7
figure 7

Selected models with different time windows. (a) R2 (b) RMSE

On the one hand, for the case of the country’s ideology, the \(I_{1}\) model displays the best performance when the time window is between 15 and 75 days. Particularly, \(I_{1}\) provides the lowest RMSE (0.276) with a 0.5 months time window of data. Therefore this model shows a great estimation achievement in that range, with an RMSE lower than 0.31 and R2 greater than 0.72, making it also flexible in terms of data input. Outside of that range, \(I_{1}\) falls to lower accuracy, although the performance would be still acceptable.

On the other hand, for the case of the country’s Dalton polarization, \(P_{1}\) also performs better in time windows between 15 and 75 days, with an RMSE lower than 0.47 and R2 greater than 0.75.

Although the optimal time windows are quite short, we would like to clarify that models with wider time windows for both ideology and polarization also show similar performances, this suggests that the effects of electoral campaigns (such as political parties being more active or people engaging more with political posts) does not affect the performance of the methodology followed. If the models were only to work due to campaign effects, there would be visible a drop in performance in Fig. 7.

In summary, the analysis performed to get the best ideology and polarization estimations in a country by using Facebook data shows that models \(I_{1}\) and \(P_{1}\) behave great with a data time window between 15 and 75 days. These models use only 2 variables as input and their error is very small, the coefficients and the time windows selected are the ones that were shown in Table 3.

It is important to highlight that although we only used 2 variables, adding more features could improve the accuracy and diminish the error of estimations. However, we would need a higher sample size to provide meaningful insights in this term. To the best of the authors’ knowledge, this is the first study that estimates a country’s ideology and polarization with predictive models relying on Facebook data. In the following section, we explore the actual predicting ability of \(I_{1}\) and \(P_{1}\) in real Parliament Elections of 15 individual countries.

4 Results

To evaluate the performance of the selected models, i.e., \(I_{1}\) and \(P_{1}\), we compare their ideology and polarization predictions to the actual ballot results from 15 individual national parliamentary elections that took place in 2019 and 2020. Even more, to further validate our model, we also use four elections happening in 2016 to understand whether the performance of our model is also good even to predict ideology and polarization in the past. If the results are good for both recent elections and past elections, this would mean our models may be a good tool for social scientists, political scientists, etc. to study EU ideology and polarization over long time periods, which would allow monitoring both political variables on a continuous basis, to observe changes in tendencies even between two close elections.

4.1 Model performance evaluation using 15 recent individual country elections

For each of the elections happening in 2019 and 2020, we compute the Ideology and Polarization values as described in Sect. 3.1, using the Vote Share of each party with its left-right value as reported by ParlGov. Then, we apply models \(I_{1}\) and \(P_{1}\) to predict the ideology and polarization of the country running the election, and measure the error of our prediction.

Table 4 and Table 5 show the date of each election and the predicted and real values for country ideology and Dalton polarization together with the error of the predictions. The tables also show the baseline model performance, which uses the previous known ideology and polarization point, i.e., using the results of the previous European election, to test whether the past information is an accurate indicator of either metric. Finally, the last row of the table assesses the model accuracy using the RMSE across the 15 elections under analysis. Overall the results are surprisingly good with an average RMSE of 0.28 for ideology and 0.27 for polarization. That means, our models are properly capturing the global ideology and polarization of a country according to the actual ballot results in most cases.

Table 4 Ideology Prediction results in 15 individual country elections
Table 5 Polarization Prediction results in 15 individual country elections

If we look at the results for political ideology, we observe that our models are quite accurate in most of the cases since the error for \(2/5\) and \(3/5\) of the countries is lower than 0.15 and 0.3, respectively. The higher error happens in Estonia. However, we believe the error is still acceptable since it is slightly deviating to the right from the actual ideology shown by the electoral results. Even though is worth mentioning that the baseline obtains slightly better results, meaning that the ideology of a country stays rather static between two consecutive elections.

We observe similar results for the case of polarization where \(7/15\) countries show an error lower than 0.15 and \(11/15\) lower than 0.3. The worst prediction happens in the UK where our model overestimates the polarization with an error of 0.658. However, the results of the \(P_{1}\) are highly better than the baseline model. In this case, the baseline model is not enough to predict the polarization changes.

Overall, the obtained results demonstrate that our models can be used as accurate proxies to monitor political ideology and polarization in Europe. This means that the information embedded in the activity of political parties on their FB pages can be exploited to produce real-time monitoring of the ideology and polarization evolution in Europe.

4.2 Performance results in prior elections

We have demonstrated that our model is a good proxy to predict political ideology and polarization in a 2 years period. However, we believe our models would be a much more useful tool to provide ideology and polarization metrics if we could prove they are also efficient to predict both metrics prior to the period analyzed. This means social scientists, political scientists, etc. could use the ideology and polarization metrics resulting from our model over long periods of time for their studies.

To carry out this analysis we had to extend our data collection beyond the initial planned period, i.e., January 1, 2019, to December 31, 2021. We selected four countries running elections in 2016: Ireland, Lithuania, Slovakia, and Spain; and one from 2017: Germany and collected the information from the FB pages of their political parties a few months before the elections. We then used \(I_{1}\) and \(P_{1}\) and data prior (0.5 and 0.75 months of prior data) to each election to predict the ideology and polarization in each of the 5 compared countries, and compare the model predictions to the actual ideology and polarization extracted from the ballot results. Table 6 shows country ideology and Dalton polarization resulting from our model predictions and the election results along with the associated error. At the bottom of the table, we show the RMSE for the four elections under analysis.

Table 6 Prediction results in 5 country elections from 2016 and 2017

We observe that the performance of our political ideology model (RMSE = 0.253) obtains slightly better results in these five elections prior (2016, 2017) to the EU parliament elections and the 15 elections run between 2019 and 2020. In the case of the Dalton polarization our model shows a slightly worse performance when considering the 5 elections RMSE = 0.454). Still, we believe it is a rather good estimator to measure the polarization of EU countries in the past if we consider that the Dalton polarization ranges between 0 and 10.

5 EU Political Barometer website

To complete our research work we have built an interactive website, referred to as EU Political Barometer, that allows other researchers, stakeholders, and any other interested person, to easily get access to our research results. Our website displays information about the activity and engagement per political party and country over time as well as the evolution of the political ideology and polarization of the EU + UK countries based on the predictions extracted from our models.

Our website displays the information from Jan. 1st, 2019 until the present day since we keep collecting the public posts from the political parties’ FB pages continuously. That means, our website will show up-to-date information about activity, engagement, ideology, and polarization as well as historical information from Jan. 2019.

We believe our EU Political Barometer represents a valuable contribution to the EU landscape since to the best of our knowledge is the first effort of this nature that aggregates together relevant insights of political activity across the 27 EU member states plus the UK. Even more, we aim to contribute to the open science by making openly downloadable the estimations of our model for ideology and polarization for each of the 28 countries considered in a standard format such as CSV files. This way we aim that anyone willing to use our predictions of ideology and polarization does not need to replicate our whole methodology and especially avoids collecting all the required data. We believe this represents a positive and relevant contribution to the open science effort. The EU Political Barometer is publicly available at https://eupoliticalbarometer.uc3m.es.

Next, we briefly describe a few dashboards available on our website that directly relates to the content of this paper. It is important to note that the website includes other elements beyond what we cover in this paper.

  • Ideology & Polarization: In this dashboard, the user can go through several panels covering the ideology and polarization of a country. The models resulting from our research allow us to estimate the ideology and polarization metrics of a country at any point in time. In addition, the flexibility of our solution allows using different time resolutions to visualize the results. Figure 8 shows an example of the Ideology & Polarization panel, where we observe the evolution of Italy’s polarization across different months. The figure includes two lines, one for the predicted values, and the other for the EWMA (Exponentially Weighted Moving Average), as is necessary to consider the small sizes of the time windows of the selected models. This dashboard includes another very interesting panel (Fig. 9) where we plot in a two-dimension plane, where the x-axis refers to ideology and the y-axis to polarization, the results for each country denoted using their flag. Users can use a time bar to visualize the results at a specific time or can play a video that shows how the ideology and polarization of each country evolve over time. We also provide a panel that includes choropleths showing the ideology and polarization in maps of Europe (see Fig. 10).

    Figure 8
    figure 8

    Dalton Polarization in Italy

    Figure 9
    figure 9

    Political positions of European Countries

    Figure 10
    figure 10

    Political Ideology of European Countries

  • Dashboard per country/party: We have a specific dashboard that allows the user to select any country or party they are interested in. For the selected country/party the dashboard displays multiple panels depicting (among other things): ideology, polarization, activity, and engagement over time. The user can zoom in to look into detail at a specific time window. In the case of countries, the figures represent the aggregated results across all political parties. As an illustrative example, Fig. 11 shows (for the case of Austria) one of the panels included in the website that represents the portion of posts contributed by each political party over time.

    Figure 11
    figure 11

    Percentage of pots published by each party in Austria

    In general, the feature of breaking down the data per country/party provides the user with much practical information about the political landscape of that particular country/party.

  • Comparison dashboard: We developed a specific dashboard for comparing countries and/or political parties. This is a very interesting functionality since it allows comparing in a single figure the difference (over time) among two or more countries/parties across different metrics. As an illustrative example, Fig. 12 shows the selection panel where we have choose three far-right parties in Europe: Rassemblement National (France), Vox (Spain), Alternative für Deutschland (Germany). In turn, Fig. 13 shows the website panel comparing the activity (posts per week) for those three political parties.

    Figure 12
    figure 12

    Selection of parties to compare

    Figure 13
    figure 13

    Post-activity of RN, Vox and AfD

  • Country stats dashboard: We also included a dashboard where the user can see the statistics and information about our dataset. It displays the number of posts, reactions, parties, followers, and other parameters related to the dataset of a particular country, and also the aggregated values of all countries.

To sum up, this section has introduced another major outcome of our research, the EU Political Barometer. This website leverages the output from the methodology described in this paper to obtain the estimations of the political ideology and polarization, which represent one of the most relevant insights of the barometer.

6 Discussion

Our results suggest that the ideology of a country can be predicted through a simple model that combines activity and engagement from political parties’ pages. However, it is important to acknowledge that scientifically explaining the causal relationship embedded in any model using social media to predict political outcomes such as ideology, polarization, vote share, etc, is a very complex task that requires a complete research work that goes beyond the scope of this paper. Although we cannot provide a scientifically proven causal link between activity, engagement and ideology, we believe it is important to include a speculative discussion to describe the complex elements embedded in a social media platform such as Facebook and roughly explain how they may be linked to the observed correlation in our results.

6.1 Social media importance in political parties communication

There is a consensus that in the current society social media is an essential channel in political parties communication and more and more political actors use social media channels to directly convey their messages to the citizens [47, 48]. Among other things, social media allows political actors to communicate directly with citizens, circumventing the influence of traditional news outlets that could potentially shape the content of the messages they would like to project [49]. In addition, a recent report from Reuters on Digital News [50] indicates that global news interest has declined from 63% in 2017 to 51% in 2022. This decline has not been equally distributed across channels. The report indicates that social media news consumption is stagnant while traditional media news consumption is declining. Overall, we can safely state that social media platforms have become an essential part of citizens’ news consumption [47, 50].

Under the described scenario, it seems important for parties to be active in social media so they can convey their messages to the citizens as well as define strategies that allow them to increase as much as possible the engagement they attract through that activity. However, to the best of the authors knowledge no one has a master recipe to guarantee a successful strategy.

6.2 Strategy and newsfeed algorithms

The participation of political parties on social media has the final objective of impacting the vote share results in elections. Which could be accomplished with different electoral goals, such as mobilizing supporters or persuading non-supporters [51]. Any of these strategies have a common ground in order to be successful, posts have to reach as many users as possible in order to achieve an efficient outcome. In other words, political parties aim is that the posts they publish become viral posts and reach a large number of users. Parties can follow different strategies to try to have a high impact on social media. However, the virality of a post is a very complex phenomenon that depends on multiple factors: social media platform algorithms, users proactively sharing the post, etc. Although parties are responsible for some aspects to achieve such virality, there is a fundamental factor that is out of control of the political organizations such as the social media platform algorithms.

These algorithms are specialized to deliver content with the objective to retain the user in the platform as much time as possible. This implies that the selection of posts is personalized for each user, which is done by using existing data of the user that the site had already collected. This means that political parties’ posts are competing with all kinds of content: influencers, celebrities, sport stars, news outlets, companies, etc. To the best of our knowledge there is no political party able to reverse engineering the algorithms in order to exploit them in their favor. Therefore, a big part of the engagement success of a post is out of the control of political parties.

Social media algorithms, including Facebook ones, are opaque and very complex. They combine many different factors to decide what posts may be relevant for a given user. The referred factors are used to compute a score for every possible post that can be delivered to a specific user. The posts with the highest scores will be displayed in the user’s newsfeed. Facebook vaguely clarifies how their algorithm works and unveils some of the factors their algorithms take into account [52, 53]. Some relevant factors are: (i) how often the user interacts with the posts of the account, group, or page, (ii) the popularity / engagement of a post: a post is more likely to appear if it has a high number of comments, reactions, or shares, (iii) how the user interacts with the same type of post (there are different types, such as text, image, video, link posts), (iv) how recently the content was posted. Finally, according to the available information Facebook avoids showing similar items to a user in a row. This means it is unlikely that a user receives two consecutive posts in their newsfeed from the same account.

Although it is clear algorithms are out of the control of political parties, they still have some room in order to maximize the chances to impact as many users as possible with their posts. In particular, based on the factors driving Facebook algorithm, political parties are responsible for: (i) increasing their user base, (ii) manage their posting frequency (i.e., activity), (iii) creating interesting content to increase the interaction of the users with their posts (i.e., engagement). We next briefly discuss each of these three elements.

6.2.1 Fanbase

The fanbase of a political party is important to increase the impact of posts, as the number of followers is a proxy of the interest of users in a particular political party. Having more followers increases the chances of having your post delivered to more user feeds, which in turn will very likely improve your engagement. We validate this hypothesis using our dataset. Figure 14 represents the number of followers (x axis) Vs. the median number of reactions per post and the maximum number of reactions achieved by each party in their most viral post (y axis) on a logarithmic scale. The results show that the engagement of a political post is highly correlated to the number of followers of the political party on Facebook.

Figure 14
figure 14

Number of followers and reactions by each party. (a) Median reactions per post (b) Maximum amount of reactions in a post

6.2.2 Activity

The political actors have control over their posting activity on the platform, as they can decide how frequently they post content on social media. There are two factors considered by Facebook posts selection algorithm that are relevant to make a decision: (i) Facebook algorithm will very rarely deliver consecutive posts from the same account to a user, (ii) Facebook will privilege recent posts over old ones.

On the one hand, publishing a post every few minutes does not make sense, as the newsfeed system will not promote the majority of posts to the pages’ followers. Therefore, a very high posting activity will lower attention per post. This effect can be seen near elections, as parties are more likely to increase their post rate, and we have observed that even though the total engagement increases, the attention per post is much lower. In addition, this effect can also be observed in individual parties like in the Lega per Salvini Premier in Italy, which is the most active party in Europe (and the most followed Italian party). However, their average engagement per post is lower than some other parties with a much smaller fanbase in Italy. They are the third and fifth party in terms of mean comments per post and median reactions per post respectively out of the seven Italian parties. Said that, their total engagement is much higher than the other parties.

On the other hand, having a very low posting rate (e.g., one post per week) is also inefficient since Facebook newsfeed algorithm privileges recent posts.

In a nutshell, political parties should apply a reasonable tradeoff and avoid very high or very low posting patterns.

6.2.3 Content of interest

Another aspect which is very relevant to the success of a post is the content itself. The more interesting the content for the users the more likely it is to obtain a large engagement. If users start liking, sharing, commenting, etc. on a post the algorithms will understand it is a relevant post and will very likely start delivering it more frequently in other users’ newsfeed. In other words, posts including content of high interests are much more likely to generate more engagement.

In a nutshell, while political parties can implement actions in order to improve the likelihood to increase engagement, they are still subject to the social media algorithms that ultimately decide which posts are delivered to which users.

6.3 Filter bubbles

As we have discussed before, posting a high amount of content is not beneficial to the average amount of engagement, even though it is possible that the feed system can produce filter bubbles, which is a term that denotes the limitation of information that a user receives on a certain topic [54]. A user can fall in a political information bubble if their friends share a big proportion of posts from a particular side of the political spectrum, even with low engagement these posts can still be seen in the feed and have an impact on the user. This exposure to this type of content, which in some cases may include news, could then affect the user’s vote choice in elections towards that side [55]. On a related note, posts with news articles shared by friends are seen as more trustworthy than if the same article was received by the news outlet [56], reinforcing the idea of the formation of a political bubble or echo chamber.

6.4 Limitations

Performing a comprehensive analysis of the impact of posts would require knowing how many times a given post has been displayed in users’ feeds. Borrowing the online advertising concept, the number of impressions of a post. The literature of online advertising has demonstrated that the more times you impact a user the more likely it is to influence them [57]. If we extrapolate that to political communication it would mean that showing your posts frequently to a given user may increase the possibility of influencing them. Therefore, a complete picture of the success of political parties posts in Facebook would require knowing the number of impressions. Unfortunately, this is something that only Facebook can do. An external researcher cannot obtain that information because it would require monitoring in real-time the posts displayed in all the newsfeeds.

Another metric that could also help to obtain a better understanding on this topic is the number of unique accounts interacting with parties’ posts, i.e., whether the engagement of the posts of a party always come from the same set of users or it varies depending on the content. If the posts of a party are always shared by the same people, their impact may be much lower that it seems.

In this paper, we have used instead the concept of engagement understood as the number of interactions of users with the political parties posts, which is something we can collect from Facebook. Therefore, our model uses the engagement as a proxy of posts success, which we believe is something reasonable but not complete.

6.5 Summary

Our model suggests that high activity and high average engagement per post (i.e., mean comments for ideology and median reactions for polarization) leads to a higher vote rate that ultimately implies higher influence of that party in the ideology or polarization of a given country. Someone could then assume that based on our model if a political party suddenly increases their activity on Facebook, it would imply that in the short/medium term it will obtain better electoral results and would have more ideological influence in the country. However, it is important to clarify that as our results show that a higher activity correlates with a lower engagement per post. This means that increasing the activity negatively affects the engagement variable included in our models.

Therefore, what our model actually suggests is that the success of political parties requires a trade-off between activity and engagement per post. While political parties can manage the former part of the equation they cannot control the latter part, which very much depends on the social media algorithms as discussed before.

In summary, the lesson we extract from our results is that: (i) it is important that political parties are active in social media up to some extent. It is better publishing four posts per day than one post per month, in order to increase your engagement, but this is not enough; (ii) political parties need to create interesting posts that capture the attention of the users to increase the interaction with them and maximize the likelihood that social media algorithm understand this is a relevant post and display it more frequently in feeds; (iii) finally, in this same line, it is important that political parties implement actions to increase the number of followers, page likes, etc. which again may help to increase the engagement.

7 Related work

There is a vast literature covering political ideology and polarization. We will organize the related work into subsections to better describe the different works relevant to our paper.

7.1 Traditional methods

The majority of studies about political ideology and polarization rely on the traditional methodology of conducting surveys and evaluating the responses of interviewees. As stated in Sect. 1, researchers have used this methodology to analyze different socioeconomic parameters associated to political ideology and polarization [16, 810, 12, 15, 17, 19, 20].

Researchers have also tried to infer the political ideology of users and locations from surveys and polls. For instance, Wright et al. [58] collect several polls run by CBS News and the New York Times to create a base of 76k respondents to measure the ideology of US states. Even in that research, the authors acknowledge the need for massive data collection to achieve better results. For this reason, they propose to merge several polls’ information. More recently, Chirumbolo et al. [59] tried to estimate the ideology of 517 individuals who answered questions about their personality, ideology, and voting information. They then associate personality with different kinds of political ideology.

Although surveys are a very powerful tool to perform ideology/polarization analysis in well-defined and bounded human groups (e.g., country, region, etc.), they have inherent limitations due to the complexity, time-consumption, and cost of widely expanding them when for instance someone wants to cover many different countries. That is why there are only a limited number of efforts such as the World Value Survey [23] covering a large number of countries based on surveys. Our work contributes a methodology that infers political ideology and polarization at a very low-cost, low complexity, and high time granularity across many countries, which has been proven accurate.

7.2 Social media

For a long time conducting surveys was the only way to measure the opinions and behaviors of users. Social media has become a powerful academic research tool. It avoids some of the limitations of traditional research: manual data entry, high costs, and time-consuming [60]. Extracting information from social media could also provide a broader spectrum of data that is not feasible to achieve with conventional surveys. Therefore, social media data have been used for multiple purposes related to our work.

7.2.1 Electoral results and popularity

We can find many works in the literature using social media data to predict electoral results. Most of these works rely on Twitter data. The work of Tumasjan et al. [61] is the first study to use Twitter to predict electoral results. They computed the number of tweets mentioning each party to estimate the 2009 German federal results. However, this research attracted criticism [6264] upon issues such as the data collection procedure or the time window selected for their analysis. Another study used sentiment analysis [65] to show how tweets correlate to traditional political polls. Zohu et al. [66] used a dataset of 110M tweets from 6.3M users and build several AI models to predict the outcome of the 2019 Argentine elections. Aurav et al. [67] and Bermingham et al. [68] also use the volume of tweets mentioning a candidate before elections to estimate the outcome. Besides, DiGrazia et al. [69] found a high correlation between the number of tweets referencing a politician and their election results.

Researchers have also used Twitter to perform a sentiment analysis on political questions. There are few works [7072] that have rely on sentiment analysis to estimate the number of seats in the 2015 Delhi, 2015 UK, and 2017 Punjab elections, respectively. Similarly, the sentiment analysis on Twitter have been used to predict the results for the 2012 US [73], 2016 US [74], 2018 Chilean [75], 2019 Indonesian [76], and 2020 US [77] presidential elections.

Beyond the prediction of electoral results, there are works [78] where the authors try to understand the popularity of political candidates in the Italian 2011 and French 2012 elections.

Although most of the works around electoral results are based on Twitter we can find few efforts using other platforms. For instance, in [79, 80] the authors used the volume of online searches using Google Trends to predict the outcome of US elections.

On Facebook, the relationship between the variables chosen in this article (i.e., posting activity, post engagement, and popularity) and the vote share of a party is a topic that has been studied in the past, e.g., Triantafillidou and Lappas [26] studied the 2019 general Greek elections in which the votes received was significantly related to different engagement measures (reactions, shares, and comments). Other studies had studied whether the followers of Facebook pages are related to the vote share obtained in elections, Cameron et al. [27] studied this phenomenon and they found a significant relationship between these variables, even though the effect seems to be small. The posting activity of politicians and its effect on vote share was studied by Hsin-Chen Lin [32], even though it seems that the number of posts in politician accounts is not significantly related to the vote share.

Our work differs from previous ones in two facts. First, we differ from them on the goal since we do not aim to predict electoral results but to obtain the ideology and polarization of a country. Second, all the introduced works focus on a single country whereas we are more ambitious and try to create a model valid across many countries.

7.2.2 Political ideology and polarization

More related to our work, social media has been also used to study the ideology of individuals and political messages. We can find works using both Twitter and Facebook data for this purpose

Barberá et al. [81] used Twitter data to analyze the communication and engagement of users to individuals with similar ideologies. Preoţiuc-Pietro et al. [82] relied on Twitter data to examine whether a user could be classified in a left-right scale by their posts published on Twitter.

Using Facebook data, Cecobelli [38] analyses the difference in the messages of political leaders from 18 countries. The results show that they tend to increase their post frequency and post content in the electoral campaign period. This is aligned with our activity analysis results presented in Sect. 2.2.1.

Ernst et al. [83] use data from Facebook and Twitter (845 Facebook posts and 555 tweets) from 88 politicians in 8 different countries. They found that populist ideas tend to compose the political strategy of politicians in the extremes of the left-right spectrum, more often than centrist politicians deliver that kind of message.

Bond et al. [84] collected data from 6.2M individuals who liked two or more verified Facebook pages from politicians. They build a model measuring the ratio of likes and fans on politicians’ pages and estimate the ideology. They provide a score of each politician’s page and estimate users’ ideology by considering an average of this score. Later, they validate the model against two sources: 20k+ users who disclosed their political ideology on their Facebook profile, and a survey conducted by over 20k+ users of their dataset, asking them to classify themselves in 5 ideological points. Finally, among other questions they explore, they find that women tend to be more liberal, and married couples correlate best in ideology.

Finally, Praet et al. use Facebook “likes” on pages as a tool to evaluate the political ideology of individuals, as likes on non-political Facebook pages could lead to identifying the ideology of a person [85], and also to study whether political polarization extends to lifestyle preferences [86].

Overall, our work contributes to the literature by proposing a new methodology to measure political ideology and polarization using Facebook data. In contrast to some of the works introduced above, we are not interested in measuring the ideology and polarization of an individual but in a country. Our approach allows the analysis of ideology and polarization indicators in a fine-grain time-continuous manner since we benefit from using information publicly available to anyone in any country. Moreover, our model focuses on a broad list of 28 countries (EU + UK), in contrast with previous works which focus on one or a reduced number of countries. We provide an easy way of doing cross-national analysis since the data necessary to estimate these measures is language-independent and only relies on posts’ indicators. The outcome models, both for ideology and polarization, work together in any of the 28 countries at any point in time. To the best of our knowledge, this is the broadest modeling of political ideology and polarization in terms of countries using social media data. As discussed in Sect. 4 the estimations are accurate to give an insight into the political situation and the evolution of each country.

8 Conclusions

This paper leverages the public posts political parties publish on their main Facebook pages to estimate the political ideology and polarization across 28 countries, the 27 EU member states, and the UK. To this end, we have created a model for ideology and a model for polarization based on the activity of the political parties, the engagement they achieve with their posts, and the popularity of their Facebook pages. We have used the EU parliament elections held in May 2019, where all the 28 considered countries were involved, to build our model. We validated the accuracy of our model using 19 elections, 15 happening in the years 2019 and 2020, and the remaining 4 in 2016. The results show a very good performance of the proposed models leading to rather small errors compared to the ground truth derived from the actual ballot results. To the best of our knowledge, there is no prior work in the literature offering a solution that measures the ideology of all EU countries + UK countries.

Our methodology provides a few advantages over traditional methods meeting the requirements pursued in the design of this research: (i) it is a low-cost option. (ii) It allows measuring ideology/polarization at a high-resolution time scale. (iii) It works across 28 countries. Using traditional methods such as surveys, interviews, or polls to measure the ideology/polarization across 28 countries at a high time resolution (e.g., every week) is an unfeasible option due to its complexity and cost.

Therefore, our work delivers a novel tool that uses social media information to produce specific country metrics that can be used for further purposes. For instance, political ideology and polarization can be used by political scientists, social scientists, economic researchers, etc. as another metric in their studies to correlate ideology/polarization to other socio-economic metrics. In addition, our models have the advantage that they can estimate the ideology and polarization of a country as long as we can get access to roughly one month of public posts from the political parties’ FB pages within that country. This brings two important benefits: (i) we can monitor the evolution of the ideology/polarization at high time resolution (e.g., one week), which allows understanding tendencies in the ideology/polarization over pre-defined periods and may allow inferring whether specific events in a country are impacting on any of the two variables; (ii) since the public posts in FB pages of political parties are typically available from several years ago, our model allows obtaining estimations of the political ideology/polarization in the past. In many cases, researchers are interested in studying the impact of specific past events, and our methodology may be useful in this case as well.

Although our methodology is superior to traditional surveys for some features, we want to acknowledge upfront that traditional methods to measure ideology and polarization are still very relevant. Surveys are very useful to perform deep analysis over specific aspects related to ideology/polarization compared to our methodology. In summary, our research does not aim to invalidate traditional methods, but to coexist with them, and offer a new tool to the research community that may be useful for specific analyses that cannot be covered by surveys.

Moreover, we wanted to make the results of our work accessible and available to the research community, relevant stakeholders in the EU, and general individuals who may be interested in politics. To this end, we have built an interactive website referred to as EU Political Barometer. The site can be found at the following address: https://eupoliticalbarometer.uc3m.es. This website includes very detailed information about the activity and engagement of the 234 political parties in our dataset as well as the aggregated information per country from Jan 1, 2019, until the present. In addition, we also include the temporal evolution of our model estimations for the political ideology and polarization per country in the same period. Users accessing our website can easily select the country or political party of their interest and play with the temporal resolution they want to visualize. Therefore, one of the main advantages of this site is the possibility of displaying up-to-date historical data with a very small resolution span of one week. The results associated with multiple political parties or multiple countries can be visualized together to easily visualize differences among them. The website offers more information and functionalities that go beyond the scope of this paper. Our ultimate goal is to contribute to the open science initiative and allow other researchers to use our ideology and polarization estimations for their work.