The analysis we present is based on the responses given by the 4.2 million users of the StemWijzer edition for the Dutch parliamentary election in 2010. The Netherlands is a suitable case for analysing the method of calculating matches, because of the widespread use of VAAs and the multi-party system. If there are only two parties, the method will likely only have limited effect, but with a multitude of parties the effects are presumably larger. Moreover, countries with a multi-party system have traditionally been frontrunners in VAA use, exactly because voters need to weigh so many alternatives. Therefore, it is most crucial to observe design effects in such cases. StemWijzer is the most widely used VAA in the Netherlands and thus provides a rich data set to explore this issue.
The 30 items included in the 2010 edition of StemWijzer are listed in Appendix A. The log files of the online application contain information on the positions taken on each statement as well as the extra weight allocated to each statement by users, as well as the party positions. This allowed us to compare the results of different methods to calculate the match between voters and parties.
Answer profiles that only contained missing answers (‘skip this question’) were excluded. In addition, we excluded about 30 000 recommendations because we suspect that these were computer-generated. These represent three cases where one single IP address requested advice thousands of times giving exactly the same answers to the statements. Next, we took a random sample of 10 000 cases for further analysis, because analysing the full data set would be too memory-intensive. Although this may introduce some error, a sample of this size will yield results that are almost certainly extremely close to what would have been found using the full data set. Most users in our sample (90 per cent) answered all 30 statements and very few skipped more than five questions (1 per cent). A majority (77 per cent) made use of the opportunity to select statements that they considered particularly important (mean=5.1; standard deviation=4.7).
The 2010 edition of StemWijzer by default included the 11 political parties that were represented in the Second Chamber of the Dutch parliament at that time. Voters had the opportunity to deselect any of these parties, while they could also select any of six additional parties for which the application included data. The log files do not contain information on which parties were (de)selected by each user, but the low number of recommendations for parties that were not included by default (2 per cent) suggests that only few people used this possibility. Because we are interested in the effect of VAA aggregation procedures on the voting advice for users, we opted to include the default selection of parties in our analysis. This presumably most truthfully reflects the advice a majority of users received.
Alternative methods of calculating voting advice
We implemented eight methods of calculating voting advice on the basis of the spatial models and metrics discussed in the preceding paragraphs: (a) high-dimensional agreement method (the method that StemWijzer used in its 2010 edition), (b) high-dimensional city block distance method, (c) high-dimensional Euclidean distance method, (d) one-dimensional model, (e) two-dimensional model, (f) three-dimensional model induced from parties’ answers to the statements, (g) three-dimensional model induced from users’ answers to the statements, and (h) seven-dimensional ‘spider’ model. The implementation of the first three methods is relatively simple (see Appendix B). One calculates the agreement or distance between the answers of a particular user and those of each party, ignoring missing answers on the part of the user. Extra weights allocated by users increase the agreement or distance between users and party. For the agreement method this results in an agreement score, which is highest for the ‘best match’. The city block and Euclidean distance method result in (weighted) distance scores, which are lowest for the ‘best match’. We included the extra weights voters could put on statements in these models (as this has been standard practice in StemWijzer and other VAAs), but our findings would have been similar if these weights would not have been taken into account.
The one-, two-, and multidimensional models require a method of combining items into issue dimensions. For the one-dimensional model this is a matter of determining the direction of each statement: does agreeing imply a left-wing/progressive position or a right-wing/conservative position? We determined this on a priori grounds and checked the homogeneity of the resulting policy dimension using Loevinger’s H. When looking at the answers of the political parties, this resulted in a scale with H=0.37, which is low but acceptable. For the users, the homogeneity coefficient is very low (H=0.07), which indicates that for voters a one-dimensional approach is insufficient. Nonetheless, we include this method to demonstrate what its adaptation would mean. Moreover, some scholars have argued that a one-dimensional left-right model is suitable in the context of electoral choice (Downs, 1957; Van der Eijk and Niemöller, 1983).
The two-dimensional model closely follows the method of Kieskompas. The model consists of a socio-economic left-right dimension and a progressive-conservative (GAL/TAN) dimension (Marks et al, 2006) and each statement has been assigned to either dimension (see Appendix A). The resulting scales are not very strong with H coefficients of 0.20 and 0.40 respectively for the parties’ answers, and very weak H coefficients of 0.06 and 0.07 for the user data. Similar results concerning the strength of the Kieskompas model have been found by Otjes and Louwerse (2011, pp. 10–13), who analysed the items in this VAA.
The two-dimensional model has been constructed by selecting the relevant policy dimensions a priori. However, one could argue that first relevant political issues should be selected and that the appropriate spatial model should be induced from the patterns of (party or voter) answers given to these statements (Otjes and Louwerse, 2011). We may thus find out inductively that parties’ answers to the statements can be captured well by a one-dimensional or two-dimensional spatial model. This type of model was fitted using classical multidimensional scaling (MDS). This method uses a Euclidean distance measure between actors (parties or users), based on their answers to the VAA statements, and tries to find a low-dimensional approximation of those distances. We applied this method in two ways, namely once on the basis of party positions and once on the basis of voter positions. The degree to which a low-dimensional model accurately represents the distances between parties is measured by Kruskal’s Stress-I statistic. Stress levels below 10 per cent are considered acceptable. For the data set of party responses to the statements a three-dimensional solution was found to be acceptable (Stress=4.37). The next step was to determine to which dimension each statement was connected, which was determined by regressing parties’ answers to each statement on these three dimensions, a technique called property fitting. We included an item in the dimension that provided the highest beta coefficient for that particular item, provided the R2 was larger than 0.30.Footnote 8 In this way 29 out of the 30 items were included in one of the scales (see Appendix A). We use additive scales to construct the model, because this reflects most closely how other VAAs, such as Bússola Eleitoral and Kieskompas, construct their spatial model. An additional advantage of this method is that for users it is more transparent than more sophisticated techniques such as factor analysis or MDS. The resulting scales had high H values of 0.71, 0.75 and 0.54, respectively (based on parties’ answers). However, when applied to the users’ answers these scales are not very strong (H=0.09, 0.06 and 0.15).
In an alternative specification users’ answers to the statements were used in an MDS analysis. Thus, instead of a ‘party space’, a ‘voter space’ was constructed. Answer patterns of users proved to be more erratic than those of parties: a three-dimensional solution had a stress level of 30 per cent, but including more dimensions only reduced this level very gradually to 13 per cent for a 10-dimensional model. For reasons of clarity and comparability, we decided to stick to a three-dimensional model in this case as well, despite the poor fit. After all, the logic behind a low-dimensional model of party positions is to provide users with insight in the different policy stances of parties – presenting a 10-dimensional model would destroy this objective. Three dimensions have also been induced by Aarts and Thomassen (2008) based on their analysis of voters’ evaluation scores of political parties. The H coefficients for the three dimensions were 0.29, 0.08 and 0.13, respectively, which is (somewhat) better than the H values (for voters) obtained from the party space, as one might expect. Still, the scalability of these items is low. To determine the resulting voting advice from the multidimensional models, the distance between users and parties were calculated as (unweighted) Euclidean distances.
The last method, a seven-dimensional model that reflects the spider diagram, has been implemented by assigning issues to one of the seven categories that were used in the EU Profiler’s spider diagram (see Appendix A; Trechsel and Mair, 2011, Figure 6). The assignment was based on a priori grounds and checked using the homogeneity coefficient H. For the answers provided by the parties, the H values for each of the seven issue dimensions were over 0.3, except for welfare state politics. Although the coefficient for this category could be improved by changing the direction of some of the items, this would run contrary to the substantive meaning of the category. The a priori approach seems to fit most closely to how smartvote and EU Profiler construct their spider models (note that smartvote uses eight categories, but the principle is the same). Users’ and parties’ answers to the statements were recoded and summed up, so that a score of 100 on a particular axis indicates complete agreement and 0 indicates complete disagreement. The total distance between voters and parties was calculated as the city block distance, which seems to fit most closely to the way a spider diagram represents the voting advice.
Three measures to compare the models
We use three different measures to compare the advice stemming from the alternative models. The first measure focuses on the party that provides the best match and indicates how often two given methods provided the same ‘best match’. If an individual received the advice ‘Freedom Party’ (PVV) using the city block metric as well as the Euclidean metric, this constitutes a full match. There are also cases where the ‘best match’ was a tie between two or more parties. When at least one party was among the best matches in both methods, this was regarded as a partial match. All other cases were treated as ‘no match’. Although a benchmark for this measure is somewhat arbitrary, one could argue that we should be able to observe at least two matches versus one mismatch. This would correspond to 67 per cent (full) matches.
The second measure focuses on the degree of match between the user and each individual party, thus looking broader than only which party provided the best match. To capture the similarity of the advice, we calculated a correlation coefficient between the match scores of two methods for each individual party. For example, if there is a perfect linear relationship between the Labour Party scores according to the agreement method and the city block method, the correlation coefficient equals one. To estimate the overall similarity between two methods we take the means of these correlations across the 11 parties. These average correlation coefficients should be rather high, given the fact that the various methods are all based on the same data and after the same outcome. Correlation coefficients of 0.7 or higher, which corresponds to roughly 50 per cent explained variance or more, should be achievable.
The third measure concerns the number of times that each party was recommended at the aggregate level (where tied recommendations are divided between the parties concerned). Some methods may divide the recommendations more evenly across parties, while other methods may favour specific parties. Furthermore, it is possible that specific parties ‘benefit’ from a particular method. In contrast to other studies (Kleinnijenhuis and Krouwel, 2008; Walgrave et al, 2009) we do not take the election result as the ‘gold standard’ for comparison. After all, the aim of a VAA is not to predict or mimic the election result, but to inform voters about their substantive policy match with parties. Voters may well decide to vote on other grounds. Nevertheless, a comparison between the number of recommendations and the actual number of votes may be considered interesting, because it indicates to what extent the electorate supports parties that most closely represent their views on a wide range of policy issues. More importantly, if we find large differences between the proportion of recommendations for particular parties across the alternative methods, this provides clear evidence for our hypothesis that the method used to calculate the advice matters.