Introduction

Political efficacy is defined as “an individual’s perceived ability to participate in and influence the political system” (Yeich & Levine, 1994, p. 259). It is usually conceptualized as two-dimensional, with interrelated but distinct dimensions (Balch, 1974): External political efficacy refers to the belief that the political system is responsive to citizens’ demands (Balch, 1974, p. 24); internal political efficacy refers to “an individual’s perception of her/his abilities to execute political actions […]” (Sohl, 2014, p. 42). Internal political efficacy has been argued to be an important psychological predictor of political behavior (Bandura, 1997; Campbell, Gurin, & Miller, 1954, p. 187). Empirical research supports this prediction in regard to different forms of political behavior, such as participation in elections (Gallego & Oberski, 2012, p. 437), political protest (Chang & Chyi, 2009), and other forms of political action (e.g., Krampen, 1990; Vecchione & Caprara, 2009).

Assumptions about the psychological underpinnings of self-efficacy beliefs can be drawn from the social cognitive theory (SCT) by Bandura (1991). The theory assumes, that people are capable of exercising control over their actions through self-reflection and self-regulation. Within the different mechanisms necessary for the exercise of control, self-efficacy beliefs play a central role: They are judgements of personal capability regarding some specific desirable behavior (Bandura, 1997) and influence a person’s motivation, perseverance, performance, and subsequent consequences of the performance (Bandura, 1991).

The conceptual similarity between Bandura’s (1997) self-efficacy concept and the concept of internal political efficacy is self-evident. However, both strings of literature have developed rather separate from each other. Only recently, Caprara, Vecchione, Capanna, and Mebane (2009) have developed a self-report measure of perceived political self-efficacy (P-PSE). In contrast to previously existing measures (e.g., Campbell et al., 1954; Niemi, Craig, & Mattei, 1991), it was developed based on the self-efficacy concept of the SCT framework by Bandura (1991, 1997) and follows the main principles for the construction of self-efficacy measures set by Bandura (2006), like the inclusion of different relevant and measurable tasks as benchmarks of successful behavior.

We aimed to extend this line of work by providing a German version of the P-PSE scale. Furthermore, we conducted empirical tests of the validity of the German translation of this scale. Most importantly, we tested whether this scale provides incremental validity over and above existing measures of internal political efficacy.

Theoretical background

In a review of definitions, Sohl (2014) illustrated how the exact meaning of the concept of internal efficacy differs between studies. She identified three different components that are regularly used to define internal efficacy: (1) the perception “that one can exert influence (affect political outcomes)” (p. 36-37), (2) the “perceived ability to […] execute political actions” (p. 37), and (3) a perception of “understanding politics/the political system” (p. 37). Definitions differ in which component they include. For example, Niemi et al. (1991) focus on beliefs about the “competence to understand and participate effectively” (p. 1407), but do not explicitly mention a successful influence on outcomes as a necessary component of internal efficacy. In contrast, Balch (1974) focuses solely on the perceived availability of “means of influence” (p. 24), without mention of specific abilities or understanding. We argue, that a focus on the second component (i.e., internal efficacy as a perceived ability) is the most useful, because (a) it best separates internal efficacy from related concepts, like external efficacy (e.g., Balch, 1974; Cohen, Vigoda, & Samorly, 2001), political awareness (Zaller, 1992), and political sophistication (e.g., Luskin, 1987); and (b) it matches with the psychological concept of perceived self-efficacy from SCT (Bandura, 1991, 1997).

The SCT framework assumes that people take an agentic role in planning and executing their own behavior, and that self-efficacy beliefs (i.e., beliefs in the personal capability of mastering desirable behavior) play a key role for initiating and executing specific actions (Bandura, 1991, 1997). For any domain of functioning, a person can judge his or her domain-specific self-efficacy. From a perceived self-efficacy perspective, internal political efficacy can thus be understood as the belief that oneself is capable of mastering the necessary tasks to successfully participate in the political process (cf. Bandura, 1997).

Traditional measures of internal efficacy

While the importance of the concept of internal political efficacy is undisputed, there has been controversy on how to measure it appropriately (for an overview, see Morrell, 2003, pp. 591–595). Most researchers use internal efficacy measures based on the National Election Study (NES) scale introduced by Niemi et al. (1991). The four items of the NES scale are “I consider myself to be well qualified to participate in politics”, “I feel that I have a pretty good understanding of the important political issues facing our country”, “I feel that I could do as good a job in public office as most other people”, and “I think that I am better informed about politics and government than most people”Footnote 1 (Niemi et al., 1991, p. 1408). Although there is no exact translation of the NES scale for the use in German surveys, two adaptations have been validated in Germany: a three-item scale by Vetter (1997) and a two-item scale by Beierlein, Kemper, Kovaleva, and Rammstedt (2014). Their items closely resemble the ones that have been used to create the NES scale (cf. Craig, Niemi, & Silver, 1990). The correlational patterns towards related variables (e.g., political interest, participation propensity, and sociodemographic variables) indicate that both scales (henceforth “Vetter scale” and “Beierlein scale”) measure the same construct as the English NES scale (cf. Arzheimer, 2005; Beierlein et al., 2014).

Caprara et al. (2009) identify two conceptual weaknesses in the traditional measures of internal efficacy, which potentially limit their predictive validity. First, although self-efficacy is a psychological construct, with an extensive body of psychological theory and research surrounding the topic (for an extensive review, see Bandura, 1997), the NES scale has not been constructed using the theoretical insight into the psychological nature of the concept. Second, the NES items address a relatively small set of politically relevant skills, with a focus on understanding (rather than participating in) the political process (for a similar critique of the NES scale see Bandura, 1997, pp. 483–484, and Caprara & Vecchione, 2017, p. 287).

The P-PSE scale as an alternative measure

In response to these limitations, Caprara et al. (2009) constructed the P-PSE scale, which is based on the guidelines that Bandura (2006) created for the construction of domain-specific self-efficacy measures: Its items should be formulated in terms of capability judgements, their content should represent all tasks relevant for what is considered successful behavior, and the items should include varying levels of task difficulty. Applying these principles to the political domain, Caprara et al. (2009) suggested two main sets of skills necessary to successfully participate in a representative democracy: (1) to voice and effectively promote one’s own political opinions and (2) to execute control over elected officials. From these skills, they deduced ten items referring to concrete tasks of political participation, for each of which respondents are asked to rate their capability of mastering it (see Table 1). The scale’s variety of specific tasks contrasts with the NES scale, which relies on a more narrow understanding of political competence—rather focusing on knowledge and comprehension of politics. The P-PSE scale can therefore be argued to cover the phenomenon of internal efficacy better than its preceding alternatives, and hence, to offer a more content valid measure of the concept (cf. Fontaine, 2005, p. 804).

Table 1 Item wordings of the original P-PSE scale

Although published in English, only an Italian version of the full P-PSE scale has been validated so far (see Table 5 in Appendix for the Italian wording). Using several Italian samples, the authors confirmed the scale’s internal consistency, reliability, and criterion validity (Caprara et al., 2009; Vecchione et al., 2014). Most importantly, they provided evidence of incremental validity over the NES scale in predicting various types of political participation behaviors.

In addition, Vecchione et al. (2014) suggested a short version of the P-PSE scale, by selecting a subset of four items (3, 4, 8, and 10), which they argue to adequately represent the content of the full scale. They conducted studies on Italian, Spanish, and Greek samples confirming the validity of this short scale.

Translation and methodology

Since the original scale was published in English, we used the English version as source instrument for the translation, thereby ensuring that future translations to other languages can be based on the same source instrument. After one German native speaker translated the scale to German, we reviewed the scale in a three-person team. The team consisted of two Germans and one English native speaker, all of whom are fluent in the other language, and who—as suggested by the Best Practice Guidelines for Cross-Cultural Surveys—unite different levels of discipline expertise (Survey Research Center, 2016, p. 245). Following the guidelines, we aimed to “keep the content of the questions semantically similar; keep the question format similar within the bounds of the target language; [and] retain measurement properties, including the range of response options offered” (Survey Research Center, 2016, pp. 233–234). In order to achieve these goals, we based our translation on an asking-the-same-questions-and-translation approach (ASQT; Survey Research Center, 2016, p. 234), and tried to stay as close to the content of the original items as possible. As in the original instrument, the translated version asks about respondents’ perceived capability to execute different political activities in the introduction. By instruction, each item displays a capability judgment (“I feel capable to…”), which is to be rated on a five-point agreement scale. The translated items are displayed in Table 5 in Appendix, together with the instruction and response categories.

Measures

In order to validate the translated scale, we conducted an online survey including the translated P-PSE items and several scales of related constructs. The P-PSE items were administered using a five-point Likert agreement scale with only the extreme categories labelled (“completely disagree” and “completely agree”).

For validation, we included the German internal efficacy scales by Beierlein et al. (2014; Spearman Brown coefficient = 0.83) and Vetter (1997; McDonald’s ω = 0.78), a five-item political interest scale (Otto & Bacherle, 2011; McDonald’s ω = 0.94), a three-item external efficacy scale (Vetter, 1997; McDonald’s ω = 0.74), a three-item scale of general self-efficacy (Beierlein, Kemper, Kovaleva, & Rammstedt, 2013; McDonald’s ω = 0.88), a self-placement of left–right orientation (based on GESIS, 2015), and a list of eleven items asking about the respondents’ political participation behavior (e.g., “During the last two years, how often did you actively participate in a political party or movement?”; McDonald’s ω = 0.83). All measures are provided in detail in the online supplementary materials.

Sample

The survey was conducted in October 2016 by the professional sampling agency Respondi (www.respondi.com). The sampling process followed a plan with representative quota for the German adult population regarding age, gender, and formal education. All respondents declared their informed consent before starting the survey and received a financial incentive for their participation by the sampling agency. After exclusion of careless responders (cf. Meade & Craig, 2012; for documentation, see supplementary material) and listwise exclusion of missing values (N = 64), a total of N = 1025 cases was used for analysis. The sample consisted of 51.7% females vs. 48.3% males. The mean age was 51.7 years (SD = 16.5). Of the participants, 36.7% reported low levels of formal education (‘Hauptschule’ or no degree at all), 30.5% reported medium levels (‘Realschule’), and 32.8% reported high levels (‘Abitur’ or ‘Fachabitur’). The sample distributions closely approximated population parameters, even after the exclusion of careless responders and missing value cases (for a detailed sample description see supplementary materials).

In order to assess cross-cultural invariance of the scale, we used an Italian sample from Caprara et al. (2009, Study 1), which was kindly provided to us by the authors. The data were collected via face-to-face questionnaire in Italy in 2008. The participants were recruited by psychology majors, who conducted the interviews as part of a course assignment (for more details, see Caprara et al., 2009, p. 1006). All respondents participated voluntarily. After listwise exclusion of missing values (N = 30), the Italian sample had a total of N = 1654 valid cases. Although a convenience sample, distributions were diverse regarding gender, age, and formal education: The sample consisted of 54.4% females vs. 45.6% males. Respondents’ mean age ranged from 19 to 89 years (M = 44.7; SD = 17.6). Of the respondents, 20.7% had concluded elementary or junior high school, 55.6% had concluded high school, and 23.8% had achieved some university degree. A detailed sample description and comparison to population distributions is provided in the online supplementary materials. The Italian P-PSE scale consisted of the ten original items (see Table 5 in Appendix for the Italian wording and response categories) and yielded an internal consistency of McDonald’s ω = 0.92.

Psychometric properties of the translated scale

The German P-PSE scale resulted in a mean score of M = 2.78 (SD = 0.88). Item–total correlations varied between r = 0.49 (item 1) and r = 0.79 (items 6 and 7), with a mean correlation of r = 0.67 (SD = 0.11). Item difficulty ranged from 0.24 (item 9) to 0.62 (item 1), with a mean difficulty of 0.44 (SD = 0.11). As suggested by Vecchione et al. (2014), we used the items 3, 4, 8, and 10 as a short version of the P-PSE scale, which resulted in a mean score of M = 2.90 (SD = 1.01). Other item-level statistics and intercorrelations are reported in the Table 4 in Appendix.

Objectivity

The translated P-PSE scale contains a written instruction, Likert scale response options, and a simple aggregation rule to obtain the scale-level score (see Table 5 in Appendix). For paper-and-pencil and computer-based questionnaires, these features sufficiently ensure objectivity regarding administration and scoring (Lösel, 1999).

Reliability

In order to assess reliability, we calculated the internal consistency estimator Omega (ω) described by McDonald (1999), which has been demonstrated to perform better than Cronbach’s α (Dunn, Baguley, & Brunsden, 2014). All data analysis was conducted in R version 3.5.1 (R Core Team, 2017) via RStudio version 1.1.456 (RStudio Team, 2015). The internal consistency of the P-PSE scale was ω = 0.91 (95% CI [0.91, 0.92]), which indicates high internal consistency. The four-item short scale resulted in a lower, yet acceptable internal consistency of ω = 0.84 (95% CI [0.83, 0.86]).

Dimensionality

We tested unidimensionality of the scales using confirmatory factor analysis (CFA) with one latent factor. CFA was conducted using the lavaan package (version 0.6-3; Rosseel, 2012) with diagonally weighted least squares estimation due to the items’ ordinal level of measurement (Kline, 2016, pp. 257–258). As recommended by Hu and Bentler (1999), we assessed model fit by jointly considering the comparative fit index (CFI; acceptable fit > 0.95) and standardized root mean-square residual (SRMR; acceptable fit < 0.08). Both indices corroborated the hypothesis of a single latent factor for the full scale (CFI = 0.993; SRMR = 0.048), as well as the short scale (CFI = 0.999; SRMR = 0.022; see the upper part of Table 2 for an overview).

Table 2 Confirmatory factor analysis (CFA) and measurement invariance (MI) model fit

Measurement invariance

The P-PSE full scale has only been validated in Italy so far (although the short scale has also been validated in Spanish and Greek samples). Therefore, we tested cross-cultural invariance of the scale by comparing our German sample (n1 = 1025) to the Italian sample (n2 = 1654). Using multigroup CFAs, we tested for configural invariance (same factor structure across samples), followed by metric invariance (same factor loadings across samples), scalar invariance (same item intercepts across samples), and residual invariance (same error variances across samples; for a similar procedure, see Baumert et al., 2014; for an overview of measurement invariance conventions, see Putnick & Bornstein, 2016). We assessed model fit of the configural invariance model using the same criteria as before (Hu & Bentler, 1999). All subsequent models are each nested within its preceding model (e.g., metric invariance within configural invariance) and were therefore assessed in comparison to the preceding model. To judge whether fit differences between nested models are substantial, we used the cutoff criteria by Chen (2007). For large sample sizes, she recommends the use of ΔCFI = 0.01 as main criterion, and ΔRMSEA = 0.015 and ΔSRMR = 0.01 (except for metric invariance, where ΔSRMR = 0.03) as additional criteria.

Using the same specifications as before, we found full configural invariance between the two samples (CFI = 0.996; RMSEA = 0.035; SRMR = 0.036; see the lower part of Table 2). However, full metric invariance could not be established (e.g., ΔCFI = 0.015). We identified Item 2 to differ most strongly in its loading across samples, and—allowing this item’s loading to differ freely between samples, as suggested by Vandenberg and Lance (2000, p. 57)—established a model of partial metric invariance. This model met two of our fit criteria (ΔCFI = 0.009; ΔSRMR = 0.018), but exceeded the cut off value of the third one (ΔRMSEA = 0.023). Since ΔCFI is recommended as the main criterion (Chen, 2007, p. 501) and absolute model fit was still good in terms of the Hu and Bentler (1999) criteria, we decided to accept this model of partial metric invariance. Since the majority of item loadings was invariant (Vandenberg & Lance, 2000, p. 38; see also Putnick & Bornstein, 2016), we proceeded in testing scalar invariance. Again, the full scalar invariance model failed our criteria (e.g., ΔCFI = 0.033). Thus, we identified the three items which most strongly diverged in their intercepts (Items 1, 2, and 3) and—allowing these to differ between groups—established a model of partial scalar invariance (ΔCFI = 0.006; ΔRMSEA = 0.009; ΔSRMR = 0.007). Finally, we tested the residual invariance model against the preceding model and found it to fit relatively well (ΔCFI = 0.003; ΔRMSEA = 0.001; ΔSRMR = 0.006).

Summarizing, we can say that the ten P-PSE items load on a single latent factor in both samples (Germany and Italy), and that nine out of ten items do so with equal factor loadings across samples, i.e., the scale shows partial metric invariance. The observed non-invariance of Item 2 (“Make certain that the political representatives you voted honor their commitments to the electorate”), indicates a cross-cultural difference in how much the described task (i.e., monitoring elected representatives) relates to the latent construct of internal efficacy, with slightly lower standardized loadings in Germany (λ2 =  0.53) than in Italy (λ2 = 0.78). Apparently, German voters perceive the control of elected officials to be less related to internal efficacy beliefs compared to Italian voters. One possible explanation might be the Italians’ experience of relatively frequent snap elections (1994, 1996, and 2008) and frequent changes of their government leader in recent years, which might be perceived as evidence that elected officials are actually controlled by the people. Since only one item displays metric non-invariance, interpretation of the overall P-PSE mean score can be assumed to be unaffected (Steenkamp & Baumgartner, 1998; Steinmetz, 2013). Full scalar invariance was impeded by three out of ten items, with higher standardized intercepts in Germany (ν1 = 3.07, ν2 = 2.90, ν3 = 2.36) than in Italy (ν1 = 2.09, ν2 = 2.05, ν3 = 1.80). Since between-group differences in item intercepts can affect the comparability of observed mean scores (Steinmetz, 2013), we suggest to use latent modelling when group mean comparisons are of the essence, where partial scalar invariance is a sufficient prerequisite (Steenkamp & Baumgartner, 1998; see also Steinmetz, 2013).

Validity

We assessed construct, criterion and incremental validity of the translated P-PSE scale and its four-item short scale.

Construct validity

In order to assess construct validity of the translated scale, we postulated a nomological network of theoretically related (convergent validity) and unrelated (discriminant validity) constructs and tested the correlations between these constructs and the P-PSE score (Hartig, Frey, & Jude, 2008, pp. 148–154). All results are displayed in Table 3.

Table 3 Pearson’s correlations of the P-PSE scales with theoretically related (convergent validity) and unrelated (discriminant validity) constructs

As expected, we found high correlations with both preexisting internal efficacy scales. There was also a high correlation with political interest, which is in line with previous studies (e.g., Craig et al., 1990, p. 305; Foschi & Lauriola, 2014, p. 350). We found medium-sized correlations with external efficacy and general self-efficacy. Again, this is very plausible, since both constructs are conceptually related to internal efficacy (Balch, 1974; Bandura, 1997). In line with previous findings and theoretical assumptions (e.g. Caprara et al., 2009), the P-PSE score was independent from left–right orientation, but revealed a small correlation with ideological extremity—operationalized as the squared z-standardized left–right score. Regarding sociodemographic variables, men and highly educated people scored higher on the P-PSE scale than women and people with lower levels of formal education, which is the typical pattern of internal efficacy (e.g., Arzheimer, 2005, p. 199). Additionally, we found a small positive correlation with age. Concluding, the correlations towards external criteria reveal the expected pattern for a measure of internal efficacy: high correlations with other internal efficacy measures and political interest, medium-sized correlations with related self-belief variables, and null-correlations with independent constructs. The same pattern emerged when using the four-item short scale (see Table 3).

Criterion validity

One predominant aspect of internal efficacy is its predictive value regarding political participation behavior (Bandura, 1997; Krampen, 1990; Vecchione & Caprara, 2009). We therefore assessed the scale’s criterion validity (Hartig et al., 2008, p. 156) by measuring its relationship towards the propensity to participate in politics. Similar to other researchers in the field (e.g. Kaase, 1999; Peterson, Speer, & McMillan, 2008), we asked about past involvement in eleven different activities of political participation behavior (e.g., stating one’s political opinion or signing a political online-petition) and used these items to build an index of political participation propensity (McDonald’s ω = 0.83). We used a hierarchical regression model including the control variables age, gender, and education to estimate the scale’s criterion validity. As expected, the P-PSE scale explained a substantial amount of variance in respondents’ participation propensity over and above the sociodemographic variables in its full ten-item version (β = 0.28, p < 0.001, ΔR2 = 0.26), and in its four-item short version (β = 0.25, p < 0.001, ΔR2 = 0.26).

Incremental validity

In their validation study, Caprara et al. (2009) showed that the original P-PSE scale accounted for unique variance in several indicators of political participation over and above the traditional NES scale. In order to corroborate the incremental value of the scale in the German context, we aimed at replicating this finding with regards to the Vetter and Beierlein scales. We estimated two more hierarchical regression models of political participation propensity. Each model included the before-mentioned control variables and one of the traditional measures before adding the translated P-PSE scale in a second step. The P-PSE scale increased the explained variance by ΔR2 = 0.12 compared to the Vetter scale and by ΔR2 = 0.13 compared to the Beierlein scale. The four-item short scale revealed the same incremental value compared to the traditional scales. Detailed results of all regression models are documented in the online supplementary materials.

Discussion and conclusion

Research on internal political efficacy has not yet come to a consensus about how to measure the construct—many of the previous measures have been criticized for several reasons (e.g., Bandura, 1997, pp. 483–484; Morrell, 2003, p. 595). Largely neglected by scholars studying political efficacy, SCT (Bandura, 1991) has offered a psychological and systematic perspective on self-efficacy beliefs. Based on this theory, Caprara et al. (2009) created the P-PSE scale as a new measure of internal efficacy, which—constructed in terms of capabilities related to relevant participation behavior—offers an arguably more content valid alternative to the established measures of internal efficacy. We translated and validated the scale for the use in German samples. One limitation of our study concerns the labelling of the response categories in terms of agreement. Ratings in terms of strength of confidence might have been more consistent, and researchers might want to try this as an alternative to our response categories in the future. Nevertheless, analogous to Caprara et al. (2009), the results confirm the reliability and construct validity of the translated scale. Analyses of measurement invariance revealed that the translated scale yields the same factorial structure, as well as partial metric and scalar invariance compared to the original scale by Caprara et al. (2009). Regarding the most important external criterion—political participation propensity—the scale surpasses the established internal efficacy measures, thereby attesting to its potential value for the study of political behavior. In addition, a four-item short version of the scale resulted in similar results—though with a small decrease in internal consistency—and hence offers an economical alternative especially suited for the application in large surveys.