Item comparability in cross-national surveys: results from asking probing questions in cross-national web surveys about attitudes towards civil disobedience

This article focuses on assessing item comparability in cross-national surveys by asking probing questions in Web surveys. The “civil disobedience” item from the “rights in a democracy” scale of the International Social Survey Program (ISSP) serves as a substantive case study. Identical Web surveys were fielded in Canada (English-speaking), Denmark, Germany, Hungary, Spain, and the U.S. A category-selection and a comprehension probe, respectively, were incorporated into the Web surveys after the closed-ended “civil disobedience” item. Responses to the category selection-probe reveal that notably in Germany, Hungary, and Spain the detachment of politicians from the people and their lack of responsiveness is deplored. Responses to the comprehension probe show that mainly in the U.S. and Canada violence and/or destruction are associated with civil disobedience. These results suggest reasons for the peculiar statistical results found for the “civil disobedience” item in the ISSP study. On the whole, Web probing proves to be a valuable tool for identifying interpretation differences and potential bias in cross-national survey research.


Comparability needs in cross-national survey research
Large-scale cross-national survey projects, such as the World Value Surveys (WVS), the European Values Study (EVS), the International Social Survey Program (ISSP), the European Social Survey (ESS), and the Eurobarometer have established long time series, partly beginning in the 1970s and 1980s. A major goal of these surveys is the analysis of social change in a comparative perspective, which requires the continued invariant measurement of D. Behr (B) · M. Braun · L. Kaczmirek · W. Bandilla GESIS -Leibniz Institute for the Social Sciences, P.O. Box 122155, 68072 Mannheim, Germany e-mail: dorothee.behr@gesis.org constructs across countries. However, countries can differ in the way the majority of the population interprets a question, even if the question seems well-translated. In addition, social change over time may lead to changes in item interpretation in a country. These aspects are a threat to the long-term validity of cross-national surveys because differences across countries and over time could be methodological artifacts rather than "real" differences between countries and time periods.
Good questionnaire design with diverse international input in form of expert review, pretesting, and the like (Harkness et al. 2010) can help prevent some of the problems mentioned above. However, even in the major cross-national surveys, pretesting across many different countries prior to questionnaire finalization is rare and so interpretation differences can never be fully ruled out. Moreover, appropriate questionnaire design cannot prevent items from changing their meaning across time or from taking on an unintended or differential meaning in a country which newly joins a survey program and which simply has to accept the (core) questionnaire as it is (Miller et al. 2011).
In view of these "threats", assessing data comparability is a necessary prerequisite of any substantive analysis of cross-national survey data. Although the application of data-analytic procedures is an appropriate means for detecting equivalence problems, they are not conducive to explaining the causes of incomparability. Knowledge of these causes, however, could serve to improve measurement instruments for future surveying. Even if measurement instruments cannot be changed, as is often the case in an ongoing time series, knowledge of the causes of incomparability can be usefully applied and inform substantive interpretation of the data.

Cognitive interviewing
A possible solution for detecting measurement artifacts and learning about different item interpretation across countries is to conduct cognitive interviewing. Cognitive interviewing helps to uncover the cognitive processes respondents use when answering a survey question. Respondents have several tasks to complete when answering a survey item: interpreting a question, generating an opinion, matching the opinion to a response category, and editing the response taking social desirability into consideration (e.g., Schwarz 1996;Strack and Martin 1987;Tourangeau et al. 2000). Surveys are likely to contain at least some questions which do not reflect or match social reality and issues of public debate. They may also include terminology that is not consistently understood by all respondents or that is too vague, thus opening up paths for a variety of possible interpretations. Here, different cultural contexts could lead to different item interpretations across countries. This may even be the case with items that seem well-translated.
There are two major cognitive interviewing techniques used in survey research. These are think-aloud where respondents verbalize their thoughts as they answer survey questions and verbal probing where interviewers ask follow-up questions to obtain additional information on responses (Beatty and Willis 2007;Willis 2005). Among the probing techniques, comprehension probing (such as "What does the term xy mean to you?") is a particularly suitable means to reveal country-specific interpretations of individual terms or phrases. Also category-selection probing (Prüfer and Rexroth 2005) can be regarded as an appropriate means to assess item comparability. With category-selection probing respondents are asked for their reasons for having selected a particular scale value for a closed question. The answers to category-selection probing allow to analyze the differentiations respondents make in the interpretation of an item. In particular "silent misinterpretations" (DeMaio and Rothgeb 1996) can be detected where respondents seemingly do not have problems with the interpretation of an item but actually misinterpret its meaning.

Cognitive interviewing across countries
The present use of cognitive interviewing (techniques) in the cross-national context is restricted. First, these techniques typically serve to improve a comparative source questionnaire or a translation prior to fielding (e.g., Childs and Goerman 2010;Fitzgerald et al. 2011;Forsyth et al. 2007). Implementing cognitive techniques in a cross-national survey itself (similar to Schuman's "random probes", 1966) or conducting cognitive interviews "after the fact" to explain problems in the dataset is still rare. Recent exceptions, however, are the post-survey cognitive interviewing studies by Thrasher et al. (2011);Reeve et al. (2011) or Morren et al. (Forthcoming). Second, the comparative cognitive studies that are conducted mostly apply to different ethnic groups in only one country, mostly the U.S. (e.g., Goerman and Caspar 2010;Willis et al. 2008). While there are certainly challenges involved in organizing these studies in one country, it is even more difficult to organize them across countries, especially if comparability of results is the goal (Lee 2012). Experienced cognitive interviewers may not be available in all countries, and even if they were, it would be necessary to standardize procedures across countries for reasons of comparability. For instance, different house styles in recruiting respondents or different guidelines specifying the conduct of interviews would need to be harmonized, at least to some extent (Miller et al. 2011;Thrasher et al. 2011). Third, the number of cases per country in cross-national studies is usually too small to draw more generalizable conclusions on the differences between country-specific answer patterns (Thrasher et al. 2011). One of the few cross-national cognitive studies, in which seven countries participated and in which, among others, ESS questions were tested, conducted not more than ten interviews in three and not more than 20 in another two countries ). Other large cross-national survey projects, such as the ISSP, do not implement comparative cognitive pretests at all. Instead, they leave it to individual countries whether such pretests are conducted and do not make an attempt to systematically compare the results across countries.
There is no compelling reason why methods for analyzing cognitive processes involved in the interpretation of items need to be restricted in the way they currently are. First, while assessment of survey questions ideally takes place before items are used in the field, much could be learned as well from post-survey evaluation and from targeting problems that show up in the final data set. Second, the application of cognitive interviewing techniques does not have to be based on personal interviews. The critical issues of (non-)availability of cognitive interviewers across countries or of harmonization of probing procedures (Thrasher et al. 2011) could be circumvented by implementing probing techniques in self-administered surveys. Third, the dominant assumption that low case numbers will be sufficient when using cognitive interviewing techniques is counterproductive. They do not permit the analysis of diverging argumentation patterns across countries or meaningful quantification of results. Also in this case, self-administered surveys seem a suitable option.
Combining all aspects mentioned above, we propose to implement probing questions in cross-national Web surveys to assess comparability of items. The general feasibility of asking probing questions on the Web has already been tested and confirmed by Behr et al. (2012) and Behr et al. (Forthcoming). In these studies, data collection and substantive analyses have been restricted to Germany only. This paper extends this research to the international arena and addresses the question to what extent probing techniques implemented in cross-national Web surveys allow to unravel different interpretation patterns across countries. Directly related to this, we investigate the usefulness of different probing techniques (comprehension vs. category-selection) and investigate the contribution that each one can make to identifying comparability or non-comparability of items.
1.4 Case study: the "civil disobedience" item in the ISSP To demonstrate the utility of implementing probing questions in cross-national Web surveys, we investigate why respondents answer the "civil disobedience" item from the 2004 ISSP Citizenship Module the way they do. The ISSP is an annual cross-national survey program, fielding general population surveys on topics that are relevant to the social sciences. Set up in 1984, the ISSP has grown from four to 48 member countries in 2011. The questionnaire design process is characterized by rigorous cross-cultural collaboration, which cannot preclude, however, that some items might eventually include bias of some sort or other. In the ISSP, the "civil disobedience" item is part of the following scale on rights of people in a democracy (ISSP Research Group 2004a): There are different opinions about people's rights in a democracy. On a scale of one to seven, where one is not at all important and seven is very important, how important is it… a. that all citizens have an adequate standard of living b. that government authorities respect and protect the rights of minorities c. that government authorities treat everybody equally regardless of their position in society d. that politicians take into account the views of citizens before making decisions e. that people be given more opportunities to participate in public decision-making f. that citizens may engage in acts of civil disobedience when they oppose government action.
Results from the ISSP (ISSP Research Group 2004b) are displayed in Table 1. They show that Canada, Denmark, Germany (eastern/western), Hungary, Spain, and the U.S. have a similar high value for an index based on the items a-e (the index, indicating the mean score on these five items, ranges from 6.2 to 6.6). However, the means of the "civil disobedience" item shows a pattern that clearly distinguishes between two groups of countries. While the means for civil disobedience is lower than the index in all countries, it is particularly low for Canada, Denmark, and the U.S. (between 3.8 and 4.1). In Germany (eastern and western), Hungary, and Spain it ranges from 5.0 to 5.5.
This peculiar pattern could be explained in at least two ways. First, the differences may be due to country differences in trust in government. Donovan et al. (2008), for instance, find that the need to engage in civil disobedience is negatively correlated with trust in government. Trust in government depends on a variety of factors, among which the evaluation of democratic performance or that of governmental responsiveness. Equally, corruption plays its part in shaping trust in government.
Second, the peculiar pattern across countries might equally be due to country-specific interpretations of the term "civil disobedience". In Canada, Denmark, and the U.S., respondents might associate negative events or situations with the term and thus be less inclined to support civil disobedience. If the key term "civil disobedience" is indeed understood differently across countries, we would have to deal with a lack of cross-cultural validity. The high non-response rate for civil disobedience in the ISSP data set already suggests that something is problematic with the item (between 3.3 and 17.5 % across countries). The amount of nonresponse for the "civil disobedience" item has also led researchers to discard the item from analyses (e.g., Bolzendahl and Coffe 2008).  Our study investigates whether differences in the level of trust in government and/or different cognitive representations of the term civil disobedience contribute to country differences.

Data source
The data in this paper come from Web surveys that were conducted in Canada (Englishspeaking), Denmark, Germany (eastern and western Germany as separate regions), Hungary, Spain, and the U.S. in January 2011. Participants to these surveys were drawn from nonprobability access panels. 1 Our Web surveys targeted nationals aged 18-65. Quotas based on gender, age (18-30, 31-50, and 51-65), and education (lower education vs. higher education) were used to obtain a balanced, albeit not representative, sample. Thus, gender and education were similarly distributed, and the mean age ranged from 40.6 to 42.3. In total, 3,695 respondents across all countries completed the survey. The mean answer time (for all countries) was almost 17 min, and the mean break-off rate was 11.4 %. Table 2 provides descriptive statistics for the individual countries. The study presented here is part of a larger methods project that aims at detecting comparability problems in cross-national survey research through the development and assessment of Web probing.

Questionnaire
The questionnaire covered the topics of family (e.g., gender roles) and politics (e.g., attitudes towards rights of people in a democracy and towards immigrants) and mainly used items from the International Social Survey Program. The topical blocks were rotated to avoid sequence effects. In total, the questionnaire contained 36 closed-ended items. In addition, each respondent received eight open-ended probing questions, among which category-selection and comprehension probing. Soft edit checks were included with all items and probes: If respondents did not answer a question, a message was shown that reminded them of the importance of their answer. Respondents were then able to choose between giving an answer or skipping to the next question. The ISSP "rights of people in a democracy" scale, as presented above, was implemented with each item appearing on a separate screen. The response scale was end-point labeled and ranged from one "not at all important" to seven "very important" (horizontal display). A "can't choose" category was explicitly offered to respondents. After the final item on civil disobedience, respondents were randomly assigned to either the screen with the comprehension probe ("What ideas do you associate with the phrase 'civil disobedience'? Please give examples.") or to the screen with the category-selection probe ("Please explain why you selected [1-7 or 'can't choose' inserted].") (see Fig. 1 for screenshots). Respondents thus received only one of these probes. Furthermore, within the politics section we inquired after respondents' interest in politics ("How interested are you in politics? Fairly interested, interested, some interest, little interest, not at all interested (1-5)"; reverse coded such that high values indicate high interest). In addition, we asked respondents to place themselves on the left-to-right scale ("Many people use the terms 'left' and 'right' to designate different political positions. We have a scale here that runs from left to right. Where would you position your own political views on this scale?"). The scale itself was randomly modified in eight experiments, including ten or 11 scale points, the DK option or not, and end-numbers only or the full set of numbers. Political interest and left-right orientation will serve as control variables in the analyses to assess the potential of bias we might have. After all, these variables are likely to be correlated with the civil disobedience item and might introduce bias if totally unrealistically represented in our Web surveys.

Translation of probe answers, development of the coding scheme, and coding
The Danish, Hungarian, and Spanish answers to the civil disobedience probes were translated into German by professional translators who had been briefed on the particularities of these texts as well as on translation and coding needs. The translators equally served as a point of contact for queries that came up during the coding process, e.g., on the scope of meaning of certain terms. Since we were able to code English and German answers in our research team, we did not commission any translation for these languages.
Two distinct substantive coding schemes were developed for the two types of probes owing to the different nature of responses that we obtained from respondents. Obviously, explicitly asking for examples in the case of the comprehension probe led to a high number of respondents offering example lists such as "Picketing, rallying, speaking out publicly". Answers such as these were not common for the category-selection probe.
Multiple substantive coding was possible for each probe answer and further defined in the coding guidelines. In addition to substantive coding, non-response codes were assigned to answers. We considered the following to be non-response answers: (1) no text entry at all; (2) ?, -, letter combinations such as "fdg"; (3) don't knows; (4) refusals such as "n/a"; (5) other meaningless entries such as "it is like that", "it is not that important", "why not"; and (6) incomprehensible and therefore non-codeable answers such as "to an extent" or "against us" (see Holland and Christian 2009 for a similar non-response definition). We tried to build the substantive coding scheme for category-selection and comprehension probing in a way that similarities between both probing types became visible.
Inter-rater agreement based on 10 % of answers proved satisfactory for both the comprehension probe (between 79 and 88 % across countries) and the category-selection probe (between 75 and 92 % across countries). The low of 75 % was partly due to mismatches within the different non-response codes. Corrections to the dataset were made following the inter-rater assessments.

Analytical procedures
First, the Web survey data is compared to the ISSP data. We regard the replication of the country patterns for the index of the "rights in a democracy scale" (all countries having about the same mean for items a-e) and the "civil disobedience" item (two groups of countries with markedly different means) as a precondition for using the Web survey data to shed light on the ISSP data. Second, we present the substantive coding schemes for the two probe types. Third, we present the distributions of the codes across countries. The significance of country differences is assessed using logistic regression, with the respective codes as dependent variables and countries as independent variables. Finally, we assess to what extent the substantive codes can explain the variance of the answers for the "civil disobedience" item. For that purpose, we run separate regression analyses for the category-selection probe and the comprehension probe. The substantive codes serve as independent variables in these analyses; the "civil disobedience" item is the dependent variable. Further socio-demographic items serve as control variables.

Comparison of ISSP and Web survey data
To begin with, we compare the ISSP and the Web survey results on the "rights of people in a democracy" scale ( Table 1). The comparison shows that the striking patterns found in the ISSP are neatly reproduced in the Web surveys. That is, the value for the index (items a-e) is about the same for all countries/regions. The right to engage in civil disobedience is supported less than the other rights in all the countries. The biggest gap between the index and the civil disobedience mean is once again found in Canada, Denmark, and the U.S. Furthermore, similar to the ISSP, the "civil disobedience" item is marked by a high non-response rate which is overall 7.4 %. For the other items of the "rights in a democracy" scale, non-response reaches maximally 2.2 % in a country. In fact, many respondents indicated that they did not understand the question and/or the term "civil disobedience" and, therefore, were not able to provide an answer.

Coding scheme for the category-selection probe
Next, we present the substantive coding scheme for the category-selection probe (see Table 3, which also lists examples for each of the codes). The codes for the category-selection probe stand for reasons-in the widest sense-that respondents put forward to explain their answer value regarding civil disobedience. In line with our assumptions that a low level of trust in government plays a role in the civil disobedience rating, we were able to identify answers that went into this direction. We distinguished two dimensions. The code NO RESPON-SIVENESS assembles answers that focus on problems with vertical accountability between politicians and respondents, on the lack of politicians' responsiveness towards voter's needs and on serving only the interests of big business instead of those of the voters. The code GENERAL DISSATISFACTION represents discontent with the government in a more general way without respondents accusing the government to act against voter's needs, breaking election promises or serving only big business. The focus with GENERAL DISSATISFAC-TION is more on the importance of having a method at one's disposal that helps to show dissatisfaction-whenever a situation of discontent with the government arises. Admittedly, it was not always easy to draw a line between these two codes. A clear-cut example for NO RESPONSIVENESS would be (answer edited for spelling and punctuation mistakes): "If you look at 'empires' of the past, you will see that corruption from within was their downfall. Now look at how things are today. Government officials abuse their power for their own gain. They are too busy satisfying their own needs, the needs of special interest, and no longer care for the working man. Even though they are the backbone of this nation." (U.S., 2868). A clear-cut example for GENERAL DISSATISFACTION would be: "People must have a way of showing governnment when the latter are making the wrong decisions" (Canada, 5703). A not that straightforward case (coded as GENERAL DISSATISFACTION) was: "You should be allowed to protest when the government does not put the public first" (Canada, 9626). Inter-rater agreement helped to assess and improve our coding in the critical cases. In addition, we ran analyses also for both dimensions merged to ensure that critical coding cases did not introduce measurement artifacts.
Another fundamental line of reasoning was referring to freedom of speech, to one's right to protest, one's right to participate in decision-taking, etc. (RIGHTS). Furthermore, we had respondents defining or describing what civil disobedience should (not) look like and under which conditions it would be (un-)acceptable (RANGE OF ACCEPTABILITY). Another group of respondents preferred other means to be taken (first) in order to put their message across, such as elections or available legal routes (OTHER METHODS). Still another group referred to respecting and accepting authority and the elected government, taking into account that not all activities please everyone (ACCEPTANCE). Also, some respondents explicitly mentioned violence or destruction that go along with civil disobedience (EXPLICIT VIO-LENCE). Respondents also more implicitly referred to violence and destruction in the sense that they were aware that the actions do not always end on a peaceful note and that they took this into account when answering (IMPLICIT VIOLENCE). Then, there were respondents who listed negative effects more in general, effects other than explicit violence and/or destruction (NEGATIVE EFFECTS), for example, disruption or disturbance of the public or the political system in general. Some respondents argued that civil disobedience has no effect whatsoever or is of no use (NO EFFECT), while others again spoke of positive effects of civil disobedience (POSITIVE EFFECTS). Finally, we had a group of respondents who  emphasized the ambiguous or unclear meaning of the term civil disobedience (AMBIVA-LENCE). All other answers which could not be coded into any of the above codes were assigned to the OTHER code. In general, answers were either assigned to non-response codes or to (a combination of) the substantive codes. AMBIVALENCE and OTHER, however, could not be combined with any other code.

Coding scheme for the comprehension code
The codes for the comprehension probe represent what respondents associate with the term "civil disobedience". The codes as well as examples are presented in Table 4. In line with our assumptions about negative associations, we were able to discern the code VIOLENCE/DESTRUCTION where respondents name violent and/or destructive activities, such as rioting or looting. Similarly, we were able to identify a code that addresses disturbances more in general, such as blocking streets or other disruptions (DISTURBANCE). The counter-code to VIOLENCE/DESTRUCTION assembles answers where respondents explicitly refer to the peacefulness of activities (PEACEFUL), such as "nonviolent protest". We had an additional code for respondents who listed activities ranging from peaceful to violent (PEACEFUL/VIOLENCE). A large group of respondents simply listed activities without making any explicit reference to peacefulness, violence/destruction or disturbances, such as "demonstrating, picketing, writing letters" or "sit-in, marches" (LISTING ACTIVITIES). Demonstrations or marches are usually peaceful events, but without any explicit reference to peacefulness we did not assign such answers to the code PEACEFUL. Equally, without explicitly referring to disturbances we did not assign such answers to the code DISTUR-BANCE-in practice, of course, each demonstration or sit-in will somehow be a disturbance to a certain group of people. Then, there were respondents who thought of breaking the law (BREAKING LAW) and those who thought of breaking rules more in general (BREAKING  RULES). These categories could not always be strictly separated on the semantic level. Also here, we merged these codes for additional analyses to make sure that our differentiation did not introduce any bias. With the comprehension probe we also had respondents displaying general or more specific dissatisfaction with the government. However, since it was not always clear to us whether respondents' dissatisfaction was part of a definition they gave us for the term "civil disobedience" or whether they merely wanted to expose their opinion (e.g., "I just believe that people elect the politicians, so if they do not do the job they said they would then we as a people have the right to kick them out of office and find someone new to do their job right." U.S., 11499), and since reliable coding proved difficult in some of these cases, we eventually abstained from coding these dissatisfaction answers separately and used the OTHER code instead. Furthermore, we grouped answers together that weredemonstrably-copied from the Web (COPY-PASTE). However, this code let to only 24 answers out of more than 1,800 being the result of copy-paste activities. The code OTHER rounded up the coding scheme: answers that did not fall into the above listed codes and answers that particularly belonged to the realm of an opinion statement were assigned to this code. In general, answers were either assigned to non-response codes or (a combination of) the substantive codes. COPY-PASTE and OTHER were single-coded and distinctive codes. Table 5 shows the distribution of codes for the category-selection probe. The columns do not add up to 100 % since multiple coding was allowed. The country pattern for NO RESPONSIVENESS (first line in Table) is most revealing. This code not only has a high (overall) frequency but it also mirrors the divide between the two groups of countries that we have already seen in the data of the closed item. While only between 4.2 and 6.5 % of American, Danish, and Canadian respondents express their discontent over a lack of vertical accountability or responsiveness, between 11.6 and 18.0 % of German, Hungarian, and Spanish respondents do so. A logistic regression with the code NO RESPONSIVENESS as dependent variable and country as independent variable demonstrates that these country differences are significant (model not presented). The code GENERAL DISSATISFACTION has the highest (overall) prevalence. Also here, we see a Note: n = 1,721, respondents having values 1-7 on closed civil disobedience item, excl. DKs. The basis for each percentage is the number of respondents in the category-selection split in each country. The entries in the rows do not add up to 100 % since multiple coding was allowed tendency in line with the peculiar pattern, but the country differences are not significant in this case. Furthermore, the codes RANGE OF ACCEPTABILITY and IMPLICIT VIOLENCE show a striking pattern (country differences significant for both codes). Only between 4.2 and 7.6 % name a RANGE OF ACCEPTABILITY in Germany and Hungary (e.g., "only within the law", HU, 3197) or "all needs to remain within an appropriate scope-violence is never a solution", DE, 6171). In the other countries, this code has an occurrence of 10.8 to 16.7 %. IMPLICIT VIOLENCE predominates in Canada and the U.S. with around 13 % compared to 6 % and lower in the other countries. The high levels of both RANGE OF ACCEPTABIL-ITY and IMPLICIT VIOLENCE for the U.S. and Canada can be explained by the coding rules. Whenever the range of acceptability was defined in terms of peacefulness, both codes were given; e.g., "Citizens should be allowed to engage in acts of civil disobedience up to a point, but not go as far as things becoming violent" (U.S., 2898). Finally, the code OTHER METHODS was hardly mentioned in Germany and Hungary compared to the other countries (significant country differences).

Distribution of substantive codes for the category-selection probe
3.5 Distribution of substantive codes for the comprehension probe Table 6 shows the distribution of codes for the comprehension probe. Once again, the columns do not add up to 100 % since multiple coding was allowed. The most remarkable code for the comprehension code is VIOLENCE/DESTRUCTION. Canada and the U.S. in particular are set apart from the others. Respondents from Canada and the U.S. name destructive/violent activities much more often than respondents in the other countries. Logistic regression with VIOLENCE/DESTRUCTION as dependent variable reveals that the country difference is significant (model not presented). When it comes to explicitly mentioning the peacefulness of civil disobedience or the peacefulness/violence range (PEACEFUL & PEACEFUL/VIOLENCE), Canada and the U.S. are again on top (significant country differences). This only highlights the importance that the dimension of peacefulness vs. violence has for Americans and Canadians when it comes to civil disobedience. BREAKING LAW particularly dominates in Denmark and Spain with around 24 % compared to 8-16 % in the other countries (significant country differences).

Regression of civil disobedience items on substantive codes
The substantive codes were subsequently used in regression analyses. We start with the category-selection split (Table 7).
In model 1, we regressed the civil disobedience item on the countries. Significant differences were found between the U.S. (the baseline) and Spain, Hungary, and (western/eastern) Germany. Denmark and Canada did not differ from the U.S. The explained variance is 12 percent. In model 2, we added all the substantive category-selection codes. The country differences were markedly reduced but remained significant. The explained variance rose to 36 %. Since the codes alone explain almost 30 % of variance, the second model leaves us with almost 6 % of variance that is due to the remaining country effect. Codes in model 2 which had significant effects on civil disobedience and, at the same time, large enough case numbers which also differed across countries (see Table 5 on country distributions) were: NO RESPONSIVENESS, GENERAL DISSATISFACTION, RIGHTS, and OTHER METH-ODS. The first three codes increased the support for civil disobedience, while with the forth code the support for civil disobedience dwindled. In model 3, we added socio-demographic variables (sex, education, age, but also left-right orientation and political interest) as control variables. While a political tendency towards the right significantly decreased the score on Note: n = 1,702, respondents having values 1-7 on closed civil disobedience item, excl. DKs. COPY-PASTE*: according to manual coding; few further culprits were identified by keystroke logging (total identified so far: 24). The basis for each percentage is the number of respondents in the comprehension split in each country. The entries in the rows do not add up to 100 % since multiple coding was allowed the civil disobedience item, the other variables did not impact on civil disobedience. The increase in explained variance in model 3 is practically zero. More importantly, the overall results produced in model 2 did not change with the introduction of the control variables. We ran additional analyses with, on the one hand, codes GENERAL DISSATISFACTION and NO RESPONSIVENESS merged and, on the other hand, codes GENERAL DISSAT- Germany (eastern) 0.98 * * * (0.15) 0.85 * * * (0.14) 0.78 * * * (0.14) Germany (western) 1.07 * * * (0.15) 0.87 * * * (0.14) 0.79 * * * (0.14) Hungary 1.32 * * * (0.15) 1.12 * * * (0.14) 1.15 * * * (0.14) Spain 0.62 * * * (0.15) 0.62 * * * (0.14) 0.56 * * * (0.14)

Argumentation patterns
Violence ISFACTION, NO RESPONSIVENESS, and RIGHTS merged to control for any potential bias that might have arisen through coding. The effects described above could neatly be replicated. Additionally, we ran the analyses also with a binary non-response variable (nonresponse for the probe) added as a predictor. There are practically no changes in explained variance compared to the models above, but we find that those who failed to provide a substantive response to the probe have in general more positive attitudes with regard to civil disobedience (0.42; p < 0.01). Similar analyses for the comprehension answers led to the following results (see Table 8): Introducing country as independent variable in model 1 led to 11 % of explained variance for civil disobedience. Upon adding the comprehension code answers to the regression, the country differences could again be reduced but to a smaller degree than was the case for the category-selection probe split. The explained variance in model 2 rose to 22 %. With 14 % of explained variance through the substantive codes alone, this leaves us, in model 2, with 8 % of variation due to the remaining country effect. The following codes were found to have a significant impact on civil disobedience and were at the same time of frequent occurrence across countries: VIOLENCE/DESTRUCTION, PEACEFUL, LISTING ACTIVTIES, and PEACE/VIOLENCE. While with the first of these codes the support for civil disobedience decreased, with the other three codes the support went up. Upon introducing the socio-demographic variables in model 3, we find that placing oneself towards the political right and higher education decrease the civil disobedience value. On the contrary, a higher political interest goes hand in hand with increased importance rating for civil disobedience. Overall, however, the socio-demographic variables do not change the patterns found in the previous models, and the explained variance only rises by 1 % in model 3.
We also set up regression models with probe non-response as additional independent variable besides the substantive codes. Those who failed to give a substantive answer to the comprehension probe did not differ significantly from the other respondents. Apart from that, results from the above models could be replicated. The replication of the above results was also possible when merging the codes BREAKING LAW and BREAKING RULES and introducing them into regression analyses.
In sum, both coding schemes-for the category-selection and the comprehension probehelped to reduce the pure country effects, the category-selection answers more so than the comprehension answers.

Discussion
We set out to shed light on interpretation differences across countries for the civil disobedience item of the ISSP by making use of Web probing. Precondition for any meaningful use of the Web survey data to explain the ISSP data was the replication of the country-specific answer pattern for both the "rights in a democracy" scale and the "civil disobedience" item. We were successful in achieving this. The first step towards answering our research questions was then to code the answers to the category-selection and the comprehension probe.
The regression analyses with the category-selection probe answers revealed that a low level of trust in government, notably expressed through the codes NO RESPONSIVENESS and GENERAL DISSATISFACTION, were the most obvious drivers for the high importance rating of the civil disobedience item. The country-specific distributions for these codes were in line with the peculiar civil disobedience pattern (U.S., Canada, and Denmark vs. Spain, (eastern/eastern) Germany, and Hungary). We, therefore, suggest that they explain part of the country differences for the closed item. In particular, the lack of responsiveness of politicians towards their electorate in Germany, Hungary and, to a weaker degree, in Spain seems to have played a decisive role in the civil disobedience rating. The differences in the level of trust are essentially substantive differences which reflect-save for imprecise measurement-real country differences. They should not endanger the equivalence status of the item, that is, they should not put into question the comparability of the item.
On the contrary, the regression analyses with the comprehension probe answers brought to light a critical equivalence issue. Interpretation patterns involving a violence/peace divide were the strongest predictors for the civil disobedience score. Especially the code VIO-LENCE/DESTRUCTION can be named in this context. Canadians and U.S.-Americans associate civil disobedience much more frequently with violence and destruction than respon-dents in the other countries. We suggest that these negative associations led in particular to the low importance rating of civil disobedience in Canada and the U.S. With many respondents in certain countries thinking of destructive and violent actions while this does not or hardly happen with respondents in other countries, we are faced with different understandings of the term "civil disobedience" (that is, we essentially have a different attitude object) and, thus, with the lack of equivalence.
These findings in combination bring us to the conclusion that the low civil disobedience values for Canada and the U.S. seem to result from both valid country differences (somewhat better perception of politicians in Canada and the U.S. than in the other countries) and a methods artifact (associations of violence/destruction that systematically add an interpretation shade that is hardly existing in the other countries). For Denmark, the low value might be due to mainly peaceful associations with civil disobedience, coupled with higher satisfaction with politicians than in Germany, Hungary or in Spain. Any future use of the item should specify the form of civil disobedience to ensure similar understanding and comparability across countries.

Limitations
In our study, the category-selection probe answers were more successful in reducing country effects than the comprehension probe answers. However, since we implemented the two probes in different splits and thus only had respondents answering either of the probes, we do not know how argumentation and interpretation interact within subjects. The different patterns could be additive but also overlapping. Future studies may combine the two probes in one person in order to allow the assessment of the relative explanatory power of the diverse patterns.
The Web probing method itself has some limitations. Although the majority of answers helped us to uncover country-specific interpretation patterns and thus served our needs, we need to issue a cautionary note. Respondents do not necessarily give answers that match the type of probing, not to mention non-response. A comprehension probe may thus lead, for instance, to elaborations on respondents' opinions rather than to respondents' definitions of certain terms. Such deviating behavior requires further investigation in the future.
Related to the issue above is the fact that respondents can search the Web for definitions of terms when asked a comprehension probe. Although this did not seem problematic in our study, it has to be considered when implementing comprehension probes in Web studies.
As with all probing techniques, one cannot fully be sure that the definitions or rationalizations given by the respondents are indeed what was going on in respondents' minds when answering the closed-ended survey questions. Still, the major answer patterns presented in this study (i.e., NO RESPONSIVENESS and GENERAL DISSATISFACTION for the category-selection probe, and VIOLENCE/DESTRUCTION for the comprehension probe) seem to be a plausible explanation for the answer values chosen by the respondents.
Our analytical approach itself, i.e., to offer answers on what might have happened in the ISSP based on conclusions from Web survey data, includes a mode switch, the use of both a representative and a non-representative survey, and a time lag. Our conclusions thus have to be treated with caution, although the replication of the two-group pattern for the civil disobedience item speak in favor of our approach. We recommend that in the future the time lag between a representative survey and a Web survey should be reduced. Alternative approaches may include follow-up Web surveys with the same respondents from a representative field or employing a probability-based Web panel, if available.
Talking about the future application of the method: Web probing, rather than being implemented after a survey, could equally be used as a pure pretesting tool and thus with a view to evaluate questionnaires and modify them for future use. The time needed to analyze (and code) hundreds of answers across countries, including the time potentially needed for translation, may be more than usually available during the questionnaire design phase, though. The method could equally be used to assess (changes in) item interpretation across countries at regular intervals.
Despite our general optimism with the method, we wish to stress that Web probing cannot replace traditional face-to-face cognitive interviewing when in particular in-depth information on response processes is sought that can only be obtained with several follow-up probes, when emergent probes (developed flexibly based on the behavior of the subject, Willis 2005) are considered the prevailing cognitive interviewing paradigm or when particular groups are targeted that cannot be reached via Web surveys. Still, cross-national Web probing has its potential and deserves further attention in practice and research.