Introduction

In surveys of sensitive attributes, social desirability bias threatens the validity of direct self-reports (Tourangeau & Yan, 2007). If some respondents choose to reply in line with social or legal norms rather than truthfully, prevalence estimates will be distorted in the direction of the socially desirable answer (Paulhus, 1991; Tourangeau & Yan, 2007). This response bias poses a serious threat to the interpretation of the results of survey studies on socially (un-)desirable or illegal behaviors, such as drug use, prejudice, abortion, tax evasion, and plagiarism. Indirect questioning techniques try to control for socially desirable responding. In the present article, we focus on two non-randomized response techniques (NRRT) that have recently been proposed as an advancement upon traditional randomized response techniques (RRT; Warner, 1965). In particular, we investigate the crosswise model and the triangular model (CWM and TRM; Yu, Tian, & Tang, 2008) and present the first direct comparison of the two methods’ validity.

Randomized response techniques (RRT) and non-randomized response techniques (NRRT)

Randomized response techniques (RRT; Warner, 1965) ensure that individual responses remain completely confidential in order to encourage respondents to provide more honest answers less distorted by social desirability bias. In the original RRT, respondents are presented with two statements: a sensitive statement (e.g., “I have taken cocaine”) and its negation (e.g., “I have never taken cocaine”). Respondents are then asked to reply to only one of the statements with “true” or “false”. The statement respondents are asked to react to is determined by the outcome of a randomization procedure (e.g., the roll of a die or the respondent’s birth month). It is impossible for the experimenter to tell whether any individual respondent has admitted to consuming cocaine because the experimenter is not informed of the outcome of the randomization. The distribution of the potential randomization outcomes is known, however; therefore, prevalence estimates for the sensitive attribute can be obtained on the sample level via appropriate statistical procedures (Warner, 1965). In a meta-analysis of 32 “weak” validation studies, RRTs obtained higher prevalence estimates than direct questions (DQ) for various sensitive personal attributes (Lensvelt-Mulders, Hox, van der Heijden, & Maas, 2005). The superiority of RRTs over DQ was found to increase with the increasing sensitivity of the attribute under investigation.

According to the “more is better” criterion, higher prevalence estimates for sensitive attributes are assumed to be more valid because they are presumably less distorted by social desirability bias. However, finding such a pattern can only be considered “weak” validity evidence, since higher prevalence estimates may still underestimate—or overestimate—the true prevalence (Umesh & Peterson, 1991). Therefore, “strong” validation studies in which the prevalence of the sensitive attribute is known and can be used as an external validation criterion are considered the gold standard. However, such studies are costly and difficult to implement, making strong validation studies quite rare (Lensvelt-Mulders et al., 2005; Umesh & Peterson, 1991). A meta-analysis found only six strong validation studies comparing the prevalence estimates of RRTs to estimates obtained via DQ. Overall, RRTs were found to provide more valid estimates than DQ, but still substantially underestimated the true prevalence (Lensvelt-Mulders et al., 2005).

RRTs have also been criticized for their relative inefficiency, the complexity of their instructions, and the need to use an external randomization device (Landsheer, van der Heijden, & van Gils, 1999; Ulrich, Schröter, Striegel, & Simon, 2012; Yu et al., 2008). Consequently, non-randomized response techniques (NRRTs) such as the CWM and the TRM (Tian & Tang, 2014; Yu et al., 2008) have recently been proposed as an advancement upon RRTs. In contrast to RRTs, NRRTs directly integrate the randomization procedure into the answer options. They thus have simpler instructions and are easier for the experimenter to administer and for respondents to understand. Moreover, at least in contrast to the original RRT (Warner, 1965), the TRM is more efficient in most cases (Yu et al., 2008).

The crosswise model

In the CWM, respondents are simultaneously presented with two statements. The first statement A refers to a sensitive attribute with unknown prevalence π (e.g., “I have taken cocaine”); the second statement B refers to a nonsensitive control attribute with known prevalence r that is used for randomization (e.g., “My mother was born in November or December”). Respondents are requested to provide a joint answer to both statements by indicating whether “both statements are true or none of the statements is true” or “exactly one statement (irrespective of which one) is true”. As in the original RRT, respondents can honestly choose either of the answers, while their individual status with respect to the sensitive statement A remains completely confidential. On the sample level, however, a maximum likelihood estimate for the prevalence π of the sensitive attribute can be obtained by using the formula (Yu et al., 2008):

$$ {\hat{\uppi}}_{\mathrm{CWM}}=\frac{{\hat{\uplambda}}_{\mathrm{CWM}}+r-1}{2\ast r-1} $$
(1)

where \( {\hat{\uplambda}}_{\mathrm{CWM}} \) is the observed proportion of respondents choosing the first answer option (“both statements are true or none of the statements is true”). An essential advantage of the CWM is its response symmetry: both answer options can and must be chosen by both carriers and noncarriers of the sensitive attribute, depending on their status with respect to the nonsensitive statement B (e.g., their mother’s birth month).

To quantify the objective confidentiality protection of the two answer options in the CWM, conditional probabilities can be derived using Bayes’ theorem (Bayes, 1763) according to the procedure described by Lanke (1976) and Fligner, Policello, and Singh (1977). The conditional probabilities of being identified as a carrier of the sensitive attribute given that one has selected the first (“both statements are true or none of the statements is true”) versus the second answer option [“exactly one statement (irrespective of which one) is true”] are:

$$ {\mathit{\Pr}}_{\mathrm{CWM}}\left(\mathrm{carrier}\ \right|``\mathrm{both}/\mathrm{none}\ \mathrm{true}"\Big)=\frac{{\mathit{\Pr}}_{\mathrm{CWM}}\left(\mathrm{carrier}\cap ``\mathrm{both}/\mathrm{none}\ \mathrm{true}"\right)}{{\mathit{\Pr}}_{\mathrm{CWM}}\left(``\mathrm{both}/\mathrm{none}\ \mathrm{true}"\right)} $$
(2.1)
$$ {\mathit{\Pr}}_{\mathrm{CWM}}\left(\mathrm{carrier}\ \right|``\mathrm{one}\ \mathrm{true}"\Big)=\frac{{\mathit{\Pr}}_{\mathrm{CWM}}\left(\mathrm{carrier}\cap ``\mathrm{one}\ \mathrm{true}"\right)}{{\mathit{\Pr}}_{\mathrm{CWM}}\left(``\mathrm{one}\ \mathrm{true}"\right)} $$
(2.2)

These equations can be reformulated using the parameters for prevalence estimation from Eq. 1:

$$ {\mathit{\Pr}}_{\mathrm{CWM}}\left(\mathrm{carrier}\ \right|``\mathrm{both}/\mathrm{none}\ \mathrm{true}"\Big)=\frac{{\hat{\uppi}}_{\mathrm{CWM}}\ast r}{{\hat{\uplambda}}_{\mathrm{CWM}}} $$
(2.3)
$$ {\mathit{\Pr}}_{\mathrm{CWM}}\left(\mathrm{carrier}\ \right|``\mathrm{one}\ \mathrm{true}"\Big)=\frac{{\hat{\uppi}}_{\mathrm{CWM}}\ast \left(1-r\right)}{\left(1-{\hat{\uplambda}}_{\mathrm{CWM}}\right)} $$
(2.4)

As can be seen when comparing the numerators of Eqs. 2.3 and 2.4, the probability of being identified as a carrier of the sensitive attribute is dependent on both the randomization probability, r, and its complement, 1 − r. As can also be seen from the equations, the conditional probability of being identified as a carrier is lower when choosing the first (“both/none true”) compared with the second answer option (“one true”) when the randomization probability is 0 < r < .5, because r < 1 − r. For .5 < r < 1, this reverses to r > 1 − r; hence, in these cases, choosing the second answer option is associated with a lower risk. Importantly, however, the probability of being identified as a carrier of the sensitive attribute exceeds zero regardless of whether the respondent chooses the first or the second answer option for all cases of 0 < \( {\hat{\uppi}}_{\mathrm{CWM}} \) < 1, 0 < r < 1, and 0 < \( {\hat{\uplambda}}_{\mathrm{CWM}} \) < 1. In practical applications of the CWM, these conditions are usually met because researchers typically ensure that the expected prevalence of the sensitive attribute, the randomization probability, and the proportion of respondents choosing the first answer option are different from 0% and 100%. In such cases, no CWM answer option provides a “safe” alternative respondents can choose to explicitly deny being a carrier of the sensitive attribute.

Even though a “safe” answer option is unavailable, respondents confronted with a CWM question might still be tempted to try to assess the relative risk of either answer option. To succeed, they would however have to (i) correctly estimate the randomization probability (r), and (ii) derive and understand the relationship between the randomization probability and the conditional probabilities from Eqs. 2.3 and 2.4. Soeken and Macready (1982) have already demonstrated that with the exception of extreme randomization probabilities, which eliminate confidentiality, respondents are rather poor at estimating the relationship between the randomization probability and the objective privacy protection afforded by the RRT. In light of this finding, as well as the time-consuming computations that would be necessary, we argue that it is quite unlikely that respondents will successfully assess the relative risk of the answer options. Instead, considering that response symmetry reduces the incentive to provide untruthful answers (Ostapczuk, Moshagen, Zhao, & Musch, 2009a), we propose that the high symmetry of the two CWM response options will lead to a higher proportion of honest responses compared with a direct question affording no confidentiality and offering a safe answer option that eliminates all risk of being associated with an undesirable behavior.

The validity of the CWM is supported by several weak validation studies in which higher, and therefore presumably more valid, prevalence estimates for sensitive attributes were obtained when using the CWM rather than DQ. The attributes investigated in these validation studies included crossing the street on a “No Walk” signal in plain view of children (Hoffmann, Meisters, & Musch, 2019), xenophobia (Hoffmann & Musch, 2016), the use of anabolic steroids among bodybuilders (Nakhaee, Pakravan, & Nakhaee, 2013), distrust in the Trust Game (Thielmann, Heck, & Hilbig, 2016), plagiarism (Jann, Jerke, & Krumpal, 2012), tax evasion (Korndörfer, Krumpal, & Schmukle, 2014; Kundt, Misch, & Nerré, 2017), prejudice against female leaders (Hoffmann & Musch, 2019), and the intention to vote for the German right-wing party Alternative for Germany (Waubert de Puiseau, Hoffmann, & Musch, 2017). Additionally, in a first strong validation study, the CWM provided highly accurate prevalence estimates for the known prevalence of an experimentally induced sensitive attribute, while DQ provided a severe underestimation (Hoffmann, Diedenhofen, Verschuere, & Musch, 2015). The CWM has also been proven to be more comprehensible than other indirect questioning techniques, and to evoke a higher level of trust than conventional direct questions (Hoffmann, Waubert de Puiseau, Schmidt, & Musch, 2017). A recent study provided evidence that the CWM is quite robust even against deliberate faking, as “fake good” instructions impaired the validity of DQ but not of CWM prevalence estimates (Hoffmann et al., 2019). This robustness against deliberate faking is likely attributable to respondents’ inability to identify a “safe” answer in the symmetric CWM.

Some recent studies have reported a problematic tendency for the CWM to produce false positives, that is, some noncarriers of the sensitive attribute were falsely classified as carriers. This can potentially result in an overestimation of the prevalence of sensitive attributes (Höglinger & Diekmann, 2017; Höglinger & Jann, 2018). In Höglinger and Diekmann (2017), the prevalence of two sensitive attributes with a known prevalence near zero was overestimated by the CWM (at 5% and 8%, respectively). Höglinger and Jann (2018) also observed false positives in a CWM survey. However, Meisters, Hoffmann, and Musch (2019) found no evidence for an overestimation of the known prevalence of an experimentally induced sensitive attribute, and suggested that the problem of false positive can be addressed by providing respondents with more comprehensive and detailed instructions and by ensuring that they actually comprehend the procedure.

The triangular model

In the TRM, respondents are also presented with a sensitive statement A with unknown prevalence π (e.g., “I have taken cocaine”) and a nonsensitive statement B with known prevalence r (e.g., “My mother was born in November or December”) to which they must provide a joint response. The response options in the TRM are: “none of the statements is true” versus “at least one of the statements (irrespective of which one) is true”. A maximum likelihood estimate for the prevalence π is given by (Yu et al., 2008):

$$ {\hat{\uppi}}_{\mathrm{TRM}}=1-\frac{{\hat{\uplambda}}_{\mathrm{TRM}}}{1-r} $$
(3)

where \( {\hat{\uplambda}}_{\mathrm{TRM}} \) is the observed proportion of respondents choosing the first answer option (“none of the statements is true”). As in the CWM, carriers of the sensitive attribute can choose the second answer option [“at least one of the statements (irrespective of which one) is true”] without disclosing their true status with respect to the sensitive statement, since this response must also be given by noncarriers of the sensitive attribute who carry the nonsensitive attribute used for randomization (e.g., respondents whose mother was born in November or December). However, in contrast to the CWM, the TRM is an asymmetric model, because the answer option “none of the statements is true” provides a “safe” answer alternative that explicitly precludes being a carrier of the sensitive attribute. Respondents who are eager to distance themselves from the sensitive attribute may therefore likely be tempted to opt for this safe response option even when told otherwise by the randomization instructions.

The asymmetry of the TRM is reflected in the conditional probabilities of being identified as a carrier of the sensitive attribute given the first (“none of the statements is true”) versus the second answer option [“at least one of the statements (irrespective of which one) is true”]:

$$ {\mathit{\Pr}}_{\mathrm{TRM}}\left(\mathrm{carrier}\ \right|``\mathrm{none}\ \mathrm{true}"\Big)=\frac{{\mathit{\Pr}}_{\mathrm{TRM}}\left(\mathrm{carrier}\cap ``\mathrm{none}\ \mathrm{true}"\right)}{{\mathit{\Pr}}_{\mathrm{TRM}}\left(``\mathrm{none}\ \mathrm{true}"\right)} $$
(4.1)
$$ {\mathit{\Pr}}_{\mathrm{TRM}}\left(\mathrm{carrier}\ \right|``\mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{true}"\Big)=\frac{{\mathit{\Pr}}_{\mathrm{TRM}}\left(\mathrm{carrier}\cap ``\mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{true}"\right)}{{\mathit{\Pr}}_{\mathrm{TRM}}\left(``\mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{true}"\right)} $$
(4.2)

The reformulation of Eq. 4.2 using the parameters for prevalence estimation from Eq. 3 is straightforward because in the TRM, the numerator of Eq. 4.1 refers to an impossible event. As per the TRM instructions, no carrier of the sensitive attribute may choose the first answer option (“none of the statements is true”), since for carriers, the sensitive statement A is true by definition. Therefore, according to the TRM, the probability of being a carrier when answering “none of the statements is true” is 0:

$$ {\mathit{\Pr}}_{\mathrm{TRM}}\left(\mathrm{carrier}\ \right|``\mathrm{none}\ \mathrm{true}"\Big)=\frac{0}{{\hat{\uplambda}}_{\mathrm{TRM}}}=0 $$
(4.3)
$$ {\mathit{\Pr}}_{\mathrm{TRM}}\left(\mathrm{carrier}\ \right|``\mathrm{at}\ \mathrm{least}\ \mathrm{one}\ \mathrm{true}"\Big)=\frac{{\hat{\uppi}}_{\mathrm{TRM}}}{\left(1-{\hat{\uplambda}}_{\mathrm{TRM}}\right)} $$
(4.4)

As Eq. 4.3 shows, respondents can be sure that the first answer option (“none of the statements is true”) is associated with a zero probability of being identified as a carrier of the sensitive attribute, irrespective of the randomization probability. Choosing this “safe” answer option is likely to attract respondents who are keen to make a positive or avoid a negative impression; this in turn is likely to result in underestimates due to dishonest responses (cf. Jerke & Krumpal, 2013).

Research on the validity of the TRM is relatively scarce. Two experimental validation studies have compared a TRM and a DQ control condition (Erdmann, 2019; Jerke & Krumpal, 2013). One study found prevalence estimates for plagiarism obtained via the TRM to descriptively exceed those obtained via DQ. However, this difference was not statistically significant (Jerke & Krumpal, 2013). In a second study, TRM estimates were comparable to and not significantly different from DQ estimates for three different sensitive attributes (Erdmann, 2019). Self-protective answering behavior facilitated by the asymmetric nature of the TRM is a possible explanation for these findings.

Comparison of the CWM and the TRM

In terms of theoretical properties, a potential advantage of the CWM over the TRM is that only the CWM offers response symmetry. If respondents confronted with CWM questions are therefore less tempted to provide evasive answers, or are less successful in identifying a self-protective choice, this should result in prevalence estimates with higher validity. On the other hand, the TRM is usually more efficient than the CWM (with some exceptions for high values of r; cf. Theorem 3 in Yu et al., 2008). Accordingly, the TRM would be preferable to the CWM if both models were equally valid. However, we argue that the validity of the prevalence estimates is even more important than the efficiency of parameter estimation.

The only existing evidence regarding this question is based on a comparison across studies. The prevalence of plagiarism has been assessed with both the TRM (Jerke & Krumpal, 2013) and the CWM (Jann et al., 2012). Comparing the two results reveals that the CWM estimate significantly exceeded the DQ estimate, whereas the TRM and DQ estimates were not significantly different. Moreover, the CWM estimate (Jann et al., 2012) was descriptively higher than the TRM estimate (Jerke & Krumpal, 2013). This pattern of results tentatively suggests that the symmetric CWM might be superior to the asymmetric TRM in discouraging dishonest responses and thus in obtaining more valid estimates (cf. Jann et al., 2012; Jerke & Krumpal, 2013). However, a more conclusive comparison would involve directly comparing the CWM and the TRM in a single sample using an experimental design, and thus allowing alternative explanations for the observed differences in prevalence estimates to be ruled out. Taking up this challenge, the current study extends the existing body of research by providing the first experimental comparison of the validity of the CWM and the TRM and contrasting the performance of the two models to a DQ control condition. Xenophobia and opposition to further refugee admissions were chosen as sensitive attributes for the purpose of this validation study.

Xenophobia and opposition to reception of refugees in Germany

Xenophobia, a fear of—or negative attitude towards—foreigners, is quite prevalent in Germany (Heitmeyer, 2012; Krumpal, 2012; Wagner & van Dick, 2001). Since the “refugee crisis” of 2015, attitudes towards foreigners in general and refugees in particular have become more negative among the German population (Bertelsmann Stiftung, 2017). A representative survey revealed that 54% of the German population opposes the further intake of refugees, whereas most Germans still perceive a “welcoming culture” in Germany (Bertelsmann Stiftung, 2017). This discrepancy is likely to lead to social pressure to deny xenophobic attitudes and endorse further refugee admissions (Zick, Hövermann, & Krause, 2012). Direct self-reports on xenophobic attitudes and reluctance to grant asylum to refugees have indeed been found to be distorted by social desirability bias, leading to underestimates of their prevalence (D'Ancona, 2013; Krumpal, 2012; Moshagen & Musch, 2012; Ostapczuk, Musch, & Moshagen, 2009b). Indirect questioning techniques have been shown to lead to higher, and thus presumably more valid, estimates of both the prevalence of xenophobic attitudes (Hoffmann & Musch, 2016; Krumpal, 2012; Ostapczuk, Musch, et al., 2009b) and opposition to further refugee admissions (Moshagen & Musch, 2012).

We expected more xenophobic attitudes and greater opposition to further refugee admissions among respondents with a right-oriented versus a left-oriented political orientation, as a right-oriented political orientation has been shown to be positively associated with xenophobic attitudes (cf. Alba & Johnson, 2000; Zick, Küpper, & Hövermann, 2011). However, we also expected that the perceived sensitivity of these attitudes might vary as a function of political orientation. Right-oriented respondents may be more willing to openly express their disapproval of foreigners and refugees, whereas left-oriented respondents might feel hesitant to openly admit to a xenophobic attitude, and therefore choose to respond in a socially desirable rather than truthful manner. Therefore, the differences in prevalence estimates obtained via direct versus indirect questioning were used not only to assess the validity of the competing non-randomized response techniques, but also to investigate the influence of political orientation on question sensitivity.

The current study

This study is the first to experimentally compare the validity of two NRRTs (symmetric CWM and asymmetric TRM) and contrast their performance with that of conventional direct questioning (DQ). We expected that both NRRTs would outperform DQ in terms of delivering more valid prevalence estimates for two socially undesirable attributes. We also expected a beneficial effect of response symmetry (cf. Ostapczuk, Moshagen, et al., 2009a), and therefore predicted that the symmetric CWM would outperform the asymmetric TRM with respect to successful control of social desirability bias. Furthermore, we investigated a potential moderating influence of self-ascribed political orientation (from “left” to “right”) on prevalence estimates for xenophobia and opposition to the further intake of refugees. Finally, a nonsensitive control attribute with known prevalence was included to test for method-specific biases in the form of a general tendency towards over- or underestimation. If the CWM and the TRM allow us to obtain prevalence estimates for the control attribute that correspond to DQ estimates and to the known prevalence, this provides strong evidence for the validity of these indirect questioning techniques.

Method

Sample

The initial sample consisted of 1544 students from the University of Düsseldorf recruited using a non-probability sampling plan (a convenience sample consisting of the attendants of large introductory lectures from various faculties and courses of study). Due to nonresponse to at least one of the experimental, demographic, or political orientation questions, 162 respondents (10.49%) were excluded from further analyses. Dropout rates were unaffected by experimental condition [condition 1: 8.27%, condition 2: 12.65%, condition 3: 9.77%, condition 4: 12.17%, condition 5: 10.00%, condition 6: 10.08%; Χ2(5) = 3.64, p = .603, Cramer-V = .05], age [final sample: M = 21.40, dropouts: M = 21.78; t(1515) = 0.71, p = .478, d = 0.06], and political orientation [final sample: M = −0.88, dropouts: M = −0.95; t(1445) = −0.32, p = .748, d = 0.04]. Therefore, neither the assignment to a specific experimental condition nor the age or political orientation influenced whether respondents provided complete data and were thus included in the final sample. The final sample consisted of 1382 respondents (60.1% female) with a mean age of 21.40 years (SD = 5.66). Respondents were contacted on the university campus prior to the start of lectures and asked to complete a short one-page survey. This survey study was carried out in accordance with the revised Declaration of Helsinki (World Medical Association, 2013) and the ethical guidelines of the German Society for Psychology (Berufsverband Deutscher Psychologinnen und Psychologen & Deutsche Gesellschaft für Psychologie, 2016). All respondents were informed of the purpose of the study and the strict anonymization of all data prior to participation, and consented to participate on a voluntary basis without receiving any financial compensation.

Survey design

The one-page paper-pencil questionnaire contained three experimental questions, one question concerning respondents’ self-reported political orientation, and demographic questions asking for the respondents’ age and gender. The first two experimental questions asked about the two sensitive attributes (xenophobia and opposition to further refugee admissions). The third question referred to a nonsensitive control attribute and asked about the first letter of the respondents’ surname. The prevalence of this attribute is known to be πcontrol = 22% in Germany according to official statistics (Reinders, 1996), and was also confirmed by the university’s student registry. All respondents were presented with all three experimental questions. The order of the sensitive attributes the experimental questions referred to was fixed (question 1: xenophobia, question 2: opposition to refugee admission, question 3: first letter of surname K, L, M). The order of the questioning techniques (CWM, TRM, DQ) that were assigned to each experimental question was randomized. This resulted in six different experimental conditions to which respondents were randomly assigned. An overview of questions, questioning techniques, and number of respondents by experimental condition is given in Table 1. This design allowed us to manipulate and analyze the questioning technique as a between-subjects variable for each sensitive attribute. After data collection, responses were pooled across experimental conditions to obtain answer frequencies for all three questioning techniques and for each experimental question. Question 1 (xenophobia) was answered in CWM format by 233 + 221 = 454 respondents (32.85%), in TRM format by 231 + 231 = 462 respondents (33.43%), and in DQ format by 234 + 232 = 466 respondents (33.72%). Question 2 (opposition to refugee admission) was answered in CWM format by 231 + 234 = 465 respondents (33.65%), in TRM format by 233 + 232 = 465 respondents (33.65%), and in DQ format by 221 + 231 = 452 respondents (32.71%). Question 3 (first letter of surname: K, L, M) was answered in CWM format by 231 + 232 = 463 respondents (33.50%), in TRM format by 221 + 234 = 455 respondents (32.92%), and in DQ format by 233 + 231 = 464 respondents (33.57%).

Table 1 Questions, questioning techniques, and number of respondents by experimental condition

Sensitive statements

The sensitive statement used to measure xenophobia read as follows: “I would mind if my 20-year-old daughter had a relationship with a Turkish man.” It was adapted from Bogardus’ social distance scale (Bogardus, 1933) and had previously been used by Hoffmann and Musch (2016) in this form and by Jimenez (1999); Ostapczuk, Musch, et al. (2009b); and Silbermann and Hüsers (1995) with respect to other minority groups. The sensitive statement regarding opposition to further intake of refugees read as follows: “Germany has already received more than enough refugees.”

CWM format

In the CWM format, two statements were presented simultaneously: one of the two sensitive statements and a nonsensitive control statement with known prevalence r (father’s month of birth: r =.158 based on official birth statistics; Pötzsch, 2012).

  • Statement A: “I would mind if my 20-year-old daughter had a relationship with a Turkish man.”

  • Statement B: “My father was born in November or December.”

Respondents could choose from the two answer options “both statements are true or both statements are false” and “exactly one statement is true (irrespective of which one)”. The statements regarding the other two topics were adapted accordingly.

TRM format

In the TRM format, two statements were presented simultaneously: the remaining sensitive statement and a nonsensitive control statement with known prevalence r (mother’s month of birth: r =.158 based on official birth statistics; Pötzsch, 2012)

  • Statement A: “Germany has already received more than enough refugees.”

  • Statement B: “My mother was born in November or December.”

Respondents could choose from the two answer options “both statements are false” and “at least one statement is true (irrespective of which one)”. The statements regarding the other two topics were adapted accordingly.

DQ format

In the DQ format, the nonsensitive control statement with known prevalence (π = 22%, Reinders, 1996) read as follows: “My surname begins with one of the following letters: K, L, M.” Respondents then had to indicate whether the statement was “true” or “false”. The statements regarding the two sensitive attributes were presented in the same way.

Political orientation

To assess respondents’ political orientation, we presented the question: “Political beliefs are often labeled as rather ‘left’ or rather ‘right’. Where on that scale would you place yourself?” Responses were recorded on an 11-point Likert scale from “left” (−5) to “right” (+5).

Statistical analyses

To obtain and compare prevalence estimates for the three attributes under investigation, we formulated multinomial processing tree models for all three questioning techniques (Batchelder, 1998; Batchelder & Riefer, 1999), following the procedure detailed in works such as Moshagen, Hilbig, and Musch (2011); Moshagen, Musch, and Erdfelder (2012); and Ostapczuk, Musch, and Moshagen (2011). The parameter π referred to the prevalence of the attribute to be estimated. In the CWM and the TRM question formats, the parameter r referred to the known prevalence of the nonsensitive attributes used for randomization, that is, the respondents’ mother or father being born in November or December. Official statistics on more than 2.3 million births in Germany provided by the Federal Statistical Office show that of all births over the course of a year, about 15.8% take place in November or December (Pötzsch, 2012, p. 17). This number was therefore considered to be the best estimate of the prevalence of the nonsensitive attributes. Consequently, the parameter r was set constant to .158 in both the CWM and TRM formats. Processing tree diagrams for the CWM, TRM, and DQ formats are shown in Fig. 1.

Fig. 1
figure 1

Tree diagram for direct questions and questions posed according to the crosswise model and the triangular model. The parameter π represents the unknown prevalence of the sensitive attribute, and the parameter r represents the known randomization probability

Based on the empirically observed answer frequencies, prevalence estimates were obtained using the expectation–maximization algorithm (Dempster, Laird, & Rubin, 1977; Hu & Batchelder, 1994) as implemented in the multiTree software (Moshagen, 2010). Model fit was assessed via the asymptotically Χ2-distributed log-likelihood statistic G2 as detailed in, for example, Ostapczuk, Musch, et al. (2009b), Moshagen et al. (2011), and Hoffmann and Musch (2016). The multinomial processing tree models for all three questioning techniques per attribute were saturated with df = 0 and G2 = 0, since the number of independent answer categories was just sufficient to estimate all parameters in the three questioning technique conditions. For each of the attributes under investigation (xenophobia, opposition to further refugee admission, first letter of surname: K, L, M), three prevalence estimates were obtained (CWM, TRM, and DQ) based on the response frequencies from three independent, nonoverlapping groups (cf. Table 1). For example, one independent proportion of respondents answering “both/none true” to the xenophobia question in CWM format, another independent proportion of respondents answering “none true” to the xenophobia question in TRM format, and a third independent proportion of respondents answering “true” to the xenophobia question in DQ format allowed us to obtain completely independent estimates for the prevalence of xenophobia for each of these questioning technique formats (πCWM, πTRM, and πDQ). To compare these prevalence estimates, we assessed the difference in model fit (ΔG2) between an unrestricted baseline model and a restricted alternative model in which the respective parameters were equalized or set to be constant (e.g., πCWM = πDQ or πCWM = .22).

To analyze the influence of political orientation, we split the sample into two independent, nonoverlapping groups of left- versus right-oriented respondents via their answers to the Likert-scaled item on self-ascribed political orientation (from “left” = −5 to “right” = +5). For each of these groups, we established separate multinomial processing trees to obtain and compare prevalence estimates for the two sensitive attributes (xenophobia and opposition to a further admission of refugees) following the procedure detailed above. Both political orientation (left, right) and questioning technique format (CWM, TRM, DQ) varied between subjects. This allowed us to conduct pairwise comparisons of prevalence estimates between political orientation groups within a specific questioning technique format (e.g., πCWM, left versus πCWM, right), and between questioning technique formats within a specific political orientation group (e.g., πCWM, left versus πTRM, left) by assessing the difference in model fit (ΔG2) between an unrestricted baseline model and a restricted alternative model in which the respective parameters were equalized or set to be constant (e.g., πCWM, left = πCWM, right or πCWM, left = πTRM, left).

To assess interactions between questioning technique and political orientation on the prevalence estimates for the two sensitive attributes, we introduced parametric order constraints by re-parameterizing the original multinomial models established for estimating the prevalence of xenophobia and the prevalence of opposition to further refugee admission, respectively (Hoffmann & Musch, 2019; Knapp & Batchelder, 2004). Within each of these models, we replaced the parameter used for estimating the prevalence among left-oriented respondents in the DQ condition (πDQ, left) with the corresponding parameter for right-oriented respondents (πDQ, right), multiplied by a shrinkage factor (αDQ, left:right); the CWM and TRM conditions were re-parameterized in an analogous way (πCWM, left = πCWM, right * αCWM, left:right; πTRM, left = πTRM, right * αTRM, left:right). The shrinkage factors αDQ, left:right, αCWM, left:right, and αTRM, left:right thus represent the ratio of the prevalence for politically left-oriented to the prevalence for politically right-oriented respondents in the DQ, CWM, and TRM conditions, respectively. For example, for the question on xenophobia, a shrinkage factor of αDQ, left:right = 33% means that in the DQ condition, respondents who label themselves as politically “left” are only .33 times as likely to admit to the xenophobic attitude as respondents who label themselves as politically “right”. To test for interactions between questioning technique and political orientation, the shrinkage factors were compared using the ΔG2 statistic, as described above (e.g., αDQ, left:right = αCWM, left:right). In this analysis, significant changes in model fit indicate significant interactions (Hoffmann & Musch, 2019). MultiTree model equations and empirically observed answer frequencies are available in appendices A and B in the electronic supplementary material.

Results

Table 2 contains the prevalence estimates and test statistics for parameter comparisons for both the two sensitive attributes and the nonsensitive control attribute obtained via the different questioning techniques.

Table 2 Parameter estimates (standard errors in parentheses) and parameter comparisons for the sensitive and nonsensitive attributes by questioning technique

Xenophobia (Sensitive Attribute 1)

Prevalence estimates for xenophobic responses were significantly higher when assessed via the CWM (31.65%) than when assessed via the TRM (20.05%) or DQ (15.45%). The TRM resulted in descriptively but not significantly higher prevalence estimates for xenophobia than DQ. This finding suggests that the prevalence of xenophobia is presumably underestimated in DQ and TRM, while the CWM seems to successfully control for socially desirable responding.

Opposition to refugee intake (Sensitive Attribute 2)

The prevalence estimates for opposition to further intake of refugees obtained via the CWM (43.56%) were descriptively but not significantly higher than those obtained via the TRM (37.43%) or DQ (36.73%). Prevalence estimates obtained via TRM and DQ also did not differ significantly.

Nonsensitive control attribute with known prevalence: First letter of surname

For the first letter of the respondents’ surnames as a nonsensitive control attribute, all questioning techniques obtained similar prevalence estimates that did not differ from the known prevalence of 22% (CWM: 23.32%, TRM: 22.22%, DQ: 24.35%). These highly accurate prevalence estimates suggest that none of the questioning techniques under investigation was subject to systematic bias in the form of a general tendency towards over- or underestimation.

Exploratory moderator analyses

A median split of the sample by self-reported political orientation revealed a moderating influence on prevalence estimates for xenophobia and opposition to further refugee admissions (see Table 3).

Table 3 Parameter estimates (standard errors in parentheses) and parameter comparisons for the sensitive attributes by questioning technique and political orientation

As expected, the prevalence estimates for both sensitive attributes were higher among politically right-oriented than among politically left-oriented respondents for all questioning techniques. For xenophobia, the prevalence estimates obtained via TRM and DQ were significantly higher for right-oriented than for left-oriented respondents; in the CWM, however, prevalence estimates varied only slightly as a function of political orientation. For opposition to refugee intake, all three questioning techniques provided significantly higher prevalence estimates among politically right-oriented than politically left-oriented respondents. In the CWM condition, however, this difference was descriptively smaller than in the two other conditions. Interaction analyses revealed a significant interaction between questioning technique and political orientation for both sensitive attributes. Shrinkage factors indicated that the difference in prevalence estimates between politically right-oriented and politically left-oriented respondents was significantly higher in the DQ than in the CWM condition (see Fig. 2).

Fig. 2
figure 2

Prevalence estimates for xenophobia (left panel) and opposition to refugee admissions in Germany (right panel) by political orientation (median split). CWM = crosswise model, TRM = triangular model, DQ = direct questioning

In the subsample of left-oriented respondents, prevalence estimates in the CWM condition were higher and thus presumably more valid than prevalence estimates in the DQ condition. In the subsample of right-oriented respondents, however, the CWM estimates only slightly and insignificantly exceeded those obtained via DQ. Thus, social desirability bias seems to have exerted a substantially stronger influence on politically left-oriented than politically right-oriented respondents. This pattern of results suggests higher perceived sensitivity of the questions measuring xenophobia and opposition to refugee admissions among politically left-oriented compared with politically right-oriented respondents.

Discussion

In this study, we conducted the first experimental comparison of the validity of a symmetric (crosswise model; CWM) and an asymmetric NRRT (triangular model; TRM), and contrasted the performance of these two models with a conventional direct questioning approach (DQ). To this end, we assessed respondents’ attitudes towards xenophobia and opposition to the further intake of refugees in Germany with two sensitive statements of unknown prevalence. Additionally, following a “strong” validation approach, we included a nonsensitive control statement (first letter of respondents’ surnames) with known prevalence to test for potential method-specific biases leading to a general tendency towards over- or underestimation.

As expected, both NRRTs yielded higher estimates for xenophobia than DQ. However, only the CWM provided estimates that were significantly higher than the estimates obtained via DQ, thus sufficing the “more is better” criterion. Moreover, the CWM estimates were significantly higher than the TRM estimates, indicating the superiority of the symmetric CWM over the asymmetric TRM. For opposition to further refugee admissions, the CWM yielded descriptively higher prevalence estimates than TRM and DQ; however, none of the pairwise comparisons of parameter estimates was significant. All three questioning techniques accurately recovered the known prevalence of the nonsensitive control attribute. Thus, both the CWM and the TRM met the criteria of a successful “strong” validation. As expected, exploratory analyses of the influence of political orientation revealed that prevalence estimates for the sensitive attributes were higher among right-oriented respondents than among left-oriented respondents. Interestingly, interaction analyses showed that this discrepancy was less pronounced when prevalence estimates were obtained via the CWM than via the TRM or DQ.

Our results indicate that while both NRRTs outperformed DQ, only the CWM satisfied the “more is better” criterion for xenophobia and was thus better able to control for social desirability with respect to this question than DQ. The estimates obtained via the TRM were descriptively but not significantly higher than estimates obtained via DQ. None of the questioning techniques under investigation exhibited a method-specific tendency towards over- or underestimation; instead, the prevalence estimates for the nonsensitive control attribute with known prevalence were highly accurate for all questioning techniques. In light of these findings, we recommend that the CWM be used to control for social desirability bias and maximize the validity of prevalence estimates in surveys of sensitive personal attributes.

We also found that for xenophobia, the symmetric CWM outperformed the asymmetric TRM in terms of validity. Response symmetry, or the absence of an objectively “safe” answer option, has been shown to increase the confidentiality of individual answers and thus also compliance with RRT instructions (Ostapczuk, Moshagen, et al., 2009a). Response symmetry seems to prevent respondents from faking their answers (cf. Hoffmann et al., 2019), either because they understand that their privacy is perfectly protected and they cannot make a negative impression or because they are simply unable to identify a self-protective response. In contrast, the asymmetric TRM offers a “safe” answer option and therefore seems to be more prone to deliberate faking than the CWM. The TRM offers the theoretical advantage that, under many conditions, it provides the more efficient estimates (Yu et al., 2008), as also confirmed by smaller standard errors for TRM than for CWM estimates in the current study. However, the CWM was found to be clearly preferable to the TRM in terms of the superordinate criterion of measurement validity. Interestingly, our results are in line with the results of two previous studies examining plagiarism, one via the CWM, the other via the TRM (Jann et al., 2012; Jerke & Krumpal, 2013). The study applying the CWM obtained descriptively higher prevalence estimates than the study applying the TRM. However, this observation is based on a comparison across studies and samples, and therefore does not provide unequivocal evidence for the superiority of one of the two models. The present study conducted a first experimental comparison of the two models, and found direct evidence for the assumption first formulated by Jann et al. (2012) and Jerke and Krumpal (2013) that the CWM outperforms the TRM in terms of validity.

In an exploratory analysis, we found a moderating influence of self-reported political orientation on prevalence estimates. For left-oriented respondents, the CWM provided substantially higher prevalence estimates than DQ, indicating that within this subgroup, the prevalence estimates obtained via DQ were strongly distorted by social desirability bias. For right-oriented respondents, the CWM obtained only slightly higher prevalence estimates than DQ, indicating that within this subgroup, social desirability bias had a somewhat weaker impact, as the topics under investigation were likely perceived as less sensitive by respondents less reluctant to openly express negative attitudes towards foreigners. Hence, the CWM proved particularly effective among left-oriented respondents, affirming the assumption that the usefulness of indirect questioning techniques increases with topic sensitivity (cf. Lensvelt-Mulders et al., 2005). Consequently, we particularly recommend that indirect questioning techniques such as the CWM be applied when investigating issues that are highly sensitive for a particular group of respondents, as a strong social desirability influence might result in strongly biased results for such groups if only direct self-reports are used.

Limitations and future research directions

It is necessary to acknowledge that the student population investigated in the present study is not representative of the population at large. Therefore, our pattern of findings and the generalizability of the prevalence estimates obtained in the present study are limited to the sample we investigated, and would need to be replicated in other populations. Estimates for the prevalence of xenophobic attitudes might turn out to be even higher in a more representative sample also including lower-educated respondents, as lower education has repeatedly been shown to be associated with a higher incidence of xenophobic attitudes (D'Ancona, 2013; Hjerm, 2001; Ostapczuk, Musch, et al., 2009b; Zick et al., 2011). Student-only samples are also presumably more homogeneous, thereby increasing the statistical power to detect differences between questioning techniques. Lower-educated respondents generally exhibit greater problems understanding indirect questioning techniques (Hoffmann et al., 2017) and therefore tend to produce more false positives (Meisters et al., 2019). Developing better instructions that are easily comprehensible even for lower-educated respondents is thus of considerable importance for future research using randomized and non-randomized response techniques in more heterogeneous samples.

The results of the current study revealed no method-specific tendency for over- or underestimation. This result is in line with several other studies that also found no deviation between CWM estimates and the known prevalence of a nonsensitive control attribute (Hoffmann & Musch, 2016) and a sensitive attribute (Hoffmann et al., 2015). To check for potential bias in the form of a general preference for one of the two answer options, future studies should try to replicate the present results using the extended crosswise model (ECWM; Heck, Hoffmann, & Moshagen, 2018), a recent modification of the CWM that allows for detecting instruction nonadherence without negatively affecting statistical efficiency.

As a final remark, it should be noted that the difference between the CWM and DQ condition was larger for self-reported ethnic discrimination than for self-reported opposition to further refugee admissions. A potential explanation for this finding is that opposition to further refugee admissions was perceived by respondents as less sensitive than the expression of xenophobic attitudes. This reasoning is supported by the higher percentage of respondents admitting that they opposed further intake of refugees (36.73%) compared with the much lower percentage of respondents admitting to xenophobia (15.45%) in the DQ condition. The lower perceived sensitivity of opposing further refugee intake might potentially be fueled by the increasing popularity of right-wing populist parties such as the Alternative for Germany, which cites social and economic concerns as a reason for limiting further refugee admissions.

Conclusions

The present research showed that non-randomized response techniques provide more valid prevalence estimates for socially undesirable attributes than conventional direct questions (DQ). The crosswise model (CWM) in particular was able to successfully control for the influence of social desirability bias, and outperformed the triangular model (TRM), presumably due to the favorable influence of the response symmetry found in the CWM but not the TRM. We also found that the sensitivity of two questions was contingent on respondents’ political orientation, and that the CWM provided the most valid estimates for respondents for whom these questions were most sensitive. Based on these results, we recommend the use of the CWM over the TRM or DQ for topics that are highly sensitive in a survey’s target population.

Open practice statement

All data and equation files necessary to reproduce the parameter estimates reported in this manuscript are provided in appendices A and B in the electronic supplementary material.

Author note

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Grant number 393108549.