In this article, a new method of reducing socially desirable responding in Internet surveys is proposed. Self-reports on sensitive topics are susceptible to misreporting, due, partly, to social desirability (Tourangeau & Yan, 2007). There are several ways in which researchers have attempted to reduce socially desirable responding (see King & Bruner, 2000; Meier, 1994; Nederhof, 1985; Paulhus, 1991; Tourangeau & Yan, 2007, for overviews of methods). These methods vary in complexity and applicability, depending on the type of questions or questionnaires and the purpose of the research. Due to the lack of control over the questioning situation in Internet surveys (de Leeuw & Hox, 2008) and to decreased attentiveness to the task (Lozar Manfreda & Vehovar, 2008), the more complex methods may be difficult to implement. Currently, no method can be recommended for reducing socially desirable responding in Internet questionnaires. It would thus be beneficial to find a practical and simple way of reducing socially desirable responding in Internet surveys, which could easily be adapted to any type of questioning—that is, to single items or multi-item scales. The method proposed here is based on simply using questions on honest responding to increase participants’ processing of the request for honest answers and putting subsequent questions in context with the questions on honest responding in order to increase the honesty of participants’ responses.

Socially desirable responding and Internet questionnaires

Self-reports are used in a wide range of fields for diverse purposes (Schwarz, 1999), and increasingly, Internet surveys are used to obtain self-reported data (e.g., Reips, 2012), especially when asking questions on personal or sensitive topics (Mohorko, de Leeuw, & Hox, 2013). However, numerous studies have shown that misreporting compromises the accuracy of self-reported data (see, e.g., Huang, Curran, Keeney, Poposki, & DeShon, 2012; Tourangeau & Yan, 2007), and research on the effects of social desirability indicates that a substantial amount of questionnaire data is distorted by socially desirable responding (e.g., Bäckström, 2007; Bäckström, Björklund, & Larsson, 2009; Barrick & Mount, 1996; Hirsh & Peterson, 2008). Tourangeau and Yan (2007) reviewed research on sensitive questions and found that inaccurate responding was quite common and that such reporting tends to be a motivated process in which respondents edit their answers before reporting, in an effort to avoid embarrassment. Answers to sensitive questions can thus shift toward the more socially desirable response options at the response selection stage of the answering process.

Computerized administration of questions seems to lessen the effect of social desirability on the disclosure of undesirable behavior (Gnambs & Kaspar, 2015) and to increase self-disclosure in general (Joinson & Paine, 2007). However, although socially desirable responding seems less prevalent in Internet research, it cannot be assumed that Internet administration of questions eliminates socially desirable responding. This is the case for a number of reasons. First, the increased use of panels for online research suggests that participants will have reduced true anonymity, due to the need to process payments and repeated contact via email. Second, although population-level privacy concerns have yet to translate into substantial behavior change (e.g., Acquisti, Brandimarte, & Loewenstein, 2015), there is increasing evidence that people are sharing less online (e.g., date of birth on social network sites; Stutzman, Gross, & Acquisti, 2013) and that the risk of disclosure via online social networks exerts a “chilling effect” on socially undesirable behaviors in offline life (Marder, Joinson, Shankar, & Houghton, 2016). In laboratory experiments, priming privacy increases the use of “I prefer not to say” as a response option to sensitive questions (Joinson & Paine, 2007), suggesting that the assumption that Internet-administered questionnaires will always benefit from reduced socially desirable responding is dependent on people’s expectations of, and concerns for, privacy. It is therefore important to continue to research methods for reducing socially desirable responding in Internet-administered questionnaires.

Several methods of reducing socially desirable respondingFootnote 1 in self-reports have been proposed, such as the randomized response technique and the bogus pipeline (see King & Bruner, 2000; Meier, 1994; Nederhof, 1985; Paulhus, 1991; Tourangeau & Yan, 2007, for overviews of methods). No consensus about the best strategy to reduce or eliminate the effects of socially desirable responding has been reached, and many of the previously developed methods are difficult to implement in Internet surveys (since some are restricted to certain types of questions, question formats, or single-item measures, or could raise ethical concerns).

A relatively recent attempt to reduce socially desirable responding that can be used in Internet research is the implicit goal-priming approach developed by Rasinski, Visser, Zagatsky, and Rickett (2005). The idea behind this method is “that the goal of providing honest, accurate answers can be activated implicitly, improving data quality” (Rasinski et al., 2005, p. 322). Rasinski et al. presented participants with a task on “word meanings” (as it was introduced to the participants). Each participant was presented with six such tasks, but for the ones receiving the goal-priming manipulation, four of the target words were intended to prime honesty. In line with Rasinski et al.’s hypotheses, the participants who received the goal priming reported more undesirable behavior than did those who did not. However, researchers have been unable to reproduce the goal-priming effect (Dalal & Hakel, 2016; Pashler, Rohrer, & Harris, 2013), casting doubt on the usefulness of this technique.

The most commonly used method to reduce socially desirable responding is instructing respondents to give honest answers, which is an explicit technique that can easily be implemented with any type of target items and/or scales in any format. There is, however, not much evidence that such instructions increase respondents’ honesty (Meier, 1994). One reason may be that this method (often referred to as “standard instructions”) is usually seen as a baseline (instructions given to the control group) to compare other methods against (e.g., “fake good” instructions), and not as a manipulation in itself (see, e.g., Douglas, Otto, & Borum, 2003). Another reason could be that respondents might not pay much attention to the instructions. This could be especially true in Internet surveys, during which no interviewer is present.

A practical method for reducing socially desirable responding in Internet surveys would need to be simple to implement and not restricted to certain types of questions, question formats, or single-item measures, nor should it raise ethical questions. The two currently existing methods that would be practical in this sense (honesty instructions and implicit goal priming) lack empirical support. Both methods are essentially based on priming respondents to think about honesty, although honesty instructions are meant to explicitly prime respondents, whereas goal priming is an implicit technique. The two methods also differ in the presentation of the honesty message. The honesty message in instructions is usually embedded in other text and does not require any kind of response from the participant. In goal priming, on other hand, the message is conveyed through a special task that requires a response. In simplified terms, the honesty message in instructions is a direct request presented subtly, whereas the honesty message in goal priming is presented saliently, but the message is indirect. When messages are implicit, it can be assumed that some unknown portion of the sample will not make the association between the message and the following task. This should not be a problem if the message is explicit. However, if the explicit message is presented subtly, it may go unnoticed. Presentation of an explicit message is therefore important, as will be discussed in the following section.

Honesty instructions and the processing of messages

Despite the lack of evidence to support the use of honesty messages, many questionnaires are preceded with some instructions encouraging respondents to respond honestly. In interviewer-administered surveys, the instructions are read to each participant, ensuring that all participants receive the full message (de Leeuw, 2008). In Internet-administrated surveys, however, respondents are expected to read the instructions. This could be seen as beneficial, because written messages give the respondent a chance to process the message at his or her own speed, as opposed to audio messages, which are read at the chosen speed of the interviewer; thus, written messages give the respondent a greater opportunity to process the content of the message than do audio messages (Petty & Cacioppo, 1981). The greater the processing of a message, the likelier it is that a person will remember that message (Petty & Cacioppo, 1986), which is an obvious prerequisite for following it.

However, the difference between interviewer-administrated instructions and Internet instructions is not just whether the message is presented orally or in written form. The presence of an interviewer ensures that all respondents receive the instructions, whereas there is no such guarantee in Internet surveys (de Leeuw, 2008). Because of the lack of environmental control in Internet surveys, the Internet survey participant can skip straight to the survey questions without ever reading the instructions. Clearly, unread instructions will have no effect on subsequent responding, and thus the honesty message will be lost on the proportion of participants who skip the instructions. The extent to which Internet participants do this, however, is uncertain and needs to be tested, so that the proportion of participants who ignore the honesty message altogether can be estimated.

One way to overcome this problem in Internet surveys is to move the honesty message to the questioning phase of the survey—that is, to pose the honesty message in the form of questions. This would not only make those who skip the instructions read the honesty message, but also increase the processing of that message. Responding to a statement posed as a question requires more processing of its content than does simply reading the statement, because the participant must form a response and map that response to the given response categories (Tourangeau & Rasinski, 1988). The more information is processed, the more likely it is to be remembered and therefore applied (Petty & Cacioppo, 1986), and thus, questions about honest responding should be more effective than instructions. Furthermore, thinking about the honesty of one’s responses puts the subsequent questions in context with the honesty questions.

Context effectsFootnote 2

The context in which a question is asked can affect how the question is answered (for more on question context effects, see Reips, 2002; Schuman, Presser, & Ludwig, 1981; Schwarz, 1999; Tourangeau & Rasinski, 1988; Tourangeau, Rips, & Rasinski, 2000). Context can be understood in a broad sense and can refer to diverse aspects of the questioning situation, but more often it is studied in relation to question order, where the content of a previously answered question creates a context in which a subsequent question is answered (Tourangeau et al., 2000). Context effects can take many forms, but with regard to the context of honesty messages, the assumed effect would be a directional context effect. What is described as a directional context effect is when a preceding question produces a uniform change in the responses to succeeding questions, altering the overall mean of the responses (Tourangeau & Rasinski, 1988; Tourangeau et al., 2000).

The purpose of honesty messages is to get all participants to respond more honestly—that is, to shift responses toward more honest responses, and thus to produce the same change seen with directional context effects. The same applies to honesty messages posed as questions. It can be assumed that most participants respond honestly to begin with, and therefore most respondents will truthfully say that they respond honestly. However, the target group, those who adjust their answers in a socially desirable manner, will also report that they respond honestly, because it is generally seen as undesirable to be dishonest. For this reason, little variability can be expected in response to honesty questions—the change is expected to occur in the subsequent target questions. For such a change to occur, there must, however, be variation in the honesty of responses to the target questions. Sensitive questions are susceptible to dishonest answers due to participants’ unwillingness to give undesirable information (see Tourangeau & Yan, 2007), and thus are well-suited to test whether honesty messages posed as questions affect subsequent questions.

If an honesty message is posed in the form of questions that precede sensitive questions, this may put the sensitive questions in context with the questions on honest responding. Context can trigger the application of a norm, in such a way that the respondent is guided by this norm when forming a response (Tourangeau & Rasinski, 1988). The heightened attention to the honesty message, provided by the context items, may thus trigger the norm of honesty, which then becomes a standard for responses, and therefore honesty is used as a guideline in the response process. As honesty in responses increases, social desirability should be reduced. Socially desirable responding is presumed to be caused by editing of responses during the response selection stage of the answering process, just before reporting (Touragneau et al., 2000). Therefore, if context items on honesty reduce socially desirable responding, it can be assumed that such items will influence the response selection process.

Several factors can play a role in context effects (see Touragneau et al., 2000, for an overview). Two of these factors are question similarity and the positioning of questions. Generally, the more similar the questions are and the more closely together they are presented, the more likely it is that context effects will occur. To intentionally create context effects between questions on honest responding and target questions, it is thus best to place the questions on honest responding right before the target questions and to make them seem as similar to the target questions as possible. The questions on honest responding are intended to be a manipulation, triggering the norm of honesty (not a measure of honesty); therefore, one way in which this can be done is to change the response options of the questions on honest responding to the response scale of the target questions, because response options influence the way in which a question is processed (Schwarz, 1999), and because editing is presumed to occur during response selection. Therefore, if the purpose of using questions on honest responding is to draw attention to the honesty of answers during the response selection stage, making participants think about the honesty of their answers within the same frame of responding would presumably increase the likelihood of the questions on honest responding creating context effects.

The present research: Questions on honest responding

The method proposed here for dealing with socially desirable responding is somewhat similar to giving instructions and priming. It makes respondents explicitly aware of possible biases in their responses—that is, the adjustment of responses toward how they think others will view their answers with regard to the desirability of the response. Respondents are made aware of this by being presented a message of honest responding as a survey question, and thereby putting other questions in context with questions on honesty. Instead of respondents being asked to answer honestly, they are asked whether or not they do so—priming thoughts of honesty, and therefore the norm of honesty, before they answer questions on other topics. The idea is to produce directional context effects in which all respondents are moved in the direction of more honesty, and thus less social desirability responding (for more on question context effects, see Schwarz 1999; Schuman et al., 1981; Tourangeau & Rasinski, 1988).

However, before testing the questions on honest responding, we ran two small pilot studies, to estimate the proportion of Internet survey participants who skip the instructions altogether, to see to what extent this is a problem of Internet surveys. This is important, because it is approximately the proportion of the sample that would not receive an honesty messages presented as part of the instructions, but who would receive the message in the questioning phase of the survey.

The main purpose of this research was to develop statements about survey participants’ response behavior and attitudes, focusing on respondents’ impression management, honesty, and the influence of other people’s opinion, in order to test whether posing such statements as questions could bring the respondents’ attention to the standard request for honest responding, and, by means of which, create a context that triggers the norm of honesty and thus reduces socially desirable responding.

Study I describes the process of developing the questions on honest responding, which resulted in a list of nine questions on honest responding that were then tested further by presenting them to half the participants as single items (one item per page) at the beginning of a survey on sensitive topics, to test for group differences in honest responding. In Study II, the effects of the same nine questions on honest responding were again tested in much the same way. However, to reduce response burden, the questions on honest responding were presented in a grid instead of in the single-item format used in Study I. Study III was aimed at further reducing the response burden of the questions on honest responding by reducing the number of questions on honest responding to three. A between-group comparison of the mean item scores was used in Study I, and a between-group comparison of mean scale scores was used in Studies II and Study III to evaluate the effects of the questions on honest responding. In addition, the effects of the questions on honest responding on the correlational relationships between scales was evaluated in Study III. In all three studies, the evaluation of honesty was based on the attribution of desirable and/or undesirable behavior, with less attribution of desirable behavior and/or more attribution of undesirable behavior being taken to indicate less influence of social desirability.

Pilot studies: Proportion of participants who read survey instructions

Prior to conducting Study I, we conducted a small pilot study on 40 first-year psychology students, to test whether the students read the instructions in Internet surveys. The students were first presented with an instruction page and immediately after clicking the “Continue” button, asked if they had read the instructions (with the response options “Yes” and “No”). Eleven out of forty students denied having read the instructions, which amounts to 27.5% of the students.

The same procedure was again tested on a larger sample of students from the University of Iceland. The survey was sent out to 10,187 potential participants who had previously given their consent to receive survey invitations sent out by the university’s Student Registry (Nemendaskrá). Out of the 1,812 who opened the survey, 1,505 gave an answer to whether they had read the instructions. Four hundred eighteen admitted to not having read the instructions, amounting to 27.8% of those who responded—about the same proportion as found in the first pilot study. If this is the case in other Internet surveys, then any message presented in the instructions will be lost on over a quarter of the sample. It should, however, be noted in this context that methods have been developed both to detect respondents who are not paying attention when responding to surveys and to increase respondents’ attention to questionnaire instructions (seriousness check, e.g. Bayram, 2018; Reips, 2000; and instructional manipulation check, Oppenheimer, Meyvis, & Davidenko, 2009), though no such methods were used in the following studies.

Study I: Questions on honest responding: Development and testing

Participants who skip straight to the questions in Internet surveys will read the introduction message if it is written in question format. Therefore, if a message conveying honest responding reduces socially desirable responding, then it should be more effective if it is posed as questions, both because of the increased attention to the message and increased processing of it. The purpose of Study I was to develop questions on honest responding and to test whether such questions can affect responses to sensitive items on both desirable and undesirable behavior.

The questions on honest responding were generated with the aim of reducing socially desirable responding and were tested under the assumption that less attribution of desirable behavior and more attribution of undesirable behavior would be evidence of reduced social desirability. To ensure that the groups in Study I did not differ in their levels of social desirability at the onset of the survey, respondents’ tendency to give socially desirable answers was assessed with the Marlowe–Crowne Social Desirability Short Form (Vésteinsdóttir, Reips, Joinson, & Thorsdottir, 2017).

Method

Participants and procedure

Participants were recruited through social network sites and email, and by snowball sampling. An invitation was posted on the websites and sent by email to potential participants. The post/email contained a short introduction and a link to the survey. Data were collected within one week, resulting in a convenience sample of 589 participants who answered at least one of the questions on the social desirability measure, the questions on honest responding, and/or the sensitive questions (dependent variables). However, in the present research it was essential that the participants in the experimental group answered the questions on honest responding, because if they did not respond to the questions, it could not be assumed that the honesty message conveyed by the questions’ content had been processed by the respondent, and thus we could not assume that this message had evoked the norm of honesty (much as a participant in a drug trial who did not take the prescribed drug or who took a smaller dose or some unknown dose would not be said to have participated in the trial). Therefore, data from all participants in the experimental group who did not respond to the manipulation questions were not included in the analysis.

Examining missingness, Little’s MCAR test (Little, 1988) showed that omitted responses on the social desirability measure and the sensitive questions (taking age and gender into account) could be assumed to be missing completely at random (i.e., were not dependent on the responses to other variables in the dataset) for both the experimental group (χ2 (208, N =260) = 220.732, p =.260) and the control group (χ2 (365, N = 296) = 382.938, p =.249). In addition, less than 5% of values, in total, were missing. Therefore, listwise deletion of missing values was used. This resulted in a convenience sample of 475 participants who completed the survey, 84 men and 383 women (eight did not indicate their gender). The participants’ age ranged from 18 to 71 years (mean = 33, SD = 12.6). In the experimental group, there were 245 participants, 45 men and 196 women (four did not indicate their gender), from 18 to 71 years of age (mean = 33, SD = 12.2). The control group consisted of 39 men and 187 women (four did not indicate their gender), from 18 to 71 years of age (mean = 32, SD = 12.9).

Instruments and research design

Marlowe–Crowne Social Desirability Short Form

The Marlowe–Crowne Social Desirability Short Form (Vésteinsdóttir et al., 2017) is intended for use on the Internet and consists of ten items from the Marlowe–Crowne Social Desirability Scale (Crowne & Marlowe, 1960). All items are true/false, with five keyed in the true direction (attribution items) and five in the false direction (denial items). Responses in the keyed direction are coded as 1 and responses not in the keyed direction as 0. The highest possible score on the Marlowe–Crowne Social Desirability Short Form is therefore 10, and the lowest is 0, with higher scores indicating more social desirability in responses. The mean scale score for the total sample was 3.32 (SD 2.18), and Cronbach’s alpha was .64, which is low, but just under the minimally acceptable alpha values for research purposes (.65 and .70) suggested by DeVellis (2012).

Questions on honest responding

A list of 25 questions on honest responding was generated, to represent the adjustment of responses to how the respondent thinks others would view his or her answers with regard to the desirability of the response. Thus, the focus of the honesty message conveyed in the questions on honest responding was on the adjustment of responses—that is, the shift from an honest response to a more desirable response—to make the respondent aware that a socially desirable response could deviate from an honest response. The statements were read by five experts in the field, for judgments of clarity and relation to the concept of interest. The 25 statements were also administered in paper-and-pencil format to 143 undergraduate psychology students during class. All items were presented with the same five, fully labeled response categories: strongly disagree, somewhat disagree, neither agree nor disagree, somewhat agree, and strongly agree, coded from 1 (strongly disagree) to 5 (strongly agree). The list of items was refined on the basis of the expert judgments and analysis of data obtained from the in-class administration.

Refinement of the list resulted in 18 questions on honest responding, which was administered online to 9,758 students from the University of Iceland through the university’s Student Registry. The participants were 191 (144 female and 47 male),Footnote 3 with a mean age of 32 years. Principal component analysis was used to further reduce the number of items (favoring item diversity instead of similarity).Footnote 4 A total of nine questions on honest responding were chosen (see Table 1). These nine statements were used in this study with the same response categories as the sensitive questions (see below).

Table 1 Questions on honest responding, in presentation order

Sensitive questions

Seven sensitive questions were chosen for the study (see Table 2), on the basis of their judged desirability, by two judges familiar with the concept of social desirability. Four of the items had also been rated as sensitive to social desirability in a previous, unpublished study of sensitive questions (Items 1, 3, 5, and 6). All seven sensitive questions were presented with the same fully labeled response categories (never, seldom, sometimes, and often).

Table 2 Seven sensitive questions, in presentation order

Design

Both groups were first presented with the Marlowe–Crowne Social Desirability Short Form and then randomly assigned to either the experimental or control group (the participants were unaware of this process). The experimental group received the questions on honest responding as single items and then the seven sensitive questions, also as single items. This order was reversed for the control group, which was first presented with the seven sensitive questions and then the questions on honest responding. Both groups were presented with the same background questions and a comment box on the last page.

Results and discussion

Participants’ tendencies to give socially desirable responses, measured with the Marlowe–Crowne Social Desirability Short Form, did not differ between the experimental group (mean = 3.34, SD = 2.24, n = 245) and the control group (mean = 3.28, SD = 2.10, n = 230), t(473) = 0.282, p = .778, d = 0.03. Differences in participants’ tendencies to give socially desirable answers before the presentation of the questions on honest responding can therefore not be assumed to cause the differences in socially desirable responding between the experimental and control groups.

Participants in the experimental group, who answered the questions on honest responding before answering the seven sensitive questions, gave significantly more socially undesirable answers on all of the sensitive questions except Questions 4 (“I tell the truth even if it gets me into trouble”) and 5 (“I have taken sick leave from work or school even though I wasn’t sick”) (see Table 3).

Table 3 Descriptive statistics and t tests between the experimental and control groups on the seven sensitive questions (SQ) in Study I

To form an index of socially desirable responding, the seven sensitive questions were summed by reverse-scoring the items on desirable behavior and calculating the mean score of all items, with higher scores representing more undesirable responses. The mean of undesirable responses for the experimental group (mean = 13.00, SD = 2.47) was higher than the mean of undesirable responses in the control group (mean = 12.20, SD = 2.20), t(473) = 3.740, p < .001, d = 0.34, as expected.

This sum score was also used to calculate the correlation between the Marlowe–Crowne Social Desirability Short Form and the seven sensitive questions, to provide an indication of the influence of the tendency to give socially desirable responses in each group. In both groups, the Marlowe–Crowne Social Desirability Short Form had a substantial correlation to the sum score of the undesirable responses (experimental group, r = – .36, p < .001, and control group, r = – .42, p < .001), indicating that this tendency influenced responses in the control group as well as in the experimental group, despite the presentation of the questions on honest responding.

Measuring social desirability with a scale based on the Marlowe–Crowne Social Desirability Scale (Crowne & Marlowe, 1960) implicitly takes the view that social desirability is a tendency of the respondent (the approval motive). Tourangeau and colleagues (Tourangeau et al., 2000; Tourangeau & Yan, 2007), however, have viewed socially desirable responding as situational, in which the questioning situation makes socially desirable responding more or less likely. Thus, socially desirable responding can be seen as either resulting from the respondents’ tendency to give socially desirable answers or as a reaction to the questioning situation. These are however not necessarily opposing views of social desirability, as was noted by Tourangeau et al. (2000).

Participants with a tendency to respond in a socially desirable manner can be presumed to be responsive to situational cues, such as the questions on honest responding, and thus the tendency alone is not expected to fully account for socially desirable responding, but individual differences in this tendency will make socially desirable responding more or less likely, depending on the situation. In other words, and more in line with the interaction approach from personality theory (see, e.g., Endler & Magnusson, 1976), the association between the response behavior (responses to the sensitive questions) and respondents’ tendency to give socially desirable answers can be expected to be moderated by the experimental situation (the questions on honest responding).

To test whether responses to the questions on honest responding differed between the two groups, a sum score was computed for the questions on honest responding by reverse-scoring Items 1, 2, 4, and 7 and calculating the mean score. Despite the reduction in social desirability in the experimental group, the total scores on the questions on honest responding did not differ between the experimental (mean = 31.74, SD = 3.37, n = 245) and control (mean = 31.98, SD = 3.32, n = 230) groups, t(473) = – 0.78, p = .436, d = 0.07. Keeping with the assumption that less desirable answers to the seven sensitive questions are more honest, this means that even though the experimental group answered the seven sensitive questions more honestly, the participants in the control group did not indicate less honesty in their responses to questions on honest responding.

The questions on honest responding were designed to influence socially desirable responding, not to measure it. Answers to the questions on honest responding can, therefore, not be taken as an indicator of social desirability, because respondents’ self-evaluations of how honestly they respond are not a good measure of social desirability. Honest responding is desirable, so questions concerning respondents’ honesty would thus also be affected by social desirability. Therefore, the respondents who were guided by the social desirability of their responses, instead of the norm of honesty, would seem to continue using this strategy when they reached the questions on honest responding at the end of the survey.

Study II: Further testing of the questions on honest responding

In Study I, nine questions on honest responding were chosen from a pool of 25 items and tested as a sequence of single items. Although the questions on honest responding are most salient when presented as single items, placing nine single items at the beginning of a survey can increase response burden, which in turn decreases response quality (Galesic & Bosnjak, 2009; Schuman & Presser, 1996). Therefore, the nine questions on honest responding were presented in a grid in Study II. Grids have the advantage of making questionnaires seem shorter and reduce redundancy in the questions, because the response options are not repeated for each item, thus reducing the respondents’ cognitive effort (i.e., less effort is needed to apply the same response categories to all items than when respondents have to read response labels for each item; Lozar Manfreda & Vehovar, 2008). The effect of the nine questions on honest responding on admitting socially undesirable behavior (these questions were also presented in a grid) was tested.

Method

Participants and procedure

An invitation to participate, with a link to the survey, was posted on social network sites. The duration of data collection was one month, resulting in a convenience sample of 1,211 respondents who answered at least one of the questions in the survey. However, as we explained in Study I, it was imperative that the respondents in the experimental group respond to the questions on honest responding (i.e., that the norm of honesty could be assumed to have been evoked by processing of questions’ content). Therefore, data by respondents from the experimental group who did not respond to the questions on honest responding were excluded from further analysis.

Little’s MCAR test indicated that missing values were missing completely at random (i.e., did not depend on the responses to other variables) on the measure of social desirability and the sensitive questions (taking into account age and gender), in either the experimental group (χ2 (502, N = 584) = 500.460, p=.511) or the control group (χ2 (566, N = 607) = 531.402, p = .849). Less than 5% of values were missing in total. Therefore, listwise deletion of missing values was used in this study, resulting in a sample of 1,015: 166 men and 838 women, with four in neither category (marking the response option “Does not apply”) and seven who did not respond to the gender question. The participants’ age was between 18 and 76 years (mean = 34, SD = 11.9). In the experimental group there were 502 participants: 83 men, 414 women, three in neither category, and two who did not indicate their gender. The participants’ age in the experimental group ranged from 18 to 69 years (mean = 35, SD = 12.0). In the control group, there were 513 participants: 83 men, 424 woman, one in neither category, and five who did not respond to the gender question. The age of participants in the control group ranged from 18 to 76 years (mean = 34, SD = 11.9).

Instruments and research design

In this study, we used the Marlowe–Crowne Social Desirability Short Form (mean scale score for the total sample was 3.28 with a standard deviation of 2.11, and Cronbach’s alpha was .62) and the nine items on honest responding generated in Study I. As a dependent variable, we used only sensitive questions on socially undesirable and/or illegal behavior. This was done in order to create a more coherent measure than the single items in Study I. The items were selected as follows: The five undesirable statements from Study I were presented in the same order, plus four new additional undesirable statements generated for the present study (see Table 4). Cronbach’s alpha for the scale was .67, which meets the minimum requirements for alpha in research settings (DeVellis, 2012). The mean scale score for the total scale was 17.88, with a standard deviation of 3.90. All measures were presented in grids—a total of three grids on three pages. In response to comments made to Study I, we also added the response option always to the questions on honest responding, and thus also to the nine sensitive questions. A sum score was calculated for the nine sensitive questions, with higher scores representing more undesirable behavior.

Table 4 The four sensitive questions added to Study II, in presentation order

The survey design was the same as in Study I. First, all the participants filled out the Marlowe–Crowne Social Desirability Short Form, and then they were randomly divided into two groups (without their awareness). The experimental group first received the questions on honest responding and then the nine sensitive questions, and the control group received these measures in reverse order.

Results and discussion

As in Study I, social desirability measured with the Marlowe–Crowne Social Desirability Short Form did not differ significantly between the experimental group (mean = 3.33, SD = 2.11, n = 502) and the control group (mean = 3.23, SD = 2.11, n = 513) at the onset of the study, t(1013) = 0.75, p = .456, d = 0.05. A significant correlation between the Marlowe–Crowne Social Desirability Short Form and the sensitive questions was found in both groups (experimental group: r = – .38, p < .001; control group: r = – .42, p < .001). Also consistent with the results for Study I, the participants in the experimental group admitted more socially undesirable behavior (mean = 18.22, SD = 4.08, n = 502) than did those in the control group (mean = 17.56, SD = 3.68, n = 513), t(997.58) = 2.728, p = .007, d = 0.17.

However, unlike the results from Study I, a significant difference was found between scores on the questions on honest responding for the two groups, with the experimental group scoring lower (mean = 39.08, SD = 4.79, n = 502) than the control group (mean = 39.67, SD = 4.43, n = 513), t(1013) = – 2.040, p = .042, d = 0.13. This means that the control group responded in a way that indicated more honest answers than did the experimental group, despite admitting less undesirable behavior. Although the difference is very small, this further indicates that responses to explicit honesty questions should not be taken as an indication of respondents’ honesty.

This finding further suggests that once respondents adopt a socially desirable response strategy, it persists throughout the survey. In contrast, when the norm of honesty is evoked by presenting the questions on honest responding, participants seem to use honesty as a guideline when responding. The questions on honest responding thus seem to shift the respondents’ focus to honesty, which is then used as a basis for forming responses.

Study III: Reducing response burden of the questions on honest responding

The purpose of this study was to test whether presenting fewer items could reduce the response burden of the questions on honest responding, and whether the method of questions on honest responding was superior to the method of presenting standard honesty instructions. The latter issue was tested by presenting all participants with honesty instructions, to test whether presenting the questions on honest responding could reduce socially desirable responding, beyond any reduction in socially desirable responding that could be attributed to standard honesty instructions. A student sample was used in this study, and therefore the measures used to evaluate honest responding were aimed at student behavior: school ambition, student helpfulness, and achievement-striving by delaying short-term gratification. Evaluation of honesty was based on attribution of favorable student behavior, with less attribution of desirable behavior being taken to indicate less socially desirable responding.

In addition, to further explore the usefulness of the method of presenting questions on honest responding, we tested whether correlations between the questionnaires would be affected by the questions on honest responding. Social desirability can influence correlations in many ways. However, in the present study all three dependent measures focused on desirable student behavior, and thus, if responses are influenced by social desirability, the answers would be shifted in the same direction. This would produce stronger positive correlations between the measures. If the correlation between the measures were positive to begin with, as would be expected in this case, socially desirable responding would strengthen this correlation. Therefore, if the questions on honest responding reduced the effects of social desirability, the correlation between the measures should be lower in the experimental group.

Method

Participants and procedure

An email invitation to participate in the survey was sent out to 10,149 students from the University of Iceland, who had previously given their consent to receive survey invitations sent out by the university’s Student Registry, upon student or staff request. The duration of data collection was three weeks, with one reminder being sent out 12 days after the original invitation. The email contained a short introduction and a link to the survey.

Data from three participants were removed from the dataset due to straight-lining in the most extreme response categories (choosing only strongly disagree or strongly agree in response to all statements) throughout the survey, and from one because of extreme responding (jumping between the most extreme options), resulting in the lowest overall score in the dataset. Data by three additional respondents were removed from the analysis on the basis of their comments at the end of the survey (indicating that the respondent had taken the survey before, had not given his consent to receive surveys, or had confused the direction of the response options at some undefined point in the survey). Two respondents also gave highly improbable responses (such as being 114 years old), but their data had already been removed on the basis of one of the above criteria.

Out of the participants who received the questions on honest responding, only those who answered all three questions were assigned to the experimental group in the analysis (since the manipulation is thought to work by evoking the norm of honesty due to processing of the question content, those who did not respond to the questions might also have ignored their content).

The final sample: After removing data by participants with response patterns that indicated response errors, participants who received the questions on honest responding but did not respond to all three, and participants who left all questions on the dependent variables unanswered (from both groups), the sample consisted of data by 899 participants. In the total sample were 175 men, 711 women, two in neither category (marking the response option Does not apply), and 11 who did not give a response to the gender question. The participants were from 20 to 69 years old (mean = 32, SD = 10.60).

Little’s MCAR test indicated that omitted responses to the dependent measures were missing completely at random for both the experimental group (χ2 (88, N =465) = 92.282, p = .357) and the control group (χ2 (154, N = 434) = 137.103, p = .832), in the sense that missingness did not depend on other variables in the dataset (Little, 1988). All dependent variables had missing data; however, all variables had less than 5% of the data missing. Therefore, listwise deletion of the missing data should yield unbiased results and would thus be justifiable. However, due to the loss of data when using listwise deletion (especially when comparing more than two variables), the missing values were replaced using multiple imputation based on a linear regression model, with the auxiliary variables age and gender included. The multiple imputation was conducted in SPSS, creating ten datasets with imputed values. The number of imputations needed is relative to the amount of missing data (Graham, Olchowski, & Gilreath, 2007), which was small in the present study, and thus ten imputations were deemed sufficient (increasing the number of imputations did not change the results).

Instruments and research design

The survey contained questions on school ambition, the Achievement subscale from the Delay of Gratification Inventory (Hoerger, Quirk, & Weed, 2011), and questions on student helpfulness. All questions were presented with the same five, fully labeled response categories: strongly disagree, somewhat disagree, neither agree nor disagree, somewhat agree, and strongly agree, coded from 1 (strongly disagree) to 5 (strongly agree).

The Delay of Gratification Inventory–Achievement subscale, contains seven statements, three of which are reverse-scored (see Hoerger et al., 2011). The maximum score is 35, and the minimum score is 7, with a higher score indicating more delay of gratification on the Achievement subscale. The mean scale score for the total sample was 26.99 (SD = 4.60), and Cronbach’s alpha for the scale was .77.

School ambition and student helpfulness were measured with three statements each, none of which were reverse-scored. For each scale, the sum score was calculated, with higher scores representing more ambition on the school ambition statements and more helpfulness on the student helpfulness statements. The school ambition statements were as follows: (1) I always try to perform outstandingly in all courses I sign up for, (2) I am a productive person, and (3) I always try to do projects as well as I can. The mean score on the scale for the total sample was 11.92 (SD = 2.15), and Cronbach’s alpha was .73. The student helpfulness statements read: (1) I am always willing to help my fellow students, (2) I take time to help those who ask for my assistance, and (3) I unconditionally participate in my fellow students’ experiments and research projects. The mean scale score for the total sample was 11.84 (SD = 2.05), and Cronbach’s alpha was .69.

Participants were randomly divided into two groups. Both groups received the same instructions, followed by the school ambition statements, Delay of Gratification Inventory–Achievement subscale, and student helpfulness statements, in that order. In addition, the experimental group received three extra questions on honest responding designed to mirror the honesty message in the instructions, which read: It is important to respond to all statements and that each person responds individually. There are no right or wrong answers, we only ask that you respond honestly and to the best of your knowledge. The three questions on honest responding were as follows: (1) When I answer questions about my behavior, I think about how others behave, (2) I answer survey questions conscientiously, and (3) I am honest in my responses to survey questions. The questions on honest responding all had the same response scale as the school ambition statements, the Delay of Gratification Inventory–Achievement subscale, and student helpfulness statements; however, the response categories appeared in a vertical order, not horizontal as for the school ambition statements, Delay of Gratification Inventory–Achievement subscale, and student helpfulness statements. Each of the questions on honest responding was presented individually, right after the instructions and before other questions. The presence or absence of questions on honest responding served as the independent variable, with the total scores on the Delay of Gratification Inventory–Achievement subscale, the school ambition statements, and student helpfulness statements as dependent variables.

Results and discussion

In the following sections, the results obtained with multiple imputation will be interpreted. However, for the sake of explicitness, the results obtained with a listwise deletion of missing values will also be reported.

The manipulation of questions on honest responding had a significant effect on the mean scores of all three measures, with lower mean scores being obtained in the experimental group (see Table 5).

Table 5 Descriptive statistics and t tests between the experimental and control groups on school ambition (SA), Delay of Gratification Inventory–Achievement subscale (DGI-A), and student helpfulness (SH)

Given that school ambition, delay of gratification, and student helpfulness are all favorable characteristics, and that the attribution of such characteristics is socially desirable, lower mean scores for the experimental group can be taken to indicate less social desirability. Drawing attention to the honesty message in survey instructions, by presenting it as questions on honest responding, therefore seems to have reduced social desirability beyond any reduction that could be attributed to standard honesty instructions.

As can be seen from Table 5, the sample sizes are quite discrepant between the experimental and control conditions. The main reason for this is that, of those who opened the link to the survey, the ones who were assigned to the control group more often left all questions on the dependent measures unanswered (194 in the control group, as compared to 165 in the experimental group). This might be due to difference in the items on the first page of the survey. Recall that in the previous two studies reported, all participants began the survey with a measure of social desirability; however, in this study the experimental group first received the questions on honest responding and then the dependent measures, whereas the control group went straight to the dependent measures. It is therefore possible that presenting the questions on honest responding might have reduced dropout. However, without further research this is merely speculation.

In addition to the between-group comparison of mean scores, the correlation between questionnaires was also calculated separately for each group. If responses to the questionnaires were affected by social desirability, then the correlations between measures should be higher due to increased similarity in the responses. These results are presented in Table 6.

Table 6 Correlations between dependent measures, for the experimental and control groups separately

As can be seen in Table 6, the correlation between measures is decreased in the experimental group, indicating less similarity in responses due to social desirability. The reduction in the correlations in the experimental group was significant when comparing the correlation coefficients of the two groups between the Delay of Gratification Inventory–Achievement subscale and the student helpfulness statements (z = – 2.01, p = .044), and marginally significant under a two-tailed assumption between the Delay of Gratification Inventory–Achievement subscale and the school ambition statements (z = – 1.82, p = .069), but was not significant between school ambition and student helpfulness (z = – 1.07, p = .285).

General discussion

The main purpose of this research was to develop a practical method that could reduce socially desirable responding in Internet-administered measures. In three studies, we showed that the questions on honest responding can reduce socially desirable responding. The three studies also differed in their designs, which speaks to the robustness of the findings. The main differences across studies are summarized in Table 7.

Table 7 Summary of the main design differences across studies

As can be seen from Table 7, there are notable differences between the designs of the three studies. First, in Study III a measure of social desirability was not included. It is possible that responding to items meant to capture social desirability makes concerns about social desirability more salient to respondents, which could have an effect on how the questions on honest responding work. Excluding this measure and obtaining results similar to those in the previous studies, however, is evidence that the presentation of a social desirability measure is not an important factor in how the questions on honest responding work. Furthermore, the questions on honest responding reduce socially desirable responding when presented as either single items on separate pages or in grids. The effect is apparent in both measures of socially desirable or undesirable behavior, with either frequency or agree/disagree response scales.

Overall, the questions on honest responding are easily implemented and can be used to reduce socially desirable responding in questions on sensitive topics. The method also reduces socially desirable responding beyond any reduction that could be attributed to presenting standard instructions. The results from the three studies conducted on the questions on honest responding show that presenting questions on honest responding can change mean scale scores and the correlations between measures in the expected direction. Moreover, presenting as few as three questions on honest responding can bring about such changes. The overall conclusion is that presenting an honesty message in the form of questions—that is, questions on honest responding—can reduce socially desirable responding, under the assumption that more attribution of favorable characteristics and higher estimates of undesirable behavior represent less social desirability.

Practical and theoretical implications

As we noted earlier, research on the effects of socially desirable responding indicates that the substantive results of questionnaire data are distorted due to socially desirable responding (e.g., Bäckström, 2007; Bäckström et al., 2009; Barrick & Mount, 1996; Hirsh & Peterson, 2008), and although socially desirable responding seems less prevalent in computerized measures (Gnambs & Kaspar, 2015), the effect is still there, as can be seen from the studies above. Also, as people become more technologically literate, the feeling of privacy that is presumed to reduce socially desirable responding in Internet surveys may fade. In addition, Internet surveys are increasingly used when asking sensitive questions (Mohorko et al., 2013), which are fairly common within the health and social sciences. Research findings based on such self-reports are used to form theoretical and practical assumptions in these fields. Using questions on honest responding is a very simple procedure and can reduce socially desirable responding, which would increase the accuracy of responses and thus the validity of the findings and of any conclusions drawn. Furthermore, many fields within the health and social sciences base their research findings on the interpretation of correlation coefficients, and therefore a significant change in correlations between measures can have major impact on the conclusions drawn from such research. The questions on honest responding could thus prove to be beneficial in correlational research on self-reported measures.

An additional practical implication of this research comes from the comparison of mean scores on the questions on honest responding. In Study I, the mean responses to the questions on honest responding did not differ between the two groups, but they did differ slightly in Study II, with higher mean scores for the control group. This finding is not particularly surprising, given that the questions on honest responding were not a measure but a manipulation. What is interesting about these results is that explicitly asking respondents about the honesty of their responses produces uninformative answers. This calls into question the practicality of explicit questions on honest responding, such as the honesty checks (i.e., asking respondents how honest their answers were) sometimes presented at the end of surveys. In light of the assumption that more attribution of undesirable behavior indicates more honesty, the responses to questions on honest responding were inconsistent with how honestly participants answered, and thus respondents’ self-reports of honest responding, measured by questions at the end of a survey, should not be used to draw conclusions about the honesty of responses.

It is also worth considering the process behind the questions on honest responding. In the introduction, we assumed that the questions on honest responding would affect the final stage of the answering process—that is, selection of the appropriate response option—because that is where editing (the mechanism behind socially desirable responding) is presumed to happen (Tourangeau et al., 2000). This is merely an assumption. It is also possible that the context created by the questions on honest responding affects other stages of the answering process—that is, the norm of honesty is applied to other stages of the answering process as well. The questions on honest responding could, for example, affect the retrieval stage, by motivating respondents to think more carefully about their responses, thus generating more incidences (or more counterincidences) of the behavior in question before drawing a conclusion. This could lead respondents to a different and less desirable conclusion, because recalling more incidences of undesirable behavior could result, for example, in the behavior being reported as more frequent. Therefore, other mechanisms could possibly lie behind the influence of questions on honest responding, and as is discussed by Tourangeau et al., the stages of the answering process may not be as distinct and sequential as they are usually described. One way to test this would be to ask respondents to give examples of the target behavior right after answering the target question and to test whether respondents who receive questions on honest responding before the target question are able to do the task faster and/or to name more examples. It must be kept in mind, though, that respondents may not be willing to reveal such information if the target question is sensitive, so the target question would have to be chosen very carefully.

Limitations and future directions

In three studies, we showed that presenting questions on honest responding before questions on desirable and/or undesirable behavior consistently produced a significant difference between mean scores on the dependent measures. However, this difference was small in all three studies. Small differences can nevertheless have practical implications (Rosenthal, 1986, 1990), the severity of which depend on both the size of the effect and the nature of the research (as when predicting, e.g., health outcomes). In addition, even though the difference in mean scale scores was small, changes in correlation coefficients can have a major impact on the conclusions drawn from correlational studies with self-reported data.

Furthermore, self-reports are assumed to be informative but not completely accurate. The total extent of inaccuracy caused by socially desirable responding could not be estimated in the studies conducted here, because we did not know the actual rate of the behavior asked about, and thus we had nothing to compare participants’ responses to. Therefore, if the prevalence of undesirable behavior is low in the sample to begin with, the use of questions on honest responding would not produce large changes in the target measures. In addition, the findings presented here are all based on convenience samples, so even if the average prevalence rate of the behavior in question were known, the sample could not be assumed to be representative of the population. It would therefore be informative to test the questions on honest responding on a predetermined sample in which the prevalence of the target behavior would be known.

Also worth considering is the involvement of the sample. The samples used in the three studies above consisted entirely of people with no vested interest in the survey outcome. Therefore, we do not know whether the questions on honest responding would prove to be useful in situations in which participants could personally gain from a favorable presentation of themselves (e.g., job applicants). Testing the questions on honest responding on different samples would help clarify the usefulness of the method in other settings.

Another test of the usefulness of the questions on honest responding would be to compare them to other, similar manipulations, such as making the request for honesty more salient by simply asking respondents to indicate their agreement to respond honestly in the form of an item. This would also draw respondents’ attention to the honesty message and might thus produce an effect similar to that of the questions on honest responding. For future research, it would be worth testing whether the questions on honest responding outperform such an explicit instructional item.

A major concern regarding the practicality of methods to reduce socially desirable responding in Internet surveys is the possible drawbacks of implementing such a method. Including extra questions in surveys can be costly and increases the response burden (Galesic & Bosnjak, 2009; Schuman & Presser, 1996), and therefore it would be useful to measure response times to each of the questions on honest responding separately, to have an estimate of the increased time taken to respond to a survey that included questions on honest responding.

It would also be preferable to use as few questions on honest responding as possible, since each response to a question takes some time and effort. In Study III we reduced the number of questions on honest responding from nine to three but did not test whether the number of questions on honest responding could be reduced further. Before simply reducing the number of questions on honest responding, it would, however, be beneficial to know which of the questions on honest responding, or which combination of the questions, works best, and on what types of target questions. Different questions on honest responding might function differently depending on the target topic. It is, for example, possible that some questions on honest responding work better when the question topic is something desirable rather than undesirable. Determining which of the questions on honest responding work best could be done by testing all nine separately against standard instructions. This information could then be used to form combinations of the best items, or to know which items could be used interchangeably (keeping in mind, though, that testing the questions on honest responding by presenting only one question might reduce the salience of the manipulation and change the context).

Identifying interchangeable items could be highly beneficial if the method is to be used more than once with the same participants, either as part of the same study or when administered to panel participants or on marketplaces such as MTurk, where participants could be expected to encounter multiple instances of the questions on honest responding. If a pool of interchangeable items could be created, researchers could choose from a number of items to use in their research, reducing the likelihood of participants receiving the same item multiple times. It is, however, possible that if participants become familiar with this approach, responding to the questions on honest responding would become “routine”; with time, the questions might cease to activate thoughts of honesty, and thus their effect would diminish.

Because the questions on honest responding were designed as a manipulation and not as a measure, they are more difficult to evaluate, since they cannot be evaluated directly. Only their effect on other target questions can be evaluated, and the effect might depend on the content of those questions. In the present research, we focused only on behavior questions, but it would be worth testing whether the questions on honest responding are even better suited to attitude questions, since the idea of the questions on honest responding is partly based on a framework built around answers to attitude questions (see Tourangeau & Rasinski, 1988; Tourangeau et al., 2000).

In sum, the research presented here is an important step in the evaluation of a new method—questions on honest responding—for reducing socially desirable responding in sensitive questions, and the results are promising. The method is easy to implement, with little added cost or response burden. More research will be needed, however, before making general recommendations about how and when it would be beneficial to use the questions on honest responding. What can be recommended at this point is to use the method, with a few questions on honest responding instead of standard instructions, in Internet-administered self-report research when the topic is sensitive.