Background

Surveys represent one of the most efficient and inexpensive research methods available to collect representative, high quality data from large numbers of research participants. They therefore frequently serve as the backbone used to define the scope and magnitude of many potential public health problems. In the United States, for example, large national surveys have been used to estimate what proved at the time to be surprisingly high levels of mental illness within the general population [1], physical violence within families [2], and sexual assault among women [3]. Even the United States Census, which serves as the basis of apportioning Congressional representatives and taxes to each state, is survey-based. Typically, survey data are either collected by interviewers using face-to-face or telephone communication with the participant or via the participant’s own self-report.

Regardless of the topic studied and how the information is collected, scientifically correct, survey-based prevalence estimates require that research participants be representative of the population from which they are drawn, that participants actually answer the questions that are asked of them, and that they answer those questions honestly. On average, research participants disclose sensitive and personal information, such as mental health symptoms, drug misuse, and history of sexual assault more frequently when responding to self-administered questionnaires than when taking part in face-to-face or telephone interviews [47]. Studies suggest that disclosure of sensitive information on self-administered questionnaires is enhanced yet more when participants respond anonymously instead of confidentially [5, 811]. This implies that anonymous, self-administered surveys may be the optimal method for accurately cataloging information about certain public health problems, such as the prevalence of physical or sexual abuse or of mental health symptoms.

Although by no means proven, most survey researchers take the stance that methods that generate higher prevalence estimates for stigmatizing or sensitive information are probably more accurate than methods that generate lower estimates. This stance, however, rests upon a rather unlikely assumption that all people carry the same propensity to participate in survey research. Particularly when a survey topic is sensitive, survey respondents tend to differ substantially from non-respondents [12]. Therefore, three mechanisms might explain why anonymous surveys generate higher prevalence estimates of stigmatizing or sensitive information compared to non-anonymous surveys: 1) propensity to participate in research is in fact equal across all members of a sampling frame, and anonymous methods promote more honest self-disclosure among the participants with stigmatizing experiences; 2) sampling frame members with stigmatizing experiences are more reluctant than others to participate in surveys, but anonymous methods reduce this inherent reluctance (under selection is reduced); 3) anonymous methods disproportionately increase the propensity of people with stigmatizing experiences to participate in the survey relative to those without such experiences (over selection is induced). The first two mechanisms reduce bias; the last introduces bias. Without information about non-respondents’ characteristics relative to respondents’, however, one cannot determine which possibility is correct. Unfortunately, under typical anonymous conditions, such information is unavailable.

Anonymous surveys carry other drawbacks relative to confidential surveys. For example, unlike confidential survey methods, anonymous survey responses cannot be linked to administrative or other non-survey data, thus limiting anonymous data’s richness and utility. Also, unless creative methods are employed, researchers often cannot track or send follow-up mailings to non-respondents of anonymous surveys, thus obtaining inferior response rates (e.g., [13]). While low response rates do not necessarily correlate to poor data quality, risks for non-response bias do increase with lower response rates.

Two methods to bypass the tracking limitation in anonymous surveys have been described. In one, participants return a completed survey and a separately mailed postcard. Only the postcard contains a unique identifier, which is used to track respondents [1416]. However, this method increases respondent burden, which can reduce response rates. Furthermore, participants may find it confusing and hence return only one item –e.g., the survey or the postcard, but not both. Receipt of equal numbers of postcards and surveys do not necessarily mean the same people returned both. Even when both are returned by the same person, the survey may be received considerably earlier than the postcard. The participant may therefore be subjected to additional mailings until the postcard is received, which may be annoying, and the researcher may incur unnecessary mailing expenses. Finally, unbeknownst to the researcher, some respondents may return more than one survey, leading to the overweighting of those individuals’ responses.

A second approach uses tracking envelopes, which simplifies respondent burden, circumvents the problem of postcards and surveys returning at different times, and avoids analyzing multiple responses from a single participant [17]. In this approach, the envelope contains a unique identifier, but not the survey. The two are returned together but separated immediately upon opening. Received surveys are then intermixed in some random fashion to avoid any possibility of linking them back to their original envelopes. If one participant returns more than one envelope-survey pair, all but the first is discarded. Until the envelope and survey are separated, however, the survey is not truly anonymous. Participants must rely on the researcher’s integrity to maintain anonymity, and they may be less willing to disclose sensitive information relative to the postcard tracking method, where privacy is absolute. Each approach has pros and cons, but the two’s effect on response rates, survey completeness, or disclosure of sensitive information have never been directly compared.

In the present paper we address these issues using a novel technique we developed, the pre-merged questionnaire, which allows comparisons between respondents and non-respondents even under anonymous survey conditions. The study involved a potentially sensitive, self-administered questionnaire asking about several traumatic experiences, including sexual assault during military service. The population of interest was male US Gulf War I era veterans with possible posttraumatic stress disorder (PTSD) who had previously applied for Department of Veterans Affairs (VA) disability benefits. We had reason to believe that sexual assault experiences were particularly high in this population [18]. However, we also feared that traditional rape myth beliefs [19], which may be especially strongly held by military service members socialized into a masculinized subculture, might either deter male sexual assault survivors’ participation in the research or impede their disclosing of such experiences.

Using 3 levels of increasing privacy tied to the tracking methods described above, we hypothesized that response rate and participant representativeness, the number of sensitive questions actually answered by participants, and the proportion of participants disclosing potentially sensitive information would increase in a dose-response manner from the lowest to highest privacy condition. Because higher incentives consistently improve survey response [20], we also tested the impact of two incentives, $10 versus $20, on survey response. We expected the response rate, number of sensitive questions answered, and proportion of participants disclosing sensitive information would be higher among those receiving the $20 incentive compared to the $10 incentive.

Methods

Population and setting

We used simple randomization without replacement to select 324 veterans for survey from the population of 46,824 men who applied for VA PTSD disability benefits prior to June 2007 and had served in the US Armed Forces between August 2, 1990 and July 31, 1991.

Study design and assignment

The study was a 3X2 factorial, randomized controlled trial (Figure 1). Using simple randomization, Veterans were assigned to one of 3 tracking/privacy conditions:

Figure 1
figure 1

Study flow chart.

  1. 1)

    “Confidential”: Under the least private condition, veterans received a survey with a highly visible, coded, unique identifier affixed to the front page of the survey. This was used for tracking, and individual respondents were potentially identifiable from their surveys.

  2. 2)

    “Anonymized-Envelope”: Intermediate in privacy, veterans were asked to return their surveys in a study envelope, which had a pre-printed, unique identification number (ID) on it. When the completed questionnaire was returned, study personnel immediately separated it from the envelope. The questionnaire was intermixed with other arriving surveys and set aside. The envelope ID was used to indicate who had returned surveys. Technically, as long as the survey resided within the envelope, respondents could be identified. Thus, this method was not fully anonymous. Once the questionnaire was removed from the envelope, however, there was no longer any way to identify the respondent (hence the term “anonymized”).

  3. 3)

    “Anonymous-Postcard”: The most private condition, veterans returned their surveys in unmarked envelopes. Besides the survey, veterans were also asked to return an enclosed, brightly colored postcard, which had a unique ID to allow tracking. Respondents could not be identified from their surveys or envelopes at any time.

Once Veterans were assigned to their tracking/privacy condition, we then used simple randomization within each condition to assign them to receive $10 or $20.

Protocol

Data collection

For all groups, the initial mailing included a cover letter describing the study’s risks and benefits, the cash incentive, and 25-page questionnaire. At two week intervals, non-respondents were mailed a post-card reminder, a second mailing of the survey, and a final mailing of the survey via overnight mail (Federal Express). Cover letters were printed on Minneapolis VA Medical Center letterhead and listed the study’s funding agency. Veterans were told that they had been selected for survey because they had filed a VA disability claim and had served during Gulf War I. They were also told that the survey would ask about “combat, unwanted sexual attention, and other lifetime and military experiences”. The cover letters also stated in bold-face font, “We would like to hear from you even if you never experienced combat or unwanted sexual attention. We would also like to hear from you even if you were not deployed to the Persian Gulf.” Cover letters were the same across groups, except that they described the incentive, tracking method, and privacy protections that were specific to each group. Copies of cover letters are available upon request.

Pre-merged questionnaires

To our knowledge, we are the first to develop pre-merged questionnaires for use in anonymous surveys. However, pre-merged questionnaires are simply an extension of the common strategy of using different colored paper, say, to collect data from different groups (e.g., green paper for men, yellow for women). Instead of different colored papers, however, we created a sticker that was applied to each subject’s questionnaire just before mailing. The sticker was designed to be as unobtrusive as possible and was thus camouflaged as a return address on the survey’s back page (Figure 2). Just below the address, we embedded an alpha-numeric code into the mailcode, which corresponded to key administrative data associated with each potential subject. When the survey was returned, so was the administrative data—already merged. The sticker code was deliberately intended to be non-exclusive to the subject. For example, a code such as “504ADBY”, indicating a veteran was aged 50 years or older, served 4 years in the Army and received disability benefits from the VA, could apply to hundreds of thousands of veterans.

Figure 2
figure 2

Example of a pre-merged sticker. For the present study, the sticker was placed within a pre-printed box on the last page of the survey. In this example, administrative data begins after the “E” in the Mailcode.

We maintained two separate, but interrelated computerized files: an administrative file containing subjects’ name and administrative codes, which were used to generate the stickers, and a tracking file containing their names and tracking ID. As envelopes, postcards, or confidential surveys were returned, the tracking ID was entered into the tracking file. This action deleted respondents’ name and ID from the tracking file and triggered a simultaneous deletion of their name and administrative code from the administrative file. Thus, by study’s end, only non-respondents’ administrative codes remained in the computerized record. These were then used to compare non-respondents to respondents. Respondents’ administrative codes were recaptured from the sticker on their returned questionnaires and hand-entered back into the analytical frame.

Measures

Primary outcomes

The primary outcome was response rate, calculated as the number of returned surveys divided by the number of veterans assigned to each arm.

Secondary outcomes

Secondary outcomes included the representativeness of respondents, percentage of Veterans fully completing all sensitive survey items, and the percentage disclosing sensitive information. Information collected by the survey that we thought might be sensitive included veterans’ experiences of sexual abuse, including sexual assault during the time of Gulf War I; other traumatic experiences, including combat and childhood physical abuse; mental health problems, including depression, PTSD, and problem drinking; and veterans’ sexual orientation.

Representativeness of respondents

We used data from the pre-merged sticker to compare respondents to non-respondents. Available data included age greater than or equal to 50 years versus younger, service in the Army versus other branch, VA disability benefit status (receiving versus not), and any VA health care utilization versus none. Specifically, we assessed whether the participant had made a visit to any VA medical facility in the prior year for any reason or had made visits to a VA facility for primary or mental health care. The term, original sample, refers to all veterans selected for the survey, regardless of their response status. Responders and respondents refer to the subset of veterans from the original sample who returned surveys, and non-responders/non-respondents refer to the subset of veterans who did not return surveys.

Sensitive information

Sensitive information was collected by the survey and included the following:

Sexual abuse

We used 3 items from Sexual Harassment Inventory’s criminal sexual misconduct scale [21] plus one additional item [22] to assess sexual assault during the time of Gulf War I, 4 items from the Sexual Abuse subscale of the Childhood Trauma Questionnaire [23] to assess childhood sexual abuse, and one item from the Life Stressor Checklist [24] to assess any sexual assault in the past year. A positive response to any one of these questions indicated a history of sexual abuse.

Other traumatic experiences

Other traumatic experiences included combat exposure, assessed using an adapted Combat Exposure Inventory [25] version; childhood physical abuse, assessed using items from the Childhood Trauma Questionnaire’s relevant subscale [23]; and past-year traumas, assessed using an adaptation of the Life Stressor Checklist [24]. Veterans who reported any childhood physical abuse item more than “rarely” were considered physically abused.

Mental health problems

We used the 5-item RAND Mental Health Battery [26] to assess depression, the Penn Inventory for PTSD [27] to assess PTSD symptoms, and the TWEAK [28] to assess alcohol misuse.

Sexual orientation

Sexual orientation was assessed using a single survey item that read, “People are different in their sexual attraction to other people. Which best describes your feelings?” Responses ranged from 1 = “Completely heterosexual or ‘straight’” to 5 = “Completely homosexual or ‘gay’”. Responses were dichotomized as “Completely heterosexual” versus “Not completely heterosexual” for analysis.

Power

The study was funded to examine different incentives’ impact on response rate and had 80% power to detect a 10% difference in response rates across incentives, assuming a 60% response rate in the $10 group and two-tailed alpha of 0.05.

Analysis

The study was intended to examine main effects, but interactions were assessed in an exploratory fashion. Results are reported for tracking/privacy condition first; incentive condition second; and, when tested, interactions third. We used χ 2 tests to compare outcomes across privacy conditions and incentives and to compare respondents and non-respondents. We used logistic regression to test for interactions between tracking/privacy condition and incentive on outcomes. We used IBM SPSS Statistics (version 19) and SAS (version 9.2) statistical packages for analyses.

Masking, disclosure, and ethical approval

Data collectors and analysts were aware of study group assignment. The Minneapolis VA Medical Center’s Subcommittee for Human Studies approved the protocol.

Results

Response rate

Response rate overall was 60.5% and did not differ significantly across tracking/privacy assignments (Confidential response rate = 56.0%, Anonymized-envelope response rate = 63.3%, Anonymous postcard response rate = 62.3%, p = 0.49). However, the response rate was almost 15 full percentage points higher among veterans randomized to receive the $20 incentive (response rate = 68.1%) compared to the $10 incentive (response rate = 52.8%, p = 0.007). While the lowest response rate was obtained from men randomized to the Confidential/$10 incentive group (response rate = 43.6%; see Figure 1), tests for interactions between tracking/privacy and incentives on response rate were not statistically significant (p = 0.46).

Respondent representativeness

As Table 1 shows, randomization failed to evenly distribute the 324 veterans according to their past-year VA health care utilization. Specifically, veterans randomized to the Anonymous-Postcard were less likely to have made a VA health care visit of any kind in the past year than were veterans randomized to the Anonymized-Envelope and Confidential groups (67.6% versus 75.4% in the other two conditions). Otherwise, randomization successfully distributed all the remaining administrative characteristics evenly across all the tracking/privacy and incentive conditions.

Table 1 Population characteristics by tracking/privacy condition and incentive; results reported as a percentage (%)

The characteristics of survey responders are shown in Table 2. Responders in the Anonymized-Envelope group had a higher proportion of individuals aged 50 years or older, a lower proportion of white persons, and a lower proportion of persons working for pay compared to the other two groups, but none of these differences were statistically significant (all p’s > 0.18). Consistent with the original sample’s maldistribution, Anonymous-Postcard respondents were less likely than other respondents to have made a visit of any kind to a VA medical facility in the prior year. Compared to the other tracking/privacy conditions, Anonymous-Postcard respondents were also substantially less likely to have made a mental health care visit to a VA medical facility, but this could not be attributed to a maldistribution of the original sample. Compared to the administrative record, all respondents substantially under-reported receiving VA disability benefits.

Table 2 Respondent characteristics by tracking/privacy condition and incentive, results reported as a percentage (%)

Respondents in the $10 incentive arm were significantly older, less likely to be working for pay, and more likely to say they received VA disability benefits than respondents in the $20 incentive arm. Both groups substantially underreported their receipt of VA disability benefits compared to the administrative record. There were no statistically significant tracking/privacy-by-incentive interactions (all p’s > 0.20).

Table 3 presents information for the original sample, stratified by response status and by study assignment. Findings show that Confidential and Anonymized-Envelope respondents differed significantly from their non-respondent counterparts in terms of age and service branch. Compared to their non-respondent counterparts, Confidential respondents were also more likely to be receiving VA disability benefits, and Anonymized-Envelope respondents were more likely to have made VA primary care and mental health visits. There were significant age differences among respondents and non-respondents randomized to receive $10, but respondents and non-respondents did not differ significantly on any available characteristic among those assigned to the Anonymous-Postcard or $20 incentive. There were no significant tracking/privacy-by-incentive interactions.

Table 3 Characteristics of original sample, stratified by response status and by tracking/privacy condition and incentive; results reported as a percentage (%)

Percentage fully completing sensitive items and percentage disclosing sensitive information

As Table 4 shows, with the exception of combat items, respondents answered every item on each of the potentially sensitive scales more than 90% of the time, regardless of tracking/privacy condition or incentive. Twenty-six questions were used to assess combat exposure, which may explain why it had the most skipped items (10.7% overall), though respondents were twice as likely to skip a combat item as they were to skip a PTSD item (3.1% overall), which also contained 26 questions. The sexual abuse questions were second most likely to be skipped (7.1% overall). There were no statistically significant associations between tracking/privacy assignment and completion of sensitive survey items. Likewise, higher incentive was not associated with greater completion of sensitive survey items, and there were no interactions between tracking/privacy assignment and incentive.

Table 4 Percentage (%) of respondents fully completing all items in a potentially sensitive scale by tracking/privacy condition and incentive

As Table 5 shows, Anonymized-Envelope respondents were substantially more likely than other respondents to disclose a history of sexual abuse. Several other contrasts appeared numerically large, even though they did not reach statistical significance: Anonymous-Postcard respondents reported more childhood physical abuse (p = 0.06) and had fewer positive depression screens compared to the other tracking/privacy groups (p = 0.09), and Confidential respondents reported more combat (p = 0.09) and had more positive PTSD screens (p = 0.08).

Table 5 Percentage (%) of respondents disclosing potentially sensitive information by tracking/privacy condition and incentive

Main effects in disclosing sensitive information by incentive did not reach statistical significance. However, there was a trend toward statistical significance in the proportion of respondents randomized to the $10 incentive with a positive depression screen compared to the $20 respondents (p = 0.08). Among Anonymized-Envelope respondents, those randomized to the $10 incentive were substantially more likely to screen positive for PTSD than those in the $20 arm (90.6% v. 65.8%; p = 0.05). Otherwise, there were no tracking/privacy-by-incentive interactions.

Discussion

In this randomized controlled trial, more survey privacy was not associated with statistically significantly higher response rates compared to less privacy, nor did tracking/privacy condition affect the proportion of respondents who actually answered our sensitive questions. Instead, each tracking/privacy condition attracted its own unique pool of respondents, which in turn may have influenced our group-specific estimates of sexual abuse, childhood physical abuse, combat, and mental health problems—despite the fact that all participants originated from the same sampling frame. Estimates of sexual abuse, for example, were more than 2 times higher in the Anonymized-Envelope condition than in the other two conditions.

As expected, the higher incentive resulted in a substantially higher response rate than the lower incentive, but there was no association between incentive and the proportion answering our sensitive questions. As with the tracking/privacy manipulation, each incentive appeared to attract its own unique pool of respondents, with the larger incentive attracting younger workers for pay who were less likely to say they were receiving disability benefits compared to the smaller incentive. Statistically, prevalence estimates for potentially sensitive or stigmatizing material did not differ significantly by incentive, despite some numerically large differences. For example, more than half of respondents randomized to the $10 incentive screened positive for depression, compared to about a third of respondents in the $20 arm.

According to leverage-salience theory [29], individuals attend to different criteria when deciding to return a survey and, further, assign to each criterion different weights and importance. These are known as “leverages”. In the present study, each tracking/privacy and incentive condition appeared to trigger a different set of leverages, so that unique subpopulations selectively participated in each of the study’s arms. When considering sensitive material, therefore, one cannot assume that the survey method generating the highest estimate is most accurate.

Since Anonymous-Postcard respondents did not differ significantly from non-respondents on available measures, one might be tempted to conclude that this tracking/privacy method generated the most representative sample of respondents and hence most accurate prevalence estimates. If so, one would also have to conclude that the Anonymized-Envelope approach over recruited sexual abuse survivors. History of sexual abuse was 13.6% among Anonymous-Postcard respondents and 33.3% among Anonymized-Envelope respondents. However, we have shown elsewhere that, even when using Anonymized-Envelopes, survey respondents underreport their military sexual assault experiences by a factor of three [30]. This suggests that the Anonymized-Envelope method either reduces under selection of veterans with sexual abuse histories or optimizes more “honest” reporting among those who have such histories—or both—compared to the Anonymous-Postcard method. It may do so, however, at the expense of either over excluding veterans with a history of childhood physical abuse or discouraging “honest” reporting of childhood abuse. In the present study the Anonymized-Envelope method generated a substantially lower, albeit not statistically significant, estimate of childhood physical abuse of 59.4% compared to the Anonymous-Postcard’s estimate of 75.8%.

In general, tracking/privacy condition and incentive level appeared to affect respondent representativeness independently, with incentives’ principal impact being the recruitment of younger and healthier participants. These findings may be reassuring to Human Studies oversight boards, who might otherwise worry that large incentives coerce the sickest and most vulnerable into survey research participation. Halpern et al. [31] has shown that higher payment levels do not override research participants’ risk perceptions when considering whether to enroll in clinical trials, and, furthermore, poorer, presumably more vulnerable participants are actually less sensitive to higher incentive levels than are wealthier participants. Similar findings have been reported for those deciding whether to respond to a survey [32].

The present study offers proof-of-concept for pre-merged questionnaires’ utility. However, pre-merged questionnaires will prove most powerful when they incorporate administrative information that is highly related to the survey’s topic (e.g., sexual abuse, childhood abuse) instead of basic demographic information. Because we did not have such information for the present study, we cannot say whether our differing estimates for these sensitive data across the three tracking/privacy conditions were a function of reducing or inflating selection biases, a function of enhancing or impeding “honest” reporting, or both. Future research will be needed to explore these issues further. It may well be that different tracking/privacy methods will prove best for different sensitive topics.

We used a computerized system to manage the tracking and administrative data interface in the present study, but the pre-merged questionnaire concept could easily be applied to manual methods. For example in a study using up to three survey mailings per subject, one could pre-print 3 stickers per subject, file them under each subject’s name, and then throw away any remaining stickers once the subject’s postcard or envelope ID was returned. By study’s end, only non-respondents’ stickers would remain.

Pre-merged questionnaires carry important limitations. Researchers must be selective in what data they encode to keep the sticker from becoming uniquely identifying. If too much information is included, participants might become identifiable based on their unique combination of administrative data. We dichotomized age and service branch in the present study for this reason. Pre-merged questionnaires also cannot capitalize on new information. Health care visits occurring after a survey is mailed cannot be linked into a dataset, for example. Nonetheless, the technique offers an advance over usual anonymous methods, particularly in its ability to assess for non-response bias, and it could easily be applied to other sensitive topics.

This study’s strengths include its randomized, controlled design and demonstration of a unique technique to overcome what has historically been an important limitation of anonymous methods –namely, an inability to evaluate non-response bias. We also compared two tracking methods that can be used in anonymous surveys. Limitations include its relatively small and unique sample. Since we did not have access to verifying information, we cannot say how honestly participants reported their experiences. Findings’ generalizability to other sensitive topics, to non-veterans, or to women is also uncertain. The study was powered to examine main effects of incentives on response rates, and we may have made Type II errors when examining secondary outcomes, effects of the different tracking/privacy conditions, and potential interactions. When findings appeared suggestive, however, we described them in the text. We also made multiple comparisons, which may have inflated our Type I error.

Conclusion

We anticipated that greater privacy and larger incentives would be associated with higher response rate, better participant representativeness, more survey completeness, and greater disclosure of potentially sensitive information. Results showed no association between privacy and response rate or survey completeness, supported the association between greater privacy and participant representativeness, and yielded mixed effects for the disclosure of sensitive information. A larger incentive was associated with higher response rate and better participant representativeness but no association with survey completeness. In the intermediate privacy arm, lower incentive—not higher—was associated with reporting more PTSD symptoms. Otherwise, we found no statistically significant associations between incentive and disclosing potentially sensitive information.

Having shown that different tracking/privacy conditions yielded different estimates of sensitive information, we cannot, unfortunately, tell which estimate was most accurate. Traditionally, higher disclosure rates of sensitive or stigmatizing information have been interpreted as being more accurate than lower rates, but our data suggest that apparently different disclosure rates may simply be a function of the subpopulations successfully recruited into a survey. This possibility needs greater investigation. Pre-merged questionnaires bypassed many of the limitations historically associated with anonymous survey methods and could be used to explore non-response issues in future research.

Authors’ information

MM, MAP, and MRP are core-investigators; AKB is data manager; and SN is core statistician for the Center for Chronic Disease Outcomes Research at the Minneapolis VA Medical Center. ABS is a former Center for Chronic Disease Outcomes Research data manager and currently works in the Health Economics Program, Minnesota Department of Health, St. Paul, MN. JPG is a former Center for Chronic Disease Outcomes Research statistician.