FormalPara Key Points

Pretesting is one of several essential stages in the design of a high-quality discrete-choice experiment (DCE) and involves engaging with representatives of the target population to improve the readability, presentation, and structure of the preference instrument.

There is limited available guidance for pretesting DCEs and few transparent examples of how pretesting is conducted.

Here, we present and apply a guide which prompts researchers to consider aspects of the content, presentation, comprehension, and elicitation when conducting a DCE pretest.

We also present a pretesting interview discussion template to support researchers in operationalizing this guide to their own DCE pretest interviews.

1 Introduction

Discrete-choice experiments (DCEs) are a frequently used method to explore the preferences of patients and other stakeholders in health [1,2,3,4,5]. The growth in the application of DCEs can be explained by an abundance of foundational theory and methods guidance [6,7,8,9], the establishment of good research practices [1, 8, 10, 11], and interest in the approach by decision makers [12, 13]. In recent years, greater emphasis has been placed on confirming the quality and internal and external validity of DCEs to ensure their usefulness, policy relevance, and impact [5, 14,15,16].

The value that decision-makers place in DCE findings is in large part dependent on the quality of the instrument design process itself. Numerous quality indicators of DCEs have been discussed in the literature, including validity and reliability [5], match to research question [17], patient-centricity [18], heterogeneity assessment [19], comprehensibility [20], and burden [21]. Developing a DCE that reflects these qualities requires a rigorous design process, which is often achieved through activities such as evidence synthesis, expert consultation, stakeholder engagement, pretesting, and pilot testing [15, 17]. Of these, there is ample guidance on activities related to evidence synthesis [22, 23] including qualitative methods [24, 25], stakeholder engagement [26, 27], and pilot testing [11, 27].

By contrast, there remains a paucity of literature on the procedures, methodologies, and theory for pretesting DCEs; even studies which report having completed pretesting typically report minimal explanation of their approach. Existing literature on pretesting DCEs has typically reported on pretesting procedures within an individual study, rather than providing generalized or comprehensive guidance for the field. Practical guidance on how to conduct the pretesting for all components of a DCE is needed to help establish a shared understanding and transparency around pretesting. Ultimately, this information can lead to improvements in the overall DCE design process and the confidence in findings from DCE research.

This paper has three objectives. The first objective is to define pretesting and describe the pretesting process specifically in the context of a DCE. The second objective is to present a guide and corresponding interview discussion template which can be applied by researchers when conducting a pretest of their own studies. The third objective is to provide an illustrative example of how these resources were applied to the pretest of a complex DCE instrument aimed at eliciting trade-offs between personal privacy and societal benefit in the context of a police method known as investigative genetic genealogy (IGG).

2 What is Pretesting?

Pretesting describes the process of identifying problem areas of surveys and making subsequent modifications to rectify these problems. Pretesting can be used to evaluate and improve the content, format, and structure of a survey instrument. It generally does this by engaging members of the target population to review and provide feedback on the instrument. Additionally, pretesting can be used to reduce survey burden, improve clarity, identify potential ethical issues, and mitigate sources of bias [28]. Pretesting is considered critical to improving survey validity in the general survey design field [29]. Empirical evidence demonstrates that pretesting can help identify problems, improve survey quality, reliability, and improve participant satisfaction in completing surveys [30].

Pretesting typically begins after the design of a complete survey draft. It occurs between a participant from the target population and one or more survey researchers. It is typical to explain to the participant that the activity is a pretest, and that their responses will be used to inform the design of the survey. Researchers often take field notes during the pretest. After each individual or at most any small set of pretests, research teams debrief to review findings and to make survey modifications. The survey is iterated throughout this process.

Several approaches can be used to collect data during a pretest of a survey generally as well as specifically for surveys including DCEs [31]. One approach is cognitive interviewing, which tests the readability and potential bias of an instrument through prospective or retrospective prompts [32]. Cognitive interviewing can ask participants to “think aloud” over the course of the survey, allowing researchers to understand how participants react to questions and how they arrive at their answers, as well as to follow up with specific probes. Another approach is the debriefing approach, wherein participants independently complete the survey or a section of it. Researchers then ask participants to reflect on what they have read, describe what they believe they were asked, and reflect on any specific aspects of interest to researchers such as question phrasing or order of survey content [33].

In behavioral coding approaches, researchers observe participants as they silently complete the activity, noting areas of perceived hesitation or confusion [33]. This is sometimes done through eye-tracking approaches, wherein eye movements are studied to explore how information is being processed [34]. Pretesting can also occur through codesign approaches, which are more participatory in nature. In a codesign approach, researchers may ask participants not just to reflect on the instrument as it is presented but to actively provide input that can be used to refine the instrument [35]. Across all methods, strengths and weaknesses of the instrument can be identified inductively or deductively.

Pretesting in the explicit context of choice experiments has not been formally defined. Rather, it has been used to describe a range of exploratory and flexible approaches for assessing how participants perceive and interact with a choice experiment [1, 36]. Recently, there has been greater emphasis on the interpretation, clarity, and ease of using choice experiments [37] given their increasing complexity and administration online [38,39,40]. We propose a definition of pretesting for choice experiments here (Box 1).

Pretesting of DCEs is as much an art as it is a science. Specifically, pretesting is often a codevelopment type of engagement with potential survey respondents. This engagement can empower the pretesting participants to suggest changes and to highlight issues. The research team (and potentially other stakeholders) work with these pretesting participants to solve issues jointly. As a type of engagement (as opposed to a qualitative study), we argue that it is process heavy, with the desired outcome of the engagement often being the development of a better instrument. Pretesting may also be incomplete and involve making judgement calls about what may or may not work or what impacts certain additions or subtractions may have.

Box 1. Defining pretesting in choice experiments

One of the key stages of developing a choice experiment, pretesting is a flexible process where representatives of the target population are engaged to improve the readability, presentation, and structure of the survey instrument (including educational material, choice experiment tasks, and all other survey questions). The goal of pretesting a DCE is to improve the validity, reliability, and relevance of the survey, while decreasing sources of bias, burden, and error associated with preference elicitation, data collection, and interpretation of the data.

Additional considerations above and beyond those made during general survey design are required when pretesting surveys including DCEs. Specific efforts should be made to improve the educational material used to motivate and prepare people to participate in the survey. A great deal of effort should be placed on the choice experiment tasks and the process by which information is presented, preferences are elicited, and tradeoffs are made [11, 25, 38, 41]. More than one type of task format and/or preference elicitation mechanism may be assessed during pretesting. It is also important to assess all other survey questions to ensure that data is collected appropriately and to assess the burden and impact of the survey. Pretesting can be aimed at reducing any error associated with preference elicitation and data collection. Information garnered during pretesting may help generate hypotheses or give the research team greater insights into how people make decisions. Hence, pretesting, as with piloting a study, may provide insights that may help with the interpretation of the data from the survey, both a priori and a posteriori.

Pretesting is one of several activities used to inform DCE development (Table 1). Activities that precede pretesting include evidence synthesis through activities such as literature reviews, stakeholder engagement with members of the target population, and expert input from professionals in the field. These activities can be used to identify and refine the scope of the research questions as well as develop draft versions of the survey instrument. Pilot testing typically proceeds pretesting. Although the terms have sometimes been used interchangeably [42], pretesting and pilot testing are distinct aspects of survey design with unique objectives and approaches. While both methods generally seek to improve surveys, pretesting centers on understanding areas in need of improvement, and quantitative pilot testing typically explores results and whether the survey questions and choice experiment are performing as intended.

Table 1 Stages of choice experiment design

3 A Guide for Pretesting Discrete-Choice Experiments

We developed a practical guide to help researchers conduct more thorough pretesting of DCEs. This guide is based on our research team’s practical experience in pretesting dozens of DCE instruments. The guide is organized into four domains for assessment during the pretest of a DCE (content, presentation, comprehension, and elicitation) and poses guiding questions for researchers to consider across each domain (Table 2). These questions are not meant to be asked verbatim to pretesting participants but rather are ones that the researcher might ask themselves to help guide their own pretesting process. This guide is relevant to inform the pretesting of DCE materials including background/educational material, example tasks, choice experiment tasks, and any other survey questions related to the DCE, such as debriefing questions. In practice there is often overlap across questions relevant to different domains.

Table 2 Guide for Pretesting DCEs

The content domain of the guide primes researchers to consider the relevance and comprehensiveness of the DCE. Addressing the content of the DCE includes refining and reframing attributes and levels. Additional considerations in this domain include whether attributes collectively capture the concept of interest, if attribute levels are presented in an appropriate and logical order, and if the level values are appropriate. Finally, the elicitation should be scrutinized for its relevance to the decision context.

The presentation domain of the guide recommends researchers consider aspects of the DCE such as its visualization and formatting. Addressing the presentation of the DCE could include identifying areas with response burden and discussing ways to reduce that burden (e.g., altering the organization of the survey, creating a color-coding scheme, etc.). Additional aspects of consideration include whether the visual aspects of the DCE, such as color, images, and layout effectively convey the intended messaging: whether materials, including the introduction and example tasks, are informative; whether materials are logically ordered; if numeric and risk concepts have been optimally presented; and if the presentation can be further optimized to reduce time, cognitive, or other burdens on the participant.

The comprehension domain of the guide reminds researchers to consider how well the DCE is being understood by participants. Addressing the comprehension of the DCE could include identifying if key terms are understood and recalled and if materials are consistently comprehensible. Additionally, it is valuable to assess whether participants are able to envision the proposed scenario and decision context in which they are being asked to make choices.

The elicitation domain of the guide refers to the process of making a choice within a given choice task. Addressing the elicitation of the DCE could include identifying whether and/or why tradeoffs are being made, if underlying heuristics are driving decision making, if choices are being made based on outside information, and if participant preferences are reflected by their choices.

4 Applying the Guide to a Complex DCE

In this section, we describe how we applied the novel pretesting guide to inform the design of a DCE regarding preferences for the use of a new police method called investigative genetic genealogy (IGG). IGG is the practice of uploading crime scene DNA to genetic genealogy databases with the intention of identifying the criminal offender’s genetic relatives and, eventually, locating the offender in their family tree [43]. Although IGG has brought justice to victims and their families by helping to close hundreds of cases of murder and sexual assault, including by identifying the notorious Golden State Killer [44], there are concerns that IGG may interfere with the privacy interests of genetic genealogy database participants and their families. Studies predating IGG have demonstrated that individuals have concerns about genetic privacy, yet they are still sometimes willing to share their genetic data under specific conditions [45,46,47,48].

Understanding the tradeoffs that the public makes when assessing the acceptability of the use of genetic data during IGG has become increasingly important as policy makers consider new protections for personal data maintained in commercial genetic databases and restrictions on the practice of IGG [49,50,51]. To help inform these conversations, we designed a DCE to measure public preferences regarding when and how law enforcement should be permitted to participate in genetic genealogy databases.

4.1 Approach

Our study team followed good practices in choice experiment design. A literature search was conducted to understand the context for the forthcoming choice experiment by exploring salient ethical, legal, and social implications of IGG. We sought expert input through a series of qualitative interviews with law enforcement, forensic scientists, genetic genealogy firms, and genetic genealogists to obtain a technically precise and comprehensive description of current IGG practices and forecasts of its future. Public input on IGG was elicited through eight US-based, geographically diverse focus groups to identify what the general population believes are the most salient attributes of law enforcement participation in genetic genealogy databases. The findings from these activities informed the elicitation question, identification and selection of attributes and levels, and use of an opt-out in the initial version of the DCE.

Pretesting of the DCE occurred from October–November 2022 (Fig. 1). Pretest participants from our target population were recruited through the AmeriSpeak panel, a US-based population panel [52]. All interviews were conducted over Zoom and recorded. One research team member led the pretest interview, while another took field notes on a physical copy of the survey and flagged areas for potential modification. We followed a hybrid cognitive interview and debrief style pretest and set expectations for participants by stating that “sometimes we will ask you to read some sections of the survey aloud, and other times we will ask you to read in your head but to let us know if there is anything unclear.”

Fig. 1.
figure 1

DCE development timeline

We applied a version of our pretesting discussion template (Box 2) with increased specificity regarding the IGG content of the DCE. This discussion template operationalizes domains and overarching questions posed by the guide and organizes them according to the typical flow of a DCE embedded in a survey. This pretest interview discussion template provides an example of questions to be asked of participants directly when reflecting upon the different sections of the DCE with the survey, including introduction to the choice experiment, review of attributes/levels, example task, choice tasks, debriefing, and finally format and structure considerations pertinent to the DCE. Interviews explored the domains of the guide for all aspects of the DCE. Tradeoffs were assessed in several ways, including asking participants to think aloud as they completed choice tasks to ascertain how they were making tradeoffs, probing as to why one profile was chosen over another and if tradeoffs needed to be made when making their choice, as well as probing if any attributes or levels were more impactful in their decision making than others.

Box 2. Pretesting interview discussion template

Introduction to the choice experiment

This is the introduction to the next part in the survey. Do you mind reading this aloud?

Can you read the description out loud? Is anything unclear?

Do you have any issues or questions?

Review of attributes/levels (one attribute at a time)

Can you explain the attribute and its levels in your own words based on what you know so far?

Are the description of the attribute and its levels clear? Would you make any change to how they are described?

What do you think about the levels of the attribute presented? Are they too similar or too different?

Do you think the levels for this attribute are presented in the right order?

Example task

Can you explain to me in your own words what you think is being asked in this task?

Is the choice we are asking you to make clear? Can you imagine the scenario we are describing here?

Do you think you could answer this question?

Was this example task useful? Do you feel prepared to do the next task on your own?

Choice tasks

Can you think aloud for me as you review this choice task?

Do you like either of these profiles better than the other?

If so, can you explain why?

Was there any tradeoff you had to make in choosing that profile?

Was there something you like better in the profile you didn’t choose?

Was there anything in the profile you chose that was not ideal?

Were any of the attributes more important in your decision making than others?

Is there any information outside of what we’ve presented here that factored into your choice?

Should there have been an opportunity to choose neither profile? As in, to opt-out of making a choice?

How many of these tasks could you do before getting tired? Is there anything we could change to make it easier to do more of these tasks?

[If presenting multiple choice task formats] Which of these formats do you like more?

Debriefing

How consistent were your answers with your preferences?

How easy were the choice tasks to understand?

How easy were the choice tasks to answer?

How long do you think it would have taken you to complete X tasks?

Format and structure

What kind of device do you usually use to take surveys (e.g., phone, computer)?

If by phone, what orientation do you typically use when taking surveys on your phone? Would you flip it to portrait to accommodate a wide choice task? Or prefer it to be up/down?

Input from pretest interviews was routinely integrated into new versions of the survey. Suggested modifications for which there was high consensus across the research team (e.g., changes to syntax) were made immediately following a pretesting interview. Lower consensus modifications and more substantive changes to the instrument were made every three to four pretests, or as soon as the problem and solution became clear to the research team. All activities were approved by the Baylor University IRB (H-47654).

4.2 Results

In total, 17 pretesting interviews were conducted with sociodemographically diverse participants in the USA. Interviews averaged 50 minutes in length (ranging 30–74 min) for a roughly 20-min survey. Substantial modifications to the DCE were made within each of the four domains posed in the checklist (Table 3). Regarding content, early into pretesting participants indicated they had difficulty fully understanding and connecting with each attribute and level introduced. To address this, attributes and levels were reframed to include meaningful potential benefits not initially accounted for to encourage better understanding as to why each attribute and level is being discussed. Following this change, participants expressed that they were able to provide more genuine responses in the DCE.

Table 3 Application of the pretesting guide to a DCE on IGG

Initially, the DCE choice tasks simply listed the appropriate attribute levels for each profile. For all choice tasks, attributes remained in the same position with levels changing according to the experimental design. While pretesting, participants expressed that it was difficult to distinguish differences between the two profiles. To address this, attribute levels were color coded, allowing for easier comparison across attribute levels in each profile. As a result, we observed a decrease in the amount of time it took for participants to complete the tasks and greater certainty in their choices.

At the outset to improve comprehension we included a brief video on the survey which provided a general overview of IGG. We anticipated that this video would be more engaging than explaining key information about IGG via text. However, early into pretesting we established that the video was introducing too many key terms that were not featured in the survey itself. We also learned that pretest participants were inclined to skip videos and “get on” with the rest of the survey. To address this, we replaced the educational video with a short, high-level description of only the most relevant terms, and included two comprehension questions to confirm understanding. Participants in later rounds of pretesting expressed satisfaction with the approach and comprehension of key terms appeared to increase.

Attributes and levels evolved over the pretesting process (Fig. 2). Changes included reordering the order in which levels were introduced, rewording both the attribute and levels, and adding/removing an attribute. A change in the order in which levels were introduced occurred in the attribute “how long the police must wait before using IGG.” This change was prompted by a suggestion made by a participant, making the levels and their descriptions flow in a more sensible manner. Each interview provided insights for how individuals may interpret each of the attributes and levels; this promoted modifications, both major and minor, to the language used to label and describe the concepts. This included simplifying language and being intentional with the use of words. Changes were also made by taking into consideration how the attribute and level would be displayed in the choice task. The attribute “identification of unknown remains” was dropped after the first version of the survey, later replaced by “victim focus.” This change came about as the interpretation and understanding of the initial attribute varied too much from person to person. Simplifying the language allowed for a clearer understanding of the attribute and its levels.

Fig. 2.
figure 2

Major changes in DCE across the pretesting process

The choice task itself also evolved during pretesting (Fig. 3). Each choice task initially contained the two profiles for participants to choose between, as well as the ability to choose that IGG cannot be used at all. Early into pretesting, some participants indicated that they would have a hard time completing the DCE portion of the survey without the ability to opt-out of IGG because they “would never support the use of IGG” and therefore felt the profiles would be inconsistent with their preferences. The language that was used to describe the opt-out was modified throughout pretesting. We observed that participants who were opposed to the use of IGG found it easier to answer the questions, and that they were more willing to complete all tasks in the section by incorporating an opt-out option.

Fig. 3.
figure 3

Choice task before and after pretesting

5 Discussion

In this paper we explore the concept of pretesting in the context of DCEs, highlighting the unique considerations for pretesting choice experiments that go above and beyond those of survey research more generally. To demonstrate these considerations, we present a novel guide for pretesting DCEs. We anticipate that this guide will provide practical guidance on pretesting DCEs and help researchers to conduct more comprehensive and relevant pretesting. Applying the guide to our own DCE helped identify multiple areas for improvement and substantial modifications to the instrument as a result.

There is a need for greater transparency around all stages of DCE design, especially pretesting. Fewer than one-fifth of DCE studies report including pretesting in their development [53]. Among those that do report having done a pretest, is it not typical that they report their specific methods and approaches to pretesting. Increasing transparency of reporting around pretesting will help identify the diversity of methods used for pretesting in DCEs, help establish norms for pretesting, and facilitate a better understanding of how to conduct a high-quality pretest.

We hope that this guide will spur future work to establish good practices in pretesting. This guide can be used to promote a common understanding of what pretesting is, as a concept, and what the process entails. There is an array of interpretations of what pretesting is, with inconsistent methods and applications. Continued discussion on pretesting as a concept as well as procedural guidance is needed to ensure pretesting practices are incorporating good research practices. To support the development of good practices, research should seek to answer questions such as: When are certain pretesting approaches most effective? What are indicators of pretesting success? How do we ascertain when pretesting is complete and a survey is ready for pilot testing?

There is not typically a clear indication of when the pretesting phase of instrument design is complete. In the absence of established indicators of completeness, our team’s experience has been that the decision to stop pretesting relies upon a bevy of factors including budget, timeline, and perceived improvement/quality in the instrument. In the current application of pretesting a DCE for IGG, we conducted what we believe to be a rigorous pretesting process with 17 pretesting interviews. Our decision to close pretesting was based on our increasing confidence that participants were understanding the survey and the experiment embedded within it, that they were making tradeoffs, and that those tradeoffs were reflecting their preferences. The success of our pretesting process was ultimately reflected in the subsequent pilot test, which demonstrated that the DCE evoked tradeoffs consistent with those expressed by pretesting participants. Notably the opt-out, which was included because of input during pretesting, offered meaningful insight into preference heterogeneity.

There are several aspects of this research which limit its generalizability. First, this guide was developed to advise primarily on questions relevant for pretesting DCEs rather than across broad methods of preference or priority elicitation. We anticipate that the domains posed in the guide are broadly applicable to all forms of preference elicitation, but the content of specific guiding questions would likely vary based on the method being applied. For instance, best–worst scaling case 1 does not use levels; therefore, questions about level ordering posed in the current guide are not relevant.

Second, pretesting is a semistructured, rapid-cycle activity that is not meant to be generalizable. The researcher is themselves an instrument in the pretesting process, and unique characteristics such as the researcher’s former experiences and knowledge of the research topic/population will influence how, when, and why changes are ultimately made to the survey. Researchers can be reflexive about their role in the instrument design process and should seek to internally understand their own motivations or rationales for making changes.

6 Conclusions

Pretesting is an essential but often under-described stage in the DCE design process. This paper provides practical guidance to help facilitate comprehensive and relevant pretesting of DCEs and operationalizes this guidance using a pretesting interview discussion template. These resources can facilitate future activities and discussions to develop good practices for pretesting which may ultimately help facilitate higher quality preference research with greater value to decision makers.