Background

In the quest to design new interventions to improve health care, health services research is routinely informed by studies and experiments that incorporate elements different from the real-world application. For example, when designing an intervention to reduce ordering of low-value tests in the ICU, the intervention may not be piloted only on ICU physicians within their day-to-day practice; instead, valid responses are expected to be obtained when data is collected outside of their day-to-day practice, or from non-ICU physicians, or from medical students. A parallel is often drawn with pharmaceutical trials, where prior to definitive trials, considerable preparatory research involves many ‘hypothetical’ elements, including animal models, pilot participants (e.g. patients, clinicians who may differ from the ultimate target group), hypothetical decisions (i.e. would you participate in a study like this?) and pilot settings (e.g. laboratories). The mechanisms studied in this preparatory research are expected to generalize to the ultimate clinical setting, despite these hypothetical or modeled elements, and such preparatory work is considered essential to the overall goal of designing interventions that will work safely and effectively in real clinical settings.

When developing health services interventions, pilot research can incorporate many hypothetical elements. As a multidisciplinary field that studies how personal, organizational, technological, and systemic factors affect access to, quality, and cost of health-care [1], health services research often seeks to design complex interventions [2] to encourage changes in behaviour and decision making among actors (patients, providers, decision makers) within the system. To aid development of these complex interventions, initial work can include piloting decision support tools on healthy volunteers rather than patients, measuring physician performance in simulated settings, and surveying or interviewing people about how they would behave under various hypothetical circumstances.

Despite these tools at our disposal, health services research interventions have often proceeded to large-scale trials without adequate preparatory or pilot research [2,3,4,5]. The most recent UK MRC Framework for complex interventions [2] explicitly emphasizes the need to pilot these interventions, in part to model the mechanisms by which one expects the intervention to work before proceeding to large, expensive trials. The reasons why there has been such a lack of preparatory work in health services research are unclear, and may stem in part from a naïve sense of the ease with which such behaviours and decisions can be changed [5, 6]. The study of the mechanisms underlying how health services interventions work is still relatively new [5, 7, 8]. Perhaps as an implicit reaction to the lack of understanding around this issue, there is a disciplinary distrust in pilot data that involve ‘hypothetical’ elements; systematic reviews often exclude studies involving hypothetical elements [9,10,11] without adequate justification.

We propose that understanding the conditions under which health services studies with ‘hypothetical’ design elements can yield valid results is essential to advancing health services research. With so many elements in these complex interventions, conducting full-scale trials of every permutation is essentially impossible; comparing different combinations in smaller pilot studies with hypothetical elements is inevitable and necessary. While other disciplines (e.g. economics, [12] moral reasoning, [13] social psychology [14]) have explored the conditions under which hypothetical decisions accurately reflect real-world decisions, little of this work has been applied to problems of health services intervention design. As an initial step towards understanding how such factors might be relevant to designing health services interventions, we conducted a systematic concept review of factors that have been shown to be related to the consistency between hypothetical and real-world decisions or behaviours. Based on these findings, we proposed a preliminary framework for those seeking to design a pilot process with hypothetical elements, which summarises and describes factors that may be related to ultimate validity with real-world behaviours.

Methods

We conducted a systematic concept mapping review, which we define as a review with a systematic search strategy that seeks to delineate the factors related to one or more target concepts; as such, the approach overlaps with systematic reviews and mapping reviews [15]. In this case, we sought to describe and map factors related to ‘consistency’, defined as the association between hypothetical decisions and corresponding real-world decisions or behaviours. In the context of this review, consistency is operationalized liberally as the association between 1) a hypothetical task or pilot task that includes some hypothetical elements, and 2) a corresponding, author-defined ‘real-world’ task, described in the same report. These might include actual real-world tasks or incentivized tasks that the authors claim to represent a ‘real-world’ decision or behaviour. Using the PICO approach to defining studies included in our review, [16] we define our population (P) to include any human study, our interventions (I) to include any factors affecting the relationship between real and/or hypothetical decisions, the comparison (C) to include real vs. hypothetical decisions or behaviours, and the main outcome (O) to be the strength of consistency between those decisions/behaviours.

Search strategy

We have modeled our reporting on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (The PRISMA Statement) [17]. Because the core issue has been explored in a variety of research areas, our review was designed to allow us to successfully obtain information from diverse fields. Two of the authors (TH & JB) hand searched the literature to identify a set of target articles that could serve as the foundation for the review. The nine target articles all identified multiple factors that could affect the relationship between hypothetical and real tasks; all were indexed in PsycINFO and/or Medline [18,19,20,21,22,23,24,25,26]. A health science librarian helped us develop an initial search strategy that included all target articles and involved keyword and titles searches for ‘decision making or behaviour’, ‘hypothetical situations’, and ‘real-world situations’, including synonyms, relevant Medical Subject Headings (MeSH) headings, etc. This search strategy was peer reviewed by a second librarian and modified to develop the final search strategy (see Appendix A). Our search strategy development was guided by the Peer Review of Electronic Search Strategies (PRESS) guideline [27]. We conducted electronic database searches on November 30th, 2015 and March 7th, 2019, a supplemental snowball search on December 9th, 2015, and a reverse citation search using Scopus and Web of Science for studies that cited our target articles.

Study selection

We conducted a title and abstract screen on all records and liberally included those that might yield factors relevant to the framework; any unclear records were included for further screening. Two of three available reviewers (TH, JB, or NH) independently screened the titles/abstracts for eligibility. The reviewers were not blinded to the journals or authors of the studies screened. To be included in the review, an article needed to clearly address the consistency between some type of hypothetical decision and a corresponding real decision or behaviour. Both empirical and commentary articles were included. Only studies published in English or in French were included. Studies were not excluded based on the setting, time frame, or the date of publication.

After title and abstract screening, the same three reviewers independently screened the full texts of the remaining studies. At this stage, studies were only included if they clearly presented a factor that would be relevant to the framework. The reviewers solved any disagreements through consensus, with JB acting as the final arbiter.

Data extraction

Three reviewers independently extracted data using a standardized data collection form and the consensus resolution processes described above. This form was developed iteratively during the screening and data collection process. They extracted basic study information (e.g. title, journal, date of publication) and data about each study’s research area, design, and research question. Research area was coded into categories inductively. Design and research question were extracted verbatim from the articles. The type of data supporting the factor was coded as 1) review of multiple articles supporting the relationship (Review); 2) empirical support from a single study or related set of studies (Empirical), or 3) statement or hypothesis without empirical support (Hypothesis). Due to the heterogeneity of the included work in this broad concept mapping review (which included work from many disciplines, as well as empirical, review, and theoretical work), we could not assess the risk of bias in individual studies included in the review, the quality of empirical support underlying each factor, or the risk of bias across studies.

We identified factors presented in the study by selecting quotes that named and described the relevant factor. Two coders (TH and NH) extracted the quotes from each study to describe how the factor affected the consistency between hypothetical and real decisions. These quotations were then summarized to produce initial factor statements. A third person (JB) supervised and corroborated this coding.

Data analysis and framework development

Our approach to data analysis resembled what Hsieh & Shannon (2005) call a “Conventional Content Analysis.” [28] This inductive approach is useful when existing theory around a phenomenon being described is limited [28]. Based on the extracted study quotations and initial factor statements, we developed standardised statements describing each factor in terms of whether it was predicted to increase or decrease consistency. The coders then made collaborative decisions about when similar concepts were combined into a single factor. Where possible, we used the authors’ own descriptions of the concepts to make these decisions.

As part of a preliminary framework development process intended to summarize and categorise the factor statements [29], raters made initial attempts at organizing the different factors into categories. After discussion yielded a mutually agreed upon set of categories that were thought to be largely mutually exclusive and potentially useful in thinking about how to design model studies, two coders (TH and JB) independently assigned each factor to a category; discussion resolved any conflicts. In situations where the sign of the association with consistency depended largely on phrasing (e.g. a positive association between consistency and ‘certainty’ might have been coded as a negative association between consistency and ‘uncertainty’), coding was decided based on clarity and the manner of presentation in the original articles.

Results

Figure 1 describes the PRISMA flow diagram for our concept review. After duplicates were removed, the abstracts of 2444 articles were screened; 2344 of these were screened out as unrelated to the topic of consistency between real and hypothetical decisions or behaviours, or not published in English or French. The remaining 100 articles underwent full text screening; 24 were excluded for lack of any identifiable factor relating hypothetical and real-world decisions or behaviours, while another 8 were identified as being too ‘context-specific’, meaning they described factors that likely had limited application to health services interventions (e.g. ‘intention to conduct criminal acts’), or because they were unrelated to consistency. The remaining 68 articles came from a range of literatures, including behavioural economics (44 articles), the psychology of reasoning/behaviour (14 articles), social psychology (7 articles), health behaviours (4 articles), and neuroscience (5 articles). The 68 articles identified 27 factors purported to modify the relationship between hypothetical and real-world decision making. For details on the included articles see Appendix B. Our consensus process identified 4 categories of factors as described below. Tables 1, 2, 3 and 4 correspond to these 4 categories, and provide name and definition of the factor, its proposed specific relationship to consistency, type of data supporting the relationship, and corresponding citations.

Fig. 1
figure 1

PRISMA flow diagram. From: Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 2009;6(7):e1000097. doi:https://doi.org/10.1371/journal.pmed1000097

Table 1 Decision maker factors
Table 2 Cognitive factors
Table 3 Task factors
Table 4 Matching hypothetical and real-world tasks

Decision maker factors

Decision maker factors are those traits/capacities that relate directly to the decision maker themselves. Table 1 describes seven factors of the decision maker studied in relation to the extent to which hypothetical decisions will match real-world decisions/behaviours. Relatively little data supported an association with basic demographic factors; for example, we were unable to find any clear associations with sex or ethnicity; however, one study reported possible gender differences in their results [38]. More convincingly, another study reported greater consistency in willingness to pay donation decisions with

  1. 1)

    Greater age of the decision maker, and

  2. 2)

    Higher education of the decision maker, both in the context of willingness to pay decisions [30].

More work has explored the extent to which capacities of the decision maker affect consistency, including

  1. 3)

    Cognitive control (higher cognitive control associated with lower consistency), and

  2. 4)

    Cognitive ability (higher scores showing lower consistency). Both were based on EEG studies involving participants choosing between hypothetical or real lottery options [23, 25]. In these studies, those with greater cognitive capacity or control were hypothesized to incorporate a greater number of issues into their decision making, considerations that made them less risk averse in hypothetical situations than in real situations.

  3. 5)

    Thinking dispositions (e.g. enjoy challenging ideas), where one study argued that such dispositions are related to greater consistency [21].

Several studies also explored apparently complex relationships between personality traits and consistency, including

  1. 6)

    Openness to experience, where higher openness may be negatively related to consistency in the context of moral cooperation decisions; openness to experience was predictive of real (incentivized) decisions, but not hypothetical decisions [31, 32].

  2. 7)

    Neuroticism, agency, and anti-social attitudes, where traits have been explored in their association with inconsistency across real-world and hypothetical decisions [13, 32, 33].

Cognitive factors

Cognitive factors are characteristics related to the decision-making process. Table 2 describes the ten cognitive factors identified as related to consistency. Several factors suggested negative associations, including activation of

  1. 1)

    Normative beliefs, where real donation decisions were affected by consideration of what important others (e.g. family members) would think of their decisions in a way that hypothetical decisions were not [35];

  2. 2)

    Social desirability, where a review of the literature shows that the wish to be seen favourably by the experimenter is stronger for hypothetical than real-world decisions [36];

  3. 3)

    Anticipated or forecasted emotions, given the extensive literature that shows that people are poor at predicting how they will feel in the future; similar issues are discussed under related terms such as ‘hot-cold empathy gap’, [19, 40] or ‘predicted vs expected utility’ [39];

  4. 4)

    Deliberative mindset, where individuals making hypothetical decisions may be more likely to carefully weigh pros and cons than those making real-world decisions [14];

  5. 5)

    Abstract construals, where hypothetical decisions are more likely to involve consideration of general vs specific features of the decision [14];

  6. 6)

    Attribute non-attendance, where decision makers are more likely to consider all relevant attributes in real-world than hypothetical decisions [44];

  7. 7)

    Risk aversion, where decision makers are often more likely to choose safer courses of action in real-world as compared to hypothetical situations [24, 45, 46];

  8. 8)

    Implicit associations, where a greater amount of automatic associations related to less consistency [49].

Our review also identified factors of cognition that suggest positive associations with consistency, including

  1. 9)

    Certainty, where decision makers who are more certain of their hypothetical decisions are more likely to be consistent with real-world decisions [25, 50, 53, 54];

  2. 10)

    Salience of or concern about the task, where increasing salience of the decision or task (e.g. by increasing incentives, making the task more interesting, ensuring self-benefit, etc.) can increase consistency [20, 22, 31, 56, 57].

Task factors

Task factors include aspects of the hypothetical decision being made, independent of the match with the real world decision scenario. Table 3 describes the eight characteristics of the hypothetical task identified as related to consistency. Factors include

  1. 1)

    High-stakes rewards; two reviews of the literature have pointed to high stakes decisions as being negatively associated with consistency- the higher the stakes, the lower the association between hypothetical and real [60, 61].

  2. 2)

    Framing bias (i.e. biases in decisions produced by providing outcome probability statements in terms of positive vs. negative frames) showing that this effect is more powerful for hypothetical than real-world decisions, reducing consistency [62].

  3. 3)

    Explicit Statements of uncertainty of outcomes, where having explicit statements describing the range of uncertainty around outcome estimates in the hypothetical task has been shown to be positively associated with consistency [60].

  4. 4)

    Fundamental attribution errors, where describing the decision maker as the direct actor, as opposed to an observer in the hypothetical task may be positively associated with consistency [63].

  5. 5)

    Personal relevance, where ensuring that the hypothetical task involves people the decision maker actually knows may be positively associated with consistency [64].

  6. 6)

    Real consequences, where ensuring that the hypothetical task entails actual consequences for decision makers is positively associated with consistency [51, 68].

  7. 7)

    Space for mental simulation (i.e. the degree to which the context of decision making is left to the imagination) may be associated with lower consistency [18, 70].

  8. 8)

    Self-image, where several studies have explored the notion that moral decisions may have lower consistency, given the tendency to preserve a positive view of oneself (i.e. more likely to make positive choices in hypothetical decisions than in real life) [51, 71, 72].

Matching hypothetical and real-world tasks

Table 4 describes two related issues identified as increasing consistency by matching the hypothetical and real-world in different ways. These literatures discussed issues of consistency less directly, and as such coders were less able to identify specific tests of the relationship between consistency and individual factors. Coders felt that these issues were core to the issue of consistency despite the lack of explicit relationships, hence the inclusion of these issues.

  1. 1)

    Matching samples with the real-world population has been discussed extensively in various literatures. Many have argued that representative samples are essential in increasing consistency (e.g. Hainmueller et al., 2015, Kesternich et al., 2013 [56, 73]) and an extensive literature has explored the extent to which specific types of samples yield generalizable results (e.g. Berinsky et al., 2012, Peterson et al., 2014 [86, 87]). One study examining the validity of different survey designs in determining immigrant acceptance decisions demonstrated that samples that demographically reflected the target group matched real-world decisions more closely than did a sample of students [56]. Reviews of the extensive literature on the use of college students as subjects in social science experiments have shown that student samples often do not yield results that are reproducible in broader populations [61, 88]. Note that we did not find any studies that sought to describe what patient characteristics need to be matched in order to ensure validity with a real-world health study.

  2. 2)

    Matching study procedures to the real-world decision contexts has also been explored extensively. Studies varying apparently minor deviations of the hypothetical decision-making context (e.g. number of cues, order of presentation) have often shown effects on complex decisions; matching on as many of these cues as possible has been argued to increase consistency [76]. For example, considerable work has examined delay discounting, i.e. the rate at which a good (or a health benefit) decreases in value depending on the amount of delay in receiving it. Chapman (2004) [39] discusses discounting in the context of health behaviours, like addiction. While most agree [69, 77] that the rate of delay discounting is generally consistent between hypothetical and real-world situations, [39, 77,78,79,80,81,82,83] matching the decision-reward delay between hypothetical and real decisions improves consistency even further [84]. In a study of children’s reactions to social problems, authors argued that having more time to decide in the hypothetical than the real situation would reduce consistency [85]. Other study authors have argued that matching contextual features of the hypothetical task to the real-world decision as closely as possible is essential for generalizable results [69, 89]. This concept has been taken one step further, where authors argue the overall complexity of the decision environment in real-life situations becomes oversimplified in hypothetical choices, leading to poor choice consistency [74].

Discussion

If the health services research community is to systematically implement recommendations for better modelling prior to large scale interventions, [90] we need to understand how health care decisions and behaviours can most effectively be modelled. Given that most health service interventions seek to change the decisions or behaviours of different actors within the system (e.g. physician test ordering, patient participation decisions), we must design model studies in which hypothetical decisions/behaviours can be valid indicators of their real-world counterparts. In this review, we sought to summarize what is known about factors thought to affect the relationship between hypothetical and real-world decisions. Our review of 68 articles identified 27 factors shown or hypothesized to affect the relationship between hypothetical and real-world decisions/behaviours. Coming from a wide range of literatures, including behavioural economics, psychology of reasoning, social psychology, health behaviours, and neuroscience, these findings clearly underline the fact that much is already known about how to help decisions and behaviours made in hypothetical contexts reflect real world decisions. Equally clear is that relatively little of this discussion has focused on health behaviours (4 of 68 articles), further underlining the need to explore these issues for health decisions.

Figure 2 summarizes our descriptive framework of the four categories of factors identified to be related to consistency; i.e. whether hypothetical decisions will predict real-world behaviours. Above the center line are examples from each category that are positively associated with consistency; below the line indicates negative associations. Decision maker factors include specific trait-level descriptors that vary between (but usually not within) individuals, and may be positively (e.g. age, education) or negatively (e.g. cognitive ability) associated with consistency between hypothetical and real decisions/behaviours. Cognitive factors describe internal, context-dependent factors (e.g. certainty, risk aversion) that may affect human decision making in general, but are particularly relevant to hypothetical-real consistency. Task factors include important aspects of the hypothetical task (e.g. describes the uncertainty of outcomes, involves real consequences) that are related to consistency independent of their relationship to the real-world task. Finally, matching factors identify areas where an overall increase in similarity between the model situation and the real-world (sample matching, procedure matching) would be expected to improve consistency; a more fine-grained analysis of these two categories will be required to identify specific factors within the context of overall complexity of the environment.

Fig. 2
figure 2

Descriptive framework of the 4 categories of factors identified as related to consistency. *Decision maker category also includes thinking disposition, openness to experience, and other personality traits. Cognition category also includes normative beliefs, forecasted emotions, abstract construals, attribute non-attendance, and implicit associations. Task category also includes framing effect, fundamental attribution error, and personal relevance

We offer this draft framework not as a recipe for optimal design of model health care studies, but as a way of organizing and describing the range of factors that might need to be explored to achieve this end. The extent to which any individual factor will predict consistency in the context of health services decisions/behaviours is almost entirely open to debate at this early stage. Few of these factors have been tested in a health services context (but see Appendix B for examples of matching procedures, [39, 65, 81] real consequences, [65] degree of certainty, [53] and forecasting emotions [39]). The potential for interactions between factors in affecting consistency is almost entirely unexplored. The data supporting them at all are highly variable, ranging from extensive literatures summarized by systematic review to suppositions made without any empirical support. For this initial description, we chose to include all factors regardless of the level of empirical support or potential for bias in order to provide the greatest range of hypotheses to consider as we push this area forward.

Several limitations of this work warrant consideration. First, while our search strategy sought to encompass as many synonyms for ‘hypothetical’ and ‘real-world’ decisions as possible, there are likely studies touching on this issue that were not captured by our search. For example, our search strategy did not include keywords specific to simulation teaching methods in the healthcare field. While the consistency between real and hypothetical decisions is relevant to the medical education field, that literature focuses on methods to help students make the ‘right’ decision (e.g. how objective structured clinical exams predict correct medical decisions). In contrast, our review focused on aspects of hypothetical decisions and their consistency with a real world decision independent of its ‘correctness’. Second, many of the included studies from the behavioural economics literature involved the common practice of using incentives to distinguish hypothetical vs real-world decisions; a ‘real-world’ task implied one where participants were incentivized with tangible rewards, while hypothetical tasks involved no incentives. Although using incentives is known to increase motivation for a range of health behaviours, [91, 92] we do not know the extent to which simple incentives can serve as a model for complex, high-stakes, often emotion-laden health care decisions. On a related note, for this initial multi-discipline concept review, we could not assess the degree to which ‘real-world’ tasks were ‘real’ enough; instead, we took the authors’ word that providing a $5 incentive (for example) was an effective approach for modeling real-world decisions. Third, our initial framework is meant to be descriptive and does not attempt to identify relative importance of the described factors, or the causal relationships and interactions between them (as it does not constitute a theory). Fourth, we cannot make strong claims about the strength of the data underlying any particular factor and its relationship with consistency; while we sought to distinguish factors supported by considerable empirical support vs those without, a stronger assessment of the quality of evidence supporting the individual relationships, and the risk of bias associated with these varied studies, was beyond our resources. Therefore, as new research becomes available, future work should focus on a meta-analytic review of empirical studies to evaluate the risk of bias for the factors we have identified, as well as establishing statistical significance of these factors in predicting the consistency between real and hypothetical decisions and behaviours. Finally, we note that some of the identified factors (e.g. forecasting emotions, matching sample factors) are supported by substantial literatures and considerable theoretical discussion that provide a level of nuance we could not address in this review. The implications these non-health literatures have for health services research applications is a clear area of future work.

Conclusions

This review identifies a range of factors that may be relevant in determining when hypothetical pilot work can be expected to yield results that are consistent with real-world health services behaviours. We have highlighted four categories that appear to encompass these factors, categories that may be helpful to consider for those designing pilot health services work. Future work can use our list of factors as the range of hypotheses that must be tested to determine which factors are most important in determining consistency in a health services context. In health services research, it is rare that hypothetical work is reported in the same article with real-world trial results. Compiling health services research programs where hypothetical pilot work can be matched to reports of real-world outcomes would be a useful step in understanding when and how to maximize the utility of hypothetical health services research.