The paper by van Dongen et al. [1] in this issue is to be commended on many levels. It contributes substantially to the blood donor recruitment and mere-measurement literatures and highlights the importance of replication. Importantly, it opens up the debate on the causes and consequences of non-compliance in randomised controlled trials (RCTs) in behavioural medicine. Non-compliance introduces non-random selection bias to RCTs and has implications for causality, generalizability and policy [25]. Therefore, reducing bias caused by non-compliance either methodologically or statistically is important.

Methodological solutions to this bias are less likely to be used than statistical ones, due, in part, to the lack of coherent frameworks to understand non-compliance. Indeed, as van Dongen et al. [1] highlight, non-compliance is complex, arising from person characteristics (e.g. attitudes to trials and conscientiousness), intervention idiosyncrasies or person/intervention interactions [6]. A first step in a methodological solution would be to identify psychological (traits, attitudes, beliefs) and demographic characteristics that differentiate compliant from non-compliant behaviour in treatment and control arms of trials. A framework developed by Angrist and colleagues offers a useful starting point [24]. This model identifies four groups [24]. The first group comply with the intervention protocol regardless of their assignment to the treatment or control arms: these are called compliers. The second group comply only if allocated to the treatment but not the control arm: called always-takers or may also be termed treatment only compliers. The third group comply if allocated to the control but not the treatment arm: the never-takers or control only compliers. The final group—defiers—do not comply with whichever arm they are allocated. Using these groups, non-compliance could be studied at a general or trial-specific level. At a general level, these four groups could be identified with respect to stable preferences to comply or not with treatment or control arm protocols across a variety of RCTs. Any stable psychological and demographic differences across these groups could be fed into compliance enhancing designs. For a specific RCT, it would be possible to explore how different intervention protocols influence who are likely to be treatment-only and control-only compliers and this information used to enhance compliance in both treatment and control arms. For example, in general, financially incentivized questionnaires (treatment arm) increase compliance relative to non-incentivized questionnaires (control arm) [7]. However, for blood donors, financially incentivized questionnaires may be either counter-productive [8] or helpful [913] for compliance. Identifying if incentivizing questionnaires increase the number of control only compliers (those preferring the non-incentivized questionnaire) and why will help to inform the intervention design and its interpretation.

If bias due to non-compliance cannot be designed out, it needs to be addressed statistically [25]. van Dongen et al. [1] call for the use of instrumental variables (IVs) analysis. IVs are designed to deal with problems of endogeneity (the explanatory predictor is correlated with the error term, leading to threats to causality due to missing variables, reverse causality etc.) by isolating the variability in the predictor that is causally related to the outcome [25, 1418]. IVs have to be highly correlated with the predictor (instrumental relevance) [14, 18], influence the outcome only via the predictor, be orthogonal to the error in the predictive model (instrumental exogeneity) and require large Ns [35, 1418]. Randomization can be used as an instrument to adjust for threats arising from non-compliance [25]. While policy questions of effectiveness (What are the benefits from treatment assignment?) and treatment questions of efficacy (What are the benefits of treatment to patients?) can be analysed using intention to treat (ITT) and ‘per protocol’ or ‘as treated’ analysis [5], the IV estimator for a RCT adjusts the ITT by the proportion actually receiving the treatment. This may account for why some studies with high compliance (e.g. 82 %) [19] have observed a mere-measurement effect with ITT analysis and others have not.

Finally, the use of IV analysis should be extended beyond RCTs as it has in the education, economic and medicine literature to infer causality when randomization is not possible, for example from survey data [15, 17, 18, 20]. Surprisingly, IV analyses have not been used in behavioural medicine where such data are often collected. Therefore, van Dongen et al.’s [1] call is timely and IV analysis, and other related techniques (e.g. directed acyclic graphs, propensity score matching and selection models), should be considered more widely in behavioural medicine [14, 2123].