Introduction

In 2021, UNAIDS [1] emphasized that the progress in the fight against the HIV/AIDS pandemic is slowing down and is even in jeopardy owing to the effects of COVID-19 crisis on health systems. This statement implies a call for action and the continuation of efforts to curb the pandemic while recognizing that HIV prevention is still a major public health issue. Despite the progress in biomedical prevention research and tools, it is already acknowledged that biomedical approaches alone are not sufficient to curb the epidemic [2]. Besides, numerous behavioral and structural interventions have shown to be effective in improving intermediate outcomes that potentially block the pathway to HIV transmission, such as inducing changes in sexual behaviors. Although such interventions struggle to show any impact on HIV incidence [3, 4], dealing with structural and behavioral determinants of HIV spread alongside the use of biomedical prevention tools is necessary when designing HIV prevention interventions [5, 6]. Therefore, combination HIV prevention, “a dynamic, rights-based approach to providing the right mix of biomedical, behavioral and structural interventions aiming to have the greatest, sustained effort on reducing new HIV infections” [7], is considered the best approach to curb the HIV pandemic [8]. A combination HIV prevention intervention (CHPI) then relates to any intervention that aims to reduce HIV transmission by using strategies that deal with behavioral and structural health determinants supplemented by biomedical prevention tools [5]. Whereas biomedical and behavioral components are individually focused approaches, structural components are designed to affect environmental conditions outside the individuals’ control (economic conditions, policies, programmatic vulnerabilities, social inequalities, discrimination, societal norms) [9, 10]. CHPIs add up many components and mobilize all involved parties to account for specific risks and vulnerabilities. By doing so, they take into consideration the contextual needs and conditions of people and communities. CHPIs are expected to prevent HIV transmission by considering their components’ effectiveness and relevant hypotheses on how they interact with one other, a credible program impact pathway and a program theory that is able to deal with pragmatic issues [11]. Hence, the impact evaluation of CHPIs raises methodological challenges owing to their multicomponent and complex nature.

Impact evaluations contribute to HIV-related decision making by generating evidence from CHPIs about effective strategies to prevent HIV infections. Indeed, impact evaluations are primarily expected to quantify the extent to which the intervention to be evaluated achieved the intended outcomes. In that sense, they are supposed to establish a causal relationship between the set of activities undertaken during the considered intervention and the improvements of the beneficiaries’ circumstances. Beyond the question of whether an intervention is effective, impact evaluations are expected to provide comprehensive evidence that informs the decisions on the implementation, the scale-up, as well as on the continuation or the interruption of an intervention [11, 12]. Currently, impact evaluations are often based on quantitative methods applied in the framework of a Campbellian validity model where the impacts of intervention are quantified in controlled settings (efficacy) and then in real-world settings (effectiveness) [13, 14]. These methods have proven their relevance, which has rooted their use in evaluation practices and has legitimated a form of hierarchy among methods in terms of evidence [15]. However, they present their own limits [14, 16, 17], inter alia concerning CHPIs impact evaluation [11]. The current recommendations for CHPIs impact evaluation acknowledge the relevance of diverse quantitative methods and approaches depending on the context of these interventions [11]. For these reasons, this study is conducted to map and critically review the quantitative methods used to assess the efficacy or the effectiveness of CHPIs on HIV transmission. It will help to address the gap between the recommendations about the use of these methods and what is actually occurring.

Methods

The systematic review was conducted according to the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) statement [18] (Online Appendix 1). The protocol was registered and published with PROSPERO (CRD42020210825) and details are presented in Ravalihasy et al. [19].

Search Strategy

We searched original articles in English and French indexed in Web of Science, Scopus and PubMed from inception to August 2022. The joint use of these databases is expected to uncover most of the studies relevant to this review [20]. Database-specific search strategies were developed using key terms associated with (i) HIV transmission (prevalence or incidence), (ii) impact of interventions and (iii) prevention and sexual risk exposure [19]. We added terms related to HIV transmission and target populations in order to improve the search strategy (Online Appendix 2). We conducted the literature search in English and then in French after translating text terms (i.e. not index terms). All records were retrieved on August 25, 2022, and imported to a reference manager library (Zotero).

Eligibility

A first screening step was conducted based on the title and abstract. References were deemed relevant if:

  • The study focused on the impact evaluation of CHPI.

  • The analyses were conducted using data gathered from the intervention beneficiaries during these interventions.

  • The data allowed to assess the impact of the intervention including the behavioral/structural components.

  • HIV incidence, prevalence or averted infections was an intervention outcome.

A second screening step based on full text was then conducted and allowed to exclude all studies that:

  • did not assess an impact on HIV transmission (hereafter, irrelevant outcome).

  • referred to an intervention that did not include any behavioral or structural component (hereafter, irrelevant intervention).

  • where the data did not allow to assess the impact of the behavioral or structural components (hereafter, irrelevant data).

  • are not an intervention impact evaluation (hereafter, irrelevant evaluation study).

  • were based on simulated data.

  • did not report any quantitative method (statistical methods, mathematical modeling or both) to assess the intervention impact on HIV transmission.

In each step, the studies were screened independently by AR and PAA-T. The disagreements were resolved by LKS, MDA or VR.

Data Extraction

A systematic review management software, Covidence (www.covidence.org), was used for data management and extraction. General information (authors, title, date of publication, location where the studies were carried out, purpose and results of the studies) were extracted from the included studies. Specific information about quantitative methods were extracted using a grid developed for this purpose [19]. When necessary, other documents referenced in the full texts, such as the study protocol or pilot study, were used as complementary supports during the data extraction process. Also, each study was classified as an “efficacy” or “effectiveness” study according to the study objectives as stated in the full texts, or according to the fact that the CHPI was already implemented or scaled-up, or using items from the PRECIS-2 tool [21].

Data Analysis

Evaluation Design

Distinctions are made between experimental and quasi-experimental approaches for evaluation design. An experimental design relates to studies where researchers assign the participants to different intervention conditions according to a randomization scheme. A quasi-experimental design relates to studies where researchers do not control intervention allocation or do not use a randomization scheme for intervention allocation [22]. Quasi-experimental designs include posttest design, pretest–posttest design and its extensions (such as interrupted time-series or regression discontinuity), nonequivalent group design, and any combination of the former [23]. Items from the Mixed Methods Appraisal Tool [24, 25] were adapted to extract information about study design reporting (Online Appendix 3).

Statistical Methods

Information about statistical methods were assessed using items developed from the guidelines for Statistical Analyses and Methods in the Published Literature [26]. Information about sample size are assessed through two items to verify if the data allowed the detection of the expected gain from the CHPIs (expected effect size) with sufficient precision in estimates. The three remaining items allow to check if the studies reported: (i) how the methods fit to the data structure (statistical validity condition), (ii) a measure of precision (confidence and credible intervals) alongside the impact measure, and (iii) how the analysis accounted for the evaluation design [19]. We verified if the studies reported each item and all items together.

Mathematical Modeling

Information about mathematical modeling were extracted using items developed from the guidelines for Strengthening The Reporting of Empirical Simulation Studies [27]. Two items allow the investigation of information about model outputs and their precision. Two items allow the investigation of information about the models’ assumptions. Two items allow the investigation of information about the data used for modeling. One item allows the investigation of information about models implementation [19]. We verified if the studies reported each item and all items together.

Results

The literature search identified 2335 articles, of which 154 were considered relevant based on their titles and abstracts. A total of 58 articles [28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85] concerning 46 CHPI satisfied the inclusion criteria and were included in our review (Fig. 1).

Fig. 1
figure 1

PRISMA flow diagram for study selection process

All the 46 considered CHPIs (58 articles) presented behavioral components, of which eighteen (39.1%) presented structural components. Table 1 reports the characteristics of the included studies. Among 53 studies where HIV transmission was a primary outcome, 26 (49.1%) reported a significant reduction of HIV transmission. Among five studies where HIV transmission was a secondary outcome, two (40.0%) reported a significant reduction of HIV transmission. Thirty-six studies (62.1%) including three cost-effectiveness studies were conducted in real-life settings, of which 21 (58.3%) reported a reduction of HIV transmission as intended. Twenty-two studies (37.9%) were conducted in controlled settings, of which seven (31.8%) reported a reduction of HIV transmission as intended. Most of the included studies targeted female sex workers (39.7%), were published after 2000 (94.8%), and were conducted in Asia (36.2%) or in sub-Saharan Africa (44.8%). Data concerning the study characteristics are provided in Online Appendix 4.

Table 1 Characteristics of the 58 included studies

Table 2 presents the evaluation design characteristics of the included studies. Among the included studies, 22 (37.9%) used experimental design: 12 (54.6%) were cluster-randomized, 9 (40.9%) were individually randomized, and one (4.5%) was a multilevel randomization (i.e., a combination of individual and cluster-randomization). The remaining 36 (62.1%) studies used quasi-experimental designs, including 6 (16.7%) posttest designs, 6 (16.7%) nonequivalent group designs, 8 (22.2%) pretest–posttest designs and 16 (44.4%) combinations of pretest–posttest and nonequivalent group designs. Items concerning sampling (67.2%) and treatment allocation or exposure (82.8%) were more frequently reported than other items. Quasi-experimental studies more frequently reported how confounders and measurement biases were accounted for in the design than in the experimental studies (50.0% vs 36.4%, respectively), but this difference was not significant (χ2 = 1.02, p = 0.311). In particular, among the experimental studies, eight reported blinded procedures: all kinds of outcome assessor [44, 49, 71]; biological outcome assessor [35, 48]; and interviewers [72] or investigators [40, 55] were blinded in these studies. Experimental studies more frequently reported information about intervention administration and adherence than did quasi-experimental studies (63.6% vs 13.9%, respectively, χ2 = 15.3, p < 0.001).

Table 2 Proportion of reported design characteristics in the 58 included studies

Among the included studies, 51 (87.9%) used statistical methods to assess to what extent the interventions reduced HIV transmission as intended. Among the former, 38 (74.5%) used regression-based methods, twelve (23.5%) used hypothesis testing and one (2.0%) used an analysis of variance. Table 3 presents the statistical methods used by the included studies. The measures of precision and the consideration of evaluation design were the most frequently reported items (80.4%). The latter involved any methods to account for the data generating process [86, 87] (adjustment, stratification, matching, weighting). Confidence and credible intervals were reported in 41 studies (80.4%), while p-values only were reported in five studies (9.8%). Twenty out of twenty-four studies used an expected effect size related to HIV transmission to compute sample size: 15 were experimental and five were quasi-experimental studies. Overall and for each considered item, statistical methods were more frequently reported in experimental studies.

Table 3 Proportion of reported items among 51 studies concerned by statistical methods

Among the included studies, nine quasi-experimental studies (15.5%) used mathematical models to assess to what extent the interventions reduced HIV transmission as intended. Table 4 presents the reported mathematical models characteristics in the included studies. Information about model implementation were the least reported.

Table 4 Proportion of accurately reported items among 9 studies concerned by mathematical models

Discussion

This study contributes to the literature on CHPI impact evaluation by giving a broad view of the quantitative methods typically being used and how they are reported. Diverse quantitative designs and methods are currently being implemented, depicting the intrinsic complexity and the contexts of these interventions. To derive an intervention impact estimate on HIV transmission, one can use common procedures, opt for specific procedures [88,89,90,91] as can be seen in four included studies [36, 38, 44, 50], or develop procedures when relevant as can be seen in one CHPI [92]. Moreover, CHPIs potentially include structural level activities, raising the demand for more comprehensive impact evaluation studies [11]. In our review, two studies [39, 66] used causal pathway analysis (i.e. accounted for the hypothesized relation between intervention components) to derive estimates of CHPI impact. Some other studies adjusted their estimates for intervention implementation outcomes such as coverage and acceptance, or reported such information [58, 67, 75, 93]. In light of the above, diverse methods are already rooted in CHPI impact evaluation practices as recommended [11]. Still, some effort should be made to better report the methods related to the results of these studies in order to best inform theories and practices.

This review identified more quasi-experimental studies than in other reviews on behavioral and structural interventions to prevent HIV infection [94,95,96,97,98,99]. It is consistent with the fact that such design may be more appropriate to assess the impact on HIV incidence or prevalence [11, 100, 101]. Furthermore, some ethical, political and resources issues as well as the nature of CHPIs make the implementation of randomized designs less feasible [102]. When feasible, some features adaptation pertaining to randomization or blinding procedures often apply, as reported in our review. Indeed, the complex nature of CHPIs challenges the translation into practice of the theoretical properties of randomized designs, which shall ensure unbiasedness and precision [103, 104]. These results illustrate the recommendation that no single methodology should be applied as a gold standard to evaluate CHPIs [11], particularly since confounders may impact experimental as well as quasi-experimental designs [105, 106].

Our review contributes to the literature and stands out for giving insights about the methods used to assess the effectiveness of these interventions. Numerous methods are used to assess the impact of CHPIs. This diversity allows to account for evaluation contexts, and practically all types of evaluation designs. Our results shows that although data from one-group posttest designs seem irrelevant for impact evaluation [107], mathematical modeling allows a counterfactual analysis of intervention outcomes. This review highlights that the reported sampling strategy or the data used in most of the quasi-experimental studies and in a few experimental studies was not intended for impact evaluation on HIV transmission. In addition, information about how well the statistical methods in use suit the data structure are not frequently reported. Therefore, the reporting of these methods should be improved in order to make clearer the relevance of the sample in relation to the methods used to derive impact estimates. Indeed, this will help to better understand the significance of the results given the diversity of the designs and the methods that are actually used.

Implications for Impact Evaluation Studies

This systematic review shows that the availability of diverse approaches, methods and designs allows us to challenge the complexity of CHPIs impact evaluation. The use of pathways analysis may overcome the need for more comprehensive approaches to impact evaluation. In order to go further, the outcomes of impact evaluation should incorporate contextual and implementation outcomes [108]. In that sense, different tools [109, 110] or approaches, such as theory-driven outcomes evaluation [111, 112], may help to enhance impact evaluation designs. By doing so, impact evaluation may improve the generalizability or the transferability of the findings. Nevertheless, while many approaches are already rooted in impact evaluation practices and the future direction for improvement is identified, poor study reporting may hamper the credibility of the findings. The lack of information on design and quantitative methods implementation means we cannot firmly rely on the findings. Many reporting guidelines have been developed to enhance the reporting of health research studies [113,114,115], and while their use might be cumbersome, their uptake is critical [116].

The lack of reporting observed in this review may be related to major methodological issues [117] given the items that are reported in the data extraction grid. Our results question the sufficiency and the completeness of the procedures of impact evaluation in the included studies, especially concerning the extent to which the data in use and the intervention administration are relevant. Thus, this review points to the need for updating some key principles that guide the planning, the processing, and the reporting of impact evaluation studies in order to take advantage of the strengths and weaknesses of the designs and the methods in use. In light of the above, these principles should deal with three non-exhaustive but essential questions.

First of all, these principles should address the question of the primary recipients of the evaluation findings (e.g. beneficiaries, stakeholders, funders). Although impact evaluation studies share the aim of establishing causal relationship between programs and outcomes, they may have different purposes ranging from testing the relevance of a program within a specific setting to influencing political decisions [12]. Indeed, the evaluation strategy and constraints may differ according to whoever is interested and involved in the evaluation process. It should be clear whether the evaluation studies are intended to only apply and be restituted within the initial program context or to have implications beyond. Indeed, the clarification of this point gives concrete indications on the scope of the evaluation outcomes.

Second, impact evaluation studies should be able to provide information about and account for the data generating process. Here, accounting for data generating process means identifying to what extent the data allows to derive impact estimates and if not, what kind of adjustment are needed. The data generating process constrains certain methodological aspects of the impact evaluation by shaping the data and the sample characteristics, the intervention allocation or exposure, and the confounding and/or the intervention contextual factors. For example, the intervention allocation may constrain the design or the quantitative analysis methods depending on whether the data collection was specifically planned to allow an assessment on the outcome of interest, such as HIV incidence. Moreover, some additional quantitative processes such as power analysis should be performed when the data collection was not planned specifically for impact evaluation purpose.

Third, some implementation outcomes, especially fidelity, should be assessed alongside impacts. Implementation fidelity is a multidimensional concept [108, 118] that encompasses not only the quality of the delivery or the adherence to the intervention, but also the exposure to the intervention, the beneficiaries’ responsiveness and the program components differentiation. This outcome deals with theoretical issues such as the program theory and pathways as well as practical issues such as the stakeholders and beneficiaries’ participation. Hence, such an outcome constitutes a key intermediate factor for attaining the expected effects from the intervention.

Taking into account these three questions allows to move towards a more comprehensive manner to consider the intervention impact, capturing the efficacy-effectiveness continuum [21]. These questions also allow to introduce the notion of transferability which focuses on a more practical way to consider the generalizability of the findings [110, 119] without questioning the necessity of the probability statements on which the campbellian generalizability relates to. Indeed, the former one deals with the impact variation that depends on the beneficiaries, the context and the implementation, while the latter one warrants the relevance of the impact estimates. By addressing these three questions, the evaluation process takes advantage of the intervention theories of action and change [120] and will enable the production of more actionable results.

Strengths and Limitations

To our knowledge, this review is the first to focus on the impact evaluation methodology of CHPI. Hence, our review included different intervention designs and settings, thus limiting the possibility of a sharper methodological analysis of impact evaluation. However, while the grid we developed [19] only include information that are common to each design, it allows to adopt a conservative approach to the analysis by focusing on the items that are reported correctly.

Some references may have been missed by the search equation: we identified few studies published before 2000 and found no study conducted in Western Europe. Nevertheless, we expanded the initial literature search strategy in order to include index terms and text terms that are expected to increase the likelihood of detecting eligible studies. Furthermore the focus on CHPI and HIV transmission, the diversity of included study and the fact that almost half of the included studies did not show a significant impact on HIV transmission is comforting with respect to publication bias [121].

This review also included efficacy as well as effectiveness studies that may have different purposes. Nevertheless, decision makers often equally use evidence from these studies [11, 14]. Also, the specific context of CHPIs may affect the generalizability of the findings in the same way [122].

Modeling studies may account for complementary approaches to impact evaluation [12]. However, these studies were included thanks to their common use in measuring HIV incidence or prevalence and to examine potential intervention impact [123]. Therefore, our study highlights the relevance of mathematical models as tools for CHPI impact evaluation.

Conclusion

This study highlights that diverse methods are already rooted in CHPI impact evaluation practices. Still, some effort have yet to be made to accurately report these methods to allow a better understanding of the findings’ significance. In addition, CHPI impact evaluation may benefit from more comprehensive approaches such as path analysis or theory-driven evaluation. Such approaches allow the quantification of the impact of these interventions, while also taking into account the pragmatic issues and causal theories underlying these interventions. Indeed, the success of a CHPI is supposed to rely on the interaction between the intervention components implementation, and so should the impact evaluation. These findings contribute to inform future directions for impact evaluation practices in order to make available more transferable and generalizable insights into CHPI.