Background

The UNAIDS “Fast Track” strategy to end the AIDS epidemic by 2030 aims to reduce HIV incidence and mortality by 90% over the 2010 to 2030 period [1]. Key milestones for this aim were the 90–90-90 targets for 2020 and the 95–95-95 targets for 2025 [1, 2]. South Africa, home to 20% of people living with HIV globally, did not achieve the 2020 targets, largely due to low antiretroviral treatment (ART) coverage amongst those who have been diagnosed with HIV [3, 4].

Over the years many interventions aimed at reducing the burden of HIV have been proposed and rolled out in South Africa, including condom distribution, voluntary medical male circumcision, HIV testing, and universal ART policies [5]. Decisions around how best to allocate resources across interventions are increasingly informed by mathematical models [6,7,8]. By projecting the course of the epidemic under different scenarios, models estimate the impact of proposed interventions, and this information allows policymakers to better determine how to achieve their targets.

However, models differ in their structures and parameterisations, and this often leads to varied projections. Inconsistencies between model outputs can suggest important research gaps. When outputs are consistent across different models despite differing underlying structures, there is typically greater confidence in their projections. A previous study comparing 12 mathematical models of the HIV epidemic in South Africa found that while the models produced consistent short-term estimates, there was significant variation in their long-term projections [9]. Despite the consistency in short-term estimates, a subsequent analysis comparing the projections to national survey data found that although many important trends were accurately predicted, there were some that most models did not capture (for example, most models underestimated HIV prevalence in adult men and overestimated ART coverage amongst men) [10]. This demonstrates that even when model projections are consistent, caution is still needed when drawing conclusions.

The HIV Modelling Consortium (http://www.hivmodeling.org) is a network of researchers that coordinates and supports HIV modelling, with the aim of informing policy decision-making around HIV programmes. The Modelling to Inform HIV Programmes in Sub-Saharan Africa (MIHPSA) collaboration is a core activity of the Consortium, and this collaboration aims to assess the optimal allocation of the HIV budget in South Africa and other sub-Saharan African countries. Previous HIV model comparison studies that have compared the cost-effectiveness of specific interventions have typically found that results differ mainly due to differences in the baseline estimates of the models, as opposed to assumptions regarding the rollout of interventions and their efficacy [8, 11, 12]. Consequently, in this study, which forms part of the first phase of the MIHPSA project, we aim to compare baseline epidemiological estimates from models of the HIV epidemic in South Africa.

Most previous model comparison studies have presented uncertainty ranges or confidence intervals around individual model estimates [10, 13, 14]. The extent to which confidence intervals overlap is a qualitative indicator of model agreement. While this is useful, it is less helpful in determining which outputs are subject to the greatest uncertainty. An alternative approach is to use a summary statistic to quantify the degree of consistency across model estimates. Several summary statistics have been used in the literature: kappa statistics have been used where outputs are binary, such as when determining if interventions are cost-effective [15]; correlation coefficients have been used when comparing only two quantitative estimates [16]; and ranges have been used where there are more than two estimates [17]. The standard deviation is another natural starting point for a summary statistic. However, with a variety of outputs, estimates will have very different means and units of measurement, making comparisons of ranges or standard deviations across output variables challenging. The coefficient of variation (the standard deviation divided by the mean) is therefore a more informative measure of model consistency that can be compared across outputs. The primary objective of this study is to quantify the consistency of estimates from HIV epidemic models included in the MIHPSA South Africa project. Our secondary objective is to determine the areas where model projections differ and to explore the reasons for these discrepancies.

Methods

Affiliates of the HIV Modelling Consortium were invited to participate in the MIHPSA model comparison study. Modelling groups needed to be able to model the South African HIV epidemic, and to estimate specific outputs to be used in cost-effectiveness analyses planned for subsequent phases of the MIHPSA project. Based on these criteria, five modelling groups were selected. The models – EMOD-HIV, Goals, HIV-Synthesis, Optima, and Thembisa – have been described previously [8, 18,19,20,21,22], and key characteristics are presented in Table 1. Model estimates were produced between August and November 2021, and did not make provision for the impact of the COVID-19 pandemic due to the prevailing uncertainty. Participating groups were asked to produce forecasts in a “status quo” scenario (assuming continuation of current policy and pre-COVID-19 trends in service provision). Where possible, this included the projected rates of HIV testing, ART uptake, ART interruption, high risk sexual and injecting behaviours, and interventions such as voluntary medical male circumcision (VMMC) and condom use. Figure S9 in Additional file 1 shows the trends in select interventions, where recent trends can be seen to be relatively stable. Since the models differ in their structure it was not possible to achieve perfect alignment of the status quo scenario. For example, in the EMOD-HIV, HIV-Synthesis, and Thembisa models ART coverage is determined by the parameters governing ART uptake and treatment adherence, which remain constant after 2020, whereas in the Goals and Optima models the ART coverage proportion itself remains constant after 2020.

Table 1 characteristics of the 5 mathematical models of the HIV epidemic in South Africa

Common empirical data used for calibration and parameterisation are listed in Table 2. Particular emphasis was placed on calibration to HIV prevalence amongst males and females aged 15–49 years (in 2005, 2008, 2012, 2016, and 2017) and the total number of people on ART (in the years 2012 through 2020). Cause-of-death data are unreliably reported in South Africa [23], but adult vital registration is reasonably complete [24], so models were provided with data on total deaths. Calibration procedures varied between models, as outlined in Table 1, with some models performing calibration manually, and others using various statistical algorithms (such as likelihood maximisation and Bayesian estimation).

Table 2 common data for calibration and parameterisation, with ticks indicating where models used at least part of the provided data

Model estimates for the period 1990 to 2040 were compared for the following variables: total adult population (age ≥ 15 years), HIV incidence (per person-year) and prevalence amongst adults aged 15–49, the proportion of those with HIV who were diagnosed, ART coverage, the proportion of adults on ART who were virally suppressed using a threshold of 1000 RNA copies/ml, AIDS deaths, all-cause deaths amongst those aged 20–59, and the proportion of males aged 15–49 who were circumcised. Unless otherwise specified, outputs were compared for the total population, children (< 15 years), adult females, and adult males.

Most models produced estimates for the full period (1990 to 2040) for all variables. The time period for comparison of diagnosis and treatment outputs was limited: for the proportion of people living with HIV who were diagnosed the analysis was restricted to 2003 onward, when estimates for adults were available from all models (except Goals); and ART coverage and viral suppression proportions were only compared from 2005 onward.

Consistency in model estimates was assessed using coefficients of variation. For each variable, we calculated the coefficient of variation for each year, and assessed the trend over time. Where models reported viral suppression proportions using a threshold of 400 RNA copies/ml these were standardised to a threshold of 1000 RNA copies/ml following a previously-proposed adjustment [25], using the reverse Weibull distribution and a shape parameter of 2.07. Additionally, we compared the cross-model coefficients of variation with those of the individual models: where models reported 95% confidence intervals these were used to approximate corresponding standard deviations, which in turn were used to calculate coefficients of variation for individual models.

Results

Selected outputs from the models are plotted against data in Fig. 1. Most estimates were broadly consistent with the empirical data. In certain cases, models deliberately deviated from routine/survey data. For example, in the Thembisa and EMOD-HIV models the proportion of males who were circumcised (Fig. 1G) was lower than implied by the self-reported survey data, due to concerns about the reliability of self-reporting. In other cases deviations were due to differences in model structures: the increase in the circumcision proportion was delayed in the Optima model, due to the VMMC programme only being modelled to start in 2018; and in Goals the proportion of patients who are virologically suppressed was modelled to be constant at 87.5% (Fig. 1H). An additional file presents estimates stratified by sex, as well as further outputs for children (see Additional file 1).

Fig. 1
figure 1

mean model outputs and survey/routine data (with 95% confidence intervals shown as vertical lines). HIV prevalence, circumcision prevalence, and ART coverage data are from the South African National HIV Prevalence, Incidence, Behaviour and Communication Surveys; death data are from Statistics South Africa (adjusted for completeness of reporting); and viral suppression data are based on the Department of Health’s ART programme monitoring

The coefficients of variation are presented in Fig. 2. For most outputs the coefficients of variation are initially relatively large due to there being limited data available for calibration in the 1990s. As more data became available from surveys and routine monitoring, most estimates became relatively consistent between 2005 and 2025, with coefficients of variation only increasing towards the end of the projection period. Estimates of the total adult population size were very consistent, with the coefficient of variation remaining below 0.05 throughout the latter half of the projection period (Fig. 2A) – i.e. the standard deviation of model estimates was less than 5% of the mean. Estimates of HIV incidence showed greater long-term variation (Fig. 2B): the standard deviation for HIV incidence in 2040 was approximately 33% and 65% of the means for females and males, respectively. Despite the variation in HIV incidence, the emphasis on calibration to prevalence data meant that estimates of HIV prevalence displayed significant consistency, with the coefficients of variation remaining below 0.1 for most of the projection period (Fig. 2C).

Fig. 2
figure 2

trends in the coefficients of variation

In terms of the 95–95-95 targets, the models produced consistent estimates (coefficient of variation below 0.1) for the proportion of adults with HIV who were diagnosed (Fig. 2D), the ART coverage in adults (Fig. 2E), and the proportion of adult ART patients who were virally suppressed (Fig. 2F), but there was higher variation in estimates for children. Apart from a brief spike from 2010 to 2013 (due to a sharp decline in the viral suppression proportion modelled by Thembisa from 2006 to 2011) the coefficient of variation for viral load suppression in adults remained below 0.04 throughout the projection period. The slow upward trend in the long-term is due to the use of a constant proportion of viral suppression in the Goals model (87.5%), while estimates from other models converged to 94%.

Estimates of AIDS deaths (Fig. 2G) and total deaths (Fig. 2H) were less consistent, and showed steady increases in variability, with long-term standard deviations of approximately 60% and 35%, respectively. For the proportion circumcised the coefficient of variation starts rising in 2010 and peaks in 2016 (Fig. 2I) – corresponding with the period during which the rates of circumcision (Fig. 1G) are rising in all models besides Optima, which only models VMMC implementation from 2018 (the version of Optima used in this study did not distinguish between VMMC and traditional circumcision in prior years). The slow increase in later years is due to the plateau in the proportion circumcised in the Goals model.

The 95% confidence intervals available from individual models were generally small relative to the cross-model variability of projections (Fig. 3), although the Optima and HIV-Synthesis models had notably wider confidence intervals than the EMOD-HIV and Thembisa models (the Goals model did not produce 95% confidence intervals for this study). For the 95–95-95 programme indicators the standard deviations of model estimates were below 4% of the cross-model means, and models projected that by 2040, in a status quo scenario, approximately 96% of adults with HIV would be diagnosed, 80% of adults with HIV would be receiving ART, and 92% of adult ART patients would be virally suppressed.

Fig. 3
figure 3

comparisons of cross-model coefficients of variation (COVs) with individual COVs calculated from 95% confidence intervals. For certain outputs confidence intervals were not obtained from all models: total adult population (A) and proportion virally suppressed (E) do not include Optima and Thembisa, and total deaths in adults 20–59 years (G) does not include Thembisa

Discussion

In this study we compared the consistency of epidemiological projections from five models of the HIV epidemic in South Africa, under a status quo scenario. As measured by coefficients of variation, there was reasonable consistency between 2005 and 2025, with increasing variability toward the end of the projection period. The greatest variability was found in the projections of HIV incidence, AIDS-related deaths, and total deaths, where the standard deviations of model estimates were found to be up to 65% of the cross-model means. Despite this variability, all models encouragingly predicted a gradual decline in HIV incidence in the long-term. Model projections were notably consistent regarding the 95–95-95 targets amongst adults, and all five models predicted that the main programme ‘gap’ is poor ART coverage. However, there was more variability in estimates of the 95–95-95 targets in children.

We observed wide variation in the level of uncertainty reported by models, which likely arise from differences in the data used for calibration. For example, the HIV-Synthesis and Optima models were not calibrated to antenatal HIV prevalence data, which are a major source of information on trends in HIV prevalence. Confidence interval widths may also be a reflection of the extent to which data uncertainty is incorporated in the calibration process. For example, the Thembisa model does not consider uncertainty in male circumcision rates in model calibration, and therefore produces coefficients of variation close to zero for estimates of male circumcision prevalence. For AIDS-related deaths and total deaths we found that future projections exhibited greater variation across models than uncertainty within any given model.

Multi-model comparisons offer an important opportunity to test the robustness of model predictions to variations in model structure, assumptions, and calibration methodologies. These variations reflect different understandings and beliefs about the dynamics of the epidemic, as well as different views about acceptable trade-offs between simplicity and realism. Although these differences of opinion should be respected, these differences lead to inter-model discrepancies and, as found in this study, it is particularly important to consider which data sources were used for model calibration. With no previous studies having quantified the consistency of estimates from a set of mathematical epidemiology models, our study establishes a baseline for this quantification. Future model comparison studies may be able to judge their measures of consistency against those obtained in this study.

In prior work by the HIV Modelling Consortium to evaluate the potential impact and cost-effectiveness of “Treat All” HIV guidelines, we found that models produced results that were consistent in terms of policy implications, despite having a wide variety of model characteristics and baseline epidemic projections [26]. However, as model-based HIV analyses become more nuanced in the types of policy trade-offs being examined, differences in baseline projections are likely to have greater influence on the ultimate policy implications of model outputs. In particular, cost-effectiveness estimates have been found to be sensitive to HIV incidence and AIDS-related deaths [8, 11, 12], both of which were found to have significant cross-model variability in this study. Our study lays a foundation for understanding similarities and differences in the outcomes of future policy analyses, such as the relative cost-effectiveness and optimal budget allocation across HIV services in South Africa. Ultimately, these analyses will reflect both the differences in baseline epidemic trajectories analysed here, and differences in model assumptions regarding the costs and impacts of specific policy options. Our analysis presents a first step toward “teasing apart” potential reasons for similarities and differences in policy implications derived from HIV modelling.

This study has several limitations. First, there are other models of the HIV epidemic in South Africa that were not included in this study. This is due to the inclusion criteria requiring specific health economic outputs for a broad range of HIV interventions, planned in later phases of the MIHPSA project. Second, due to differing model structures the status quo scenario varied slightly between models. For example, some models assumed that certain variables (such as the circumcision prevalence, ART coverage, and levels of viral suppression) were constant in the status quo scenario, whereas in others these variables continued to change as they were dynamically modelled processes (albeit with time-constant inputs). Third, numerous sources of calibration data were available, but models used various subsets of these data as appropriate for their structures. While recognising the value of variation in model structures and calibration methodologies, alignment to a common set of calibration data would likely improve the consistency of model projections. Fourth, some models did not produce results for all output variables. This limits our ability to find the areas of greatest uncertainty and refine the modelling of these aspects of the epidemic.

Lastly, the coefficient of variation does have potential pitfalls. Several model outputs that are proportions, such as ART coverage and viral suppression, are projected to approach 100% in the future, leading to coefficients of variation trending to zero. Conversely, certain estimates such as HIV incidence may approach 0%, leading to very high coefficients of variation. In these cases it may be better to apply a logit transformation to highlight differences across models for such outcomes. This was attempted in the current study, but produced distortions in these indicators at earlier timepoints when their values were far below 100%. Alternative transformations of model outcomes, informed by contemporary HIV epidemic goals and benchmarks, may be required to contextualize our findings for future analyses. Future studies may also consider alternative measures of consistency, such as the intra-model coefficient of variation of an ensemble model. However, the construction of the ensemble model’s prediction intervals would need careful consideration to avoid conflating within-model uncertainty with inter-model variation.

Conclusion

Evaluating consistencies and differences in model projections can help to set the stage for policy analyses and highlight areas for future research. Our study found consistent estimates in population sizes, HIV prevalence, and the 95–95-95 indicators in adults, but observed wider variation in paediatric 95–95-95 indicators, HIV incidence and both AIDS-related and overall deaths. Additional data collection and inclusion in model calibration procedures, particularly regarding HIV incidence and paediatric HIV service coverage, will be necessary to reduce uncertainty in HIV epidemic trends in South Africa.