Association Between Early Sexual Debut and New HIV Infections Among Adolescents and Young Adults in 11 African Countries

We investigated the association between early sexual debut and HIV infection among adolescents and young adults. Analyzing data from nationally representative Population-Based HIV Impact Assessment (PHIA) surveys in 11 African countries, the research employed a multivariate logistic regression model to assess the relationship between the early sexual debut and new HIV infections in the age group of 10–24 years. The results revealed a significant and robust association, indicating that young individuals who experienced early sexual debut were approximately 2.65 times more likely to contract HIV than those who did not, even after accounting for other variables. These findings align with prior research suggesting that early initiation of sexual activity may increase vulnerability to HIV infection due to factors such as biological susceptibility and risky behaviors like low condom use and multiple sexual partners. The implications of these findings for HIV prevention strategies are substantial, suggesting that interventions aimed at delaying sexual debut could be an effective component in reducing HIV risk for this population. Targeted sex education programs that address the risks of early sexual debut may play a pivotal role in these prevention efforts. By employing a comprehensive approach, there is a possibility to advance efforts towards ending AIDS by 2030. Supplementary Information The online version contains supplementary material available at 10.1007/s10461-024-04343-w.

Note: HIV prevalence is a measure of the proportion of the population currently infected with HIV.The units are percentages (%).The 95% confidence intervals are included in the parenthesis.The estimates for Total, Male, and Female are among all age groups.The estimates for each age group is among both sex.A few countries have 0 observations for age group 10 -14 due to lack of information on early sex and are not included in analysis.Note: The units are percentages (%) per year.HIV incidence is the measure of new infections of HIV per year.The 95% confidence intervals are included in the parenthesis.2 countries (Tanzania and Rwanda) have more than 93% data missing on indicators of early sex.They are all not included in analysis.Note that incidence estimates are based on a small number of recent infections.The data were not powered to estimate HIV incidence at the national level; therefore, these estimates should be interpreted with caution.Appendix C

C.1 Covariates
The recency of an HIV infection is determined through a combination of tests This dichotomous outcome variable allows for a straightforward interpretation of the analysis results, as it directly represents the presence or absence of recent HIV infection among the participants.The assignment of these numerical values facilitates the use of statistical models, such as logistic regression, that are specifically designed to handle binary response variables.
The variable of early sexual debut is a dichotomous response (yes, no) to the question "Have you ever had sex?" for participants aged 10-14 years 3 .For those aged 15 years and older, it is determined by the response to the question "How old were you when you had sex for the very first time?"4 .Specifically, it is defined as sexual initiation before the age of 18 years in this study.We choose this cut-off age based on the legal consideration.From a legal standpoint, the age of consent in many jurisdictions is set at 18 4 , the threshold at which an individual is legally recognized as mature enough to give informed consent to sexual activity.
In many places, 18 is the minimum age for obtaining a driver's license without restrictions or supervision 5 .Aligning our definition of early sexual debut with this legal benchmark ensures that our study adheres to widely accepted legal standards.
The age variable is a continuous measure ranging from 10 to 24 years.Sex is categorised into male or female, and the place of residence is divided into rural and urban areas.
The education level is dichotomised into the categories of having received any education or no education.This simplification is based on the assumption that the majority of individuals under 15 years are unlikely to have completed middle school.
The household wealth is represented by a categorical wealth quintile (1 -5), derived from dwelling characteristics and asset variables.This classification follows the guidelines provided by the Demographic and Health Surveys (DHS) 6 .
The country of residence is also included as a categorical variable.The covariates are carefully selected to provide a comprehensive understanding of the factors influencing the risk of HIV infection among both adolescents and young adults.

C.2 Data imputation
Some observations in the survey have missing data for some covariates.The primary reason of imputation of missing data is to mitigate the bias resulting from missingness, rather than discarding incomplete cases altogether.Studies 7 have shown that approximately 94% of research that use listwise deletion to eliminate entire observation can result in loss of valuable information.For this research, we impute the missing data by employing the multiple imputation by chained equations (MICE).MICE is a statistical method widely used in health science to handle missing data in a dataset, and has been proven to outperform other imputation techniques in some simulation experiments 8,9 .
Multiple imputation involves creating multiple imputed datasets to account for the missing data and then applying standard analysis methods (hypothesis testing in our case) to each of these datasets.The multiple sets of results are combined using Rubin's rule 10 to get final estimates with standard errors that allow for the uncertainty of the missing data.
Specifically, our parameter of interest would be the coefficient (β) of early sexual debut.We could obtain an estimate βm with each of the M imputed dataset as well as their standard errors se m .To get the final point estimate β, we would average over the estimates from each imputed data; and to get the standard error se β of the final estimate, we would combine the between-imputation variance and the within-imputation variance: where the within-imputation variance is estimated by

and the between-imputation variance is estimated by
The procedures and advantages of MICE are well documented 11 .We have followed the suggestions of using the outcome for imputation of missing predictor values proposed by several studies 12,13 .We implement the imputation of missing data using the MICE mice package 14 in the R statistical software.

C.3 Modeling
The statistical analysis for this study involves the implementation of multivariate logistic regression models, adjusted to take into account the survey sampling design.These models are employed to examine the associations between early sexual debut and HIV infection among both adolescents and young adults.The models control for potential confounding variables, and incorporate a fixed effect to account for variability at the country level.We model the associations between new HIV infections and covariates among adolescents and young adults with the Eq.C.1 described below.
Where the variables are described as follows, • Dependent Variable: y: Represents whether an individual has a recent infection (binary outcome).
• Independent Variables: -early_sex: A binary variable indicating whether the individual has had early sexual debut.
gender: A binary variable, with female as the reference category, representing the gender of the individual.
-wealth_quintile: Categorical variable with values ranging from 1 to 5, characterizing the individual's household wealth.
educated: Binary variable indicating whether the individual has received any education.
urban: A binary variable indicating whether the individual lives in urban area.
age: Continuous variable reflecting the age of the individual.
country: Categorical variable denoting the individual's country of residence, with Zambia selected as the reference category (Zambia has the highest number of recent infections).
• Parameters: β 0 , β 1 , . . ., β 7 : Coefficients representing the change in the log odds of having a recent infection for a one-unit change in the corresponding variable, holding other variables constant.
ϵ: Error term, capturing unobserved variability in the dependent variable.
Appendix D

SENSITIVITY ANALYSIS
The number of imputations performed in handling missing data might influence the study's results.There have been some debates regarding how many imputations are needed for good statistical inference.Some research suggest that 3 -5 imputations are sufficient to yield excellent results 15 .Other studies show that the statistical power for small effect sizes diminishes as the number of imputations become smaller and recommend performing more imputations than previously considered sufficient 16,17 .In our sensitivity analysis, we vary the sets of imputations to be 10, 20 and 50.Additionally, we conduct the model analysis without data imputations, and any observations containing missing data are excluded from the analysis (complete-case analysis).
Furthermore, the choice of imputation algorithm can significantly impact the results of a study.Several imputation algorithms exist, each with its strengths and weaknesses 18,19 , and the choice between them can alter the outcomes of the research.Therefore, a sensitivity analysis that examines the effects of different imputation algorithms will provide insights into how our findings might change under these different approaches.This will help ensure that our results are not inappropriately influenced by the particular imputation algorithm selected.In our sensitivity analysis, we have imputed the data using three other imputation algorithms: random sample from observed values, classification and regression trees 20 , and Bayesian linear regression 21 .
The decision to include adolescents as young as 10 to 14 years old in the study could have substantial implications for our findings.Given the sensitive and complex nature of sexual behavior in this age group 22 , their inclusion could introduce additional variability and potential bias into our results.Conducting a sensitivity analysis that compares results with and without this age group will allow us to assess the impact of this decision on our overall conclusions.
The definition of "early" sexual debut can vary, and the choice of a cutoff age can influence the study's outcomes.Some studies use 15 as the cutoff age, while others may use 16, 17, or 18 23,24,25 .In our sensitivity analysis, we explore the impact of varying the cutoff age for early sexual debut from 15 to 18 years.This will help us understand how sensitive our model is to the definition of "early" sexual debut.
Gender can play a significant role in sexual behavior and its associated outcomes.Research has shown that males and females often differ in their sexual behaviors, attitudes, and risks.For example, females tend to have much older sexual partner and forced sex compared to males 26,27,28,29 .To account for potential gender differences, we conduct separate models for males and females in the sensitivity analysis.This will allow us to identify any gender-specific patterns or biases that may exist in our data.
The effect of early sexual debut might not be uniform across genders due to various factors, including biological differences and social norms.For example, females who experience early sexual debut may be at a higher risk for HIV because of biological vulnerability and societal factors 30,31 .Therefore, we include an interaction term between gender and early sexual debut in our sensitivity analysis.By including this interaction term, we can provide a more comprehensive understanding of the impact of early sexual debut, taking into account the complex interplay between gender and early sexual debut.
Country-level factors, such as HIV prevalence, could confound the relationship between early sexual debut and our outcome variable.In our sensitivity analysis, we include countrylevel HIV prevalence rates as a covariate while dropping the country variable.This will help us assess if country-level prevalence can account for the variations observed between countries.
The choice of imputation method can potentially influence the results of a study.To assess the robustness of our findings, we conduct a sensitivity analysis comparing two widely-used imputation methods for multiple imputation: MICE and Amelia 32 .Their usefulness may vary depending on the missing data mechanism and the underlying distribution of the data.
By comparing the results obtained using these two different imputation methods, we aim to better understand the impact of early sexual debut on recent HIV infections.
Lastly, censoring can introduce bias and affect the validity of study findings.In our sensitivity analysis, we investigate the impact of dropping observations that are below the cutoff age for early sexual debut because the exposure of interest (early sexual debut) is not observed for these subjects in the study.This will help us assess how sensitive our model is to incomplete data.
In summary, by conducting a sensitivity analysis on these nine hyperparameters, we can gain a more comprehensive and robust understanding of the relationship between early sexual debut and the risk of HIV infection among adolescents and young adults.This will enhance the validity and overall quality of our research findings.In the process of sensitivity analysis, we only investigate one parameter at a time while keeping the other parameters constant.
We compare the obtained results with those from the benchmark model.

D.0.3 Whether to include young adolescents
The sensitivity analysis that compares the model results of whether to include young adolescents in the analysis is shown in

D.0.5 Gender-specific models
The sensitivity analysis that compares the model results of whether to model males and females separately is shown in Figure D.5.Note that we have modeled them together in the benchmark model.The graph shows that the effect of early sexual debut is significant for females and not significant for the males.The uncertainty interval is much wider for males

D.0.6 Interaction between Gender and Early Sexual Debut
The sensitivity analysis that compares the model results of whether to include the interaction between gender and early sexual debut in the model is shown in Figure D.6.Note that we have excluded the interaction in our benchmark model.The graph shows that the conclusions 24 with respect to all covariates remain the same whether to include the interaction or not.The data are shown in D.8.Note that the effects of gender and the interaction are not significant after including the interaction.This suggests that the effect of early sexual debut on the outcome might not depend on the gender.

Figure B. 2 :
Figure B.2: The incidence rate of HIV among people aged 15 -24 years old by sex and country.

Figure B. 3 :
Figure B.3: The prevalence of early sexual debut among people aged 10 -24 years old by sex and country.

Figure B. 4 :
Figure B.4: The prevalence of HIV among people aged 10 -24 years old by age group and country.

Figure D. 1
Figure D.1 displays the sensitivity analysis that compares the model results when varying the number of data imputations before modeling.Note that we have used 5 sets of data imputations in our benchmark model.On the X-axis, we have the odds ratios for different covariates.The color of each line represents a different set of imputations.For each covariate,

Figure D. 1 :
Figure D.1: Sensitivity analysis comparing the model results of varying the number of data imputations.

Figure D. 3 .
Note that we have included young adolescents in our benchmark model.The graph shows that the conclusions with respect to all covariates remain the same whether to include young adolescents or not.The data are shown in D.3.D.0.4 Cutoff age for early sexual debutThe sensitivity analysis that compares the model results of using different cutoff age to classify early sexual debut in the analysis is shown in Figure D.4.Note that we have used 18 years old as cutoff age in our benchmark model.The graph shows that the conclusions with respect to all covariates remain the same for cutoff age from 16 to 18 years old.The effect of sexual debut slightly decreases as the cutoff age gets younger.The results are shown in D.4.

Figure D. 2 :
Figure D.2: Sensitivity analysis comparing the model results of using different data imputation algorithms.

Figure D. 3 :
Figure D.3: Sensitivity analysis comparing the model results of whether to include young adolescents (10 -14 years).

Figure D. 4 :
Figure D.4: Sensitivity analysis comparing the model results of using different cutoff age to classify early sexual debut.

Figure D. 5 :
Figure D.5: Sensitivity analysis comparing the model results of whether to model males and females separately.

Figure D. 6 :
Figure D.6: Odds ratios from the models of sensitivity analysis of interaction between early sex and gender.

Figure D. 7 :
Figure D.7: Sensitivity analysis comparing the model results of whether to include HIV prevalence as covariate.

Figure D. 8 :
Figure D.8: Sensitivity analysis comparing the model results of handling censored data.

Table B .
2: Incidence rate of HIV among people aged 15 -24 years old from the data used.

Table B .
3: Prevalence of early sexual debut among people aged 15 -24 years old.

Table D .
1: Odds ratios from the models of sensitivity analysis of number of data imputations.

Table D .
2: Odds ratios from the models of sensitivity analysis of imputation methods.Note: cart: classification and regression trees.norm: Bayesian linear regression.pmm: predictive mean matching.sample: random sample from observed values.

Table D .
3: Odds ratios from the models of sensitivity analysis of whether to include data of young adolescents.

Table D .
4: Odds ratios from the models of sensitivity analysis of the cutoff age for early sexual debut.
Note: To include HIV prevalence as covariate, the country variable need to be dropped due to collinearity.

Table D .
6: Odds ratios from the models of sensitivity analysis of handling censored data.
Note: If the cutoff age is 18 for classifying early sexual debut, exclude censored data means that observations below 18 are dropped.

Table D .
TableD.7:Odds ratios from the models of sensitivity analysis of whether to model gender separately.8: Odds ratios from the models of sensitivity analysis of interaction between early sex and gender.

Table D .
9: Odds ratios from the models of sensitivity analysis of imputation methods..78,2.96) 1.42 (0.66, 3.08) Wealthquintile 1.02 (0.79, 1.32) 1.07 (0.74, 1.55) Table E.1: PAF for male aged 10 -24 years in each country.reflects the combined uncertainties of both the prevalence and the odds ratio of early sexual debut.Table E.2: PAF for female aged 10 -24 years in each country.