Introduction

Women account for greater than half of the estimated 36 million people living with HIV and two million new HIV infections annually worldwide [1]. For the vast majority of women living with HIV (WLHIV), preventing unintended pregnancies is not only paramount for the woman’s health but also important for the prevention of mother-to-child transmission of HIV. Great strides have been made in improving universal access to antiretroviral therapy (ART) and the provision of effective contraception in resource-limited settings [2, 3]. For example, the use of subdermal contraceptive implants, which are the most effective contraceptive method with failure rates < 1%, has risen from a prevalence of 1.7% in 2003 to 18.1% in 2016 among married women in Kenya [4]. Yet, limited options for ART use and effective contraceptive methods exist in these settings. The recent scare with dolutegravir exposure periconception and possible increased risk of neural tube defects [5] underscores the limited options for ART and the difficult family planning decision-making process WLHIV, their providers, and policymakers face in resource-limited settings.

In light of the possible association between dolutegravir and birth defects, many national HIV treatment programs initially defaulted to recommending efavirenz-containing ART as the best option for women of reproductive potential [6]. However, our prior work, a retrospective analysis of electronic medical record (EMR) data from one program in western Kenya, has demonstrated reduced effectiveness of contraceptive implants when concurrently used with efavirenz-containing ART [7]. An increasing number of observational and pharmacokinetic studies have supported these findings [8,9,10,11,12], indicating that the likely cause is drug-drug interactions with efavirenz increasing the metabolism of the exogenous progestin in the contraceptive implants leading to reduced concentrations of the circulating progestin.

Although the EMR is a great source of data for questions regarding a large number of patients or rare outcomes, and surveillance of adverse pregnancy or neonatal outcomes are likely best ascertained using routine data systems, there are recognized challenges to using routinely collected EMR data for research purposes, particularly due to data quality concerns [13, 14]. Some studies, including several in the HIV/AIDS literature, have seen dramatic changes in estimates after data validation [15,16,17,18,19]. Nonetheless, significant investments are being, and will continue to be, made in health informatics systems, including in resource-limited settings; for example, Kenya now supports a robust national EMR system [20, 21]. It is critical to know whether the initial findings suggesting a reduced effectiveness of contraceptive implants with concomitant efavirenz-containing ART can be substantiated with higher quality data. Thus, to better estimate the associations between contraceptive use, ART regimens, and pregnancy, we conducted a follow-up study expanding our study sample to include another large HIV treatment program in western Kenya as well as incorporating a three-phase validation study, reviewing over 5000 charts and conducting over 1000 phone interviews to establish accuracy of the EMR in our primary exposures and outcome of interest.

Methods

Study setting, site, and population

We conducted a retrospective analysis of a longitudinal cohort of WLHIV from 15 to 45 years of age followed from January 1, 2011, to December 31, 2015, at two HIV treatment programs in western Kenya affiliated with the East Africa International Epidemiology Databases to Evaluate AIDS (EA-IeDEA). These two President’s Emergency Plan for AIDS Relief (PEPFAR)-sponsored HIV treatment programs, Academic Model Providing Access to Healthcare (AMPATH) and Family AIDS Care & Education Services (FACES), supported care for approximately 72,000 and 50,000 individuals living with HIV in western Kenya, respectively, during the study period. The chart review and telephone interviews were conducted from April 2016 to March 2017.

The Human Subjects Division at the University of Washington; Indiana University Institutional Review Board; Committee on Human Research at the University of California, San Francisco; Institutional Research and Ethics Committee at Moi University/Moi Teaching and Referral Hospital, Ethical Review Committee at Kenya Medical Research Institute; and US Centers for Disease Control and Prevention approved this research.

Observation periods and censoring

An observation period that began at first clinical visit captured in the EMR for a woman on or after January 1, 2011, would change when a contraceptive method or ART regimen or both changed or the woman became pregnant, and the final observation period would end at the last visit on or before December 31, 2015. Thus, each observation could span multiple clinical visits, and women with only one visit in the EMR would not contribute any person-time to our study. We made no efforts to track women who were potentially lost to follow-up, transferred their care out to another facility, or died. Additional details can be found elsewhere [22].

In the AMPATH dataset, women were considered not to be at risk for a subsequent pregnancy, hence, censored, for the duration of the current pregnancy as indicated by the pregnancy outcome records (miscarriage, abortion, or preterm or term delivery). In the FACES dataset, however, such information was unavailable, and therefore, women who became pregnant were considered not to be at risk for a subsequent pregnancy for 38 weeks, starting from the date of likely conception, i.e., the period of duration for a full-term birth. After the pregnancy, the women were considered to be at risk again and could contribute multiple pregnancies to our dataset.

Variable definitions

Exposures

Contraceptive method was documented at each clinic visit and then categorized as follows: (1) implants, which may have included information on specific types of etonogestrel-containing (e.g., Implanon®/Implanon-NXT®/Nexplanon®) or levonorgestrel-containing (e.g., Jadelle®) implants; (2) depomedroxyprogesterone acetate (DMPA); (3) oral contraceptive pills (OCPs), including combined oral contraceptive or progestin-only pills; (4) other more effective contraceptive (MEC) methods such as intrauterine devices (IUDs) and permanent methods; (5) less effective contraceptive (LEC) methods, such as male and female condoms and “natural” contraceptive methods (withdrawal and rhythm); or (6) no contraceptive method. When multiple methods were documented at the same visit, the contraceptive method was assigned according to the following hierarchy: MEC over implants over DMPA over OCPs over LEC.

During the start of the study period in Kenya, nevirapine- or efavirenz-containing ART were the recommended first-line ART, but by early 2013, efavirenz-containing ART became the recommended first-line ART [23] and universal ART was not recommended until 2016 [24]. The ART regimen was documented at each visit and was categorized as follows: (1) efavirenz-containing ART; (2) nevirapine-containing ART; (3) protease inhibitor (PI)-containing ART; (4) nucleos(t)ide reverse transcriptase inhibitors (NRTIs) only-containing ART; (5) a combination ART regimen containing two or more of efavirenz, nevirapine, or PIs; or (6) no ART. We defined an “ART regimen” as at least a three-drug combination of antiretrovirals. Due to few person-years in ART regimen categories 3 through 5, observations in these categories were dropped before conducting this analysis. We chose the use of nevirapine-containing ART as the reference category for ART comparisons across contraceptive methods, as the alternative option of no ART is not clinically meaningful in the era of universal ART use.

Outcome

Our primary outcome was incident pregnancy documented in medical records by a clinical diagnosis, through self-reports or presenting while gravid. Neither urine nor serum tests were routinely used to confirm clinically suspected pregnancies or prior to implant placement in the study setting. We estimated the date of incident pregnancy as the date of likely conception based on reports of last menstrual period, estimated gestational age, or estimated date of delivery. We assumed the contraceptive method was still being used if the method is a permanent method, not noted to be explicitly removed (in the case of implants or IUDs), or another contraceptive method has not been initiated prior to the pregnancy (applicable to all methods). In order to identify pregnancies that may have been conceived towards the end of our study period but not yet clinically detected, we tracked reported pregnancies for another nine months past December 2015. Additional details can be found in supplementary text A.

Covariates

A priori we included age, marital status, number of living children under 14 years of age (as a proxy for parity since these data were not directly available), education level, CD4 cell count, WHO clinical stage of HIV disease, body mass index (BMI), use of anti-tuberculosis (TB) medications, calendar time, and program as adjusting variables. Additional details on covariates can be found in supplementary text A.

Data validation via three-phase sampling

We designed a 3-phase sampling scheme for data validation, to overcome potential limitations in data collection and entry errors in the EMR (supplementary figure 1), adapted from 2-phase sampling schemes [25, 26]. The first phase sample consisted of routine data collected from the EMR. Our second phase sample consisted of a manual chart review for a subsample of patient records from the EMR. We randomly selected records for chart review by categorizing records into 32 categories based on combinations of contraceptive method (implant, DMPA, MEC, none), ART exposure (efavirenz, nevirapine, PI, no ART), and pregnancy status (pregnancy, no pregnancy). We over-sampled records from certain categories of particular interest. Charts were reviewed by trained research assistants. Our third phase sample consisted of telephone interviews for a non-random subset of women for whom we completed chart reviews. Priority was given to interviewing women noted to become pregnant while using an implant, women not pregnant while using an implant, women pregnant while using DMPA, and women not pregnant while using DMPA, in that order. Phone interviews were performed by research assistants with a standardized telephone script and after obtaining verbal consent. We removed observation time in the chart review and telephone interview data before and/or after observation time in the EMR for the weighted analyses. Our goal was to validate only the primary exposures (contraceptive method and ART regimen) and outcome (incident pregnancy) of interest in this study, so we ascertained only data pertaining to these three variables in the second and third sampling phases. When differing values were generated for these three variables in the three datasets, we generally gave priority to the values recorded in the telephone interview dataset for use in analyses. However, for the ART regimen, when the telephone interview dataset indicated that the woman could not recall or was unsure of her ART regimen, we used the ART regimen values observed in the chart review dataset. Details for each individual phase and its methods are found in supplementary text A, figures 2 and 3, and table 1.

Statistical analysis

We present frequencies and proportions for categorical variables and median and interquartile range (IQR) for continuous variables. We imputed missing data in the EMR, which ranged in freqeuncy from 0.4 to 25.7% (Table 1), using multiple imputations by iterative chained equations, with all model covariates, contraceptive method, ART regimen, and pregnancy as predictors and, for time-varying variables, including the next and preceding non-missing values. We calculated adjusted incident rate ratios (aIRRs) using Poisson models with interaction terms between contraceptive method and ART categories, program, the various covariates mentioned above, and robust standard errors. All estimates are presented with interaction terms since the interaction terms are central to the study question.

Table 1 General characteristics of women sampled in each phase, based on woman-years contributed to each sample phase (proportion (%) or median (IQR))

We applied generalized raking inverse probability weighting (IPW) to obtain estimates of the aIRRs using the chart review (second phase) and telephone interview (third phase) data that accounted for the over-sampling of women in certain contraceptive-ART-pregnancy categories [27, 28]. Specifically, we first determined the probability of being selected for the manual chart review, denoted as p1, which was empirically estimated based on the observed proportions of women sampled in each of the 32 contraceptive-ART-pregnancy categories described above. Next, given that a woman’s chart was reviewed, we estimated the probability that she was selected for a telephone interview, denoted as p2. This was done using logistic regression models that included as covariates the categories used to define the priority telephone interviewing strategy (described above). For analyses using chart review data only, data from women with chart validation were assigned a weight 1/p1; for analyses using telephone interviews, data from women with both chart validation and phone interviews were assigned a weight 1/(p1p2). To improve the efficiency of our estimates, we applied generalized raking techniques [27], which use auxiliary variables recorded on all women in the EMR to fine-tune these inverse probability weights. Specifically, we calibrated our inverse probability weights using the estimated influence function derived from a Poisson model fit to the unvalidated data [29]. Lastly, we then obtained raking estimates of the aIRRs by fitting a Poisson model to the fully validated data using these calibrated weights and the covariates described above. Under assumptions that missing validation data are missing at random (i.e., that selecting a woman for validation depends only on known characteristics) and that models for these probabilities are correctly specified, these generalized raking IPW estimates of the adjusted IRRs are consistent estimates for the full EMR dataset (i.e., accurately approximate what one would get if one validated all records) [30]. Additional details on statistical methods are found in supplementary text A.

Data were prepared using SAS version 9.3 (SAS Institute, Cary, NC, USA), and analyses were conducted using R version 3.6.1 (R Core Team, Vienna, Austria). Analysis code is publicly available at https://github.com/gustavodecastro/ImplantEFV.

Results

General characteristics of cohort

In this analysis, 85,324 women (53,711 from AMPATH and 31,613 from FACES) contributed a total of 170,845 women-years (w-y) of observation time with a median of four observations (IQR 2.0 to 7.0) and 2.0 years of total observation time per woman (IQR 0.5 to 3.4; Table 1). Women had a median age of 33.3 (IQR 28.0 to 38.5) years, 38.9% of the time women had completed primary schooling or a higher level of education, 49.0% of the time women were married or co-habiting, 65.0% of the time had at least one living child, and 64.8% of the time women were in WHO clinical stages 1 or 2. These parameters were generally similar among the women in the second and third phase samples.

Women used implants 7.0%, DMPA 13.3%, OCPs 1.3%, MEC 4.0%, LEC 24.1%, and no contraceptive method 48.6% of the total w-y of observation time. WLHIV used nevirapine-containing ART 48.8%, efavirenz-containing ART 24.7%, PI-containing ART 5.9%, and no ART 18.7% of the total w-y of observation time (Table 1). A total of 12,896 incident pregnancies were observed in the EMR among 11,724 women.

Relative incidence of pregnancy

The aIRR for pregnancy among implant users for efavirenz- vs. nevirapine-containing ART use in the EMR (first phase), chart review (second phase), and telephone interview (third phase) samples were 1.9 (95% CI 1.6, 2.4), 2.2 (95% CI 1.7, 3.0), and 3.2 (95% CI 1.9, 5.4), respectively, in unweighted analyses that ignored the validation sampling strategy (Table 2). Weighted estimates that accounted for the validation sampling strategy were similar: 2.3 (95% CI 1.5, 3.5) with the chart review only and 3.2 (95% CI 1.8, 5.7) including the telephone interview data (Table 2). Adjusted pregnancy incidence by implant type (e.g., etonogestrel, levonorgestrel, or unknown) and ART regimen, for both unweighted and weighted analyses, were largely similar between the types (supplementary table 1).

Table 2 Adjusted pregnancy incident rate ratios per 100 women-years (and 95% CI) by contraceptive method and ART category, individually by each sampling phase ignoring validation sampling strategy (unweighted) and accounting for validation sampling strategy (weighted)

The aIRR for pregnancy among DMPA users for efavirenz- vs. nevirapine-containing ART use in the EMR, chart review, and telephone interview were 1.1 (95% CI 0.9, 1.2), 1.2 (0.9 to 1.5), and 1.0 (0.7 to 1.6), respectively, in unweighted analyses and 1.1 (0.6 to 1.8) with the chart review only and 1.0 (0.3 to 2.9) including the telephone interview data in weighted analyses accounting for the validation sampling strategy (Table 2).

The aIRRs for pregnancy among implant and DMPA users by ART use, both unweighted and weighted, are depicted in Fig. 1.

Fig. 1
figure 1

Forest plot of adjusted pregnancy incident rate ratios per 100 women-years by contraceptive method and ART category for each sampling method (reference group is nevirapine-containing ART)

The aIRR for pregnancy among efavirenz users for DMPA vs. implant use in the EMR, chart review, and telephone interview samples were 1.8 (95% CI 1.5, 2.1), 1.8 (95% CI 1.4, 2.3), and 1.7 (95% CI 1.1, 2.6), respectively, in unweighted analyses and 1.5 (95% CI 0.9, 2.4) for the chart review only and 2.4 (95% CI 1.0, 6.1) including the telephone interview data in weighted analyses (supplementary table 3).

The overall pregnancy and incidence by either contraceptive method or ART regimen or both in the three samples are found in supplementary text B and table 4.

Discussion

Our study based on EMR data from over 85,000 women, manual chart reviews of over 5000 records, and telephone interviews with over 1000 women confirms prior findings of 2–3 times higher risk of pregnancy-associated with concomitant contraceptive implant and efavirenz- vs. nevirapine-containing ART use [7]. We also find that concomitant DMPA, the leading alternative contraceptive method, and efavirenz-containing ART use have a similar or higher risk of pregnancy than concomitant implants and efavirenz-containing ART use. Thus, in settings where efavirenz-containing ART remains a common ART option, ministries of health and programs should reconsider any restrictions on concomitant implant and efavirenz use and ensure that WLHIV have access to all available contraceptive methods. In settings where other ART options are available, such as dolutegravir-containing ART, women currently using or desiring implants and on efavirenz should be prioritized for a switch to dolutegravir-containing ART.

Programmatic relevance of our findings

As dolutegravir-containing ART is increasingly used as first-line ART worldwide, including among women of reproductive potential, the hope is that dolutegravir will avoid any drug-drug interactions with hormonal contraceptives [31]. Early data from a pharmacokinetic study with dolutegravir and contraceptive implants supports this suggestion [32]. Nonetheless, for a small subset of WLHIV, regimens containing efavirenz or newer NNRTIs may still be their best option, due to enduring concerns with possible teratogenicity, intolerance to or weight gain with dolutegravir, drug-drug interactions with dolutegravir and other therapeutics, or increasing drug resistance to dolutegravir. Currently, a large number of women worldwide remain on efavirenz. Therefore, policymakers, program staff, and WLHIV need to remain vigilant about known or possible drug-drug interactions with NNRTIs, including efavirenz and hormonal contraception.

When reduced effectiveness of implants with concomitant efavirenz use was first reported, certain countries, such as South Africa and Malawi, moved swiftly to limit implant use among WLHIV on efavirenz [33]. As more balanced data emerged, that despite their reduced effectiveness, implants remained one of the most effective contraceptive methods for WLHIV using efavirenz, many countries reversed their course. Similarly, when a possible signal was associated with peri-conception dolutegravir use and neural tube defects in infants, many countries rushed to limit dolutegravir use among WLHIV of reproductive potential [6]. As more comprehensive data emerged, such as modeling studies showing that dolutegravir use would still lead to fewer maternal or infant deaths and mother-to-child HIV infections [34, 35], coupled with outcry from WLHIV and other advocacy organizations, some countries reversed course again to allow some WLHIV to continue dolutegravir use. There are three common lessons to be learned from both the efavirenz/implant and dolutegravir/neural tube defect issues: (1) to not react dramatically to early reports of potential negative implications, and to appreciate the implications more holistically, accounting for leading alternative options and downstream consequences; (2) to offer WLHIV counseling and options, using a person- or human rights-centered approach; and (3) to bring WLHIV into the decision-making process so that their voices and thoughts are adequately represented in ultimate policies.

Importance of novel statistical techniques used

Our findings suggest that studies around pregnancy using routine clinical data can yield valid estimates and underscore the continued investments in health information systems, including in resource-limited settings. We used a 3-phase, largely random sampling design to repeatedly ascertain exposures of contraceptive method and ART regimen and incident pregnancy from data sources of EMR, chart review, and telephone interviews with WLHIV. Our work went beyond many data validation studies in that we used information learned in subsequent samples to adjust the initial point estimates, potentially improving the robustness of our findings. Additionally, we used state-of-the-art statistical methods, including IPW coupled with generalized raking, to account for our validation sampling strategy while gaining efficiency over traditional methods. The chart review results most closely approximated the EMR results, which is expected since the chart files are the source forms for the EMR data entry. However, our phone interview data yielded an overall higher pregnancy incidence, as well as for concomitant implant and efavirenz use relative to concomitant implant and nevirapine use. Possible explanations for this observation may include recall, sampling, or ascertainment biases, all leading to estimates away from the null. First, recall bias may exist as efavirenz is a newer antiretroviral compared to nevirapine, and residual confounding may persist despite our adjusting for calendar time. This would lead to differential misclassification away from the null, with higher pregnancy incidence reported among efavirenz vs. nevirapine and implant users. Second, because we relied on the EMR and chart files for telephone numbers, those on the newer antiretroviral of efavirenz may be more likely to have working numbers though numbers are updated routinely regardless of specific ART users. Contraceptive implants have been introduced more recently than DMPA too, so possibly secular trends may have led to differential sampling. Third, it is possible the research assistants differentially assessed pregnancy among women concomitantly using implants and efavirenz, as they referenced their chart notes to guide their phone interview. Nonetheless, while caution is advised in interpreting the telephone interview data as closest to the “truth,” the validation phases illustrate the overall robustness of the EMR data quality.

Limitations

Despite this being the largest and most robust study to date on this topic, our study has additional potential limitations. First, we are assuming that data are missing at random, including for the subsampling; however, where data missing differentially by some unmeasured factor associated with both our exposure and outcome categories, this could bias our findings. Second, the phone interview sample was not randomly selected; we purposefully focused on certain combinations of exposure and outcome categories to most meaningfully inform our study objectives, as ascertaining exposure-pregnancy relationships by self-reports for some categories, e.g., for OCPs, still does not shed greater light on the failure of that method. Third, we asked women to recall information dating back as many as 6 years from the time of the interview, and potential recall bias may be avoided with prospective ascertainment. Fourth, as with all observational studies, residual confounding, unobserved confounding (e.g., our analyses were limited in the comorbidities or coinfections we could adjust for), and time-varying confounders (e.g., CD4 cell count, BMI, or WHO status that we did adjust for) that actually exist on the causal pathway between ART/contraceptive method and pregnancy may exist with our study. Different study designs and future work that develops methods to account for both time-varying confounding and validation datasets would be interesting. Lastly, our study does not ascertain adherence to either ART regimen or contraceptive method, though is an accurate reflection of real-world effectiveness data. Notwithstanding these limitations, the relative comparisons between exposure categories within each sample uphold earlier findings; the two validation phases underscore the overall robustness of EMR data quality; the generalizability of our findings to other resource-limited settings remains high; and both the 3-phase sampling and our statistical approaches add innovation for analyses conducted with EMR data.

Conclusion

First, with more robust data quality, we confirm prior findings of reduced contraceptive effectiveness when contraceptive implants are concomitantly used with efavirenz-containing ART, and equivalent or higher effectiveness compared to leading alternative contraceptive methods such as DMPA. Second, our novel methodology using a 3-phase sampling data validation approach provides an innovative tool for other analyses to improve the robustness of EMR data quality. These findings provide policymakers, program staff, and WLHIV greater confidence in guiding their decision-making around ART and contraceptive options.