Database searches identified 27,563 titles, and 17,934 were screened after removing duplicates, resulting in 554 articles that went to full-text review (Fig. 1). Full-text review identified 19 articles and two further articles [12, 14] were identified through scanning bibliographies, giving a total of 21 included articles that reported on 17 unique longitudinal studies. Additionally, 18 authors were contacted, and one provided a single cRR estimate. None were able to provide further aRR estimates. In total, 16 studies reporting 18 independent cRR estimates (or sufficient data to derive estimates), including one study reporting three independent estimates, and five studies reporting five independent aRR estimates were included in the meta-analysis (Fig. 1). Two of the 21 articles were excluded from the analysis because they reported cRR and aRR estimates already reported in other articles [26, 27]. Four studies reported both cRR and aRR, 12 studies reported only cRR, and one study reported only aRR. Five of the 12 studies reporting only cRR estimates also conducted multivariate analyses but did not include RAI in the model.
The main characteristics of the 17 studies included in the meta-analysis are presented in Table 1 (see Table S1 for additional details on included studies). Most studies were conducted in Africa (number of studies [Ns] = 12) [12, 14, 15, 28,29,30,31,32,33,34,35,36,37] after ART was introduced in 1996 (Ns = 11) [13,14,15, 28,29,30,31,32,33,34, 36, 37], and the most common study design was cohort (Ns = 9) [28, 29, 31, 34,35,36,37,38,39], then RCT (Ns = 6) [12,13,14,15, 30, 32, 33], and serodiscordant couple studies (Ns = 2) [40,41,42]. Most studies were among high-risk women (Ns = 11), including FSWs (Ns = 6) [12, 14, 29, 31, 35, 37, 38], and other high-risk populations (Ns = 5) [13, 32, 33, 40,41,42] such as serodiscordant couples (SDCs) and high-risk HIV-negative women and the mean or median age in most studies was less than 28 years (Ns = 10) [12, 13, 28, 29, 32, 33, 35,36,37, 40, 41]. Study sample sizes varied hugely, ranging from 73 to 8859 women, and length of follow-up ranged from 258 to 8024 person-years (Ns = 15). Almost all studies recorded sexual behaviour data, including RAI, in face-to-face-interviews (FTFI) (Ns = 16) [12,13,14,15, 28,29,30,31,32,33, 35,36,37,38,39,40,41,42] and only one  (which reported three independent estimates) used audio computer-assisted self-interviews (ACASI) . RAI was most commonly measured during follow-up (Ns = 10) [12, 15, 29, 30, 33, 34, 38,39,40,41,42] in the past 1 month (Ns = 4) [12, 29, 30], 3 months (Ns = 4) [15, 33, 34, 38], 6 months (Ns = 2) [40,41,42] and past year (Ns = 1) , and only four of these studies analysed RAI as a time-varying covariate [33, 34, 38, 42]. RAI was also measured at baseline (Ns = 7) [13, 14, 28, 31, 32, 35,36,37], ‘ever’ (Ns = 4) [13, 31, 35, 36] and in the past 1 month (Ns = 2) [28, 32]. Two articles reporting on the same study did not report the time frame of baseline RAI practise [14, 37]. Lifetime RAI prevalence ranged from 2 to 43%. RAI prevalence in the past 6 months ranged from 7 to 16%, 3 months from 2 to 15%, and 1 month from 2 to 42%. Most studies first mentioned RAI in the main text (Ns = 13) [12, 14, 15, 28, 30,31,32,33,34,35,36,37,38, 42], with five mentioning it first in the abstract or title [13, 29, 39,40,41]. No studies reported RAI frequency data. Two studies reporting only cRRs defined exposure as URAI only [13, 15]. A third study  controlled for condom use by dividing women into subgroups that ‘always’ or ‘sometimes/never’ used condoms for all sex acts, reporting a cRR estimate for the ‘sometimes/never’ subgroup only, as no seroconversions occurred in the ‘always’ subgroup. All other studies either specified inconsistent condom use during RAI or did not specify condom use. No included studies reported estimates of partner ART use.
Most studies reported HRR (number of estimates [Ne] = 7), then IRR (Ne = 6), CIR (Ne = 4), and OR (Ne = 1). Three crude IRR, three crude CIR, and their 95% CIs were self-derived. All five adjusted estimates were HRR. Two were adjusted for age only [33, 36] and one for herpes simplex virus-2 (HSV-2) infection only . The remaining two were adjusted for age, sexual behaviour, sexually transmitted infections (STIs) ; and for condom use, symptoms of HIV/AIDS-related disease in the partner living with HIV, and ART use by the partner living with HIV . One study reported a single crude HRR combining data from four microbicide trials conducted in nine sites in Africa and one site in India , and was therefore included as an African estimate in subgroup analyses.
All estimates received a NOS score between 5 and 8, with 6 the most common score (Ne = 13) [13, 15, 28, 30, 32, 33, 35,36,37,38,39,40,41], indicating most studies were of adequate quality (Table S2). Across all estimates the most commonly failed criteria was ascertainment of exposure using FTFI rather than more confidential methods. For cRR, failure to adjust for potential confounders and for aRR, failure to adjust for important confounders including condom use, produced lower scores. Only one aRR estimate was adjusted for condom use.
Does RAI Practise Increase the Risk of HIV Acquisition Among Women?
The meta-analysis included 18 independent cRR estimates ranging from 0.45 to 24.3 across a total of 31,712 women (Fig. 2a), and five independent aRR estimates ranging from 0.82 to 8.50 across a total of 2176 women (Fig. 2b). Despite substantial heterogeneity across estimates, the pooled cRR (1.56, 95% CI 1.03–2.38, I2 = 72%, N = 18) and aRR (2.23, 95% CI 1.01–4.92, I2 = 70%, N = 5) suggested significantly higher HIV incidence rates among women reporting RAI (Fig. 2).
In the subset of four studies reporting both crude and adjusted estimates, the individual cRR and aRR estimates did not differ hugely (average difference = 39%) and the pooled cRR and aRR were of similar magnitude (pooled cRR = 1.26, 95% CI 0.57–2.80, N = 4; pooled aRR = 1.69, 95% CI 0.82–3.47, N = 4) (Fig. S1) to the overall pooled estimates based on all study estimates.
How Do Characteristics of the Study Participants Influence the RR?
In subgroup analyses, crude study estimates varied by world region (p = 0.03) with higher pooled estimates for studies outside Africa (pooled = 4.10, 95% CI 1.36–12.3, Ne = 5, I2 = 79%) than in Africa (pooled = 1.16, 95% CI 0.88–1.54, Ne = 13, I2 = 21%) (Fig. 2a, Table 2a). The small number of studies limited exploration of the heterogeneity across adjusted study estimates. Pooled aRR did not differ by world region (p = 0.90) but differed slightly (although non-significantly) by risk populations (p = 0.06) (Fig. 2, Table S3), with the pooled aRR for general-risk women higher (pooled = 8.50, 95% CI 1.90–38.0, Ne = 1) than for high-risk women (pooled = 1.69, 95% CI 0.82–3.47, Ne = 3, I2 = 63%) (Table S3). Pooled cRR and aRR did not differ significantly by other participant characteristics, including mean age or RAI prevalence.
How Do Study Characteristics and Study Quality Influence the RR?
In subgroup analysis, pooled cRR only differed significantly by interview method (p = 0.04), with lower estimates in studies using ACASI (pooled = 0.88, 95% CI 0.54–1.43, Ne = 3, I2 = 0%) than FTFI (pooled = 1.81, 95% CI 1.11–2.94, Ne = 15, I2 = 73%) (Table 2b). Pooled cRR and aRR did not differ by other study characteristics or quality indicators, including study year, study design, measurement of exposure, definition of RAI, type of measure, and NOS score. In exploring potential publication bias, pooled cRR from study estimates directly reported in the original studies was slightly higher than when self-derived or retrieved from authors and from studies that more prominently reported RAI in the abstract or title rather than the main text (Table 2b). However, these differences were not statistically significant (Table 2b). Similar analyses could not be done for aRR estimates. Funnel plots also showed no evidence of publication bias across cRR estimates (Fig. S2A), but some evidence across aRR estimates (Fig. S2B).
How Do Individual Study Estimates Influence Pooled Estimates?
In leave-one-out sensitivity analyses, the direction of the associations remained intact (Fig. S3A–C). Overall, the pooled cRR estimate was mainly influenced by Novak’s  estimate among women in the US, one of the two studies defining RAI as URAI only (Fig. S3A). Omitting this estimate slightly lowered the pooled cRR and reduced the I2 value (pooled without Novak = 1.30, 95% CI 0.95–1.77, I2 = 42%) (Fig. S3A). Consistent with the low heterogeneity across study estimates, the pooled cRR for African studies was not influenced by any specific estimate (Fig. S3B). The overall pooled aRR was equally sensitive to most study estimates, which slightly influenced results in either direction, although omission of Ramjee substantially reduced the I2 value (I2 = 34%) (Fig. S3C).