Introduction

The vast majority of maternal and newborn deaths occur in settings characterized by the least amount of data on intervention coverage and quality of care [1, 2]. Accurate data on effective intervention coverage, the proportion of individuals experiencing health gains from a service among those who need the service, is key to monitoring and scaling up the delivery of essential interventions to populations in need [3, 4]. Intervention coverage data is routinely used to track progress in national and global commitments such as Sustainable Development Goal 3—which includes a target to reduce the maternal mortality ratio to less than 70 per 100,000 live births, Countdown to 2030, as well as WHO strategies Ending Preventable Maternal Mortality (EPMM) and the Every Newborn Action Plan (ENAPP) [5,6,7,8]. In response to evidence that intervention coverage indicators may overestimate progress due to poor content of care[9,10,11,12], strategies have shifted emphasis from monitoring health care access to quality adjusted coverage [4, 13].

In resource-limited settings, data on the coverage of maternal and newborn health interventions often relies on women’s reports collected in nationally representative household surveys such as the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS) [14]. Self-reported data from population-based surveys, however, assumes that women accurately recall interventions received during the antenatal, intrapartum, and postnatal periods. A growing number of studies have assessed the validity of self-reported maternal and newborn care interventions used (or with the potential to be used) in these surveys [15,16,17,18,19,20,21,22]. Collectively, evidence from these studies demonstrate considerable variability in indicator validity across settings (accuracy metrics defined in Fig. 1), leading to the question of why? [15,16,17, 19, 23].

Fig. 1
figure 1

Indicator Key Terms

The validity of self-reported data on maternal and newborn health interventions received may be influenced by a variety of factors. These include women not knowing whether an intervention occurred because they were not aware it was performed (i.e., it was not explained, or in the case of newborn interventions, it was performed outside of the mother’s view). Recall may also be influenced by the nature and timing of questions. Prior research on maternal recall of interventions received in the intrapartum period has found that indicators which include technical terms (e.g., names of medications or diseases), refer to the timing (e.g., whether an intervention occurred immediately or within the first few minutes after birth) or sequence of events (e.g., whether infant wrapped before being laid on the mother’s chest) or are performed within the first hour of birth were unlikely to be recalled with high accuracy [15,16,17, 24]. The recall period may also influence reporting. Both DHS and MICS surveys typically ask women to recall events related to births occurring 2–5 years prior. Previous analysis of the recall accuracy of intrapartum and immediate postnatal care in Kenya has suggested that while accuracy generally declines with time, select interventions that are recalled with high accuracy at facility discharge maintain acceptable accuracy at 13 to 15 months follow-up [17].

Question comprehension related to respondent background characteristics or their expectation of care may also influence recall accuracy. For example, if a woman had a positive experience and/or delivered in a facility perceived to be high quality, she may be more likely to indicate that an intervention assumed to be beneficial occurred. Background characteristics may influence reporting if lower education contributes to poor understanding of questions with technical or complex wording or if higher parity leads to confusion of care with a previous pregnancy. Adolescents may have lower reporting accuracy because they have had less familiarity with the health system. Accurate coverage estimates among adolescents is of particular importance given that infants born to nulliparous adolescent mothers are at higher risk of neonatal and infant mortality than any other maternal age group, are more likely to delay care seeking, and receive fewer components of maternal health care [25,26,27,28], which further emphasizes the need for accurate coverage estimates in this group.

Explanatory analyses to examine patterns in the accuracy or consistency of reporting by respondent characteristics (e.g., age, education or prior parity) or by infant or facility characteristics have varied by indicator and setting, making it difficult to discern broad patterns [18, 29,30,31]. One study in rural Nepal found that maternal age and place of delivery (facility vs. home) did not influence maternal reporting of infant outcomes; while accuracy related to infant birth size was higher among multiparous mothers [30]. Another study of intrapartum care recall among mothers in Ethiopia found that older women (ages 35–39 relative to ages 10–24) were more likely to report postpartum complications inconsistently while those who delivered in a health facility were more likely to inconsistently report on newborn immediate thermal care practices [29]. A third study which assessed recall of facility-based postnatal care interventions among women in Kenya, found no pattern in reporting accuracy by maternal age, education, parity or infant age [18]. Overall, heterogeneity in the types of indicators assessed, study methodology, question wording and limited sample sizes for subgroup analysis in some studies complicates collective understanding.

To better inform how respondent and facility characteristics that influence the accuracy of self-reported maternal and newborn care, we synthesized data from five previous validation studies conducted in low and middle-income country settings. Studies were purposely sampled due to known similarities in question wording and validation design. Using these data, we examine whether respondent characteristics (e.g., age, education, prior parity), facility quality, or intervention coverage consistently predicted recall accuracy.

Methods

Data sources

We synthesize patterns in reporting accuracy from a unique set of known validation studies led by the Population Council which used the same validation design to assess comparable indicators of maternal and newborn care in multiple low and middle-income country settings. We draw on five validation studies of maternal and newborn care reported across two publications [18, 32]. Three studies assessed antenatal care indicators (Bangladesh, Cambodia, Kenya) and five studies (Bangladesh, Cambodia, Kenya (2) and eSwatini) assessed postnatal care indicators for the mother and newborn. Studies were purposely selected from two multi-country intervention studies as each study used the same or very similar wording for client questionnaires and observer checklists (Additional Files 1 and 2). Table 1 describes the study context and sample characteristics for each study. In all studies the samples consisted of women of reproductive age who received facility-based care and were interviewed at discharge (exit interview). Women’s self-reports at exit interview were compared against direct observation by a trained third-party observer using a structured checklist (reference standard).

Table 1 Study characteristics

Data on routine postnatal care in Kenya and eSwatini were drawn from the Integra Initiative, a quasi-experimental study which aimed to strengthen provider capacity to give postnatal care to the (1) infant and (2) the mother, integrated with (3) family planning, (4) HIV counseling, testing and services, and (5) screening for and management of sexually transmitted infections [33]. The study population for the Integra study was women who attended a postnatal check for themselves and/or for their newborn (> 24 h to < 10 weeks of delivery) at a participating study facility and who provided informed consent to be interviewed. There were eight facilities (public health units/MCH-FP) in three regions (Lubombo, Manzini and Shiselweni) of eSwatini and 12 public health facilities located in the former Eastern province (present-day Kitui and Makueni counties) in Kenya. In total, matched exit interview and observer data were available for 545 women in Kenya and 319 in eSwatini.

Data on receiving antenatal or postnatal care in Bangladesh, Cambodia, and Kenya were originally collected as part of an evaluation of a voucher and accreditation intervention (henceforth “voucher study”) which assessed whether the voucher program improved service quality by verifying service delivery through reimbursements to providers [34,35,36]. The theory of change was that subsidized service demand stimulates greater service utilization and competition between service providers to improve service quality [37]. Providers were effectively rewarded for quality service delivery through reimbursement of service provision at a contracted level of quality. As such, voucher intervention facilities were used as a proxy for higher quality of care relative to propensity-score matched control facilities. Although voucher intervention status is not a comprehensive measure of facility quality, existing evidence supports the link between such voucher-accreditation approaches with improved facility readiness and quality for reproductive health service delivery [37, 38]. While the influence of voucher schemes on antenatal care has been comparatively less studied, evaluation of the Kenya Safe Motherhood Voucher Scheme found significant improvement in the overall quality in the components of delivered postnatal care relative to comparable control facilities [39]. Evaluation of the Bangladesh voucher scheme also found some evidence of postnatal care service quality improvements among high performing voucher facilities relative to control areas, however, differences in antenatal service quality were less substantial [36]. While the Cambodia scheme was found to increase ANC service utilization, no published findings with regard to quality are available [40].

Voucher studies in this analysis included a total of 22 government health facilities from six divisions of Bangladesh (Barisal, Chittagong, Dhaka, Khulna, Rajshahi and Sylhet), 40 government facilities from five provinces of Cambodia (Kampong Speu, Kampong Thom, Kampot, Prey Veng, Takeo), and 62 facilities in Kenya, which were a mixture of public (64%), private-for-profit (16%), faith-based (15%) or NGO (5%) and were located in Kisumu, Kiambu, Kitui counties and two informal settlements in Nairobi. Approximately half of facilities in each location were assigned to voucher or propensity-matched control facility status. In total, 3,169 women were interviewed and observed for antenatal care (n = 1,036 in Bangladesh, 957 in Cambodia and 1,176 in Kenya) and 2,462 for postnatal care (n = 208 in Bangladesh, 635 in Cambodia and 1,619 in Kenya).

Indicator selection and data extraction

All comparable indicators with available validation data from at least three studies were extracted, as this was considered sufficient for meta-analysis [41]. For each indicator, two-by-two contingency tables which compared women’s self-report to the observer report (reference standard) were tabulated to obtain the number of true positive, false positive, false negative and true negative responses. “Don’t Know” responses were set to missing for validity analysis but reported in the tables as this response type is distinct from women who think they know whether an intervention was received.

Predictors

Predictors were defined a priori as maternal age, maternal education, parity, type of facility (whether facilities were voucher accredited or control) and intervention coverage (observed intervention prevalence in each setting). Predictor selection was informed by prior evidence of factors with the potential to influence reporting accuracy. Age strata were adolescent (ages 15 to 20) vs. adult (ages 21–52). The adolescent age group was inclusive of clients aged 20 to maximize sample size for stratification. Prior parity was defined as first pregnancy (for ANC) or birth (PNC) vs. two or more prior pregnancies or births. Education was defined as less than primary completion vs. primary completion or greater. As described above, whether facilities were voucher accredited was used as a proxy for facility quality. Finally, intervention coverage was calculated as the mean observed indicator prevalence in each study.

Analysis

To examine differences in reporting accuracy by respondent and facility characteristics, forest plots of sensitivity (true positive rate) and specificity (true negative rate) stratified by predictors of interest were examined. As a summary benchmark, high sensitivity and specificity was considered 80% or higher. This threshold was selected based on the empirical distribution of the accuracy of self-reported data related to maternal and newborn care. Stratified forest plots for age are shown, as this was the primary outcome of interest. To statistically test whether predictors were a source of heterogeneity between primary validation studies we used fixed effects and bivariate random effects models, as data allowed.

Bivariate random effects models were constructed when study indicators were validated in at least five studies, the minimum number required for model estimation [41, 42]. A bivariate random effects approach is the standard in diagnostic test accuracy as both sensitivity and specificity are simultaneously estimated, which accounts for the trade-off between sensitivity and specificity [43]. Typically, as a threshold is varied to increase the sensitivity, the specificity often decreases and vice versa [44]. Bivariate models also account for variation in the diagnostic threshold used across studies (e.g., differences in observer ratings due to variation in training procedures or other factors across studies). Bivariate models accommodate study-aggregate covariates (i.e., intervention coverage, i.e., the prevalence of a given indicator in each study) to examine whether the predictor affects sensitivity, specificity, or both. Intervention coverage was examined as a predictor for PNC reporting accuracy only, as a minimum of five studies was required for parameter estimation. Within-study predictors (i.e., individual-level respondent and facility characteristics) predictors are not accommodated in bivariate random effects models and were compared by assessing the degree of overlap in summary estimates (and corresponding 95% CIs) for stratified bivariate models.

As all ANC indicators as well as the predictor facility quality were collected in three studies only, univariate fixed effects models were constructed. Univariate models estimate the diagnostic odds ratio, which describes the odds of obtaining an affirmative response from a respondent who received the intervention compared to a respondent who did not receive the intervention [45]. To assess whether results varied by level of the predictor, overlap in the summary diagnostic odds ratio (DOR) and corresponding 95% CIs for fixed effects models were examined. Univariate fixed effects models do not account for the trade-off between sensitivity and specificity or between-study heterogeneity [41, 42], however, they give reasonably consistent estimates of the DOR irrespective of variation in diagnostic threshold [46]. Given these limitations, the ANC and facility-quality results from univariate fixed effects models are presented in Additional files 3, 4 and 5. Emphasis is given to results from bivariate fixed effects models in the discussion of study results.

Finally, indicators based on a small number of true (observed) positive or true negative cases which resulted in low precision (margin of error greater than 15 percentage points for bivariate models or a diagnostic OR of five or greater for univariate models) are reported in the data tables, but not discussed in the text. Results from the bivariate and univariate models were obtained using the mada package in R Studio (Version 1.1.383, Boston MA) [44].

Results

Study and sample descriptions

Participant sociodemographic characteristics across studies are presented in Table 2. The pooled sample size for postnatal care was 3,326 women and for antenatal care indicators was 3,169 women. There were comparable indicators with sufficient sample size (no multiple zero cells) in three or more countries for 12 postnatal care indicators and six antenatal indicators.

Table 2 Sample descriptive statistics

Among postnatal care clients, mean age was highest in Cambodia (26.8 years) and lowest in Bangladesh (23.8 years). Higher educational attainment (completion of secondary school or more) was greatest among postnatal clients in eSwatini (73.9%) and lowest among participants in the Kenya Integra study (19.1%). Postnatal clients in the Kenya Integra study were most likely to be primiparous (29.9%).

Among antenatal clients, mean age was slightly lower in Bangladesh (23.5) relative to Cambodia (27.8 years) and Kenya studies (25.2 years). On average, a higher proportion of antenatal clients in Bangladesh completed secondary school or more (62.0%) relative to those in Cambodia (38.8%) and Kenya (40.8%). Antenatal clients were most likely to be pregnant for the first time in Bangladesh (39.6%).

Indicator validity across studies

Figures 2 and 3 display PNC indicator sensitivity and specificity across included studies. In general, indicators of PNC had higher sensitivity than specificity. With few noted exceptions, estimates of sensitivity and specificity demonstrated wide variability by study. One of twelve PNC indicators demonstrated a sensitivity of greater than 80% in all five studies—whether the infant was weighed. An additional six PNC indicators had a sensitivity of approximately 80% or higher in three of five studies – blood pressure check, breast exam, abdominal exam, discussion of family planning, infant physical exam (undressed), and discussion of breast/infant feeding. In contrast, no PNC indicator achieved a specificity of 80% or higher in all five studies. All PNC indicators which reflected aspects of the maternal physical exam also achieved a specificity of approximately 80% or more in three of five studies – blood pressure check, breast exam, abdominal exam, vaginal exam, anemia check/referral and whether the provider asked or checked for excessive bleeding. One counseling related indicator – whether dangers signs for the mother were discussed – also achieved a specificity of 80% or more in three of five studies. Few indicators of newborn PNC achieved a specificity of 80% or higher in any one study. No PNC indicator achieved both sensitivity and specificity greater than 80% in more than one study, underscoring considerable heterogeneity in validity results across settings.

Fig. 2
figure 2

Postnatal Care (PNC) Indicator Sensitivity (Panel A) and Specificity (Panel B) by Country of Study, Sorted by Indicator Prevalence. Study abbreviations refer to Cambodia Voucher study (CA), Bangladesh Voucher study (BA), Kenya Voucher study (KE), Kenya Integra Study. (KE-I) and eSwatini Integra Study (SZI). Indicators are defined in Additional File 2. Grey horizontal lines represent 95% confidence intervals about the estimates. As a benchmark for indicator quality, 80% sensitivity and specificity is shown as a vertical grey line

Fig. 3
figure 3

Postnatal Care (PNC) Counseling Indicator Sensitivity (Panel A) and Specificity (Panel B) by Country of Study, Sorted by Indicator Prevalence. Study abbreviations refer to Cambodia Voucher study (CA), Bangladesh Voucher study (BA), Kenya Voucher study (KE), Kenya. Integra Study (KE-I) and eSwatini Integra Study (SZI). Indicators are defined in Additional File 2. Dq abbreviation for “discussion of”. Grey horizontal lines represent 95% confidence intervals about the estimates. As a benchmark for indicator quality, 80% sensitivity and specificity is shown as a vertical grey line

ANC indicators performed similarly to PNC with generally high sensitivity across the three settings and variable specificity (Additional File 3). Three indicators of the maternal physical health checks during an ANC consultation had a sensitivity greater than 80% in all three studies: weight taken, blood pressure check and abdominal exam. While no ANC indicator had a specificity of greater than 80% in all three settings, two ANC indicators – urine screen and fetal heart rate monitoring – had a specificity of at least 80% in two or more settings. No ANC indicator had both sensitivity and specificity of 80% or higher in more than one study.

Respondent characteristics: maternal age, education, and prior parity on PNC reporting accuracy

Age-stratified results showed overlap in the 95% CI for sensitivity and specificity between adolescent and adult strata for all PNC indicators across studies (Fig. 4). Across individual studies, there was no clear pattern indicating that adolescent-reported sensitivity and specificity was better or worse than adult reporting. Wide confidence intervals in individual study estimates obscured any significant differences between age groups.

Fig. 4
figure 4

Postnatal Care (PNC) Indicator Sensitivity (Panel A) and Specificity (Panel B) by Country of Study, Stratified by Respondent Age Group. Study abbreviations refer to Cambodia Voucher study (CA), Bangladesh Voucher study (BA), Kenya Voucher study (KE), Kenya Integra. Study (KE-I) and eSwatini Integra Study (SZI). Age group: adolescent (ages 15 to 20 years), adult (ages >20 years). Grey horizontal lines represent 95% confidence intervals about the estimates, overlapping confidence intervals implies no statistical difference in level of the predictor. As a benchmark for indicator quality, 80% sensitivity and specificity is shown as a vertical grey line

Estimates of sensitivity and specificity across postnatal care interventions (Table 3) stratified by adolescent and adult group also revealed substantial overlap in the 95% CI for all indicators, suggesting no differences by age. Similarly, no systematic differences between stratified bivariate models were observed for the predictors of education or prior parity for any postnatal care indicator examined (Tables 4 and 5).

Table 3 Bivariate random effects model: Self-reported PNC indicator accuracy by adolescent vs. adult age group
Table 4 Bivariate random effects model: Self-reported PNC indicator accuracy by education
Table 5 Bivariate random effects model: self-reported PNC indicator accuracy by parity

The same general patterns were observed in univariate fixed effects estimates obtained for ANC indicators by age, education, and parity (Additional files 4 and 5). Although there were some exceptions, for most indicators there were either no differences by subgroup or comparison was not possible due to low precision.

Facility quality

Differences in the accuracy of PNC indicators were inconsistent by facility quality (whether respondents attended a voucher intervention facility or comparable control facility) (Additional file 6). Of eight indicators with reasonable precision for comparison, two indicators differed by facility quality level but in mixed directions. The odds of correct reporting on whether the infant was examined (undressed) was greater among respondents who visited non-voucher facilities (proxy for lower facility quality), while whether information on infant danger signs was discussed was more likely to reported accurately among mothers who attended voucher intervention relative to control facilities (proxy for higher facility quality).

Intervention coverage

Visual inspection of paired forest plots of sensitivity and specificity for PNC indicators sorted by intervention coverage (Figs. 2 and 3) illustrate that, for most indicators, there is a trend of decreasing specificity (more false positive reporting) with higher levels of intervention coverage across studies. In the forest plots, there is some evidence that indicator sensitivity improves with increasing prevalence (most apparent for indicators of the maternal physical exam), however, this pattern is less strong.

Results of the likelihood ratio test which compared model fit for a bivariate random effects model that incorporated intervention coverage as a study-level covariate relative to an intercept-only model confirmed that intervention coverage significantly explained heterogeneity in reporting accuracy between studies for the majority (9 of 12) indicators (Table 6). Separate tests that examined the influence of intervention coverage on indicator reporting accuracy demonstrate that indicator specificity decreased with higher intervention coverage levels for the majority (8) of indicators, implying greater false positive reporting. Results also show that increased sensitivity was also positively associated with intervention coverage for half (6) of the indicators, implying low false negative reporting. The relationship between intervention coverage and indicator sensitivity and specificity was variable among ANC indicators (Additional File 6), with only three studies per indicator.

Table 6 Influence of intervention coverage on self-reported PNC indicator accuracy1

Discussion

We assessed heterogeneity in self-reported antenatal and postnatal care by respondent and facility characteristics using data from five studies across Sub-Saharan Africa and Southeast Asia. Results show that no indicator of antenatal care nor postnatal care achieved a combined high sensitivity and specificity (80% or higher) in more than one study, underscoring variability in validity estimates across settings. We also did not find strong evidence that accuracy in self-reported ANC or PNC care systematically varied by maternal characteristics, such as adolescent vs. adult age, education, parity, or by facility quality. Higher intervention coverage level, however, was associated with reduced specificity (higher false positive reporting) and somewhat improved sensitivity (lower false negative reporting) for most indicators.

That validity did not systematically vary by respondent characteristics or facility quality is perhaps a surprising, although reassuring result in terms of approaches to data collection and indicator construction. Our finding is largely consistent with prior studies of respondent characteristics on reporting accuracy for received maternal health services which have found that associations vary by both indicator and respondent attribute [30, 31, 47]. In addition, no consistent evidence related to facility quality (voucher intervention or control facility) was observed across indicators. This finding aligns with a study which assessed how the accuracy of women’s perceptions of facility quality predicted her choice of where to receive care in informal settlements of Nairobi, Kenya. The study found substantial evidence of ‘information asymmetry’ – that a high proportion of women (two in five) were unable to discern which facilities offered the highest technical quality of care prior to using the facility’s services [48]. It may be that inaccurate perceptions of facility quality explain, in part, why facility quality was inconsistently related to reporting accuracy. It is also possible that women value different aspects of care, including the patient care experience, than those typically emphasized in monitoring efforts [49]. Our measure of facility quality may have been an incomplete proxy for how women perceive quality care.

The finding that higher study intervention coverage (i.e., prevalence) is associated with reduced specificity and somewhat improved sensitivity is also in accordance with prior findings and has important implications for efforts to monitor maternal and newborn quality of care. While sensitivity and specificity are independent of prevalence in their mathematical calculation, several studies and reviews have suggested an association [50]. A study by Carter and colleagues, which assessed the reliability of maternal recall of delivery and immediate newborn care indicators in Nepal, for example, also documented an inverse association between indicator specificity and higher intervention coverage [47]. This pattern may be the result of reporting biases in the classification of the reference standard (i.e., the observer report) and/or in women’s self-reports (the ‘test’) [50]. For example, it is possible that in settings where an intervention is commonplace, respondents are more likely to anticipate that it will occur and in turn respond affirmatively. This type of reporting bias would lead to higher false positive reporting (lower specificity), implying monitoring efforts would overestimate coverage in high coverage settings. A high expectation of care could also imply few false negative reports (high sensitivity), which was observed for about half of the indicators in our analysis. In high coverage settings women who did receive the intervention were unlikely to be undercounted. However, in low coverage settings, underestimation (low sensitivity) may be an issue. For monitoring progress in the quality of maternal and newborn care, the reduced specificity in high prevalence settings and lower sensitivity in low prevalence settings is of public health importance. Although descriptive only, our results suggest that monitoring efforts should consider the context of care when interpreting national estimates and time trends in intervention coverage, as mismeasurement may occur in both directions dependent on setting.

A strength of this study is that we were able to synthesize patterns in reporting accuracy across several studies which used the exact or very similar question wording and recall time by interviewing women at facility discharge for a routine antenatal or postnatal care visit. This addresses the limitations of prior studies on this subject which have not been able to discern patterns across settings and have smaller sample sizes for subgroup analysis. The ability to examine validation results across settings descriptively and with statistical assessment lends robustness to our main findings. However, several important limitations remain. Primarily, few studies have examined the accuracy of maternal reports of antenatal and postnatal care using comparable indicators and it is possible that a relevant study was missed. Further research to examine variability in indicator accuracy across settings is warranted. The few number of studies assessed contributed to low precision in our analysis, particularly for ANC indicators which were only assessed in three studies and used fixed, rather than random effects models. Results from the fixed effects models should be considered exploratory as variability by study, correlation between sensitivity and specificity, and heterogeneity attributed to threshold differences across studies are not accounted for. For example, observer training for what constituted an intervention having taken place may have varied across studies. Further, it was possible to incorporate study aggregate variables (i.e., intervention coverage) only, rather than within-study covariates (e.g., respondent age, education parity) [41]. To assess variability by respondent individual characteristics we used stratification, which reduces precision. For example, the sub-sample of adolescents across studies was relatively small, despite increasing the age category to include respondents aged 20 years. Finally, given data availability, it was not possible to examine facility type (e.g., public sector or not, tier of facility) across studies. This is a topic for future research. We hypothesize that intervention coverage within facility type may, at least in part, contribute to observed differences in validity.

Despite noted limitations, the finding that reporting accuracy does not consistently vary by respondent or facility characteristics is reassuring news for efforts to monitor the quality of maternal and newborn care. Evidence of consistently lower reporting accuracy by respondent characteristics such as adolescent age could, for example, suggest that self-reported data may be insufficient to inform country-level interventions, policies and resource allocation for a group at high risk of adverse maternal or infant health outcomes ([25, 26], and this was not the case. However, study findings do suggest that caution is warranted when interpreting results, obtained by participant self-report, of interventions to improve quality of maternal and newborn care in very low, or alternatively very high, prevalence settings as false negative and false positive reporting may be more likely in either setting. National monitoring efforts should consider the context of care in the interpretation of country estimates of the coverage of self-reported quality of care and triangulate with other available data sources such as facility registries. Further research to validate indicators in additional study settings and which models the extent different intervention coverage levels affect the ability to detect changes in coverage between countries and over time is warranted. With sufficient confidence in such models, adjustment factors could be applied to coverage estimates in global monitoring efforts to account for bias attributed to differences in intervention prevalence. At the very least, caution is warranted in the interpretation of coverage estimates from very high or low prevalence settings.

Conclusions

Results from this study provide no evidence to suggest that self-reported receipt of maternal and newborn health interventions are consistently influenced by respondent characteristics including adolescent vs. adult age group, education, parity or facility quality. Rather, this analysis suggests that accuracy differences across studies is, at least in part, explained by differences in the prevalence of the intervention across settings. This study suggests that high-intervention coverage settings may contribute to higher false positive reporting (poorer specificity) among women who receive PNC care at health facilities and undercount intervention coverage (lower sensitivity) in low prevalence settings. Caution may be warranted when interpreting population-based household survey estimates of quality, or change in quality over time, in very high or very low prevalence settings.