Background

Universal health coverage (UHC) aims to ensure access to needed services of sufficient quality for everyone without suffering financial hardship [1]. Health financing reforms are one means towards reaching this aim in the most efficient and sustainable way. Results-based financing (RBF) is one of several health financing reforms currently implemented in a number of low- and middle-income countries (LMIC) [2]. RBF is considered a form of strategic purchasing that reimburses healthcare providers (i.e. service suppliers) based on pre-defined quantity or quality outputs [3]. RBF schemes can also include demand-side components related to population coverage (e.g. vouchers, conditional cash transfers) to further complement service supply [4].

While many LMIC have introduced RBF to improve coverage and quality of maternal and childcare service provision, recent evidence points at the existence of mixed results when evaluating RBF programs, with both quantity and quality indicators responding differently in different contexts [5, 6]. RBF programs are rather complex health financing interventions in terms of their expected theory of change (i.e. use of financial incentives to align health worker behaviours to provide services more efficiently and effectively, thus improving both the quantity and quality of healthcare services used by the targeted population). To reflect this complexity, RBF evaluations often assess impact along a number of different dimensions (i.e. health worker motivation, quality of care, service use, crude population coverage) [7,8,9,10]. Given RBF programs’ theoretical impact on both service quality and population coverage, approaches measuring aggregated outcomes that frame broader concepts might be useful adjuncts in gaining additional understanding of the role RBF plays in the achievement of UHC.

To assess achievements towards UHC, effective coverage (i.e. the proportion of a population in need of care who receive services of sufficient quality) has been suggested as a suitable measure in the evaluation and monitoring of health system interventions [11]. Effective coverage differs from the commonly used measure of crude or contact coverage (i.e. the proportion of a population in need of care able to access a service) in also accounting for the expected or actual effectiveness of received care necessary to produce a desired health outcome [12]. While increasingly applied to both general or impact assessments of maternal and child health programs in LMIC [13,14,15], the use of effective coverage in the evaluation of health financing interventions, such as RBF, has so far not been reported.

Given that the wider focus of RBF programs is not only intervening with health care supply, but also – directly or indirectly – with demand of services and the health system in general [16], the theory of change of RBF closely represents the UHC idea of service coverage (i.e. people in need receive essential health services of sufficient quality) [17]. The aim of this study was therefore to examine the impact of an RBF scheme in Malawi on effective coverage in relation to the provision of facility-based obstetric care services and to gain further insight in the role PBF can play in achieving UHC.

Methods

Study setting

At the time the intervention was designed, mortality of mothers in Malawi was high (maternal mortality ratio in 2013: 636 deaths/100,000 live births for Malawi vs. 210 deaths/100,000 live births globally; newborn mortality rate in 2013: 25.9 deaths/1000 live births in Malawi vs. 20.1 deaths/1000 live births globally) [18, 19]. Obstetric care is offered free of charge through public and contracted not-for-profit health facilities [20]. Yet, 75% of all pregnant women with obstetric complications do not actually receive satisfactory emergency obstetric care (EmOC) [21]. Salaries of publicly employed health workers stem directly from central government budgets, while publicly funded primary care facilities (i.e. health centres and district-level hospitals) receive a mix of financial allocations from both central and local government budgets. Malawi’s health system is further challenged by chronic health worker shortages and system-wide stock-outs of essential drugs and supplies [22].

Intervention

The Malawi Ministry of Health (MoH) introduced the Results-Based Financing For Maternal And Neonatal Health (RBF4MNH) Initiative to four districts (Balaka, Dedza, Mchinji, Ntcheu) in April 2013 to enhance obstetric care provision at facilities designated to eventually fully function as EmOC centres. Together, these four districts consist of a total of 33 designated EmOC facilities serving a catchment population with an expected birth rate of about 111,450 deliveries per year [21]. District selection was driven by MoH decisions to avoid overlap with other existing or upcoming maternal health or health financing programs in the country. The RBF4MNH’s main objective is to improve both quality and access to facility-based obstetric care for women and newborns during birth and up to 48 h after delivery [23] through a combination of supply- (performance-based payments to facilities and district health management teams (DHMT)) and demand-side mechanisms (conditional cash transfers (CCT) to pregnant women within catchment areas). Implementation occurred in two phases: an early pilot phase (April 2013 to October 2014) and a later expansion phase (November 2014 onwards). During the early phase, only 18 out of the 33 EmOC facilities (four hospitals, 14 health centres) were empanelled and later expanded by an additional 10 facilities (two community hospitals, eight health centres) with on-going plans for a nation-wide scale-up. Empaneled facilities were selected based on the presence of at least four skilled birth attendants, catchment population size, and number of institutional deliveries.

Upon verification, facilities receive payments for achieving a set of performance targets related to quality of clinical care as well as general service improvement performance indicators [see Additional file 1]. Of these rewards, 40% are earmarked for further investments improving structural working conditions, 60% for individual bonus payments to health workers and auxiliary staff (about 15–25% of staff’s baseline salary). DHMT receive payments related to performance indicators on effective management and support of the entire district [see Additional file 1), also divided into an investment and bonus portion. CCT portions to women were calculated to defray upfront costs related to childbirth (i.e. transportation, basic childbirth items, stay in maternal waiting home) and average opportunity costs anticipated by an average poor patient or her family in Malawi during a 48-h postpartum facility stay. The maximum payment per woman is about seven Euros. To ensure minimum standards in EmOC delivery, all RBF facilities received some basic infrastructure and/or equipment support (e.g. delivery beds, essential examination, EmOC and sterilization material, renovation or reconstruction of labour rooms, postpartum wards, maternal waiting homes, electricity and water supply) prior to intervention launch.

The early implementation phase consisted of three six-month reward cycles and served as an opportunity for implementers to further fine-tune the intervention in response to unforeseen challenges. Feedback and experiences gained during these initial cycles resulted in some programmatic adjustments prior to its expansion in October 2014, including:

  • Performance verification using reciprocal peer-review between districts assigned to an external third party auditor to avoid negative sentiments between peers.

  • ‘Target-based’ (i.e. all-or-nothing) calculation of rewards changed to ‘proportion-based’ (i.e. relative to progress) to increase motivation towards attainment of more challenging targets.

  • Reward cycles reduced from six to three months to increase motivation linking performance and rewards over shorter time intervals.

  • With the demand-side CCT component requiring facilities to directly keep and manage cash, administrative restrictions in setting up bank accounts at RBF facilities had to be overcome. This delayed the initial start of the demand-side component to September 2013 (i.e. six months after the initially planned launch).

Study design

As part of a larger impact evaluation assessing the effect of the RBF4MNH on MNH service utilization and quality [24], this study followed a non-randomized controlled pre-post-test design including 32 of the 33 facilities targeted by the RBF4MNH intervention (we excluded one facility since it could not be identified as EmOC given lack of a delivery ward at baseline). The 18 facilities empanelled during the early phase served as ‘interventions’. Of the remaining 14 control facilities, five turned into RBF facilities at the start of the expansion phase and were treated as ‘switchers’. Data were collected at three time points: baseline (April–May 2013, before official program launch), midterm (June–July 2014, approximately one year after program launch) and endline (June–July 2015, approximately two years after program launch).

Study samples and data collection

We used three different samples: a facility sample, a service user sample, and an obstetric case sample. During each survey round, we collected three different sets of data: a facility inventory, direct case observations, and a household-level survey.

The facility sample included 32 facilities. During baseline and midterm this sample consisted of 18 intervention and 14 control facilities, and 25 interventions and nine controls during endline (after expansion in October 2014). Selection of interventions followed the RBF empanelment criteria. Controls included all EmOC facilities initially not included by the RBF. Facility inventories consisted of a structured checklist collecting information on availability of operational equipment, medicines, and supplies essential to routine and basic emergency obstetric care.

The service user sample included 5509 randomly selected women living in catchment areas of sampled facilities who completed a pregnancy within the twelve months preceding each survey date. Women reporting pregnancies that resulted in foetal loss or demise before the third trimester were excluded. A two-stage cluster approach was used to sample eligible women. Structured household-level questionnaires collected information on women’s demographic characteristics, health-seeking behaviour during pregnancy, obstetric care service use at birth, birth outcome, and household-specific socio-economic.

The obstetric case sample included a total of 383 labouring women who presented to the sampled facilities during the three data collection rounds. Convenience sampling was used to include only cases without obstetric complications to ensure comparability between cases. Case observations consisted of a structured checklist collecting information on birth attendants’ adherence to clinical standard guidelines during routine case management. Content was based on performance standards developed for the Malawi Performance Quality Improvement program [25]. Observations started once a labouring woman was admitted to the maternity unit and lasted up to two hours after delivery.

Outcome variables

We used effective coverage of pregnant women with facility-based basic obstetric care services as main outcome variables and defined it according to the literature [11, 12, 26] as ECFBD preg = UFBD preg ∣ NFBD preg ∗ QFBD preg representing effective coverage (EC) of pregnant women (preg) using facility-based delivery services (FBD) at a designated EmOC facility (U) providing a given level of quality (Q). Here, UFBD preg ∣ NFBD preg denotes FBD service use conditional on true need for basic EmOC (i.e. crude or contact coverage) and defined it as any woman carrying a pregnancy beyond the second trimester [27, 28]. We further defined service use UFBD preg as any pregnant women who used services at any of the designated EmOC facilities included in our sample (vs. non-facility-based care or facility-based care at a non-EmOC facilities). To determine the expected quality of care received by a woman using FBD services at a given facility, we created a composite score using a content-of-care approach measuring the extent to which obstetric care was provided in adherence with pertinent standards of care based on a combination of input and process indicators taken from the inventories and case observations. In developing the composite score, we followed a standard approach including weighting, aggregation and uncertainty analysis [29]. A detailed outline of this approach and the underlying indicator mapping is provided in the additional files [see Additional file 1 and Additional file 2].

The resulting composite score ranged from 0 (not meeting any of the measured obstetric care standards) to 1 (meeting all standards). In the entire sample, none of the studied facilities actually attained a score of 1 (measured scores were nearly normally distributed with a range from 0.22 to 0.86, median of 0.56 and mean of 0.55), dichotomous categorization of facilities into ‘effective’ (i.e. a score of 1) and ‘less than effective’ (i.e. a score less than 1) was not practical to our analysis. In addition to only measuring effective coverage as the percentage of service use adjusted by the respective quality score, we further created additional binary variables based on different cut-off values within the upper range of the measured scores to assess facilities’ relative achievements towards these sub-levels. These cut-offs were set at scores of 0.5, 0.6, 0.7, and 0.8 representing 50%, 60%, 70%, and 80% of full obstetric care effectiveness. Quality scores were then assigned to each sampled woman based on reported facility use during the previous year, assuming short-term changes in service quality or effectiveness at a given facility to be minimal. Due to missing data for some facilities, we were not able to determine a quality score for each facility during baseline and midterm and consequently could not assign quality information to 141 women, reducing the actual sample available for analysis to 5368 women.

Analytical approach

We use descriptive statistics and two-sample t-test to compare differences in key characteristics between interventions and controls for each sample. We used frequencies illustrating the distribution of facilities and users by quality score categories over time. To estimated the RBF4MNH impact on crude and effective coverage we used linear regression in a difference-in-differences comparison [30]:

$$ {Y}_i={\beta}_0+{\beta}_1t{1}_i+{\beta}_2{T}_i+{\beta}_3t{1}_i\ast {T}_i+{\beta}_4t{2}_i+{\beta}_5t{2}_i\ast {T}_i+{\beta}_k{X}_{ki}+{\varepsilon}_{it}, $$

with Yi representing the outcome (crude or effective coverage), t the time point (t1 = midterm, t2 = endline), T the treatment group, and T*t the interaction between treatment and time point (T1*t1 interaction at midterm, T*t2 at endline). Coefficients β3 and β5 represent the effect estimates at midterm and endline, respectively.

Models were further adjusted for potential confounders (denoted by βkXki): household characteristics (district location, distance to nearest EmOC, socioeconomic status) when modelling effects on both crude and effective coverage; additional facility characteristics (type, ownership) when modelling effects on effective coverage. Household socio-economic status was measured as a relative wealth index based on assets and living conditions using principal component analysis and described in detail elsewhere [31]. We also adjusted for clustering at catchment area level and for the late phase expansion with five initial control facilities switching treatment arms. Given the relatively small number of catchment area clusters we used bootstrapping to improve the accuracy of standard errors for our effect estimates. Given the limitations of our clustered study design, we were only able to detect effect sizes of 0.25 or larger at a significance level of 5%. STATA version 14.1 was used for all statistical analyses.

Results

Sample characteristics

Table 1 summarizes distribution and characteristics for each of the three samples included in the analysis. Complete information for both facility inventories and corresponding case observations was available for only 26 facilities at baseline, and for 30 and 32 facilities at midterm and endline. Across time points, proportions of health centres and faith-based facilities were higher in the control arm reflecting the RBF4MNH’s mandate to contract primarily public facilities, including the four district hospitals. For the remaining 5368 women sample sizes differed greatly between study arms reflecting the oversampling of district hospital catchment areas in the intervention group. Statistically significant differences in two-sample t-tests between group means existed for household socioeconomic status (p = 0.01 at baseline), with women in the intervention arm on average residing in poorer households. Between 88.6 and 97.0% of women depending on study arm and time point reported facility-based service use, while delivery at any or at catchment area specific EmOC facilities was much lower.

Table 1 Distributions and characteristics of sampled facilities, observed cases, and surveyed women

Obstetric quality scores

As shown in Table 2, facilities scored on average relatively low when assessed against the standards included in our composite. While we observed a continuous increase over time in intervention facilities (from an average score of 0.55 to 0.65), scores in control facilities showed an increase only at midterm (from 0.56 to 0.60), but then dropped below the baseline value at endline. As for the different score levels, more than half of facilities in each study arm reached the 0.5 cut-off (i.e. at least 50% of service effectiveness) at baseline, with more than three-quarters of intervention facilities reaching this cut-off at endline compared to only about two-thirds of controls. About one third of control and even fewer intervention facilities reached the 0.6 cut-off at baseline. While more than half of intervention facilities eventually reached this level at endline, there was no further improvement observed in controls over time. Only single facilities (control) or none (intervention) reached the 0.7 and 0.8 cut-off scores at baseline, with slight improvements over time only in some intervention facilities. Further details on sub-scores for each of the twelve quality categories included in the composite are provided in an additional file [see Additional file 1].

Table 2 Obstetric care quality score and distribution of EmOC facilities by score categories

Table 3 outlines the proportion of pregnant women using services at EmOC facilities in relation to the observed obstetric quality score. At baseline, most women ended up using facilities with a score of 0.59 points or less (75.4% combined in interventions areas, 62.3% combined in controls). At endline, the larger proportions of pregnant women in intervention areas used services at facilities with scores between 0.60 and 0.79 points (combined 84.1%), compared to control areas where the larger proportion of women used facilities scoring between 0.50 and 0.69 points (combined 72.6%).

Table 3 Women seeking care at any designated EmOC facility by obstetric care quality score (categorized)

Trends in population coverage

Figure 1 graphically illustrates the levels of crude and effective coverage for each study group (solid bars), as well as effective coverage for each of the different effectiveness cut-offs (patterned bars). Crude coverage remained relatively stable throughout the study period by around 80% in both study arms. In comparison, effective coverage at baseline was much lower (43.5% intervention, 45.7% control) and increased in both study arms at midterm to about 52%. At endline, effective coverage in controls declined to near baseline levels (46.6%), but remained elevated in interventions (54.0%). Applying cut-offs at different effectiveness levels, coverage trends were parallel with increases between baseline and midterm for intervention and controls at a 50% cut-off, but remained stable between midterm and endline. At a 60% cut-off, effective coverage trends were parallel between baseline and midterm and diverged between midterm and endline due to further increases in interventions and a drop in controls. At a 70% cut-off, an effective coverage upward trend between baseline and midterm occurred only in the intervention arm, with a continuous downward trend in the control arm. At an 80% cut-off, effective coverage remained extremely low for both study arms and was absent at endline.

Fig. 1
figure 1

Time trends of crude and effective coverage (without and with cut-off levels applied). Data based on descriptive non-adjusted analysis

Effects on population coverage

Table 4 presents the effect estimate attributable to the RBF4MNH intervention modelled by our difference-in-differences analysis both at midterm (β3) and endline (β6). Effects of the RBF4MNH on crude coverage were initially negative, but slightly positive later on at endline, however, in both instances without statistical significance (p = 0.28 and 0.83, respectively). Effects on effective coverage in comparison were positive, but statistically non-significant, both at midterm (4.7%-points, p = 0.13) and at endline (7.1%-points, p = 0.07). At a 50% cut-off, effects on effective coverage were only minimal and statistically non-significant. At higher cut-offs, effect sizes were positive and relatively large, with a statistically significant increase of 31.1%-points (p = 0.02) between at endline for the 60% and of 25.8%-points (p = 0.04) at midterm for the 70% cut-off. At the 80% cut-off, effects remained positive, but statistically non-significant.

Table 4 RBF4MNH impact on crude and effective obstetric care coverage (adjusted analysisa)

Discussion

To date, this is the first study assessing the impact of a RBF program on effective coverage. Our findings indicate that the RBF4MNH improved effective coverage of pregnant women with facility-based obstetric care by about 7%-points after a two-year implementation period. Defining effectiveness by different quality cut-offs, our analysis further demonstrated that this impact was greatest (31%-point increase) once service effectiveness was defined as meeting at least 60% of the measured quality score. While use of obstetric care services (crude coverage) remained relatively unchanged, a higher proportion of women received higher quality services over time, although none of the studied facilities met all aspects of the quality of care measured by our score.

In LMIC contexts, RBF programs are not only seen as provider payment mechanisms but also as a driver for wider health care reforms addressing good governance, autonomy, competition, and separation of health financing functions [32]. Unlike many other RBF schemes in LMIC, which address a wide range of primary care services, the RBF4MNH kept an explicit focus on obstetric care only, and thus might not be fully comparable with other broader RBF programs. While the RBF4MNH improved quality of care (Table 1), thus allowing more women to attend services of higher quality in general (Table 3), this rather vertical implementation approach might have been less effective in addressing some of the underlying cross-cutting service delivery deficits in the country. Still, the RBF4MNH motivated health workers and management teams to take more responsibility and accountability in their strategic decision-making [33].

Given the relatively short evaluation period of about two years, the observed improvement in service quality by 0.1 points in intervention facilities is quite remarkable, but probably not sufficiently satisfying in light of the rather low scores observed at baseline (around 0.55). Although not validated, we feel confident that in developing this score using a content-of-care approach, we took sufficient precautions in both indicator selection and composite construction to adequately reflect the current standards and guidelines related to obstetric care provision in our study context beyond the programs performance focus. Thus, we consider the improvement in the score to point at wider quality of care deficits not yet sufficiently addressed by the RBF4MNH program, such as challenges related to central supply-chain management, the wider aspects related to the skilled health worker shortages, or Malawi’s declining economic situation.

Given these wider deficits in relation to health service delivery, the large positive net effects on effective coverage (20–30%-points) achieved at 60% and 70% cut-off levels are nevertheless a demonstration of the RBF4MNH’s potential to introduce improvements towards more effective service provision and coverage. However, these effects may have partly resulted from the initial RBF4MNH assignment in favour of more functional facilities. The initial upgrades (i.e. non-conditional inputs) prior to study begin might also have likely contributed to the observed effects, however, this component was considered an integrated part of the RBF4MNH implementation design and was thus evaluated accordingly. Our analytical approach, unfortunately, did not allow further discerning of the effects of upgrades from those produced by the performance payments alone.

Another reason explaining the larger effects observed at 60% and 70% effectiveness cut-offs compared to higher levels might be the relative short study period of about two years, of which the first 18 months were characterized by programmatic changes. It is therefore plausible to assume that the RBF4MNH reached full functionality only once the expansion phase started. Introduction of new purchasing structures in a healthcare setting with little antecedents in setting performance targets, service price negotiations, performance documentation and verification, or distribution and investment of rewards by providers seemingly requires a wider timeframe than the one feasible for this evaluation [34], and limited the extent to which our study was able to assess the full effect of the program.

Evidence from other RBF programs in LMIC suggests the necessity of demand-side components in redirecting women’s choice in place of delivery [9, 10]. However, we observed no substantial effects on crude coverage in this study. We attribute this to the initially relatively constrained incentives towards service use (demand-side component was only fully implemented during the expansion phase) and the fact that crude coverage was rather high already at baseline. In light of the relatively low obstetric quality observed for many facilities, a phase-in of the demand-side after successful supply-side implementation should probably have been preferential from an ethical point of view, in order to limit the extent to which women are directed to use services that provide substantially lower quality or ineffective care.

Our study has limitations. First, as already mentioned, the two-year study period might have been too short to assess the full impact of the scheme, considering the general scope of RBF schemes to not only change reimbursement structures, but concomitantly introduce a set of new management and decision-making processes. Second, in absence of randomized assignment of facilities to the RBF, our study remained limited to a quasi-experimental design. With only one observation point available prior to intervention start we failed to test the parallel trend assumption underlying the difference-in-differences method, with our effect estimates being more conservative and likely underestimating the true effect of the intervention. Third, selection of control facilities had to be limited to the four intervention districts. RBF4MNH incentives were not only provided to assigned facilities, but also to each DHMT targeting district-wide activities (e.g. supervisory visits, supply-chain management) for the benefit of all facilities in a district (including our controls). This might have contaminated our measures on service effectiveness in the control arm underestimating the overall RBF4MNH effects observed. Lastly, the content-of-care approach underlying our composite score does not account for wider aspects of healthcare quality, such as patient satisfaction or actual mortality reduction [11]. We therefore cannot make any assumptions of the RBF4MNH’s effect on elements of effectiveness other than clinical care quality.

Conclusion

Despite these limitations, our study was able to demonstrate that the RBF4MNH program improved effective coverage of pregnant women by improving service quality and allowing a larger proportion of women to receive more effective care in the context of Malawi. We consider our findings as evidence that RBF programs have the potential to address effective coverage and thus can play a role in LMIC’s progression towards UHC. To gain further understanding on the true potential of RBF schemes on both service and population coverage, we therefore suggest a wider use of effective coverage measures in the evaluation of RBF programs, especially in LMICs.