Accreditation as a quality-improving policy tool: family planning, maternal health, and child health in Egypt

Accreditation of healthcare providers has been established in many high-income countries and some low- and middle-income countries as a tool to improve the quality of health care. However, the available evidence on the effectiveness of this approach is limited and of questionable quality, especially in low- and middle-income countries. We exploit the interventions introduced under Egypt’s health sector reform program between 2000 and 2014 to estimate the effect of health facility accreditation on family planning, maternal health, and child health outcomes. We use difference-in-differences fixed-effects and propensity score matching difference-in-differences models. To do so, we spatially link women to their nearest mapped health facilities using their global positioning system coordinates. We find that accreditation had multiple positive effects, especially on delivery care and child morbidity prevalence. The effects appear to weaken over time though. Our findings suggest that facility accreditation can be effective in improving family planning, antenatal care, delivery care, and child health, but stress the need to study how the effects can be sustained.


Introduction
In 1997, Egypt launched a comprehensive Health Sector Reform Program (HSRP) to address fundamental challenges in the healthcare system. The HSRP introduced an integrated package of service delivery and financing interventions to address the means by which primary health care (PHC) is financed, delivered, organized, and managed [1,2]. One of the cornerstones of the HSRP was the facility accreditation program, which was defined by the country's Ministry of Health and Population (MOHP) as a process for evaluating PHC facilities according to a set of standards that define activities and structures that directly contribute to improved patient outcomes. The main aim of the program was to provide the HSRP with a framework for continuous quality improvement [3,4].
Accreditation of healthcare providers has been established in many high-income countries and some low-and middle-income countries (LMICs) as an approach to improve the quality of care that combines the two elements of quality assurance and quality improvement [5,6]. There is consistent evidence that accreditation programs improve the process of care in all settings, especially those related to the quality of healthcare services [7][8][9][10][11]. However, there is limited evidence on the effectiveness of accreditation on patient-related outcomes in all settings [12][13][14]. One interesting study on patient outcomes in a high-income country is Falstie-Jensen et al. [15], who report a statistically significant association between persistent low compliance with accreditation and higher 30-day mortality and longer length of stay at public Danish hospitals. However, Lam et al. [16] report that U.S. hospital accreditation is not associated with lower mortality and is only statistically weakly associated 1 3 with lower re-admission rates for some medical conditions. Similarly, accreditation of public health hospitals in Saudi Arabia is found to have no significant effect on mortality rates but significant positive effects on infection and length of stay [17].
Although a number of studies explored the effect of accreditation on the quality of care in LMICs (e.g., [18]), we could only identify one study on the effect of quality improvement through accreditation on patient outcomes [19]. This study concludes that accreditation of Chinese chest pain centers was associated with better in-hospital outcomes for acute myocardial infarction patients. Since accreditation usually entails significant costs, investigating its effectiveness is crucial, especially in settings where resources are extremely constrained. Moreover, in light of evidence on the non-monotonic effect of access to health care on gender inequality [20], it is particularly important to estimate the effect of having access to an accredited facility on disadvantaged groups, such as women and children.
We attempt to fill the gap in the literature by investigating the effect of accreditation of PHC units on the quality of care and patient outcomes in Egypt, a middle-income country. The paper exploits the quasi-natural experiment associated with the introduction of Egypt's facility accreditation program to estimate the effect of having access to an accredited facility on a set of family planning, maternal health, and child health outcomes. To do this, differencein-differences (DiD) fixed-effects models are used for the period 1992-2014. DiD models are also combined with propensity score matching (PSM) for the specific periods of 2000-2005, 2005-2008, and 2008-2014. The paper is organized as follows: section "Background" provides some background; section "Data" describes the exploited data; section "Estimation methods" explains our empirical strategy; section "Results" presents some descriptive statistics and estimation results; section "Discussion" discusses the paper's findings; and section "Conclusion" concludes. Appendix contains descriptive statistics, robustness checks, and extensions.

The facility accreditation program
In 1997, the Government of Egypt (GOE) launched the HSRP, a new PHC strategy to reform the health system in phases over a period of 15-20 years. The program came into operation in 2000 [2]. The simultaneous implementation of the program across the country was deemed infeasible due to pre-existent constraints in the healthcare system and the complex nature of interventions to be introduced. Hence, GOE decided to implement the program over phases. The targeting took place at the district level in the participating governorates, whose master plans relied on a social vulnerability index to target districts of the most vulnerable populations [1]. Early entrants to the program included a group of PHC facilities in the governorates of Alexandria, Menoufia, and Sohag, which represent, respectively, urban governorates, Lower Egypt, and Upper Egypt. The three pilot governorates represent the three major regions in Egypt. Other governorates followed subsequently [1].
One of the key marketing points of the HSRP service provision reform of PHC was that it would improve access to quality care. To ensure this, the facility accreditation program was established by MOHP as a process to monitor and facilitate the quality of services and influence the behavior of healthcare providers. Thus, as part of the HSRP, accreditation became obligatory to all PHC facilities in the districts targeted by GOE to join the program. Technical assistance was provided to develop improvement plans [3].
In the preparatory phase of the facility accreditation program, a PHC facility needed to meet the following criteria: (i) had a process to monitor, evaluate, and improve quality of care; (ii) had a patient record system; (iii) provided a defined package of services including reproductive health obstetrics and gynecology, neonatal care, pediatric and adult medical care, basic emergency care, preventive health services, and ambulatory care; and (iv) had been in operation for at least six months, had appropriate license by MOHP and relevant medical union, and operated within the laws and regulations [21].
The survey was a key step in the facility accreditation program. A site visit to PHC facilities was conducted by a team of experts trained in accreditation using pre-set accreditation survey instruments and tools. The purpose of the accreditation survey was to evaluate the extent to which a facility complied with the nationally established accreditation standards, and accordingly, determine whether it was to be awarded or denied accreditation. The assessment initially included eight categories: patient rights, patient care, safety, management of support services, management of information, quality improvement program, family practice, and management of the facility. Optimal standards in each category focused on key processes, activities, or outcomes that facilities should achieve [2,3,22].
During the accreditation survey, trained surveyors used three approaches to collect data and measure compliance with the established standards: review of specific administrative and clinical records; observe the performance of specified tasks in particular areas; and conduct personal interviews. If a facility scored 80% or above in the total survey scores, it was granted full accreditation for a period of 2 years. If a facility scored between 50 and 79% of the total survey score, it was granted provisional accreditation for one year, after which a re-assessment survey is conducted to investigate if the deficits pointed out by the first round have been addressed. If a facility scored less than 50%, accreditation was denied [22]. The scoring by areas is presented in Tables 3 and 4 in Appendix A: the facility accreditation program.
In terms of contribution to the total score, the quality dimension patient care contributed the most. This dimension measured the extent to which patients received appropriate care, and focused on compliance with clinical practice guidelines and appropriate diagnosis [22]. Details of the patient care assessment process are included in Appendix A: the facility accreditation program (see Box A.1).

Anticipated effect of the facility accreditation program
One important characteristic of healthcare markets is the presence of asymmetric information [23]. It is well known that healthcare providers may act as 'imperfect' agents of patients and over or under provide care or fail to deliver the adequate health care quality, which can become a health concern. Different interventions, including accreditation of providers, have evolved in response to these problems. By subjecting healthcare providers to a formal process that makes them meet pre-determined standards, accreditation is expected to minimize variations in medical practice, eliminate medically inappropriate care, control costs, and address the possibility that quality is underprovided [24,25]. Recent studies suggest that accreditation can influence quality of care through three mechanisms: coherence, organizational buy-in, and collective quality improvement action [26], with possible effects on patient-related outcomes. Earlier studies suggest that the accreditation process can stimulate organizational changes that enhance the quality of care (e.g., [27]).
In the context of this study, accreditation is expected to have a direct effect on some maternal health, child health, and family planning outcomes, through improving the quantity and quality of pertinent health services provided, and an indirect effect on other outcomes. Accreditation of health facilities certifies high compliance with standards defining activities and structures that directly contribute to improved patient outcomes. Hence, within the quality dimension patient care, accreditation standards established to measure compliance of facilities in the sub-areas of antenatal care (ANC), integrated management of child illnesses (IMCI), and family planning are expected to have a primary effect on ANC coverage (number of visits), quality of ANC (being informed of signs of pregnancy complications, weight measurement, blood pressure measurement, and urine sample collection), child morbidity prevalence (acute respiratory infection (ARI), fever, and diarrhea), and informed choice of contraceptive methods (knowledge of side effects of contraceptive method used and knowledge of other methods of contraception that could be used). These outcomes reflect some of the standards assessed during the accreditation survey (see Appendix A: the facility accreditation program, Box A.1). We expect improvements in these outcomes in accredited compared to non-accredited facilities.
In parallel, accreditation is expected to have a secondary effect on the utilization of antenatal and delivery care services. Quality improvement in accredited facilities introduces an incentive for individuals to seek care at these facilities. The effect of this incentive is expected to be more significant in the sub-areas of care included in the assessment of the accreditation survey. Thus, we expect having access to an accredited facility to be associated with higher ANC coverage, higher institutional birth-delivery, and higher skilled assistance during delivery. This expectation holds given that accredited facilities were not functioning at full capacity prior to accreditation and can increase supply in the short term.

Evidence on the effect of accreditation
There exists a large body of literature on the effects of accreditation as a quality signaling device for firms (for instance, a good summary on firm behavior and accreditation can be found in Dranove and Jin [28]). However, less is known about the impact of accreditation on healthcare provision and about patient health outcomes.
The majority of studies on accreditation in health care in LMICs report on its positive effects on process indicators. A recent study by Terra and Berssaneti [11] provides evidence that accreditation promoted healthcare processes in Brazilian hospitals, thus strengthening the culture for healthcare quality and increasing patients' satisfaction. Similarly, in Brazil, Saut et al. [10] indicates that accreditation mainly influenced the internal processes of healthcare organizations. Saadati et al. [29] show that the accreditation of an Iranian teaching hospital improved patient-centeredness, patient safety, logistics and managerial processes, as well as decision-making.
Several studies in LMICs focus on accreditation's effect on compliance with quality standards [30][31][32][33]. A quasiexperimental study in Egypt found that accredited non-governmental health units had higher compliance with quality standards compared with non-accredited units [30]. Another study in Egypt on public clinics found that providers that had earned an accreditive Gold Star were more likely to adhere to higher quality practices in counseling and examination than non-Gold Star facilities [32]. According to a descriptive study in Zambia, a national hospital accreditation program was associated with significant improvement in compliance of accredited hospitals with standards in overall scores and in seven out of 13 functional areas [31]. In South Africa, Salmon et al. [33] used a randomized control trial to investigate the effect of an accreditation program on public hospitals and found that the processes' and outcomes' average compliance of accredited hospitals improved significantly, while no significant improvement was observed in non-accredited hospitals.
Besides compliance with standards, the majority of the studies report on the effect of accreditation on quality of care measures. These are, for the most part, not patient health outcomes, but downstream process indicators [18,30,[32][33][34][35]. Unlike compliance with standards, there is no conclusive evidence on the effect of accreditation on quality of care. While Avia and Hariyati [8], El-Jardali et al. [34], Hong et al. [32], Quimbo et al. [35], and Reisi et al. [18] report positive effects of accreditation on different indicators of quality of care, studies employing more robust study designs report mixed effects. These are Salmon et al. [33] and Al Tehewy et al. [30], which used a randomized controlled trial and a quasi-experimental design, respectively. In a study based on data from hospitals in South Africa, Salmon et al. [33] found limited or no effect of a randomized accreditation program on quality measures apart from increases in perception of quality among nurses. In Egypt, Al Tehewy et al. [30] found a positive effect of accreditation of non-governmental health units on patient satisfaction with respect to all areas of health service (cleanliness, waiting area, waiting time, and staff performance). As for provider satisfaction, the study found a positive effect on the overall satisfaction score, but no significant difference in the mean satisfaction score between accredited and non-accredited units with respect to social environment, administrative environment, and family health model.
In conclusion, the available evidence on the effectiveness of quality improvement through accreditation of healthcare providers is limited and of questionable quality but suggests that accreditation can improve the process of care provided by different providers. However, evidence is limited on the effectiveness of accreditation on patient outcomes. One study found that accreditation of Chinese chest pain centers was associated with better in-hospital outcomes for acute myocardial infarction patients [19]. We could not identify any study on the effect of accreditation on key patient outcomes such as family planning, maternal health, and child health outcomes.

Data
To answer our research questions, we exploit the Egypt Demographic Health Survey (DHS). This survey consists of two questionnaires: one for households (HHs) and the other for ever-married women (EMW), and has a consistent response rate of over 99% for all waves. 1 For this study in particular, we make use of all the relevant data made available by the Egypt DHS on family planning and maternal and child health. We expect these variables to be affected by the changes in quality standards, policies, and procedures, which is the focus of accreditation assessment. 2 To construct our dependent variables at the facility level, we collapse the responses of 97,990 women over the period 1992-2014 at the facility level, drawing from six DHS waves : 1992, 1995, 2000, 2005, 2008, and 2014. To do so, we use the Global Positioning System (GPS) coordinates of both interviewed women and health facilities to link each woman to the nearest mapped facility for each wave of the Egypt DHS. The aim is to identify women who live in the catchment area of accredited facilities (treatment group) and those in the catchment area of non-accredited facilities (control group). All eligible PHC facilities across Egypt are used during the joining process. We are confident that women do not bypass their closest PHC facility because they are obliged by MOHP to use the facility in their catchment area.
The gradual uptake of the facility accreditation program by health facilities provides a quasi-natural experiment. Thus, for our main explanatory variable reflecting reform status, we draw on facility-level data from MOHP to categorize facilities as treated (accredited, either fully or provisionally) and non-treated (non-accredited). To ensure that treatment reflects only accreditation, we remove from the sample accredited health facilities that were subject to additional interventions under the HSRP such as performance-based financing (PBF) and introducing user fees.

Dependent variables
Having linked each woman to a respective health facility, we are able calculate health outcomes at the facility level for each of the Egypt DHS waves and obtain a panel. By construction, our dependent variables are at the facilitylevel: informed choice of contraceptive methods ('family planning'), ANC, delivery care, and child morbidity prevalence, which we describe in detail below.

Family planning
As part of the family planning sub-area of the patient quality of care dimension, the accreditation surveyor checks if the facility has a good information/education/communication (IEC) system. For this sub-area, we include two family planning outcomes capturing the effect of accreditation on choice of contraceptive methods by calculating the percentage of current users of selected contraceptive methods who were informed of the side effect of or problems associated to the method used. 3 Those receiving information on the efficacy and side effect of contraceptives used tend to have higher continuation rates than those who do not [36]. We also calculate the percentage of current users of selected contraceptive methods who were informed of other methods of contraception that could be used. 4 Informed choice emphasizes that women choose the method that best satisfies their personal and reproductive health needs based on a thorough understanding of other methods of contraception they could use.

ANC
As part of the ANC sub-area of the patient quality of care dimension, the surveyor checks if physical examination is performed for all patients. We include six ANC outcomes that capture the effect of accreditation on the quality of ANC. We calculate the percentages of mothers who received the following components of ANC: being informed of signs of pregnancy complications, weight measurement, blood pressure measurement, and urine sample collection. 5 As part of the accreditation survey, the surveyor also checks if the number of ANC visits falls within the clinical guidelines. Therefore, we calculate an outcome of ANC coverage indicator (at least four visits). 6 This outcome is used as a global preferred outcome of access to and use of health care during pregnancy to track performance in maternal health programs. 7 A pregnant woman is expected to receive health interventions during antenatal visits that can be vital to her health and the health of her infant as well.

Delivery care
As part of the ANC sub-area of the patient quality of care dimension assessed by the accreditation survey, the surveyor assesses patient's knowledge and understanding of delivery services provided in the facility. Egypt DHS data allows us to calculate two delivery care outcomes to capture the effect of compliance with the accreditation standards in this regard: institutional delivery 8 and skilled assistance during delivery. 9 The two outcomes are widely advocated for reducing maternal, perinatal, and neonatal mortality. Institutional delivery captures the effect of accreditation on expanding access to childbirth facilities and, more importantly, is a proxy measure of maternal and neonatal morbidity and mortality. 10 The second but most important measure of delivery care included in our analyses is skilled assistance during delivery. Empirical literature provides evidence that wider access to professional care during pregnancy and childbirth reduces maternal mortality. Women assisted by skilled health personnel during delivery are less likely to die from any cause related to or aggravated by childbirth [37].

Child morbidity prevalence
As part of the IMCI sub-area of the patient care dimension of quality assessed by the accreditation survey, the surveyor checks if child is checked for cough, diarrhea, sore throat, ear infection, and fever. We use the prevalence of childhood ARI, fever, and diarrhea from the Egypt DHS data as three outcomes reflecting morbidity prevalence. 11 3 To do so, we divide the number of women starting a new method of contraception within the 5 years preceding the survey having been informed of its side effects/problems by the number of all women who started a new method of contraception within the same period. 4 We divide the number of women starting a new method of contraception within the 5 years preceding the survey having been informed about other available contraceptives prior to start by the number of women who started the use of new modern contraceptive method within the same period. 5 An outcome of an ANC component, for instance being informed of signs of pregnancy complications, is calculated as the number of women who received ANC for their last birth within the 5 years preceding the survey who were informed of signs of pregnancy complications divided by the number of women with a birth within the same period who received ANC for their last birth. 6 We divide the numbers of women who received at least four ANC visits for their last birth within the last 5 years preceding the survey by the number of women with a birth within the same period. 7 The World Health Organization recommends that a woman receives at least four antenatal visits during a normal pregnancy to ensure that antenatal complications are detected and controlled at the earliest stage. 8 This is calculated by dividing the number of live births in an institutional setting within the 5 years preceding the survey by the number of all live births within the same period. 9 This is calculated by dividing the number of live births assisted by medical provider (doctor or nurse/midwife) during delivery within the 5 years preceding the survey by the number of live births within the same period. 10 Women who give birth at a health facility are more likely to receive proper medical attention and care during delivery, as do their infants. 11 These outcomes are calculated by dividing the number of children ill with a cough accompanied by short and rapid breathing (for ARI), or ill with a fever (for fever), or ill with diarrhea (for diarrhea) at any 1 3 ARI is the leading infectious cause of death in children worldwide. 12 Diarrheal diseases are the second leading cause of death in children under age five [38]. The risk of under-five mortality can be diminished substantially through reducing the prevalence of ARI and diarrheal diseases and encouraging women to seek treatment for their children at a health facility and/or from a healthcare provider. 13

Explanatory variables
The explanatory variables included in the analyses of this study are a treatment variable that reflects participation in the facility accreditation program, facility-level controls, district-level social and economic controls, and regional dummies to control for regional variation.
In particular, the facility characteristics include labor force, the facility's building condition, and population coverage. For labor force, we incorporate the number of eight types of workers in a health facility: practitioners, specialists, pharmacists, nurses, lab technicians, X-ray technicians, health observers, and social workers. For building infrastructure, a dummy variable that describes the condition of a facility building as 'bad,' 'average,' or 'good' is included. As for population, we control for the size of population in the facility catchment area. This information is obtained from Egypt's MOHP.
At the district level, we use Egypt's 2006 Population and Housing Census to construct eight district-level social and economic controls: illiteracy ratio, unemployment ratio, income dependency ratio, inaccessibility to electricity, inaccessibility to potable water, average family size, HH crowding factor, and population size. In addition, regional dummies are defined for fully urban governorates, urban Lower Egypt, rural Lower Egypt, urban Upper Egypt, rural Upper Egypt, and frontier governorates. These district-level covariates control for both the selection criteria of the HSRP targeting and the demographic variation across districts. The regional targeting of the HSRP followed a socio-economic vulnerability index that was constructed from the eight social and economic indicators outlined earlier.

Difference-in-differences
As the accreditation policy is staggered, we first follow Wooldridge [39] and use a general DiD fixed-effects model to estimate the effect of accreditation by comparing the health outcomes of accredited facilities (treatment group) to that of non-accredited facilities (control group) before and after accreditation (treatment) between 1992 and 2014.
Accredited and non-accredited facilities presumably differ in observed characteristics, such as labor force, and unobserved ones too, such as managerial ability. The DiD method controls for both observed and unobserved characteristics that are time invariant. Fixed effects further eliminate any confounding that might be caused by facility effects, whether observed or unobserved, which are constant over time within each facility. With regard to time-varying unobservable factors, we report in Appendix C: checks and robustness extensions the results of the parallel-trends test, which provide evidence of the absence of unobserved timevarying confounding, establishing the unbiasedness of our DiD estimates.
Treatment self-selection is not a concern in our context as treatment, i.e., the decision of whether or not to obtain accreditation is exogenous. Accreditation is not a function of some choice of the treated unit, but is rather a function of a policy that differentially affects units based on pre-determined characteristics. As noted earlier, the facility accreditation program is only rolled out in certain geographical areas. Hence, whether one facility can obtain accreditation or not is neither self-selected nor screened.
For each health facility i at time t , we estimate the following DiD fixed-effects model: y it denotes a health outcome of interest y for facility i at time t . We include outcomes of informed choice of contraceptive methods, ANC, delivery care, and child morbidity prevalence drawing from secondary data from Egypt DHS. The variable policy it equals 1 if facility i is subject to accreditation in year t . year t is a time-period dummy. i is the unobserved facility effect.
Additionally, to compare between the effects of accreditation for the specific periods time during the two weeks preceding the interview, by the number of all children under five years of age. 12 Mortality due to ARI accounted for 16% globally of the total deaths among under-five children in 2015. 13 We are not able to calculate indicators of child morbidity treatment as observations in our sample are not statistically sufficient. an indicator variable that takes value 1 if facility i is accredited and 0 if not; d post is an indicator variable for the followup year; the interaction term acc it × d post measures the effect of accreditation in the follow-up year; and , our main coefficient of interest, captures the effect of accreditation on the outcome at the facility level.
Finally, to eliminate potential unobserved heterogeneity and account for possible differences between accredited and non-accredited facilities prior to accreditation, the vector fac i contains facility-level controls that reflect different characteristics of facility i including labor force, the facility's building condition, and population coverage; and dist i is a vector of district-level controls including social, economic, and demographic characteristics of the district in which facility i is located.

Propensity score matching difference-in-differences
The targeting of the reform interventions at the district level under the HSRP followed a socio-economic vulnerability index of the areas around health facilities. As such, the comparison of health outcomes without accounting for this would be biased. To ensure that no bias exists due to targeting, we combine DiD with the PSM approach. 14 Matching on observables mitigates the potential bias by pairing accredited and non-accredited health facilities based on preaccreditation observable characteristics, which were initially used by GOE for accreditation targeting. Additionally, as a stand-alone method, DiD can be used to identify treatment effects if there is a selection based on (time-invariant) nonobservables. Thus, while conventional PSM cannot account for non-observables, combining matching with DiD enables us to account for both the effect of observable and unobservable heterogeneity that is constant over time, as well as the targeting policy. To minimize any potential bias due to time-varying unobservable factors, we also control for an extensive set of facility-level characteristics and population coverage of facilities. Reassuringly, we generally find no significant differences in outcomes or characteristics between the population covered by treated and non-treated facilities (see Appendix C: checks and robustness extensions, Table 12). Table 12 suggests that our matching strongly satisfies the requirement of conditional independence.
To obtain the PSM DiD estimates, we follow Rosenbaum and Rubin [40]. We first apply PSM to match facilities and then extend the conventional DiD estimates by defining health outcomes conditional on propensity scores and applying semi-parametric methods to construct the differences. First, we match treated and control health facilities based on pre-treatment observable characteristics and use Kernel functions to assign weight to the jth control facility matched to the ith treated facility. 15 As such, in our context, the propensity score is the probability of being targeted by the HSRP intervention given a set of observable social and economic indicators used to construct the socio-economic vulnerability index. Second, we estimate a DiD specification in Eq. (2) with health outcomes defined conditional on the propensity score generated earlier. The Kernel PSM DiD estimate for each treated facility i is calculated as Prior to the DiD estimation, we verify that the common support assumption is satisfied by checking the overlap between treatment (accredited facilities) and control (nonaccredited facilities) groups (see Appendix C: checks and robustness extensions, Figs. 1, 2, 3). Once the matching is applied, we use two-sample t-tests to examine if there are significant differences in the means of observable characteristics for both groups [40]. While we use district-level social and economic indicators to estimate the propensity score, facility-level characteristics are used as additional covariates later in the DiD estimations. For each of our health outcomes, we report the results for three study periods

Descriptive statistics
14 The mixture of PSM DiD was first proposed by Heckman et al. [49]. PSM DiD estimates are superior to the conventional DiD estimates as no functional form restrictions are imposed when estimating the conditional expectation of the outcome variable using PSM DiD. 15 Using weights: , where a is the bandwidth parameter; K is the Kernel function and P i and P j are the propensity scores for treated and control facilities. various ANC components. Women were more likely to be weighed (82%) and get their blood pressure measured (82%) during ANC visits but far less likely to be informed of signs of pregnancy complications (34%). In parallel, Egypt has a fairly high level of delivery care coverage through access to health facilities and skilled health personnel. On average, over 60% of women delivered their most recent birth in an institutional setting and over 70% of births were assisted by skilled health personnel. The country had a higher level of prevalence of childhood fever (23%) than ARI (14%) and diarrhea (13%).
The descriptive statistics of district characteristics of facilities based on which targeting took place are also reported in Table 8 in Appendix B: data sources and descriptive statistics. The table highlights the difference in the district characteristics between accredited and non-accredited facilities. We use the two-sample t-test to check whether the means of the two groups differ significantly. On average, districts to which accredited facilities belong have significantly higher HH overcrowding during the period 2000-2005; significantly lower illiteracy, income dependency, inaccessibility to electricity, and inaccessibility to potable water; smaller family size; and bigger population size during the period 2005-2008; and significantly lower income dependency, inaccessibility to electricity, and inaccessibility to potable water, smaller family size, and lower HH overcrowding during the period 2008-2014. These results suggest that the actual targeting of the HSRP did not strictly follow the socio-economic vulnerability index.   Table 2.

Estimated effects of accreditation
Using the pooled sample covering the period of 1992-2014, Table 1 shows that accreditation did not have a significant positive effect on all health outcomes except the prevalence of childhood diarrhea. This unexpected finding invited us to disentangle the observed effects of the program from each time period.
For the study comparing the years 2000 and 2005, Table 2 provides evidence that having access to an accredited facility was associated with higher likelihood of being informed of the side effects of contraceptives: the proportion of women who were informed of the side effects of the contraceptives used increased significantly by 16 percentage points (ppts) among those with access to accredited facilities, compared to women with access to non-accredited facilities. This positive effect disappeared in the subsequent periods. The proportion of women with access to accredited facilities, who were informed of other methods of contraception that could be used, increased by 7 ppts in 2005, but this effect was counterbalanced by a decrease of 9 ppts in 2008, again compared to women with access to non-accredited facilities. The effect also disappeared during the period 2008-2014.
With respect to ANC, Table 2 shows that accreditation had a limited positive effect on ANC during the period 2000-2005, specifically on being informed of signs of pregnancy complications (6 ppts); this positive effect is again observed in the period 2005-2008, but vanishes in 2008-2014. The proportion of women with access to accredited facilities, who had 4+ ANC visits or had a weight measurement during ANC visits, slightly increased significantly by 5 ppts and 3 ppts (respectively) in 2005-2008, compared to women with access to non-accredited facilities. We also observe significant negative effects of accreditation on blood pressure measurement during ANC visits in the period 2000-2005, with no significant effect in the following periods.
In terms of delivery care, institutional delivery and skilled assistance during delivery increased by more than 7 ppts and 11 ppts, respectively, among women with access to accredited facilities in the first period of analysis. Nevertheless, the estimates of both outcomes are statistically insignificant during the periods 2005-2008 and 2008-2014. In parallel, we observe that accreditation had multiple significant positive effects on child morbidity prevalence during the period 2000-2005: accreditation reduced the prevalence of childhood ARI, childhood fever, and childhood diarrhea among children with access to accredited facilities by about 10 ppts, 10 ppts, and 4 ppts, respectively, compared to children with access to non-accredited facilities. We also observe a significant positive effect of accreditation on child morbidity prevalence later during the period 2008-2014, but we do not observe any significant effects on all child morbidity prevalence outcomes during the period 2005-2008.

Robustness tests
We test the robustness and plausibility of our results by running several alternative checks which are discussed in Appendix C: checks and robustness extensions. Mainly, we test the parallel-trends requirement for the acceptable application of DiD; we run placebo models; we verify the common support requirement for the feasibility of the matching; we provide tests on the quality of the matching, and do several sensitivity analyses on the matching method. Table 1 Difference-in-differences fixed-effects estimates of the effects of accreditation, 1992-2014 Each column represents a separate specification, i.e., the dependent variable is specified in the second row. Dependent variables are expressed in percentages. The observations are health facilities. Clustered standard errors are reported in parentheses *, **, and *** denote statistical significance at the 10%, 5%, and 1% levels, respectively

Discussion
The findings of this paper suggest that the HSRP's accreditation process in Egypt was associated with significant improvements in child morbidity, family planning, delivery care, and-to a lesser extent-in ANC. World Bank [1] shows that the pace of implementation of the program decelerated over time. This is compounded by the extent to which facilities complied with reform rather than the rate by which facilities joined the HSRP. A plausible indicator of compliance is the accreditation score. In this regard, Grun and Ayala [41] find that although more facilities got accredited, accreditation scores were increasing until 2004 but decreased after that. Further, accreditation compliance varied across governorates. Supporting our hypothesis, using compound quality indices, El-Zanaty and Associates et al. [42] demonstrate that the quality of services delivered at accredited facilities improved gradually during the first four years post reform, after which it decreased and hit its lowest level after 9 years. A high level of political Each row represents a separate regression. Dependent variables are expressed in percentages. The observations are health facilities. District-level social and economic indicators as well as regional dummies are included as controls in all estimations. Clustered standard errors are reported in parentheses *, **, and *** denote statistical significance at the 10%, 5%, and 1% levels, respectively Our results can also be explained by the findings of Braithwaite et al. [44], who compared accreditation programs in LMICs with those in high-income countries and concluded that, regardless of country context, the sustainability of those programs is determined by continued policy support from government, stable program funding, assorted incentives promoting facilities' participation in accreditation, and constant improvement in accreditation agency operations and program delivery. In Egypt, the HSRP has suffered from a number of institutional sustainability issues. Most critically, political support has weakened in the absence of a robust analysis of the effectiveness and rationale of the interventions proposed under the program. The relationships between parent (existing) organizations and entities newly created under the program have not been clearly delineated by the program as well [1]. On the financial front, while the initial investment costs of the HSRP were substantial, a significant fraction of these costs was financed through donor funding. The ability of the HSRP to generate its own revenues from different sources was constrained, making the program financially unsustainable in the long run [1]. In this regard, Mansour et al. [6] emphasize the lack of financial resources as a major challenge to the implementation and sustainability of accreditation programs in limited-resource settings.
The study has a number of limitations. The most notable arises from the fact that Egypt's DHS does not track the women and children over time. To overcome this, we collapsed the data of each wave at the facility level, estimated the effects of accreditation at the facility, and constructed a longitudinal dataset. Also, our analysis on child health is limited to morbidity prevalence of common early childhood illnesses instead of treatment received because the data of Egypt DHS is patchy and incomplete on treatments. Further, being a survey, some of our calculations rely on self-reported information (number of ANC visits, information on contraceptives received, etc.), which also introduces some noise. Related to this, although the response rate is very high for each wave, there are some outcome variables with a notable higher level of missing values (contraceptive information, for instance) and we are obliged to rely on the responses available. Lastly, a small amount of randomness may be due to treatment and controls groups being defined assuming that women follow the rules and attend their nearest PHC facility, but we trust this disturbance to be very small or inexistent due to the legal obligation to do so.

Conclusion
This paper contributes to the existing literature by investigating the effect of accreditation as a policy tool to improve quality on key patient outcomes rather than downstream process indicators, as used by most of the related literature.
We exploit six waves of the Egypt DHS to investigate the effect of HSRP's facility accreditation programme in Egypt between 2000 and 2014 on family planning, ANC, delivery care, and child morbidity prevalence. To be able to measure the effect, we first spatially link women to their nearest mapped health facilities using their GPS coordinates. We then use DiD fixed-effects models and also combine DiD with Kernel PSM to correct our estimates from the potential endogeneity biases.
The results indicate that accreditation had multiple positive effects, especially on delivery care and child morbidity prevalence. The positive effects of accreditation appear to weaken over time though. Accreditation alone is not sufficient to sustain high quality of care, especially with respect to delivery care.
These results emphasize that a high, continued, level of commitment, which is a reflection of strong political will, is indispensable for the success of quality improvement interventions in LMICs in the long run. Decentralization in no way diminishes the necessity of a high level of pledge from the central government (see Braithwaite et al. [44] and Mansour et al. [6] for a detailed discussion).
Our findings encourage an enquiry into which interventions, if combined with accreditation, are associated to improved patient outcomes. There is evidence that improvements can be achieved, for example, through combining accreditation with properly monitored and well-designed payment or incentive schemes [35].
Acknowledgements This work was sponsored by the Economic Research Forum (ERF) and has benefited from both financial and intellectual support from the ERF. The contents and recommendations do not necessarily reflect ERF's views. We are also grateful for the useful discussions with and feedback from Mylene Lagarde from the London School of Economics and Political Science and Agne Suziedelyte from City, University of London, as well as the anonymous referees, as they have helped us improve this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Appendix A: the facility accreditation program
See Appendix Tables 3, 4.

Box A.1: Quality dimension assessment for the different sub-areas
For the sub-area ANC, patient care focused on the quality of ANC at the facility, i.e., the surveyor assessed if a comprehensive history and physical examination was performed for all patients. The general physical examination should include weight measurement, height measurement, blood pressure measurement, and measurement of edema of lower limbs. The surveyor also assessed if the necessary diagnostic tests (laboratory and radiology) were performed on time to determine the diagnosis. These tests included but were not limited to blood analysis, complete urine analysis, and ultrasound according to clinical guidelines. In addition, the surveyor assessed that all treatment plans were appropriate according to clinical guidelines. For example, supplementation of iron and folic acid in first trimester was checked. The surveyor also judged the number of ANC visits according to clinical guidelines and if some educational messages were discussed with patient. For example, the physician should assist pregnant women have better knowledge and understanding of their immunization status (tetanus toxoid); the importance and the number of visits prior to delivery; alarming signs such as bleeding; and, the delivery services in the facility.
The focus of patient care in the sub-area IMCI was the wellbeing of children under five years of age. The surveyor assessed if a comprehensive history and physical examination was performed for all sick children according to age of child (checking for cough, diarrhea, sore throat, ear infection, and fever); if health providers explained to mothers disease The scoring criteria of the accreditation standards ranges from zero to three. Scores of zero, one, two, and three denote that an accreditation standard is not met, unacceptable (partially met), acceptable (partially met), and fully met, respectively. All the scores from each activity are added to get the aggregate for the accreditation standard. The average score for each standard is calculated by dividing the aggregate scores by the frequency of activities. The scores are weighed at the sub-area score level (level one) and the overall facility score level (level two) as shown in this table

Dimension total 12
Management of information Medical records 2 MIS***/reporting 1

Dimension total 3
Quality improvement program Quality improvement program 2

Dimension total 2
Family practice model Prevention and screening 3

Continuity of care 3
Referral 3

Dimension total 9
Management of the facility Human resource development 1  classification and treatment using clear and simple language; if diagnostic tests were appropriately referred when needed; and, finally, if the facility provided appropriate prevention and treatment to all sick children according to IMCI guidelines. The assessment of patient care in the sub-area family planning mainly focused on the provision and quality of counseling sessions, i.e., if a comprehensive history and physical examination was performed for all new women according to guidelines; if the facility had a good IEC system such as discussing all family planning methods and the different methods, mode of action, side effects, and costs of each.
Also, equipment had to follow international standards in accredited facilities. Thus, if needed, accreditation was accompanied by a series of interventions so that equipment would meet the expected quality standards and staff would be competent in addressing family health needs. Usually, this implied upgrading, renewing, or adding modern equipment such as sterilization ovens, delivery chairs, and dentist chairs in family health units and ensuring that there are ultrasounds and X-rays machines, and hematological and cytological labs in family health centers. To strengthen staff's competence, equipment interventions were accompanied by a comprehensive training package for facility staff. For physicians and nurses, the package focused on family health practice. For other non-medical specialists in facilities, such as pharmacists, lab technicians, and social workers, the package focused on subject-specific training. In addition, training was a means to introduce substantial administrative changes in facilities, such as reaching out to and rostering families, and keeping medical records electronically and in family folders.

Appendix B: data sources and descriptive statistics
See Appendix Tables 5, 6, 7, 8.  Table 6 Women's responsesraw data *Conditional on reporting using any contraceptive method. **Conditional on reporting at least one birth in the last five years. ***Conditional on reporting at least one child aged 5 or under in the HH

Parallel-trends check
The key identifying assumption of DiD is parallel trends in health outcomes of accredited and non-accredited health facilities in the absence of the facility accreditation program. We need to ensure that this assumption is not violated despite of two reasons. First, was accreditation targeted at health facilities already performing better (or worse) with respect to the health outcomes of interest? Second, the magnitude and even the sign of the DiD effect can be sensitive to the functional form if the outcomes' averages for accredited and non-accredited facilities are significantly different at the baseline. The validity of the DiD estimates depends on the treated and control units being similar at the baseline. In this section, we present a number of diagnostics we ran to assess the validity of the parallel-trends assumption.

Pre-treatment trends in health outcomes
We  [45], we regress the change in health outcomes in the period 1995-2000 (i.e., pre-treatment Table 9 Mean difference in health outcomes Each row represents a separate specification with the dependent variable specified in the second column. Dependent variables are expressed in percentages. The observations are health facilities. The covariates are the facility characteristics, district socio-economic indicators, and regional dummies. Standard errors are reported in parentheses *, **, and *** denote statistical significance at the 10%, 5%, and 1% levels, respectively  Each row represents a separate specification with the dependent variable specified in the second column. Dependent variables are expressed in percentages. The observations are health facilities. The covariates are the facility characteristics, district socio-economic indicators, and regional dummies. Standard errors are reported in parentheses *, **, and *** denote statistical significance at the 10%, 5%, and 1% levels, respectively   16 . Table 9 reports the mean changes in health outcomes between the 1995 and 2000 survey waves for facilities that are accredited  The table indicates that the 'treated' dummy is not statistically significant for all the reported health outcomes except skilled-assisted delivery in the period 2005-2008, however the estimated effect of accreditation on this particular outcome is already insignificant and negative for the period 2008-2014 (see Table 2). As the health outcomes of accredited and non-accredited facilities had moved in tandem before the facility accreditation program started, we are confident that outcomes would have continued to move in tandem in the post-intervention period. Thus, the test for pre-trends confirms that the DiD design is valid and that the reported DiD estimators are unbiased.

Placebo treatment
We also run a placebo test by defining a 'false' lagged accreditation intervention. If the functional form of the DiD set-up is properly specified, pre-accreditation estimations should yield null results. That is, the facility accreditation program should not have a significant effect on the   Table 2 only emerged after the introduction of the facility accreditation program, i.e., accreditation caused the effects observed rather than the other way around.

Placebo outcomes
Lastly, we identify some health outcomes that, theoretically, should be unaffected by the facility accreditation program, but might be indirectly. Examples of these outcomes are modern contraceptive prevalence, ANC by skilled health personnel, tetanus immunization during pregnancy, cesarean section (C-section) rates, and under-five child mortality. If the DiD design is valid, the facility accreditation program should not have any effect on the placebo health outcomes in any study period. We re-estimate the DiD model using these outcomes and report the results in Table 11. None of the placebo outcomes are statistically significant, which supports the validity of our DiD models.

Common support check
A requirement for matching to be feasible is the common support or overlap condition. Thus, we check the overlap in the distribution of observable characteristics between treatment (accredited facilities) and control (non-accredited facilities) groups by visually inspecting the densities of propensity scores of both groups. Figures 1, 2, 3 show that there is a large common support area or a sufficient overlap in propensity scores of accredited and non-accredited facilities to produce adequate matches for all study periods. This is expected because the number of non-accredited facilities is significantly larger than that of accredited facilities. This variation also explains why the calculated propensity scores do not exceed 0.8. In principle, if there are at least as many control units as there are treated units in the data, all the treated units can be matched, but when a small caliper is used (as in this case), the matching requires that almost all the propensity scores be less than 0.5 [46]. So, as the number of accredited versus non-accredited facilities increases in the second study period, the propensity scores increase. Plausibly, Fig. 2 shows that the control group has a higher maximum propensity score before matching, but not after matching. Figures 2, 3 also provide evidence that none of the groups has a higher maximum propensity score than the other after matching.

Quality of matching
To check the extent to which observable characteristics are balanced in the matched sample, we perform the balancing t-test with the weighed covariates. Specifically, we use the balancing two-sample t-test of the difference in means of covariates across matched samples of facilities. Our covariates of interest are the ones used earlier to match treated and control health facilities. The results of the t-test are reported in Table 12. As the table indicates, there are no systematic differences in general at the baseline in the means of observed characteristics between accredited and non-accredited facilities. That is, matching on the propensity score is successful.

Sensitivity of results
We further inspect the sensitivity of our results to the type of the Kernel function, the bandwidth of the Kernel function, and the estimation method of the propensity score. To do the Kernel matching, we must first specify the type of the Kernel function. We initially use the Epanechnikov Kernel (the default type) to obtain our main results. In Tables 13,  14, 15, we compare the main results of the estimated effects reported to the results obtained based on other types of functions, specifically Gaussian, biweight, uniform, and tricube.
Overall, we find that our main estimation results are not sensitive to the type of the Kernel function.
To do the Kernel matching, we must also specify the bandwidth of the Kernel function. The choice of bandwidth implies a trade-off between bias and efficiency. On the one hand, a small bandwidth decreases the bias of estimates as we use the most similar observations to construct the counterfactual. The characteristics of these facilities are, in general, very similar. However, a small bandwidth decreases the efficiency of estimates as we ignore a lot of information from the sample. The fact that many control facilities are not used for the estimation implies an increase in the imprecision of estimates caused by a higher variance. On the other hand, a large bandwidth increases both the bias and efficiency of estimates. The bandwidth choice is, therefore, a compromise between a small variance and an unbiased estimate of the true density function. This choice is more important in practice than the choice of the type of the Kernel function (e.g., [47,48]. The default bandwidth of the Kernel function initially used to obtain our main results is 0.06. Alternative bandwidths are tried (bandwidths = 0.05 and 0.1). Table 16 shows our main results of the estimated effects using different bandwidths. We find that our main results are not sensitive in general to the bandwidth parameter.
The estimation of propensity scores depends on a parametric specification (commonly logit or probit), which affects the quality of matching and, consequently, the results. As for the benchmark we use a probit model, we test the results when we use a logit model instead and then re-run the PSM DiD models. The results of this exercise are reported in Table 17. We find that the estimates for both methods of estimation match for most outcomes. The previous robustness checks rule out an existing trend that could challenge the PSM DiD identifying assumptions. The robustness checks also provide evidence that our main estimation results are not sensitive in general to alternative types of the Kernel function, bandwidths of the Kernel function, and estimation methods of the propensity score.