National-level, ministry-led health information systems (HIS) are widely touted as a “foundation of public health,” [1] with available, reliable, timely, and valid data accepted as a prerequisite for decision-making and the provision of high-quality health services at all levels of the health care system. Published literature, however, is replete with studies detailing low quality of routine HIS data among many low-and middle-income countries (LMICs) [2-6]. In addition, failed attempts to use HIS data to monitor or evaluate the effects of health interventions or to conduct operational research are common [7-10].

Groups working in multiple LMICs have recently shown that rapid and effective methods for improving HIS data exist and have been tested [11]. In KwaZulu-Natal, South Africa, a seven-month data quality intervention consisting of three-day trainings, monthly data meetings, and data quality audits (DQAs) at health facilities increased data completeness from 26% to 64% and data accuracy from a correlation of 0.54 to 0.92 [12]. Interventions as simple as implementing quarterly data review workshops and fostering the use of HIS data for decision-making have resulted in improved data quality and coverage in diverse LMIC settings [13,14].

While case studies of short-term data quality interventions have been previously illustrated, no studies have quantitatively evaluated the relationship between health system factors and facility-level intervention effect heterogeneity over longer time periods. The objective of the present study is to measure the impact of a data quality intervention over three years and to identify factors associated with changes in HIS data concordance over time in Mozambique. Identifying these factors could improve the development and targeting of future interventions to improve HIS data in LMICs.


Study setting and data quality intervention

Funded through the Doris Duke Charitable Foundation’s African Health Initiative, the Mozambique Population Health Implementation and Training Partnership (PHIT) is a comprehensive public health system intervention focused in Sofala Province [15]. One key element of this intervention is to improve routine HIS data through continual assessment of the availability, consistency, and accuracy of HIS data. Beginning in 2010, annual DQAs have been conducted from a sample of 26 health facilities from all districts in Sofala Province. The study setting and profile of the 26 health facilities have been previously described [16]. In terms of the intervention, health facilities are publicly ranked by summary data concordance measures, and facilities with poor data quality receive additional supportive supervision and data training. Additional intervention components include: (1) district-level meetings bringing together front-line health workers and district/provincial managers for data feedback, performance gap identification, solution planning, and action plan monitoring; (2) the development and use of simple data dashboards for easy visualization of secular trends in key health indicators; (3) the development of simple human resource allocation optimization models; and (4) equipment purchase and maintenance. A full description of intervention components and an introduction to the Mozambican HIS have been previously published [15,17].

Variable definitions and statistical analyses

Outcome of interest

Our outcome summarizes the availability and reliability (concordance) between a gold standard data quality audit and routine HIS data across four key indicators (outpatient consults, institutional births, first antenatal care visits [ANC1], and third dose of diphtheria, pertussis, and tetanus vaccination [DPT3]) and five levels of health system data aggregation (daily facility paper registers, monthly paper facility reports, monthly paper district reports, monthly electronic district reports, and monthly electronic provincial reports). As has been used in similar studies [12,18], data were deemed concordant if they had less than a 10% error margin comparing the gold standard DQA and routine HIS numbers. Each month’s value was compared for all five levels of data aggregation and across the four key indicators listed above and then averaged. That is, perfect facility concordance would be 16/16, representing four indicators multiplied by four comparisons across the five levels all achieving <10% error. If data were unavailable, concordance was zero for that indicator/level combination. DQA data teams consisted of trained data collectors external to the Ministry health system supervised by a data expert. Data were double-entered and managed in an Excel database. If there were discrepancies in abstracted DQA data, data collectors would validate their measurements by re-counting registry entries with the help of the expert supervisor.

Predictors of interest

Predictors were selected based on previous research regarding facility-level predictors of stock-outs of essential health products [16] and the realities of data availability. These included: type of health facility; health facility burden measured in number of outpatient consults or ANC1 visits; number of inpatient beds; number of technical staff (doctors, nurses, assistants); number administrative staff; distance from central drug and equipment distribution center; rural/urban location; and number of health facility drug stock-outs where the drug was available at the district-level drug depository. The relationship between stock-outs and data quality was evaluated for 2011 and 2012 only due to limited stock-out data availability. Detailed methods regarding data collection for drug stock-outs and other key predictors have been previously published [16].

Analysis methods

Mixed-effects linear models were built in Stata 13 with 0-100% data concordance as our outcome of interest and α = 0.05 representing statistical significance using two-tailed tests. Our analysis plan included: (1) local regression across time and clinics to determine functional forms for variable parameterization; (2) crude analyses of data trends; and (3) analyses of each explanatory variable and its effect on data quality after accounting for the confounding effect of time using linear splines with yearly knots and random intercepts and slopes for clinics; and (4) fully-adjusted analyses controlling for time and simultaneous adjustment for all predictors. For all models, significance of group variables (health facility type, number of drug stock-outs) was determined by a chunk test prior to interpreting within-group associations. Analyses of residual plots indicated no significant lack of model fit at all steps.

Ethics statement

This study was approved by the Mozambican National Institutional Review Board. The University of Washington deemed this study exempt as it focused on program evaluation purposes and was not considered human subjects research under United States federal regulations.


Descriptive statistics, the basic profile of health facilities surveyed, and information about the study setting have been previously published [14,15]. The intraclass correlation coefficient was 0.26 (confidence interval [CI]: 0.14, 0.37). Baseline median concordance in 2009 was 56.3% and concordance increased to 87.5% by 2012 (Table 1). There was no significant change in concordance prior to the data quality intervention (2009–2010). Concordance improved significantly by an average of 1.04% per month (95% CI: 0.60, 1.49) from 2010–2011 and 1.56% (CI: 0.89, 2.22) from 2011–2012 while the DQA intervention was implemented across health facilities. Intervention activities continued in 2012–2013, but no significant increase in data quality was observed.

Table 1 Crude time trends in data concordance across 26 public-sector health clinics undergoing data quality intervention, 2009–2012, Sofala Province, Mozambique

Each 100-unit increase in first antenatal visits was associated with 3.3% higher (CI: 0.43, 6.2) data concordance, while each additional inpatient bed was associated with 0.94% (CI: −1.7, −0.20) lower data concordance (Table 2). Further, each additional technical staff at the health facility was associated with 0.71% higher (CI: 0.14, 1.3) data concordance.

Table 2 Health facility factors associated with data concordance across 26 public-sector health clinics undergoing data quality intervention 2009–2012, Sofala Province, Mozambique

The factor most strongly associated with concordance was the number of essential drugs stocked out at health facilities while the drug was available at the district headquarters. Compared to those clinics with no drug stock-outs, those with five drugs stocked out had 51.7% (CI: −64.8, −38.6) lower data concordance.


Similar to previous studies in sub-Saharan Africa [11-14], the present study found that an intervention consisting of data audits, equipment/supply purchase and maintenance, supportive supervision to low-performing clinics and feedback from district/provincial levels, data trainings, and district performance enhancement meetings focused on improving data use for decision-making can result in rapid improvements in data concordance in public-sector health facilities. Novel findings from our study in Mozambique are that: (1) improvements in data quality occur most significantly during the first two years and may hit a plateau of approximately 85-90% mean concordance; (2) improvements in data reliability can be sustained over multiple years given continued intervention activities; (3) higher numbers of human resources for health are associated with larger gains in data concordance; (4) facilities attending more antenatal care visits and those with fewer inpatient beds also show greater increases in concordance; and (5) stock-outs of essential medicines for primary health care provision are strongly associated with poor HIS data quality.

Our findings that data improvements were not related to determinants such as facility location (rural/urban, distance from district headquarters) and facility type are promising given that these more “static” infrastructure-related factors are difficult to modify in the short term. Given this, rapid and equitable data improvements appear possible even at rural peripheral health facilities that traditionally have the fewest health resources. These results support past evidence suggesting that management issues centered around motivation and value placed on the quality of routine data collection [14,19], as well as health worker numeracy and training [20], may be significant determinants of poor HIS data quality in LMICs. Our study builds on these previous findings by showing that, controlling for health facility location and type, interventions to improve data quality may be less effective at facilities with few human resources for health or large amounts of high-burden inpatient services. Further research should clarify how facility burden characteristics (number of ANC1 visits, outpatient visits) are related to data improvements because of our counterintuitive findings of a positive relationship between ANC1 visits and data concordance, but no corresponding association with outpatient visits.

Given that HIS data quality gains can be sustained over multiple years (allowing reliable data-driven decision-making), and that relatively simple data improvement interventions have been tested and shown effective in multiple LMIC settings, donors and governments should consider investments in DQAs and other interventions to improve routine data systems. These investments are especially important given recent analyses indicating potentially increasing subnational disparities in health statistics in LMICs [21] and the difficulty of traditional survey designs (Demographic and Health Surveys/Multiple Indicator Cluster Surveys) to provide health statistics below the provincial level [10,22]. Moreover, our findings further support the idea that quality HIS data are necessary for high-quality service provision, such as supply management of essential medicines and the forecasting of future supply needs to guard against stock-outs.

Our study has a number of limitations. First, without an adequate control group we cannot eliminate the possibility that all clinics in Mozambique are experiencing similar data improvements. Second, significant increases in data concordance do not necessarily mean that data validity has improved – a more difficult metric to evaluate. Third, the key indicators evaluated may not be representative of all HIS indicators essential for program planning and service provision. Last, the present study was conducted in one province of Mozambique and in a subset of clinics and therefore may not be representative of all public health clinics nationally.


We found that an intervention consisting of facility-based data audits, targeted training and supervision, equipment purchase/maintenance, and data audit and feedback meetings was associated with significant increases in public-sector HIS data concordance. Improvements were greater at health facilities with more human resources for health, more antenatal care visits, and fewer inpatient beds. Given the importance of available, reliable, timely, and valid data for decision-making and health care provision – such as effective management of essential medicines – donors and Ministries of Health should consider increased investments in improving HIS data quality. Future studies should aim to identify which data quality intervention components are most effective and to determine the sustainability of data quality interventions over the longer term.