Introduction

Skilled attendance at birth is recommended to reduce the over 270,000 maternal and three million neonatal deaths that occur annually, most in low-income countries [1]. Many factors influence the use of skilled delivery care, including characteristics of the mother and of the service environment [2]. In many rural high-mortality settings, geographic access to skilled delivery care is poor. While often neglected, the impact of geographic access to skilled care on choice of delivery place is of similar magnitude as that of education or wealth [3].

Defining access

Access to health care has been conceptualized and operationalized in many ways. Potential and realized access distinguish between stages in the use of care [4]: potential access refers to the availability of services in a geographic area, realized access to the actual use of services after barriers have been overcome. The term “spatial access” encompasses two of the formative five dimensions of access described by Penchansky and Thomas: accessibility, referring to the spatial relationship between location of supply and demand; and availability, referring to the adequacy of provider supply in relation to demand [5, 6]. Accessibility and availability are often combined into a single index to measure geographic coverage of care, ranging in complexity from a simple ratio of provider supply to patient demand, to more complex indices such as the two-step floating catchment area method [7, 8]. This paper addresses the first dimension, calculating accessibility as in travel impedance, which can be thought of as a measure of the “friction of distance” or the “cost of travel” between locations and expressed in distance or time [9].

Measures of accessibility in public health research

Distance or travel time between place of residence and a health service location are common measures of travel impedance used in public health research. The simplest approximation is straight-line or Euclidean distance between two points. Geographic information systems (GIS) can be used to model more realistic estimates of travel impedance, such as road network distance or travel time.

In principle, there are two types of data models that can be used to model the cost of travel in GIS separately or in combination: vector data models and raster data models [9]. Vector data models represent traversable paths between points along lines (termed edges) in a network, with anything not on the network being non-traversable ‘empty space’, whereas raster data models represent travel through a pixel (cell) grid, where all space in a defined area is included in the grid. The cost of travel in both model scenarios is determined by an impedance value assigned to each edge in vector models, or each cell, in raster models. Both models can incorporate travel along roads, and raster models usually include topographic features such as land-cover and void areas that cannot be traversed (e.g. lakes, legally restricted areas). Time or season may modify these impedance values, and space-time models can incorporate changes in topography and land-cover as well as population over time.

In practice, the appropriate impedance measure is largely determined by several issues: data availability, geographic context (topography i.e. water bodies and mountains) and cultural context (i.e. common modes of travel). Information on road networks and land-cover is publicly and commercially available in many high-income countries, making more sophisticated estimates of distance and travel time possible (e.g. United States Census Bureau, TIGER/Lines, ESRI). Freely available geographic information mapped by crowdsourcing, such as the OpenStreetMap, can also be used for routing in areas with high quality data [10]. Network models are appropriate in high-income settings like the United States because most travel occurs on roads, whereas in countries with limited infrastructure, travel does not always occur in vehicles or on roads. Additionally, road and land-cover data are not systematically available on a global level, and particularly scarce in low-income countries [11]. Sourcing input data is an important and sometimes difficult task, the quality of the estimates depends on the resolution, accuracy, currency and completeness of the data [12]. Data often come from multiple sources, and it is important to ensure that the different layers of information are temporally coherent, i.e. refer to the same time period. Researchers working in low-income countries often have to spend considerable time and effort to locate data, usually from multiple sources, or digitize road network maps themselves [13].

Previous studies

Previous comparisons of different measures of geographic accessibility have suggested that the use of Euclidean distance is a poor proxy of access, however, most comparisons have been in high-income countries, and many did not assess the methods against an outcome (Table 1). Two studies conducted in predominantly rural areas in LMICs concluded against Euclidean distance; one in favor of actual travelled distance measured with trackers [14], and one for raster travel time [15]. However, both studies were conducted in more mountainous regions of East Africa, and their findings might not hold true in other flatter settings.

Table 1 Studies comparing different methods of calculating travel impedance to health services

Study aim and objectives

By comparing different measures of travel impedance, we aim to determine if Euclidean distance can be used as a reasonable proxy for potential spatial access in LMIC settings, using Brong Ahafo region of Ghana as a case study.

This study has three objectives:

  1. 1)

    to investigate the effect of using different geospatial algorithms and data models (vector, raster) on measures of travel impedance (Euclidean distance, network distance, network travel time, raster travel time; Table 2) between population and delivery care,

  2. 2)

    to assess the potential spatial aggregation error associated with using average population location (village centroid) compared with individually geocoded location (compound of residence) on measures of travel impedance,

  3. 3)

    to compare the association that different proxies for spatial access to care show with facility delivery, i.e. whether or not women use a facility for delivery as a binary outcome variable, using surveillance data over a one-year period from the study area.

Table 2 Definitions of different impedance measures

Methods

Overview & data sources

Ghana is a West African country with a high maternal mortality ratio estimated at 328 per 100,000 in 2011 [26]. The study area consists of 7 contiguous districts with a population of more than 100,000 women of reproductive age (14-45 yrs), where demographic surveillance was established for several field trials [2729]. Travel occurs on roads, and mainly on foot to the closest health facility, as reported by approximately 58% of households in a 2003 national survey [30].

A geospatial database of the study area was created, mapping population location in compounds and village centroids, service locations for all health facilities offering delivery care (including higher level facilities with capacity for surgery), and a detailed road network (Figure 1). We included administrative boundaries and topography (land-cover, including water bodies) [24, 25]. We combined data sources in a workflow (Figure 2), and describe the fieldwork in more detail here:

Figure 1
figure 1

Study area showing topographic cover in Brong Ahafo region, Ghana. First inset shows study area in Ghana with administrative divisions. Second inset shows detail of example village with centroid, compounds, road network and a delivery facility.

Figure 2
figure 2

Workflow for geospatial analysis.

  1. 1.

    Road network

    A detailed road network of all roads in the study area was created using GPS trackers. The study area covers approximately 15,302 km2 and our road network includes over 1,900 km of roads. Extensive deskwork was done in order to transform these road tracks into a network dataset appropriate for analysis in ArcGIS, ensuring functional connectivity between roads. A tool was developed in a PostGIS geodatabase to validate the connectivity of the network roads, and the road network was subsequently cleaned in GRASS GIS [31]. The road network was then integrated into the land-cover raster layer for analysis using a 200 m resolution. Additional information on road condition, surface type, and usability in the rainy season was collected for all roads.

    Travel times by vehicle between village centroids were collected for one study district. A total of 88 journey segments were used in order to calibrate road speeds, which were assigned with reference to speeds used in the literature [32, 33]. Road speeds ranged from 30 km/h on dirt roads, to 90 km/h on good tarmac roads. Very few roads (four in the study area) were reported as impassable during the rainy season, so we model the dry season scenario only.

  2. 2.

    Health facility census

    We conducted a health facility assessment of all 86 geocoded health facilities in the study area to categorize facilities according to the availability and quality of maternal and newborn services: 64 facilities offered delivery services and 8 offered comprehensive emergency obstetric care (CEmOC), i.e. higher-level facilities with the capacity for cesarean section and blood transfusion [3436]. The majority of the hospitals, health centers, and clinics with delivery care are publically owned, and all maternity homes are operated privately by the Ghana Registered Midwives Association.

  3. 3.

    Surveillance

    Surveillance of all women of reproductive age in the study area through monthly visits was undertaken as part of health and demographic surveillance for several field studies [27, 28]. The surveillance included taking GPS coordinates of 433 village centroids and, in 173 larger villages, coordinates of 47,537 individual compounds (with a median of 450 compounds per village (IQR 258–844, max 3,204)).

    For the analysis of facility use (objective 3), we included villages and compounds where deliveries occurred in 2009 with known birthplace and compound coordinates, resulting in 169 villages, 8,120 compounds and 9,306 births. There was a median of 96 births per village (range 1–634), and a median of 1 birth per compound (range 1–8).

Distance measures

All six impedance measures were calculated to two levels of care, distance to closest facility with delivery care, and distance to closest facility with CEmOC (Table 2).

In ArcMap version 10.0, we used the Spatial Analyst tool “Near” to calculate Euclidean distances and the Network Analyst tool “Closest Facility Analyst” to calculate network distance and time (ESRI software, California). For the raster-based analyses, we used the cost surface algorithm in GRASS GIS to determine the fastest route (least-cost path) from starting points to given destinations [31, 37].

Analysis

To address objective 1, we used Spearman rank correlation coefficients to compare the six impedance measures within each origin destination pair (i.e. village centroid to closest delivery facility and village centroid to closest CEmOC in a dataset of all villages; compound to closest delivery facility and compound to closest CEmOC in a dataset of all compounds; Table 2).

We assessed potential spatial aggregation error (objective 2) in three ways using the surveillance dataset. First, we compared the correlation of the measures calculated from the two origins, and then whether the different measures identified the same facility as closest from both origins for each birth. Finally, we calculated distance deviance, the absolute difference in distance or time between measures starting from compounds compared to measures starting from village centroid for each birth. These absolute differences represent the potential error in access estimates that result from using average village centroids as opposed to individual compound coordinates, and are dependent on the dispersion of villages.

Spatial access to health care is known to be a facilitator of delivery in a health facility [3]. The impedance measure that is the best proxy of spatial access to delivery care, i.e. has the least measurement error, should then show the strongest association with facility delivery in a regression model (objective 3). We modeled the association of each impedance measure with whether or not a woman delivered in a facility as a binary outcome variable, in a logistic regression model for all births in the study area. For ease of comparison between measures with units in distance and time, we standardized our impedance measures to have a mean approximately equal to zero and standard deviation (SD) of one. In order to account for clustering of women by village, we used logistic regression models with robust standard errors. All analyses were done in Stata version 12.0.

Ethical considerations

This study uses data collected for the Newhints trial, which was approved by the ethics committees of the Ghana Health Service, Kintampo Health Research Center and the London School of Hygiene and Tropical Medicine (LSHTM) [28]. The additional analyses were approved in an amendment by the LSHTM ethics committee.

Results

Different measures of travel impedance

Median Euclidean distance from compounds to closest health facility was less than 1 km, and less than 10 km to the closest CEmOC facility (Table 3). Network distances were longer than Euclidean distances; partly because this measure includes distances to and from the road network (Table 3). Median mechanized network time (i.e. drive time) from compounds to closest facility was 13 minutes (0.22 hours), and to closest CEmOC was 23 minutes. Raster methods produced longer travel times than the network method; this is likely due to the use of the GlobCover topography map, which has higher impedance values (slower speeds) for off-road travel than the network model where we used a fixed speed of 2 km/h.

Table 3 Impedance measures from compound and village to closest facility using six methods

Median Euclidean distance from villages as origin to the closest delivery facility was 5.7 km; median distance to the closest delivery facility was less than 1 km from compounds (Table 3). However, the median of all villages should not be compared to the median of all compounds, as there are many more compounds than villages, and larger villages with many compounds are more likely to also have a health facility, leading to a shorter median distance and travel time for measures from compound. A fair comparison of measures from compound to measures from village is made in the surveillance dataset of births from the 169 villages with both coordinates (see below, spatial aggregation error).

Distances to the closest CEmOC facility were over 10 km longer and mechanized travel times 2–2.5 times longer than to the closest delivery facility. For instance, median network distance from village to closest CEmOC was 19.4 km compared to 7.3 km to closest delivery facility. Non-mechanized travel times (i.e. walking) were much longer to CEmOC facilities than to delivery facilities in general: walking from one’s compound along the road network to the closest delivery facility would take a median of 30 minutes, whereas walking to the closest CEmOC facility would take nearly 3 hours (Table 3).

With the exception of the mechanized measures, Spearman coefficients showed that distance and travel time measures were highly correlated with each other (r > 0.89, Tables 4 & 5). Correlation between travel times incorporating mechanized travel (i.e. driving) with the other measures was low; Euclidean distance and mechanized raster time from compound to closest delivery facility were the least correlated (r = 0.39). The highest correlation was between network distance and network walking time from village centroid to the closet CEmOC facility (r = 0.99). Correlations between measures showed a similar pattern from both origins (compounds and villages), with slightly higher correlations for the longer distances to CEmOC facilities.

Table 4 Spearman rank correlation coefficients (r) between different impedance measures and same health facility identified as closest using different impedance measures (%) for impedance measures calculated to closest delivery facility
Table 5 Spearman rank correlation coefficients (r) between different impedance measures and same health facility identified as closest using different impedance measures (%) for impedance measures calculated to closest CEmOC facility

Euclidean distance identified the same closest delivery facility as the other measures, except mechanized raster time, for about 80% of village centroids and about 90% of compounds (Table 4). The three network-based measures identified the same closest delivery facility for over 97% of the villages and compounds. Mechanized raster time differed most, identifying the same closest facility for the fewest villages and compounds as other methods.

Spatial aggregation error

We assessed the influence of spatial aggregation, i.e. how using village centroids (average compound location) differs from using individual population location (compound coordinates) when calculating distance and travel time to health facilities using the surveillance dataset (9,306 births). Correlation coefficients between the two options were high for most measures (r >0.82, Table 6). The same delivery facility was identified as closest in over 85% of births and the same CEmOC facility in over 97% of births (Table 6).

Table 6 Spearman rank correlation coefficients (r) and proportion of facilities identified as closest (%) between measures calculated from compound compared to measures calculated from village centroid, n = 9,306 births

The mean absolute distance deviance (comparing distances from compounds and villages) for the Euclidean measure was 250 m to closest delivery facility and 300 m to closest CEmOC facility (Table 7). There was a larger difference in network distance estimates, 380 m to closest delivery facility and 460 m to closest CEmOC facility. The non-mechanized raster time measure showed the largest difference in time estimates, with a 12 minute difference to the closest delivery facility, and a 14 minute difference to CEmOC. For 90% of the births in the surveillance dataset, the deviance between measures calculated from village and compound was less than 30 minutes for any of the time measures to closest delivery facility, and less than 35 minutes to the closest CEmOC facility. The means and standard deviations of the impedance values calculated from village centroid and compound in the surveillance dataset were almost exactly the same (Table 8).

Table 7 Absolute difference in measures to closest delivery facility calculated from compound compared to measures calculated from village centroid, n = 9,306 births
Table 8 Mean, standard deviation and effect of measures to closest facility on use of facility for delivery, n = 9,306 births

Association with facility use

We modeled the association between each impedance measure and facility delivery as a binary outcome, with the assumption that the best proxy for access to care would show the strongest association with use of a facility for delivery. Effect estimates (odds ratios) for facility use were the same for all non-mechanized impedance measures from compound to closest delivery facility (Table 8): the odds of women delivering in a health facility decreased by 67% per standard deviation (SD) increase in each measure to closest delivery facility (OR = 0.33). When calculated from village centroid, the effect of non-mechanized network and raster time was slightly smaller than when calculated from compound as origin. There was less evidence of an association and a smaller effect with mechanized measures from both origins (e.g. OR 0.91; 95%CI 0.65-1.27; p = 0.569 for mechanized raster time from village centroid; Table 8).

We modeled access to any delivery facility and access to CEmOC separately because of the longer distances to CEmOC facilities, and found that effect estimates followed a similar pattern for both facility types. All non-mechanized measures suggest an approximately 55% decrease in odds of delivering in a health facility per SD increase in measure to the closest CEmOC (from both origins), with Euclidean distance showing the largest effect (OR = 0.41 95%CI 0.33-0.50, Table 8). Again, time measures incorporating driving showed a smaller effect.

As a sensitivity analysis, we also modeled the association of these impedance measures with facility delivery using quintiles, log-transformed and binarized measures, as well as adjusting for several potential confounders (age, parity, wealth quintile) with similar results (data not shown).

Discussion

We comprehensively compared six commonly used measures of travel impedance in a predominantly rural area in Ghana’s Brong Ahafo region as a case study for access measures in certain LMIC settings. We estimated the potential spatial aggregation error using average population location (village centroid) compared with individually geocoded location (compound). We assessed each measure as a proxy for potential spatial access, modeling the association between facility use and travel impedance measures to all delivery facilities and to CEmOC facilities, separately. We showed that measures calculated with different methods were highly correlated with each other, and identified the same facilities as closest for over 80% of villages, with the exception of mechanized cost. Measures calculated from individually geocoded locations (compounds) were highly correlated with measures calculated from village centroids, and identified the same facility as closest for over 85% of births and the same CEmOC facility as closest for approximately 98% of births in the surveillance data. Higher travel impedance was associated with lower facility use for delivery, and this association was of similar magnitude for all travel impedance measures except for mechanized travel.

In contrast to our results, two previous comparisons of distance measures in predominantly rural LMICs favored sophisticated measures over Euclidean distance. A study in Kenya compared models predicting use of specific facilities by febrile children, and found that the Euclidean distance model was less accurate than models using raster measures (called “transport network models” because they incorporated the transport network) [15]. However, the predictive accuracy of the Euclidean and unadjusted raster models was similar (kappa Euclidean 0.71 vs. kappa unadjusted raster 0.73); they differed more substantially when the raster model was adjusted for competition between facilities (kappa 0.83). This suggests it could have been the adjustment for competition in facility types that increased the accuracy of the raster model rather than the access measure itself. In a study region in Ethiopia described as mountainous with a poor road network and difficult terrain, no association was found between under-five mortality and straight-line distance (p value = 0.398), compared to a strong association with actual distance travelled (p value = 0.016) [14]. As we found that Euclidean distance performed as well as the other measures in predicting the odds of facility use, we cannot conclude that it is always better to use sophisticated measures, but that it rather depends on context.

Travel time is determined by many factors in addition to distance, including mode of travel. Incorporating mechanized modes of transport into travel time measures makes assumptions about access to and use of motorized vehicles. For our travel time estimates, speeds were determined empirically for roads, and obtained from a globally available topographic map for land-cover. We assumed the same travel speeds for the entire population, and our estimates should be interpreted as an average estimate for the population as a whole. Assuming that access is a good predictor of facility use, the weaker associations between facility use and driving times indicate that either our road speeds were inaccurate, or that women in our study area do not travel by vehicle to delivery facilities. The reality of an individual woman’s journey to a health facility may include multiple transport modes, such as a combination of walking, public transportation, and hiring taxis, which we were unable to fully take into account due to lack of information on individual’s travel modes.

Obtaining individually geocoded population locations is difficult due to issues of privacy and anonymity and in high-income countries accessibility is usually measured from aggregate locations, such as census tracts or zip codes. Aggregation error arises from the distribution of individuals in a spatial unit, and could affect associations with health outcomes measured on an individual level [20, 38]. Results and interpretations vary widely in studies evaluating aggregation errors, and there does not seem to be a generally agreed upon cutoff for this error [38]. Authors of a study in the US comparing individual addresses to zip-code centroids conclude that a deviance <12 miles (19 km) for 95% of the population is not appreciably large [19], while a study in Canada found a deviance <1.5 km for 95% of the population, and considered the >1.5 km deviance for the remaining 5% a significant error [39]. Like other studies in Africa [13], we were able to collect individually geocoded locations and compared these to an aggregate measure, village centroid. While there were some extreme deviances (up to 3.2 km and 2.4 hours), the magnitude of the median deviance (e.g. 180 m Euclidean) was much smaller than the median value of the measures themselves (e.g. 910 m Euclidean). However, a different facility was identified as the closest using different origins for approximately 13% of the births in our surveillance dataset, which may have implications for access if these facilities offer different levels or types of care. The magnitude of this error obviously depends on the size of villages and the dispersion of compounds within villages, which varies with context. Deciding whether village centroids are a sufficient proxy for population location should be made based on the study area context as well as the research question.

As we do not have a gold standard with which to compare the various travel impedance measures to identify the “best” impedance measure, deciding which to use in an analysis is a matter of appropriateness and accuracy which can differ depending on the purpose of the analysis [39]. As all of the non-mechanized measures we included showed almost exactly the same association with facility use from both origins, we conclude that in similarly flat LMIC contexts, simple methods to calculate travel impedance can be reasonably used as proxies for potential spatial access. Euclidean distance is an accurate measure of the distance between two points and does not necessitate making any assumptions regarding mode of travel. For the purpose of comparing relative access to care as opposed to describing detailed journey paths, obtaining accurate estimates may be more important than aspiring to realism.

Conclusion

Apprehension towards using Euclidean distance as a proxy for access, and the high cost (in time and money) of investing in sophisticated GIS techniques should not be a deterrent to research on access to care in rural low-income settings. Where poor access to care is a major barrier to health, this missing evidence may have real consequences for policy and populations. Using a case study in rural Ghana, we have shown that Euclidean distances from village centroids can be used as a reasonable proxy for individual potential spatial access, which can be generalized to other similar topographic and cultural contexts (i.e. non-mountainous regions, without major water bodies). The accuracy of sophisticated impedance measures depends on the quality of the input data and validity of assumptions regarding travel mode, and the possibility for error in these assumptions may decrease their usefulness. We suggest that for researchers faced with scarce and disparate data sources in relatively flat low-income countries, the use of Euclidean distance from an aggregate measure of population location is an acceptable proxy for access. Furthermore, we suggest that the extra time and effort required for sophisticated and individually geo-referenced methods are justifiable only when a high level of accuracy and completeness of the input data can be assured.