Introduction

Chronic back pain (CBP) prevalence is estimated at approximately 14% in England [1]. Lower back pain (LBP) is the biggest contributor to disability measured in years lived with disability worldwide and disability-adjusted life years in western Europe [2]. In the UK, the direct cost of total healthcare treatment for patients with CBP is almost double that of matched controls, with an estimated annual marginal cost of £1.5 billion [3] and a total cost to the economy of over £10 billion annually [4].

National estimates are prone to misrepresenting what is happening at a small area level by averaging out areas of high and low prevalence. As a result, areas of unacceptably high prevalence of a disease may go unnoticed, hindering equitable health resourcing and the implementation of targeted interventions, which are likely to be more effective.

Whilst it is possible to estimate the burden of a disease at small area level with local surveys [5,6,7], these studies cover relatively few small areas and are resource-intensive. A solution is computational modelling, with an example being regression modelling [8, 9]. A regression approach to producing small area estimates is useful for local planning and comparing CBP prevalence by area, but does not provide data on other variables for each small area, such as physical activity (PA) or body mass index (BMI). Therefore, it is not possible to analyse such associations at a small area level or to estimate the effect of modification of such variables. Spatial microsimulation (SMS) solves this, using individual units, such as people, with the attributes they possess, to build large-scale synthetic spatially disaggregated microdata sets [10], allowing the analysis of other variables of interest as well as the outcome variable. The impact of a change, such as policy implementation, can also then be simulated and analysed without the limitations of using aggregate data.

This study aimed to:

  • Map CBP prevalence at small area level in England using a simulated CBP dataset.

  • Identify associations that may explain spatial variation and help to identify specific populations where interventions are likely to be most effective.

  • Explore counterfactual simulations (‘what-if’ analyses), for increases in PA levels.

Method

Spatial microsimulation

This study used ‘SimObesity’ [11], a previously developed, validated, SMS program for various health conditions, including obesity, osteoarthritis and cancer [10,11,12]. The algorithms used in SimObesity have been described in detail by Edwards et al. [10]. This study uses the two-stage simulation process described in detail elsewhere [12]. This was chosen as it allows for later ‘what if’ analyses to assess the impact of targeted policy changes. In stage 1 a simulation was run using a Health Survey for England (HSE) dataset including PA to simulate PA at ward-level. The output file was then used as the geographic dataset (geofile) in the stage 2 simulation, to simulate CBP at ward-level. See Fig. 1. Wards, also known as electoral wards, are geographical units used in UK government elections and form an essential component of the UK’s administrative hierarchy, their mean population is around 7900 people.

Fig. 1
figure 1

Two-stage spatial microsimulation process

Data sources

The national-level survey dataset used in this study was the HSE, obtained from the UK Data Service [4]. Four years of the HSE were used (2013 [13], 2014 [14], 2015 [15] and 2017 [16]) to provide data on CBP, PA and control variables. The geographic dataset used to provide demographics of each small area was the 2011 UK Census, obtained from InFuse [17].

Variables

Chronic back pain

The 2017 HSE’s chronic pain questions record the presence or absence of chronic pain at seven sites. “Chronic” is defined as “pain or discomfort for more than 3 months” [18]. The questions distinguish between “back” and “neck” pain but do not recognise thoracic back pain and LBP as separate entities.

Physical activity

All four HSE years used recorded PA using the International Physical Activity Questionnaire (IPAQ) [19], a widely used standardised instrument suitable for making regional and international comparisons of PA patterns [19, 20]. PA is measured in minutes per week of moderate-to-vigorous physical activity (MVPA). Physical inactivity (PIA) is defined as < 30 min MVPA per week [21].

Constraint variables

Variables shared by both datasets were shortlisted if deemed to be predictors of CBP or PA based on a literature review. Variables with inconsistent definitions between the two datasets were removed. Potential constraint variables’ ability to predict CBP and PA was assessed in the input HSE data sets using regression analysis in IBM SPSS Statistics version 27 [22].

Validation

The purpose of validation is to determine whether a simulated dataset is a successful representation of the real population at ward-level. Each simulated dataset was internally validated by comparing it with input real small area data from the census. Validation results were compared to determine the optimum combination of constraint variables. For each validation mean absolute error (MAE) was calculated for each constraint variable as well as the number of simulated areas with > 5% error (E5) and > 10% error (E10) [23, 24].

Extra variables

Other variables of interest (not used as constraint variables) were also included in the datasets for subsequent analyses. These variables are known as ‘‘extra/additional” variables.

Spatial analyses

Mapping of output

The geographic information system (GIS) software Geoda 1.18 [25] was used to map the simulated CBP prevalence at ward-level. Global spatial autocorrelation was assessed using Moran’s I index. Local spatial autocorrelation was assessed using local Moran’s I [a local indicator of spatial autocorrelation (LISA) statistic] [26, 27]. This determines whether the data pattern is statistically significantly clustered, dispersed or random. Normalisation of the numerator means that the index values fall between  − 1.0 and  + 1.0.

Geographically weighted regression

The mapped output was analysed using multiscale geographically weighted regression (MGWR) in MGWR2.2 [28] to assess the relationship between PA and CBP at ward-level allowing for non-stationarity. PA coefficients were mapped to visualise the spatial relationship between PA and CBP.

‘What-if’ analysis

‘What-if’ analysis was performed to simulate and validate the effect of policies that increase PA on CBP prevalence. This was achieved by altering MVPA in the stage 1 HSE dataset to achieve higher area level MVPA values in the PA geo-dataset used for the stage 2 simulation. Three national-level policies were simulated, targeted at individuals not meeting the UK Chief Medical Officers’ Physical Activity Guidelines (71) of at least 150 min of MVPA per week. The scenarios saw those individuals increasing their weekly MVPA by 15, 30 or 60 min.

Results

Results of internal validation as well as the wider data preparation and model selection process have been detailed previously by Smalley et al. (paper in publication). Briefly, the model performing best on internal validation was selected as the final model. This model comprised of the constraints sex (male/female), age (10-year intervals), standard occupational classification 2010 (major group) [29] and MVPA (in stage 2). The extra variables consisted of BMI, ethnicity, smoking status, alcohol intake, disability, anxiety/depression, time sitting and pregnancy status. The fit of the final stage 1 and stage 2 simulations at the national level can be seen below in Figs. 2 and 3. Overall, the simulations were robust and reliable.

Fig. 2
figure 2

Summary of constraint categories at national level – Stage 1

Fig. 3
figure 3

Summary of constraint categories at national level – Stage 2

Mapped output

Mapping of the resulting simulated population revealed a clear visual pattern of high CBP prevalence along the east coast and in the South West. Prevalence appeared to be relatively low in the southern central area. See Fig. 4. A similar spatial pattern was seen for PIA prevalence (Fig. 5).

Fig. 4
figure 4

Map of CBP prevalence in England

Fig. 5
figure 5

Map of PIA prevalence in England

Spatial autocorrelation

Autocorrelation analysis revealed positive global spatial autocorrelation for CBP (see Fig. 6). Global Moran’s I index 0.525, p = 0.001 (999 permutations). Locally, high-high wards (clusters of high prevalence) are seen predominantly in coastal areas, particularly the east coast and the South West of England. Herefordshire on the Welsh border is another notable cluster of high prevalence. There is also a relatively large area containing clusters of high prevalence where the borders of South Yorkshire, Derbyshire and Nottinghamshire meet. Low prevalence clusters can be seen in the south especially in and around London as well as cities of the midlands and north (Fig. 7).

Fig. 6
figure 6

Moran scatter plot showing positive global spatial autocorrelation

Fig. 7
figure 7

CBP prevalence Local Moran’s I LISA cluster map Univariate geographically

Univariate geographically weighted regression

The univariate global regression analysis showed PIA prevalence to be a statistically significant predictor of CBP prevalence (Table 1). 73.5% of the variation in CBP was explained by PIA.

Table 1 Univariate global regression results

The subsequent local univariate model for PIA (Table 2) showed an improvement in the R2 value compared with the global model (0.815 vs. 0.735), suggesting a better fit.

Table 2 Univariate GWR (local) results

Mapping of the GWR coefficients (β) showed higher PIA coefficients in and around cities (Fig. 8).

Fig. 8
figure 8

Map of %inactive’s coefficients

Multivariate multiscale geographically weighted regression

Multivariate global regression analysis showed that most predictors were statistically significant (the proportion of residents that were: physically inactive, over 60, in low-skilled jobs, female, pregnant, obese, smokers, white or black, disabled). An R2 value of 0.924 was achieved. See Table 3.

Table 3 Multivariate ordinal least squares (global) regression results

No improvement was seen in the R2 between the global and local model (Table 4). In the local model, all bandwidths were close to the total 7678 ward study area, except for those of the prevalence of current smokers, females and those pregnant. This is also reflected in those variables’ broader coefficient standard deviations.

Table 4 Multivariate MGWR (local) results

A reduction was seen in the mean coefficient of %Inactive down to 0.070 in the multivariate local model compared with 0.857 seen in the univariate local model.

‘What-if’ analysis

The ‘what-if’ analysis showed a detectable reduction in CBP prevalence for increases in MVPA of 30 and 60 min. No detectable change was found for a 15-min increase. 30 and 60-min increases in MVPA resulted in approximately the same magnitude of decrease in CBP prevalence (−2.71%). See Table 5.

Table 5 ‘What-if’ analysis results

Discussion

For the first time, CBP prevalence has been mapped at small area level across England, highlighting areas of high and low prevalence.

Clusters of high prevalence CBP were found predominantly in coastal areas. There are large clusters along the East Coast and in the South West of England, as well as in Herefordshire and where the borders of South Yorkshire, Derbyshire and Nottinghamshire meet. This information could be helpful to public health planners if targeted interventions were to be implemented.

Spatial variation of back pain prevalence in England has been reported previously by Walsh et al. [7]. Their study compared LBP 1 year period prevalence in eight areas in Britain, with the lowest in England being 31.9% for Darwen in Lancashire and the highest being 39.7% for Wisbech in Cambridgeshire. CBP prevalence in this study ranged from 6.5% to 23.4%. Due to the small sample of areas in Walsh et al.’s study and the fact that only adjusted prevalence was reported it is difficult to draw comparisons on the extent of variation across the whole of England. These factors aside, it could be the case that CBP varies more spatially than acute back pain as factors that predict the occurrence of acute back pain have been shown to also predict the transition from acute to chronic [30].

Spatial variations in back pain prevalence have also been found in other countries. Deyo et al. [31] found that in the US lifetime prevalence of episodes of LBP of  ≥ 2 weeks ranges from 10.9% in the North East to 15.0% in the West. In Sweden Bjelle et al. [32] found higher back pain prevalence in areas of lower population density. Conversely, a study in Austria by Großschädl et al. [33] comparing back pain prevalence spatially and temporally found no discernible differences in regional prevalence. However, this study only compared East, Central and West Austria. Analysis at this level does not take account of more local variation due to processes acting at a smaller spatial scale. Nevertheless, at the level used they did not find the East–West gradient that is apparent in Austria with many other diseases [33]. Similarly, in England, there is a well-established “North–South divide”. The north generally has poorer health than the south including greater premature mortality [34, 35]. In our study, whilst central southern England had lower CBP prevalence, high prevalence was seen in the South West. There was not an obvious difference overall when comparing the North and South.

This study also showed a strong positive correlation between an area’s level of PIA and CBP. This is reflected in the similarity of Figs. 4 and 5 (CBP and PIA prevalence maps). The strength of the relationship between PIA and CBP varied by area, from a coefficient of 0.073 in the area with the weakest relationship to 2.623 in the strongest. By mapping the coefficients of the univariate GWR it could be seen that wards with the strongest relationship between PIA and CBP were in and around cities.

This study further explained the spatial relationship between PIA and CBP in the multivariate MGWR analysis. PIA’s mean coefficient declined to 0.07. Controlling for other variables also eliminated the improvement from a global to a local model. This suggests that areas of high prevalence of CBP can largely be explained by geographic variation in confounders (the proportion of residents that are: over 60, in low-skilled jobs, female, pregnant, obese, smokers, white or black, disabled).

Caution should be taken when interpreting the results of the spatial analyses. They should not be treated as individual level results. For example, in the global regression (Table 3) pregnancy is found to have a significant negative coefficient. This should not be interpreted as, individuals who are pregnant are less likely to have back pain. Back pain could still be more prevalent amongst pregnant individuals than non-pregnant individuals. In this case the relationship may be confounded by age as only age over 60 was controlled for.

Finally, ‘what-if’ policy modelling showed that if interventions were put in place that decreased PIA, then CBP prevalence could be reduced by up to 2.71% (1,164,056 cases). Interestingly, a dose–response was not established, with a ceiling effect being seen at 30 min of MVPA. This may be explained by the combination of two factors. Firstly, SimObesity works using categorical data. This means that a change in CBP prevalence can only be seen if individuals increase their activity enough to move up into the next category. Secondly, the HSE MVPA data is not smoothly distributed, with most individuals either doing 0 min or more than 30 min. This means the number of individuals changing category for each change in MVPA is very inconsistent. However, at an individual level, findings from studies investigating the relationship between PA and back pain vary [36, 37]. Few studies have focused specifically on CBP. A meta-analysis by Shiri et al. [38] found statistically significant differences in CBP risk when comparing inactive groups with other levels of PA. In their study, a relatively large difference was seen between inactive versus active with a diminishing return for higher volumes of PA. This may explain why no difference was seen in our study between policies to increase MVPA by 30 and 60 min. It may be that transitioning from being inactive to active is protective against CBP, but once active increasing MVPA further has little protective effect on CBP.

Strengths and limitations

SimObesity used in this study is a previously validated program with proven usefulness in simulating health data [10,11,12]. The data used for the simulations was obtained from the HSE and Census, two well recognised high-quality datasets. The HSE uses the IPAQ to measure PA, a standardised, validated, instrument suitable for making regional comparisons [20]. A large HSE sample of 32,903 individuals was used.

Due to the lack of differentiation between lower and thoracic back pain in the HSE this study was unable to simulate to any greater anatomical specificity than “back pain”. Whilst “back pain” prevalence is mostly comprised of lower back pain [39, 40], this limits accurate comparison with studies using a strict “lower back pain” definition. The model created is a static SMS model. In which, the time period of the simulation is dictated by the datasets used to construct it. Time since the 2011 Census may have meant that small areas’ demographics may have changed and thus CBP prevalence changed also. However, at the time of the study the 2021 Census was not yet available.

Conclusion

CBP prevalence varies at ward-level across England. There are significant clusters of high prevalence CBP predominantly in coastal areas and significant clusters of low prevalence CBP predominantly in cities. At ward-level PIA is strongly positively correlated with CBP. The geographic variation in PIA was explained by geographic variation in confounders (the proportion of residents that are: over 60, in low-skilled jobs, female, pregnant, obese, smokers, white or black, disabled). Policies to increase PA in these groups to 30 min MVPA will likely result in a significant reduction in CBP prevalence. To maximise their impact, policies could be tailored to areas of high prevalence, which are identified by this study.