Background

Low back pain is the world’s largest cause of years lost to disability, but it usually has no objective diagnosis or known mechanism [1, 2]. Aberrant intervertebral motion in the lumbar spine as measured in vivo using standardised quantitative fluoroscopic imaging protocols (QF) has been linked to nonspecific low back pain (NSLBP) as a biomarker [3,4,5], and many interventions exist to influence it [6,7,8]. This holds out the prospect of applying lumbar motion parameters as personalised biomarkers for the diagnosis of otherwise ‘nonspecific’ low back pain.

By improving understanding of mechanisms in individual patients, measurement of quantitative imaging biomarkers for back pain that takes advantage of such technologies could accelerate the development of new management approaches and facilitate more personalised care that may help avoid chronicity and/or resort-to-opioid medications [9]. However, quantitative imaging biomarkers are an emerging science [10], and measuring changes in motion parameters will always involve some error, either because of natural variation in the subject, variation in the measurement process or both [11]. Recommendations for scientific studies and regulatory submissions highlight the requirement to measure change; therefore, it is necessary to establish intrasubject repeatability over a credible intervention period for each parameter [10].

The dynamic measurement of continuous intervertebral motion in vivo is a relatively recent development, and intrasubject variation tests have tended to be limited to regional lumbar range of motion over short periods [12]. This has tended to confine the objective dynamic measurement of intervertebral function to cadaveric studies and computer models [13,14,15,16,17] providing little insight into individual living patients and representing a predicament in spine biomechanics research that has led to calls for in vivo, dynamic measurement methods of the multi-segmental spine and their validation. The hope is to make possible the production of individualised and, if possible, predictive models of functional spinal derangements [18, 19].

To provide such methods and allow them to be used to make valid comparisons between individuals, settings, populations and time points, two-dimensional (QF) systems have been developed that use standardised patient motion protocols to acquire multi-segmental, continuous image sequences from which intervertebral movement can be analysed with minimal behavioural variation. The resulting studies have provided early evidence that excessive intervertebral sagittal plane translation [20, 21], laxity [4], motion sharing inequality (MSI) [5], motion sharing variability (MSV) [22] and instant centres of rotation (ICRs) [23] are in various ways associated with spinal pain. Accuracy and observer repeatability studies have tended to support these parameters, as well as intervertebral range of angular motion (IV-RoM) and anterior disc height [24,25,26]. However, intrasubject repeatability data are lacking.

The intrasubject repeatability of intervertebral kinematic measures is also important when trying to decide whether a given parameter can be used in follow-up studies. This is typically expressed as the minimal detectable change (MDC), or measurement error, which is the change required to exceed the inherent variability in a truly unchanged population [11]. It reflects the smallest within-person variation, or change in score that can be interpreted as real and statistically significant, making it possible to decide in advance whether the degree of change that is of clinical interest can be detected with the technology at hand. This is different to the need to distinguish between subjects, when reliability measures, such as intraclass correlations, are preferred [27].

Aim of study The above parameters can be extrapolated from continuous multilevel intervertebral motion studies using QF. The aim of this study was to determine the intrasubject reliability (ICC) [28] and minimal detectable change (MDC95) [11] of the repeated measurement of kinematic parameters during standardised active weight bearing and recumbent passive lumbar spine motion in flexion, extension, left- and right-side bending from L2-S1 using 2D quantitative fluoroscopy (QF) in healthy individuals over a period of 6 weeks.

Methods

Variables under consideration

Intervertebral range of angular motion (IV-RoM) IV-RoM as measured with QF is the maximum angular rotation of intervertebral motion reached during bending (Fig. 1). In various forms, it is a very common biomechanical measure [29,30,31]. QF has been reported as measuring IV-RoM in the cervical spine with levels of interobserver agreement ranging from 0.3° to 1.0° (SEM) and reliability of 0.92–0.99 (ICCs) [32] and in the lumbar spine with between 0.23° and 0.76° (SEM) and reliability of 0.94–0.99 (ICCs) [33].

Fig. 1
figure 1

Example of the identification of maximum intervertebral rotational range (IV-RoM) using a standardised lumbar left bending and return QF imaging of L2-S1. Note that the maximum IV-RoM does not necessarily occur at the maximum of motion frame range

Sagittal translation Translation can be calculated for the sagittal plane in vertebral body units (VBU) which are converted to millimetres for presentation by multiplying the result by 35, being the standard chosen for vertebral body depth in millimetres [34]. Intra- and interobserver agreement for translation using QF has been found to be 1.1 mm or less (SEM) with fair-to-substantial reliability (ICCintra 0.53–0.99, ICCinter 0.57–0.93) [35].

Laxity Laxity is a kinematic measure that reflects mid-range intervertebral restraint in response to external forces [36]. It is used as a surrogate indicator of dynamic neutral zone length in in vivo studies and is also sometimes known as the initial attainment rate of intervertebral rotational displacement [37]. High values are evidence of disco-ligamentous microstrain or sub-failure and therefore a potential source of nociceptive pain [38]. Laxity is measured as the gradient of intervertebral motion in the initial 10° of global motion from the mid-range position [39] (Fig. 2). The higher the ratio, the less the restraint within the vertebral linkage [40]. Reliability for laxity has been found to range from ICCintra 0.84–0.98 and ICCinter 0.92–0.98 [33].

Fig. 2
figure 2

Example of laxity (initial attainment rate) as initial gradients for four intervertebral levels

Anterior disc height Disc height is defined as the sum of the perpendicular distances of the anterior–inferior corner of the cranial vertebra and the anterior–superior corner of the caudal vertebra from the bisectrix between the two vertebral body mid-planes [34] (Fig. 3). Disc height is used to measure the effects of disc degeneration and end plate subsidence in relation to disc prostheses [41]. Anterior disc height, like translation, is also calculated in VBU for flexion and extension and subsequently converted to millimetres. It is calculated as a maximum for extension and a minimum for flexion. Reliability for disc height change for extension has been reported as ICCintra 0.65–0.97 and ICCinter 0.49–0.0.97, and for flexion as ICCintra 0.24–0.88 and ICCinter 0.64–0.99 [25].

Fig. 3
figure 3

From Frobin et al. [34]

Measurement of anterior disc height in the a neutral and b flexed positions based on the sagittal mid-planes of adjacent vertebrae

Motion sharing inequality (MSI) and motion sharing variability (MSV) Asynchronous intervertebral motion during standardised trunk bending has been found to be greater in patients with nonspecific back pain than in controls and may represent a form of movement impairment [5, 22, 42]. Numerically, MSI is the average range of differences in the sharing of motion by each intervertebral level at each data point throughout the motion and reflects inequality of restraint across levels. MSV is calculated as the square root of the variance (or SD) of these differences throughout the motion. Both variables are derived from continuous proportional angular motion data (Fig. 4), and MSV may be considered to reflect intervertebral motion control. Details of these variables and methodologies have been published elsewhere [5, 42]. However, no observer repeatability statistics have yet been published for MSI and MSV.

Fig. 4
figure 4

Example of intervertebral proportional motion sharing at four intervertebral levels during outward and return motion. Motion sharing inequality (MSI) is calculated as the average of the maximum distances between levels at all data points and motion sharing variability (MSV) as the square root of their variance

Instant centre of rotation (ICR) The ICR is conventionally the fulcrum of the arc of rotation of a vertebra with respect to its subjacent neighbour over a predetermined range. Its importance lies in the belief that it represents the centre of reaction force during loaded bending [43]. The more caudal its position, the more translation has accompanied the bend over the chosen range. Unfortunately, it is prone to large errors for small rotations, making it difficult to gather large amounts of change data over time. However, for rotations greater than 5°, QF has substantial-to-excellent reliability (ICCintra 0.63–0.99 and ICCinter 0.62–0.88) [26].

Sample size calculation Sample size was calculated as the smallest number that would allow an assessment of intrasubject repeatability based on recognising a minimal change of 25% of the mean value for each kinematic index [11]. This allows an evaluation of the method to detect changes that are well within the upper reference limits found in previous studies. The width of the 95% confidence interval for the population within-subject standard deviation is given by:

$$1.96\frac{{S_{\text{w}} }}{{\sqrt {2n\left( {m - 1} \right)^{ } } }}$$

where Sw is the precision that can be estimated, m is the number of observations per subject and n is the number of subjects required.

We wished to estimate to a precision of 1.96 SD with two observations per subject and a confidence interval ≤ 0.25 of the mean value of each parameter in healthy controls. Solving for n in the equation below returns n = 30.73.

$$\frac{1.96}{{\sqrt {2n\left( {2 - 1} \right)^{ } } }} = 0.25$$

With 31 pairs of observations, according to central limit theorem, the sampling distribution of the mean will also approach a normal distribution, which will allow calculation of the baseline standard deviation for future power calculations. Therefore, to enable 31 participants to be imaged in each of the coronal and sagittal planes (to minimise radiation dosage to participants), upwards of 62 participants were needed. However, it was planned to recruit 150 participants with these inclusion criteria for a normative database, which is still in progress. Therefore, this target was exceeded.

Participant recruitment A convenience sample of 109 healthy control volunteers was recruited from staff, students and visitors of the AECC University College (Bournemouth, UK). Participants were included if they were aged 21–80, BMI < 30, with no history of previous back or abdominal surgery or spondylolisthesis, no medical radiation exposure of > 8 mSv in the previous 2 years and no current pregnancy. Participants also had to have been free of any back pain that limited their normal activity for more than 1 day in the previous year. In order to restrict radiation dosage, within-subject measurements over 6 weeks were only carried out twice. Fifty-four received passive recumbent and active controlled weight-bearing QF investigations to the left and right (coronal plane), and 55 received passive recumbent and active weight-bearing controlled flexion and extension (sagittal plane) investigations of their lumbar spine motion. All participants had these procedures repeated 6 weeks later by the same operators using the same equipment at approximately the same time of the day. Informed consent was obtained from all participants, and ethical approval was obtained from the National Research Ethics Service (South West 3, 10/H0106/65).

Data collection The QF image acquisition and analysis procedures are further detailed in previous studies [5, 21, 22] (Fig. 5a–d). However, in order to minimise radiation dose, participants were allocated to either coronal or sagittal plane sequences.

Fig. 5
figure 5

ad Positioning of participants for a passive recumbent coronal and b passive recumbent sagittal recumbent and c active weight-bearing coronal and d active weight-bearing sagittal imaging

All participants had both recumbent and weight-bearing imaging. For recumbent QF, participants lay on a movable table in which the trunk section was motorised and driven by a controller (Atlas Clinical Ltd.). This produced a bending angle of 40° during separate left and right (coronal plane, subject supine) and flexion and extension (sagittal plane, subject-side lying) motion sequences during fluoroscopic screening. For active controlled weight bearing, participants sat on a stool with their backs against an upright motion frame fitted with arm rests which guided them through 40° of left- and right-side bending. Participants receiving sagittal plane investigations stood with their right side against the motion frame with their pelvis secured and upper limbs supported on a projecting rest which guided them through 60° of flexion angle (and return) using the same controller apparatus as for the recumbent procedure. The motion controllers accelerated at 6° s−2 for the first and second followed by a uniform 6° s−1 thereafter. The images were collected as single (not repeated) motion sequences at 15 Hz using a Siemens ARCADIS Avantic digital C-arm fluoroscope (Siemens GMBH) giving approximately 230 frames per sequence. All images were exported to a computer workstation and analysed using manual first image registration and thereafter bespoke frame-to-frame tracking using codes written in MATLAB (v2011a—The MathWorks Inc.

Calculation of kinematic parameters

Maximum intervertebral rotation (IV-RoM), maximum sagittal translation in flexion, sagittal disc height during flexion (maximal in neutral to minimal in flexion), laxity (gradient of segment to trunk motion in first 10°), MSI (average proportional range shared between segments) and MSV (square root of the variance of the proportional range shared between segments) were calculated. Individual-level intervertebral motion data for each orientation (upright or lying) and direction (left, right, flexion and extension) were pooled, whereas multi-segment indices (MSI and MSV) gave single values. Vertebral levels from L2-S1 were analysed in the sagittal plane and from L2-5 in the coronal plane, (given the lack of movement of L5-S1 in this plane). All data were pseudonymised and stored on an encrypted database, with access restricted to the chief investigator, the research assistant and the database manager. Image and statistical analyses were conducted by two independent observers who were blinded to each other’s observations. Translation and disc height measures were confined to the sagittal plane, and ICR was excluded due to insufficient segments with rotations above 5°. The study was conducted in accordance with Statistical Methods in Medical Research (SMMR) recommendations [11].

Statistical analysis Data were inspected for distribution and central tendency. Analysis was according to intervertebral level and direction, i.e. left and right from L2-3 to L4-5 (3 levels) and flexion and extension from L2-3 to L5-S1 (4 levels). The association between test and retest and between differences and means was assessed using Kendall’s tau. As no significant and/or substantial associations were found, the data were not transformed. Repeatability was assessed using intraclass correlation coefficients (ICC2,1—two-way random effects, average measures model) and the minimal detectable change (MDC95). To interpret the relevance of the ICC ‘reliability’ level, an ICC score of > 0.80 was considered ‘excellent’, > 0.60–0.80 ‘substantial’, 0.40–0.60 ‘moderate’ and < 0.40 ‘slight’ [44]. This framework is consistent with other reliability studies reporting reliability of spinal posture measurement [45, 46].

The distributions of the differences between baseline and follow-up measures for each level and direction for each variable were checked for normality using the Shapiro–Wilk test and the significance of any differences determined. Repeatability coefficients were calculated using the formula below, where Sw is the within-subject standard deviation. The repeatability coefficient estimates the magnitude of the within-subject change that can be expected 95% of the time and represents the minimum detectable change (MDC95) [11].

$${\text{Repeatability coefficient}}\;\left( {{\text{MDC}}_{95} } \right) = 2.77S_{w}$$

Results

The study population consisted of 43 females and 66 males. Their characteristics and allocations to coronal and sagittal plane investigations are given in Table 1. For those participants who undertook coronal plane investigations, the median effective dose was 0.97 mSv (1.2 mSv upper third quartile), and for those who undertook sagittal plane investigations, the median effective dose was 0.66 mSv (0.78 mSv upper third quartile). This is less than and compares favourably to the 1.3 mSv quoted as the typical effective dose expected during a series of X-rays of the lumbar spine for diagnostic procedures [47]. The mean baseline and reference ranges, RMS differences between baseline and follow-up, ICCs (95% CI) and MDC95 in the units of the measures and as a percentage of the baseline scores are given in Table 2 for passive recumbent motion and in Table 3 for active weight-bearing motion.

Table 1 Study populations imaged in each plane
Table 2 Passive recumbent motion: pooled means, RMS differences between baseline and 6-week follow-up, ICCs and MDCs for intervertebral motion parameters in healthy participants
Table 3 Active weight-bearing motion: pooled means, RMS differences between baseline and 6-week follow-up, ICCs and MDCs for intervertebral motion parameters in healthy participants

In general, reference ranges for IV-RoM and laxity were similar to published control studies that used the same measurement methodology [22, 24, 48]. Their weight-bearing and recumbent values were similar when the same trunk bending range was applied. MSI and MSV, however, had higher values during weight bearing than recumbent motion for all directions.

Reliability Reliability was substantial to excellent for repeated measurements of IV-RoM, laxity, flexion translation and disc height during recumbent passive (ICC 0.69–0.96) and active weight-bearing motion (ICC 0.64–0.92), except that translation was only moderate for weight-bearing extension translation (ICC 0.55). MSI was moderate to excellent for both positions (ICC 0.43–0.91), and MSV was moderate to substantial for weight-bearing motion (ICC 0.40–0.65), but poor to moderate for recumbent motion (ICC 0.14–0.47).

Measurement error Measurement errors (MDC95) for all variables were high, ranging from 42% of baseline for anterior disc height in passive recumbent extension to 408% for weight-bearing extension MSV, suggesting that degrees of change that would be of interest may not be detected in these ranges (Tables 2, 3). Measures of restraint (IV-RoM and laxity) tended to have lower measurement errors in recumbent passive than active weight-bearing motion. However, of all the measures, anterior disc height had the smallest measurement errors, ranging from 45% of baseline in recumbent extension to 53% in weight-bearing flexion. The measurement error for translation was unacceptably high for both weight-bearing (157–283%) and recumbent (111–209%) tests, possibly reflecting their small baseline values in healthy controls. For MSV, weight-bearing measurement error ranged from 135 to 408% and recumbent from 150 to 208%, while MSI was 78–135% for weight bearing and 91–131% for recumbent. Measurement error for disc height, on the other hand, ranged from 42% for passive extension to 53% for weight-bearing flexion.

Discussion

This is the first appearance of intrasubject repeatability studies of in vivo continuous intervertebral motion parameters using controlled motion protocols, and the first time to our knowledge that spine biomechanical measurement error has been calculated over a clinically relevant outcome interval. The results suggest that, irrespective of baseline measurement values, follow-up data would not necessarily be useful as biomechanical outcomes for all measures: This is simply because there is poor repeatability of some variables. On the other hand, the acceptable levels of reliability bode well for their use for distinguishing between low back pain patients in relation to biomechanical change [27].

A summary of the magnitudes of reliability and measurement error for all variables is given in Table 4. This shows that for outcome studies that employ QF, the best overall intrasubject reliability and agreement over a 6-week intervention period are the measurement of disc height and IV-RoM and the worst for the measurement of MSV. The measurement of laxity, MSI and translation has acceptable reliability, but not agreement. The implications of this for outcome studies is that for the time being, disc height and IV-RoM are the only variables that could be considered for randomised trials of interventions that might target these as outcomes. With the exception of MSV, the other variables (laxity, MSI and translation) could be considered for investigation as baseline moderators or perhaps correlates or mediators of patient-reported outcomes.

Table 4 Summary table of magnitudes of reliability and measurement error for all variables

Limitations Results for individual-level vertebral data were not calculated in this study as the aim was to address repeatability and the differences between baseline and follow-up measures. In addition, some measures, such as translation, had low values in healthy controls and their changes across time, although small, would be high compared to the baseline itself, giving high percentages but low errors (e.g. 1–2 equivalent mm for translation) which could be quite acceptable in patients with high baseline values. Therefore, patients with high translation or laxity values may have values that are expected to be reduced greatly by an intervention (such as spinal fusion) again, making high measurement error more tolerable. For example, the MDC95 for recumbent laxity of between 0.16 and 0.19 is a difference that would be likely to be detected as the upper reference levels are in the region of 0.40.

The variables evaluated in this study may have greater clinical utility as observational measures rather than specific outcomes to detect change over time, especially for recumbent testing, where there was excellent reliability for a number of measures including: IV-ROM, laxity, disc height and MSI. On the other hand, recumbent IV-RoM and laxity produced the smallest measurement errors, ranging from 55 to 97%, suggesting that these measures of restraint show some promise for longitudinal testing of change over time. Evaluation of recumbent motion enables spinal motion analysis to be conducted without the influence of muscular control and tend to be much better tolerated by individuals who are in pain. Subsequently, variables measured in this position may be biomarkers for LBP [5, 42].

Variables tested during weight bearing generally demonstrated slightly lower reliability scores and higher errors over time compared to recumbent testing. Spinal movement during weight-bearing studies involves active control; thus, muscle activation is likely to play a role in the magnitude of such variables. Future work could therefore include evaluation of the active components of spinal movement, for example, muscle activity using electromyography and muscle oxidation and perfusion to understand potential mechanisms underpinning motor control and muscle metabolism in both the symptomatic and asymptomatic spines during dynamic movement.

Measures of proportional motion inequality (MSI) and variability (MSV) of lumbar motion using QF have shown promise in differentiating between healthy and CLBP populations [22, 42]. MSI has been shown to be significantly greater and, notably, correlated with composite disc degeneration (CDD) in CLBP during recumbent flexion [5]. This suggests greater inequality of motion sharing in NSLBP individuals and intimates a link between in vivo biomechanics of the disc and pain. MSI’s reliability in the current study, as represented by intraclass correlations, was generally acceptable for both weight-bearing and recumbent measures; thus, MSI may be a useful variable of interest for future clinical QF studies.

Although QF protocols were associated with acceptable intrasubject repeatability for some parameters, the poor intrasubject results observed for MSV may be hypothesised to be due to individual changes in the behavioural performance of spinal motion rather than measurement error, although variability of movement is fundamental to motor learning and control, especially in the study of healthy movement and posture [49]. In order to repeatedly achieve a task consistently, variability is required in the motor constituents, to ensure that the individual can respond to altered task demands without performance being compromised [49]. Thus, one could hypothesise that healthy individuals demonstrate unique movement behaviours and may have a range of potential movement patterns available which may explain the high error values obtained for MSV.

Further work The results of this study support previous work that has demonstrated the intra- and interobserver repeatability of these measures [24,25,26, 48, 50], However, this still needs to be determined for MSI and MSV. We also suggest that the present methodology should be repeated in a stable CLBP cohort, where baseline parameters may be different.

Conclusion

Of the six measurement parameters considered, disc height and IV-RoM were the only variables that could currently be considered for use in randomised trials of interventions that employ these as outcome measures. However, laxity, MSI and translation could be considered as candidates for potential moderators, correlates or mediators of patient-reported outcomes.