Longitudinal multi-modal muscle-based biomarker assessment in motor neuron disease

Background Clinical phenotypic heterogeneity represents a major barrier to trials in motor neuron disease (MND) and objective surrogate outcome measures are required, especially for slowly progressive patients. We assessed responsiveness of clinical, electrophysiological and radiological muscle-based assessments to detect MND-related progression. Materials and methods A prospective, longitudinal cohort study of 29 MND patients and 22 healthy controls was performed. Clinical measures, electrophysiological motor unit number index/size (MUNIX/MUSIX) and relative T2- and diffusion-weighted whole-body muscle magnetic resonance (MR) were assessed three times over 12 months. Multi-variable regression models assessed between-group differences, clinico-electrophysiological associations, and longitudinal changes. Standardized response means (SRMs) assessed sensitivity to change over 12 months. Results MND patients exhibited 18% higher whole-body mean muscle relative T2-signal than controls (95% CI 7–29%, p < 0.01), maximal in leg muscles (left tibialis anterior 71% (95% CI 33–122%, p < 0.01). Clinical and electrophysiological associations were evident. By 12 months, 16 patients had died or could not continue. In the remainder, relative T2-signal increased over 12 months by 14–29% in right tibialis anterior, right quadriceps, bilateral hamstrings and gastrocnemius/soleus (p < 0.01), independent of onset-site, and paralleled progressive weakness and electrophysiological loss of motor units. Highest clinical, electrophysiological and radiological SRMs were found for revised ALS-functional rating scale scores (1.22), tibialis anterior MUNIX (1.59), and relative T2-weighted leg muscle MR (right hamstrings: 0.98), respectively. Diffusion MR detected minimal changes. Conclusion MUNIX and relative T2-weighted MR represent objective surrogate markers of progressive denervation in MND. Radiological changes were maximal in leg muscles, irrespective of clinical onset-site. Electronic supplementary material The online version of this article (10.1007/s00415-019-09580-x) contains supplementary material, which is available to authorized users.


Introduction
A significant challenge in motor neuron disease/amyotrophic lateral sclerosis (MND/ALS) research is the facility to track disease changes objectively over manageable time-scales, to reduce the duration and expense of clinical trials. Whilst survival remains a commonly applied outcome measure, slower progressing patients appear relatively over-represented in clinical trials [1,2], and surrogate outcome measures are necessary to detect therapeutic effects in this group. The revised ALS functional rating scale (ALSFRS-R) questionnaire [3] is frequently used, but has well recognized limitations, including inherent subjectivity and influence of symptomatic treatment [4,5]. Objective biomarkers are, therefore, required; imaging and electrophysiology appear promising candidates [6,7].
Clinical heterogeneity in anatomical site of onset, pattern of spread, and rate of deterioration are important barriers to quantifying progression at group-level, whether using clinical, electrophysiological or radiological measures. Most previous imaging studies have focused on the central nervous system (116 in a recent review [8]), and there are relatively few studies of MND effects on peripheral nerve [9][10][11] or muscle [9,10,[12][13][14][15][16][17], yet denervation and muscle weakness are cardinal clinico-pathological features. Approximately 25% of MND patients present with bulbar weakness and 70% with either upper or lower limb muscular weakness in similar proportions [18]. It is, therefore, challenging to capture the disparate effects of denervation on an individual's muscles objectively and translate into a grouplevel parameter suitable for a trial. This may be addressed by application of clinical scores or electrophysiology to multiple muscles, or by whole-body muscle magnetic resonance (MR) imaging. In previous work, we reported longitudinal relative T2-weighted changes, derived from wholebody MR, in tibialis anterior over 4 months [17]. In this study, we present a new and comprehensive analysis of a wide range of clinical, electrophysiological and radiological muscle measures, including both T2-and diffusion-weighted MR, tested in multiple muscles over an extended follow-up period of 12 months. The aim was to identify individualized muscle denervation patterns in MND, and the objective was to assess the optimal technique to detect group-level change from a variety of clinical, electrophysiological and radiological candidates. We hypothesized that whole-body T2-and diffusion-weighted muscle MR would enable quantification of generalized denervation, regardless of clinical site of onset.

Study population
This was a prospective, longitudinal, observational cohort study. Patients were identified at first presentation to the tertiary referral neuromuscular clinic at the Royal Hallamshire Hospital, Sheffield, UK and were assessed at baseline, 4 and 12 months between October 2013 and May 2016. Inclusion criteria were age > 18 years, a clinical diagnosis of ALS fulfilling El Escorial criteria [19] or progressive muscular atrophy. Participation in interventional studies was recorded. Exclusion criteria were cognitive impairment sufficient to impair consent, contraindications to MR imaging, pregnancy, another neuromuscular disease, or respiratory failure impairing the ability to lie flat in the scanner. Healthy controls were recruited from partners of patients and by advertisement, and assessed at two time-points. Based on the results of our previous study [17], the primary outcome selected was between-group differences in relative T2-weighted MR signal over time and, in order to satisfy the requirement of at least 10-20 observations per degree of freedom for the linear regression models (with age and gender as covariates), a minimum sample size of 30-60 observations was required [20]. Secondary outcomes were between-group differences in clinical, electrophysiological and diffusion-weighted MR measures, inter-modality associations and change over time.
reformatted to correspond to axial DWI acquisitions; DWI:  TR = 9412 ms, TE = 66 ms, TI = 250 ms, b0, b1000 s/mm 2 ,  voxel size 2.3 × 2.3 × 5 mm 3 , eight stations, 50 axial slices.  Total acquisition times including localizers and breath-holds  were approximately 20 min and 40 min for the T2-and diffusion-weighted acquisitions, respectively. Muscle regions-of-interest were contoured by two observers using standardized anatomical landmarks using a semi-automated spline function (Extended MR Workspace V2.6.3.5, Philips) on single slices for both T2-and diffusionweighted images (Fig. 1a-i). Prior to analysis, intra-and inter-rater reproducibility was confirmed by coefficients of variability of < 5% for all regions-of-interest on six datasets reassessed after > 24 h. Mean relative T2 estimates were obtained from the following muscles and muscle groups in axial orientation: tongue, splenius capitis, bilateral trapezius, sternocleidomastoid, deltoid, biceps brachii, forearm compartment encompassing brachioradialis, thoracic paraspinal, psoas major, gluteus maximus, quadriceps, hamstrings, tibialis anterior, and gastrocnemius/soleus. Triceps, first dorsal interosseous, thenar and hypothenar eminence were also assessed but on coronal rather than axial T2 images, as anatomical boundaries were more consistently identifiable. Apparent diffusion coefficient (ADC) estimates were obtained from each of the muscles assessed in axial orientation and not from intrinsic hand muscles and triceps. To adjust for coil-loading effects, relative T2 estimates were expressed as a ratio to a bone reference within the same acquisition station [17] (Supplemental Material); no adjustment was made for ADC.

Statistical analysis
Stata version 13.1 was used (StataCorp, Texas). For between-group comparisons and associations, p values were reported corrected for age and gender, due to potential influences on muscle parameters [24]. All p values were corrected for multiple comparisons by applying the Fig. 1 Coronal whole-body T2-weighted acquisition (a); axial slices from relative T2-weighted (b, d, f, h) and apparent diffusion coefficient (c, e, g, i) maps from head and neck station depicting right and left sternocleidomastoid and splenius capitis (b, c); thoracic station depicting right and left thoracic paraspinals (d, e); upper leg station depicting right quadriceps and hamstring groups (f, g) and lower leg station depicting right tibialis anterior and gastrocnemius/soleus groups (h, i). Coronal images from the lower leg station shown to illustrate an increase in relative T2-weighted signal in tibialis anterior and gastrocnemius/soleus groups in an MND patient between baseline (j) and 12 months (k). ADC apparent diffusion coefficient, gastrocs gastrocnemius, SCM sternocleidomastoid, TA tibialis anterior, TP thoracic paraspinal Benjamini-Hochberg method to each table of results at each time-point, [25] and results where significance was retained were asterisked.

Baseline differences between MND patients and controls
For continuous variables, between-group differences were assessed using multiple regression models, entering each clinical, electrophysiological and radiological variable of interest, in turn, as the dependent variable, and group (patient/control), age and gender entered as independent variables. Between-group differences in categorical variables were assessed using chi-squared tests.
Results were reported as the difference in each parameter between patients and controls, derived from the regression models, expressed as a percentage ratio, with between-group difference the numerator, and control mean the denominator. Ratio 95% confidence intervals were calculated [26]. For ordinal MRC scores, the proportion of patients with weakness in each muscle (MRC < 5) was reported.

Clinical, electrophysiological and radiological associations
Associations between clinical, electrophysiological and radiological variables were assessed using separate multiple regression models, entering each clinical or electrophysiological variable, in turn, as the dependent variable, and the anatomically corresponding radiological variable, for relative T2 and ADC in each muscle, in turn, as an independent variable. Age and gender were entered into the model as additional independent variables.

Longitudinal changes
For continuous variables, longitudinal changes were modelled using mixed effects linear regression, with each clinical, electrophysiological and radiological variable entered, in turn, as the dependent variable, and time-point (as a categorical variable) and subject entered as independent variables. No assumptions were made on covariance structure. All available data were entered. Separate models were run for each variable, and for patients and controls. For radiological variables, percentage signal change compared to baseline was reported.
In addition to investigating each individual muscle separately, two additional analyses were performed to assess performance of radiological muscle estimates individualized to clinical onset-site to determine whether it was possible to increase sensitivity to detect group-level effects by individualizing damage measures to anatomical site of onset. First, for each subject, a single muscle was chosen to represent onset-site: tongue for bulbar-onset, right or left first dorsal interosseous for upper limb-onset and tibialis anterior for lower limb-onset (chosen because commonly clinically affected) [27]. These signal estimates were specified as a "muscle-of-onset" dependent variable, into a mixed effects regression model, entering time-point and subject as independent variables.
Second, for each subject, mean signal estimates were calculated from all muscles in the region-of-onset (tongue, trapezius and sternocleidomastoids for bulbar-onset; right or left deltoid, biceps, triceps, forearm compartment, first dorsal interosseous, thenar and hypothenar eminence for upper limb-onset; right or left psoas, gluteus maximus, quadriceps, hamstrings, tibialis anterior and gastrocnemius for lower limb-onset). These estimates were specified as a "region-of-onset" dependent variable, into a mixed effects regression model, entering time-point and subject as independent variables.
To compare these different strategies for detecting longitudinal relative T2-signal change in individuals, plots for each patient were reported for the following measures, selected post-hoc: whole-body muscle summary mean, region-of-onset, muscle-of-onset and a single leg muscle (right tibialis anterior).
The responsiveness of each normally distributed longitudinal outcome measure was reported using standardized response means (mean change between baseline and 12 months divided by its standard deviation); values > 0.8 are considered highly responsive [28].
To quantify within-subject heterogeneity for each measure, variance ratios were reported, derived from regression model outputs, by dividing the variance of the regression model constant (the fixed effects, representing group-level disease effect) by the summed variance of the constant and residual variance (the random effects, representing interindividual variability). Lower values indicate greater relative within-group phenotypic variability.
Median differences in ordinal MRC scores were assessed using Wilcoxon matching-pairs tests.

Baseline predictors of muscle weakness
To determine whether baseline relative T2-weighted muscle signal predicted development of weakness at four and 12 months, clinical change variables were generated by calculating MRC score differences (four and 12 months minus baseline, respectively). Each of these change variables was entered as the dependent variable in separate regression models with baseline relative T2 from the corresponding muscle group as the independent variable. This analysis was performed only in muscles with corresponding clinical and radiological data, namely splenius capitis, deltoid, biceps brachii, first dorsal interossei, psoas major, gluteus maximus, quadriceps, hamstrings, tibialis anterior, and gastrocnemius/ soleus. To determine whether relative T2-signal in clinically 1 3 strong muscles was associated with development of weakness, the analysis was repeated after excluding muscles with MRC score < 5/5; sample sizes for each muscle are reported in Table 1.

Study population
Twenty-nine MND patients (26 ALS and 3 progressive muscular atrophy) and 22 healthy volunteers entered the study. Follow-up rates are reported in Fig. 2. No patients participated in any interventional research during the course of the study. There were no differences in age, gender and weight

Baseline differences between MND patients and controls
Radiological and clinical differences are reported in Table 1. There were significant differences in relative T2 signal but no significant differences in apparent diffusion coefficient (ADC) between patients and controls. Electrophysiological differences are reported in Table 2.

Clinical, electrophysiological and radiological associations
Associations between relative T2-weighted MR in each tested muscle with clinical power using hand-held dynamometry and MUNIX are reported in Table 3.

Longitudinal changes
Longitudinal clinical, electrophysiological and relative T2-signal changes in MND patients are reported in Tables 4  and 5.
There were no significant changes in healthy controls in any measure.
Individualized plots of longitudinal relative T2-weighted changes summarized for all muscles, by region-of-onset, by muscle-of-onset and for a single leg muscle (right tibialis anterior), are illustrated in Fig. 3.

Discussion
This study represents the most comprehensive longitudinal analysis of muscle-based clinical, electrophysiological and imaging biomarkers in MND to date, combining multimodal assessments across multiple muscles. The key result  is that no single technique or muscle fully captured change at group level; different assessment tools were differentially sensitive in different muscles. We hypothesized that whole-body muscle imaging would capture widespread progression of denervation, but instead found that leg muscle changes were the most effective radiological biomarker in this cohort, regardless of clinical onset-site and, importantly, detected changes in slow progressors, an area of need for clinical trials. At baseline, clinical weakness was frequent in left abductor digiti minimi (ADM), bilateral first dorsal interosseous and bilateral psoas. Of these muscles, ADM weakness is perhaps surprising, because generally considered relatively spared in MND, at least in terms of wasting (the basis of the split hand phenomenon), whilst involvement of first dorsal interosseous is typical [29]. Patients exhibited greater motor unit loss in abductor pollicis brevis than ADM at baseline, but MUNIX also dropped significantly in ADM over time, and this muscle appeared commonly affected in this cohort. In general, radiological changes were associated with clinical weakness more frequently than with electrophysiological motor unit loss, although associations with both were evident in tibialis anterior. Radiological increases in relative T2-signal likely reflect muscle fluid changes, and later fatty replacement [30], and appear a consistent finding in MND. Qualitative T2 changes have been reported in the tongue [12] and arm muscles [9,10], and quantitative changes in a small cohort in leg muscles [13]. In a very recently published paper, differences between MND patients and healthy controls were demonstrated in leg muscles on T2-weighted short tau inversion recovery imaging evaluated with rater scales, but there were no differences in quantitative fat fraction imaging in either the leg muscles or tongue [31]. In contrast to T2 signal, muscle volume changes appear modest [14,16,17]. Our data suggest that muscle relative T2-signal change may capture aspects of pathophysiology contributing to weakness other than loss of electrophysiological motor units. Associations between high baseline relative T2-signal and development of weakness in some muscles, even when clinically strong, suggests this may occur early, an intriguing finding that merits further investigation.
The difficulties of capturing change in MND with simple clinical measures, such as MRC scores, were illustrated in this study and highlight the challenges of phenotypical heterogeneity. Group-level longitudinal changes were detectable in first dorsal interosseous and tibialis anterior on dynamometry, muscles generally recognized as typically affected in MND [29,32], but this test is effort-dependent [4]; despite its known limitations, ALSFRS-R proved the most responsive longitudinal clinical measure in this study. This is likely to reflect the generally lower variance of ALSFRS-R compared to muscle T2 values outside the leg muscles, as illustrated in Tables 4 and 5, and the mortality-related attrition common to MND studies may also have biased the 12-month SRM estimate for ALSFRS-R. Muscle MR has some advantages over ALSFRS-R not captured by SRM estimates, namely objectivity, independence from potential confounds of therapeutic intervention, and assessment of pathophysiological effects rather than their symptomatic consequences. These assessment methods appear complementary. It is possible that a fully quantitative T2 relaxometry protocol could reduce the error variance and increase the responsiveness of the MR measurements, but this question cannot be answered by the present study.
On objective tests, progressive electrophysiological motor unit loss was evident, as in previous studies [33], especially in tibialis anterior and abductor pollicis brevis. Interestingly, there was only limited evidence of reinnervation on MUSIX, at baseline or longitudinally. We examined the strongest side in patients, which may indicate that MUSIX changes lag behind MUNIX, because subclinical or early motor unit loss had not yet triggered reinnervation. We also pooled weak and strong muscles which may have diluted overall differences in a relatively small cohort. Limitations of MUNIX/ MUSIX are that patient effort is required, not all muscles are amenable to study, only relatively few can be assessed in a session, and "floor effects" exist. Our data suggest that tibialis anterior and abductor pollicis brevis represent good targets. Floor effects also exist for clinical measures, such as dynamometry and MRC scores. We did not adjust for this effect in our analysis (for example, by excluding patients with low MRC scores at baseline from further analysis). This could be explored in a larger, adequately powered cohort.
This was the first application of whole-body diffusionweighted MR to assess muscle tissue integrity in MND. Very few changes were found, either because opposing effects of pathophysiological processes occurred or due to technical factors. It is possible that concurrent effects of myofibrillar cell membrane damage and increased intramuscular fluid increased diffusion, whilst consequent cellular debris and increased fat deposition caused a decrease, resulting in no detectable net ADC change. Alternatively, exponential signal intensity decay at high b values may have resulted in loss of signal. A previous study of muscle denervation in rats applied a lower b value of 600 s/mm 2 [34], compared to b = 1000 s/mm 2 used in this study. We conclude that T2-weighted muscle imaging approaches appear more sensitive to MND change than diffusion-weighted MR, at least using the parameters applied.
Leg muscles appear the best target for future fully quantitative T2 studies, although assessment in an independent cohort is necessary to determine whether this finding is generalizable. Whilst an increase in whole-body relative T2 was evident, this did not survive adjustment for multiple comparisons. Longitudinal changes were more readily detectable in the lumbar region, compared with cranial, cervical and thoracic body segments. This does not appear to be attributable to clinical factors; whilst lower limb-onset and progression were quite prevalent in our cohort, this was also the case for arm muscles. Technical factors may have contributed; leg muscles are larger, central within the acquisition field-ofview, with clearly defined anatomical boundaries, and these factors could influence the observed lower regression variance ratios. Measurement error might be reduced by developing fully automated analysis algorithms for whole-body MR in the future. Technical factors also prevented assessment of other muscles of interest, such as the diaphragm, which was not consistently identifiable using the slice thickness applied in this study. Thinner slices are possible but would necessitate longer scan-times.
Despite cohort attrition, typical of longitudinal cohort studies in MND, resulting in lower statistical power, longitudinal relative T2 changes from baseline were more marked at 12 than 4 months. In our previous study, which assessed this cohort to four months using different methodology, longitudinal changes were identified in tibialis anterior (and not in biceps brachii, thoracic paraspinals or the tongue) [17]. In the present analysis, similar results were found, despite a different methodology (assessing axial rather than coronal slices) performed by a different operator. Progressive denervation effects were again only found in leg muscles, including right tibialis anterior at both 4 and 12 months. Although the previously identified increase in relative T2 signal in left tibialis anterior did not reach statistical significance at 4 months in the present analysis (probably due to sampling differences), changes in this muscle were detectable at 12 months. Changes in dynamometry and electrophysiology were again evident in leg muscles. It is interesting to consider whether an MR "floor effect" exists, as for dynamometry and electrophysiology, when no further change is detectable because of complete paresis with absent motor potentials. This would require subgroup assessment in an adequately powered cohort.
A limitation of relative T2-weighted MR is the necessity to adjust measurements to reference tissue within each acquisition station to allow for differential coil-loading effects between participants, because the sequence is not fully quantitative. This could have biased between-muscle comparisons. We sought to minimize bias by reporting percentage T2-signal differences relative to healthy controls. Previous studies using similar sequences have applied qualitative grading scales and expert raters [9,10]. We argue that our approach reduces subjectivity and has the advantage of producing continuous data, but measurement variance will be higher than fully quantitative T2 techniques. Despite these potential limitations, a clear pattern of biologically and clinically feasible results was evident. These considerations illustrate the necessary trade-off between the number of muscles that can be studied concurrently and a feasible scantime for disabled MND patients. For similar reasons, we could not collect corresponding clinico-electrophysiological data for all muscles investigated with MR, or combine our assessments with other promising muscle techniques, such as electrical impedance myography [35]. Nevertheless, our dataset still represents the most wide-ranging imaging and electrophysiological muscle assessment in MND to date. Our cohort demonstrated the heterogeneity in disease progression rates typical of the ALS population. It would be interesting to assess the utility of muscle biomarkers in a cohort of ALS patients selected for slow progression (> 0.9 ALSFRS-R points/month), where these measures would add most value, in a future study.
In summary, this longitudinal study is the first to demonstrate clinically and electrophysiologically relevant progressive muscle denervation on MR across a wide range of muscle groups over 12 months. Although we hypothesized that whole-body muscle MR would capture generalized changes, our data suggest that leg muscles are sensitive to detect group-level longitudinal changes, irrespective of clinical onset-site, and could represent a biomarker target for future quantitative studies. Relative T2-weighted MR appeared more sensitive to detect denervation than diffusionweighted MR.