Introduction

Imminent clinical treatment trials for neuromuscular diseases (NMDs) [1, 2] need valid, sensitive and reliable treatment response measures [3]. Conventional outcome measures, including muscle strength, neurophysiology and functional assessment are insufficiently sensitive [2]: NMDs typically progress slowly against a background of age-related changes [4], with therapies more likely to reduce progression than reverse established injury. There is a pressing need for outcome measures reflecting underlying pathological processes with demonstrable longitudinal sensitivity and applicability in multi-centre trials. Systematic assessment of feasibility, reproducibility and normal variation in healthy volunteers is a logical first step in establishing such measures.

Conventional magnetic resonance imaging (MRI) can delineate both acute and chronic muscle pathology: acute denervation [5] and inflammation [6] cause oedema-related T 2 -weighted hyper-intensity, typically in early disease and potentially reversible with treatment [7]. Chronic muscle damage, whether caused by a primary myopathy or secondary to a neuropathy, results in atrophy and fatty degeneration [8, 9], causing T1-weighted hyper-intensity, with patterns aiding NMD diagnosis [9].

Quantitative MRI can objectively measure these changes: on T2 relaxometry muscle T2 is elevated in myotonic dystrophy [10], Duchenne muscular dystrophy [11], juvenile dermatomyositis [6] and amyotrophic lateral sclerosis [12], while skeletal muscle magnetisation transfer ratio (MTR) is decreased in limb girdle muscular dystrophy [13] and Charcot-Marie-Tooth disease [14]. Muscle fat content has been quantified by T1-relaxometry [15], proton spectroscopy (1H-MRS) [1619], T2 relaxation modelling [2022] and chemical-shift based Dixon fat-water separation [23] providing maps of the proportion of fat to water, or “fat fraction” (FF) [10, 22, 2426].

While these reports support the validity of putative NMD MRI outcome measures, little has been published on potentially confounding age, gender or body mass dependencies [17, 25, 27], and while inter- or intra-observer reproducibility has been investigated [10, 15, 28], scan-rescan reproducibility has not been addressed systematically. These factors significantly influence trial statistical sensitivity [3]. Furthermore, studies seldom compare multiple MRI measures in the same subjects, and at both calf- and thigh-levels, having focused mainly on single measures [6, 13, 14, 21, 27, 29, 30], in either lower leg [10, 13, 21, 26] or thigh [6, 31], and generally in a single limb rather than bilaterally.

To establish practical NMD MRI trial outcome-measure protocols, we assessed in healthy volunteers a suite of MRI measures expected to be sensitive to NMD muscle pathology. We tested: reproducibility by quantifying scan-rescan and inter-observer reliability, internal consistency by comparing left- and right-limb values, external consistency by comparison with published data and sensitivity to healthy variation by measuring the dependence of lower-limb muscle T1, pseudo-T2, FF and MTR upon anatomical location and demographic factors, including sex, age and body mass.

Materials and methods

Subjects and MRI examination schedule

With local research ethics committee approval and written consent, 47 healthy volunteers (23 men) were studied: (mean ± SD, range) age 44.4 ± 17.0, 21.5-81.0 years; height 171 ± 9, 150-188 cm; weight 73 ± 16, 44-115 kg; body mass index 25 ± 4.7, 17-41 kg/m2; 15 undergoing repeat imaging after approximately 2 weeks with identical imaging parameters. The subjects, recruited from friends and family of patients participating in MRI research or the host institution staff, underwent clinical screening to exclude neuromuscular disease prior to examination.

MRI sequence selection

Four MRI measures were chosen for investigation according to their likely sensitivity to both acute (T2 and MTR) and chronic (T1 and fat fraction) muscle pathology. Specific measurement pulse sequences and parameters were selected on a pragmatic basis: we chose to select from standard pulse sequences widely available on routinely available imaging platforms with imaging parameters selected to facilitate accurate quantification. For the purposes of the present study it was necessary to obtain wide anatomical coverage of both limbs at thigh and calf level in a practical examination time: this necessitated certain compromises in the acquisition design, such as precluding the use of a Carr-Purcell-Meiboom-Gill multi-echo T2-measurement sequence.

MRI acquisition

Subjects were examined lying feet-first and supine at 3T (TIM Trio; Siemens, Erlangen, Germany) using a multi-channel peripheral angiography coil (PA Matrix; Siemens) and ‘spine matrix’ coil elements. Before examination, the distance between the anterior superior iliac spine and the superior border of the patella was measured and thigh-level imaging volumes were centred one-third of this distance above the patella superior border. Calf-level imaging volumes were centred on the point of widest lower leg circumference.

Axial-slice matrices and fields of view (FOVs) were 256 × 128 and 400 × 200 mm (410 × 205 mm in some subjects) for thigh-level images and 256 × 120 and 400 × 188 mm for calf-level images, except for FF acquisitions where matrices were 512 × 256 and 512 × 240 pixels respectively. In this healthy volunteer study, fat suppression was not applied in any of the measurements. The total acquisition time was less than 40 min and included the following sequences:

Fat fraction measurement

For Dixon FF measurement [23], three 2D gradient-echo acquisitions were performed with echo-times (TEs) (TE1/TE2/TE3 = 3.45/4.60/5.75 ms, TR = 100 ms, flip angle [α] = 10°, bandwidth [BW] = 420 Hz/pixel, number of excitations [NEX] = 4, 10 × 10-mm slices with 10-mm gap). The maps of the field error term, φ, generated as an intermediate step in Glover and Schneider’s decomposition algorithm [23], underwent phase unwrapping using the PRELUDE tool, which is part of the FSL software (FMRIB, Oxford) [32]. Each limb in the FOV was processed separately on a 2D individual-slice basis using the TE = 3.45 ms magnitude image as a threshold mask. The decomposed fat (F) and water (W) images were then used to calculate FF as FF = 100 % × F/(F + W). The TE = 3.45 ms image was used for region of interest (ROI) placement and as a reference for inter-method image registration using FLIRT (FSL, FMRIB, Oxford).

T1-relaxometry

DESPOT-1 [33] T1-mapping used three 3D fast low-angle shot (3D-FLASH) images S1,2,3 with nominal α1,2,3 of 5, 15 and 25°, TR/TE = 23/3 ms, and BW = 440 Hz/pixel acquired in a single, non-selective slab with 80 × 5 mm longitudinal phase-encoded partitions. Flip-angles were corrected using B1 maps obtained as below and T1 calculated according to Deoni et al. [33].

T2-relaxometry

Dual-contrast turbo-spin-echo (TSE) images (TR/TE1/TE2 = 5,500/16/64 ms, 6,500/13/52 ms or 6,500/16/56 ms; 10 × 10-mm slices with 10-mm gap, parallel imaging factor (iPat) 2, TSE factor 4, BW = 444 Hx/pixel, refocusing flip angle 180º, NEX = 2) were acquired. Pseudo-T2 was calculated from the respective pixel intensities ITE1 and ITE2 from the TE1 and TE2 images as T 2 = \( \frac{T{E}_2-T{E}_1}{ \ln \left({I}_{TE1} / {I}_{TE2}\right)} \).

B1 mapping

Separate TSE images (TR/TE = 7,000/11 ms, 128 × 64 matrix, 40 contiguous 10-mm slices, BW = 429 Hx/pixel, 1/2 k-space sampling) yielded image intensities V1 and V2 acquired with nominal excitation α1 and α2 of 60o and 120o. B1 deviation was mapped according to B 1Dev  = arccos(V 2/2V 1)/α 1 [34].

Magnetisation transfer ratio

MTRs were calculated from two 3D-FLASH images with (M1) and without (M0) an MT pre-pulse (500° amplitude, 1,200 Hz offset, 10 ms duration) (TR/TE = 65/3 ms or 68/3 ms, α = 10º, BW = 440 Hz/pixel, iPat = 2, 40 × 5-mm longitudinal phase encoding partitions) according to MTR = (M0-M1)/M0 × 100 percentage units (p.u.). MTR maps were RF-inhomogeneity corrected using B1 maps obtained as described in “B1 mapping” above according to [35] using a mean-over-all-subjects B1 inhomogeneity correction factor of k = 0.0085.

ROI analysis

A single observer (A.F.; a radiologist with 4 years post-specialist experience in neuromuscular imaging) defined ROIs outlining the cross-sectional area of each muscle avoiding contamination with fascia or subcutaneous and inter-muscular fat and allowing for minor movement between acquisitions, using ITK-SNAP [36]. The fifth-most superior slice was used in the thigh and the sixth slice in the calf, unless muscles below were not visible, in which case an adjacent slice was selected. In the 15 subjects with repeated imaging, ROIs for the second acquisition were drawn on the slice most similar to that used from the first acquisition.

Left and right limb ROIs were defined for the rectus femoris, vastus lateralis, vastus intermedius, vastus medialis, semimembranosus, semitendinosus, biceps femoris, adductor magnus, sartorius, gracilis, tibialis anterior, peroneus longus, lateral gastrocnemius, medial gastrocnemius, soleus and tibialis posterior muscles (Fig. 1a). The ROIs were transferred to the co-registered parameter maps, minor position adjustments to account for imperfect registration were performed as necessary and the mean value for each muscle ROI was recorded. To provide summary measures, the mean of all individual-muscle ROI-means for each subject was calculated for each measure separately at thigh and at calf level. To assess inter-observer reliability, a second observer (J.M.; a neurologist with 3 years’ experience in neuromuscular imaging) independently defined ROIs using the same method on one acquisition from each of the 15 subjects with repeat examinations. Image data were inspected visually and ROI values originating from areas of gross artefact were excluded from the analysis.

Fig. 1
figure 1

Sample images from a single volunteer (a 24-year-old man, both thighs and calves). a Unprocessed Dixon acquisition (TE = 3.45 ms) used for definition of ROIs demonstrated on left thigh and calf. b B1 field map demonstrating reduced B1 anteriorly on right and posteriorly on left (arrows). All images are axial with standard orientation (anterior at top of image, subject’s right hand side at left of image). ROI labels in the thigh: RF rectus femoris, VM vastus medialis, VI vastus intermedius, VL vastus lateralis, Sa sartorius, SM semimembranosus, ST semitendinosus, BF biceps femoris (long head), AM adductor magnus, G gracilis. ROI labels in the calf: TA tibialis anterior, TP tibialis posterior, PL peroneus longus, So soleus, MG medial head of gastrocnemius, LG lateral head of gastrocnemius

Statistics

Using SPSS 18 (SPSS, Chicago, IL), inter-muscle differences were assessed using ANOVA with post hoc comparisons using Bonferroni’s method. Inter-scan and inter-observer overall mean value differences were assessed using paired t-tests and reproducibility determined as mean absolute inter-scan and inter-observer differences, displayed on Bland-Altman plots with calculation of limits of agreement [37] and intra-class correlation coefficients (ICCs). Multivariate regression assessed the influence of demographic factors (age, gender, weight, height) on MRI measures: height showed no independent correlation with any MRI measure and was therefore excluded from the model. Pearson’s correlation coefficients between MRI measures were calculated.

Results

Data quality

The number of images excluded from the analysis was small: nine data-sets were missing or technically non-analysable: FF—thigh 1, calf 1; T1—thigh 2, calf 4; T2—none, MTR—calf 1. In the remaining data, small fractions of individual ROIs were excluded due to local artefact, mostly B1-related signal drop-out: FF—thigh 1.7 % (16/920), calf 2.4 % (13/540); T1—thigh 24 % (219/900), calf 12 % (57/492); T2—thigh 5.4 % (51/940), calf 0.2 % (1/552), MTR—thigh 15 % (142/920), calf 5.2 % (28/540). In all subjects, asymmetric B1 deviations were observed (Fig. 1b) with B1 reduced anteriorly on the right and posteriorly on the left. This was evident at the calf level but more prominent in the thigh, particularly affecting the right rectus femoris and vastus medialis. This artefact prevented measurement within right rectus femoris in 45/47 T1 maps, 41/47 MTR maps and within right vastus medialis in 35/47 T1 maps and 33/47 MTR maps.

Individual muscle values

MRI parameter maps from a representative subject are depicted in Fig. 2. Individual muscle values for each MRI measure in all 47 subjects are shown in Fig. 3. FF and T2 were similar in the left and right limbs, suggesting asymmetric B1 variations did not unduly influence these measures. Between muscles, FF differed significantly (ANOVA, p < 0.001 at both calf- and thigh-level). Group-mean sartorius FF was higher than all other thigh-level muscles (p < 0.01 for semimembranosus, p < 0.001 for all other muscles), whilst the rectus femoris FF was lower than most other thigh muscles (p < 0.01 vs gracilis, vastus lateralis; p < 0.001 vs sartorius, semimembranosus, biceps femoris and adductor magnus). Similarly, in the calf soleus the FF was highest (p < 0.05 vs peroneal, p < 0.01 vs medial gastrocnemius, p < 0.001 vs each remaining muscle), whilst tibialis anterior FF was the smallest (p < 0.01 vs medial and lateral gastrocnemius, p < 0.001 vs soleus and peroneal). However, the absolute inter-muscle differences were small; FF ranging from 0.6 % in the rectus femoris to 2.9 % in the sartorius. Inter-muscle T2 differences were also significant (ANOVA, p < 0.001 at both calf and thigh-level), with the same muscles (sartorius, semimembranosus and biceps femoris in the thigh; soleus, peroneal in the calf) showing elevated T2 as elevated FF. Whilst tibialis posterior and tibialis anterior T2 times were lowest in the calf, consistent with their low FF, gracilis T2 was lowest despite this muscle’s intermediate FF.

Fig. 2
figure 2

Sample quantitative maps from a single volunteer (a 24-year-old man, left thigh and calf). a Fat fraction map (in %). b T1 map in ms at left thigh and calf level. c T2 map (in ms). d MTR map (in p.u.). All images are axial with standard orientation (anterior at top of image, right hand side at left of image)

Fig. 3
figure 3

Individual muscle ROI values at thigh and calf levels for 47 subjects. Bars indicate median, 25th, 50th and 75th centiles, blue left limb, green right limb, lines range, o minor outlier, * major outlier. MTR magnetisation transfer ratio, p.u. percentage units

MTR showed apparent left-right differences in some regions with lower values for right tibialis anterior, right rectus femoris and left semimembranosus, corresponding to the areas of maximum B1 deviation. Excepting these ROIs, MTR was similar across all thigh and calf muscles (range, 31.7-33.2 p.u.). Mean T1 similarly varied between left and right limbs in these muscles suggesting incomplete B1 inhomogeneity correction, but was otherwise consistent across the remaining muscles (1,240-1,370 ms).

Scan-rescan and inter-observer reliability

Scan-rescan reliability values are shown in Table 1, with inter-observer reliability in Table 2. Mean values are shown for both summary measures and individual-muscle ROI values, together with scan-rescan and inter-observer ICCs and limits of agreement for both. ICCs were 0.84-0.99 for inter-observer and 0.62-0.99 for scan-rescan values, and were generally higher for the summary measures than for the individual muscle values. The limits of agreement were consistently narrower for overall mean values and inter-observer comparisons than for individual ROI values and inter-scan comparisons. The limits of agreement were broadly similar when each muscle was analysed separately (ESM Table 1).

Table 1 Inter-scan reliability of MRI measurements from ROIs defined by a single observer for both summary measures and individual muscle ROI values
Table 2 Inter-observer reliability of MRI measurements from identical source images for both summary measures and individual muscle ROI values

Dependence upon age, gender and weight

Results of multivariate linear regression modelling the MRI measures at each level against the assumed explanatory variables age, gender and weight are shown in Table 3 for the all-muscle summary measures, and for individual muscles in ESM Table 2. There were significant positive dependencies of both FF and T2 upon age at both anatomical levels, and upon weight in the thigh but not calf. MTR showed strong negative dependence upon age (p < 0.001) for both thigh and calf (see also Fig. 4, illustrating the univariate Pearson correlation between overall muscle mean MTR and age), and significant correlation with weight and notably gender in the thigh. T1 did not depend significantly upon any demographic parameter, except for an association with weight in the thigh only (p < 0.05). Although FF correlated positively with T2, and negatively with T1 and MTR (Table 4), the MTR-age correlation remained significant when the other quantitative parameters were included as covariates (p < 0.01 thigh, p < 0.001 calf). We also constructed multivariate linear regression models for individual muscles (ESM Table 2), most consistently demonstrating positive correlations between FF or T2 and weight in the thigh, and negative correlations between MTR and age/gender/weight in the thigh, and age in the calf.

Table 3 Multivariate regression analysis of the dependence of mean muscle MRI measures in thigh and calf upon demographic factors in healthy volunteers
Fig. 4
figure 4

Overall mean thigh (×) and calf (+) MTR is negatively correlated with subject age (p < 0.001)

Table 4 Pearson correlation coefficients between quantitative parameters in individual muscles

Discussion

We demonstrated the reproducibility of 3T MRI lower limb muscle T1, T2, MTR and FF obtained using routinely available acquisition sequences suitable for deployment in NMD treatment trials. With the exception of T1 and MTR in areas of poor B1 homogeneity, we obtained literature-consistent measurements with good internal consistency, and demonstrated dependence upon specific muscle compartment, age and weight in healthy individuals. Since changes in these measures with muscle disease are expected to far exceed the variations in health we report, combinations of these measurements targeted to disease-specific anatomical levels may offer robust trial outcome measures sensitive to pathological change.

Inter-muscle variation and comparison with previous studies

We observed small but significant inter-muscle T2 and FF differences, including hamstring FF exceeding quadriceps FF [22], and increased soleus T2 compared with tibialis anterior, consistent with previous results [22, 25, 26, 38] attributed to differing proportions of type 1 muscle fibres [39] with increased intra-myocellular lipid [38]. For outcome assessment, this anatomical specificity far exceeds that provided by non-imaging outcome measures such as myometry [40] and neurophysiology [41]. Excepting those muscles for which B1 deviations were too severe for effective correction, MTRs were consistent with previous calf-muscle studies [13, 25]. All measurements showed good left-right internal consistency except T1 and MTR in areas of maximum B1 variation where correction was impossible or proved inadequate.

Reproducibility

The inter-scan limits of agreement provide a measure of sensitivity to detect meaningful change; e.g. for the thigh-level, a change in the overall mean measures in FF, T2, T1 or MTR of +0.28 %, +1.8 ms, -39 ms or -1.63 p.u. is a significant change at the 95 % level for an individual subject. Rates of change of these with specific NMD progression will be confirmed in future natural history studies, but the detectable change thresholds our data suggest are small compared with cross-sectional disease-dependencies [10, 14, 15, 21, 22] and are in the range of 1-year changes in oculopharyngeal muscle dystrophy [42].

Inter-scan differences exceeded inter-observer differences as a source of variation, the former potentially driven by small scan-scan position inconsistencies. Compliance with a predefined positioning protocol could improve scan-scan consistency [43]. Mean all-muscle summary measures provide superior reliability to individual muscle measures; an approach which would be appropriate in NMD with diffuse rather than specific muscle involvement.

Rather than assessing scan-scan reproducibility in the same session [17], a 2-week rescan interval was chosen to better simulate clinical trial conditions whilst being short enough that a true underlying physiologically-driven change in muscle MRI properties was unlikely. We did not explicitly check for factors such as recent exercise [29, 44] or diet [19], known to influence muscle T2 and fat content respectively. Nevertheless, high reproducibility and the ability to demonstrate subtle age, weight and gender dependencies suggest that, in practice, metabolic perturbations due to typical exercise and diet regimes are small. Thus, these factors are unlikely to confound quantification of muscle pathology, an observation important for experimental trials where such factors may be hard to control.

Age, gender and body-weight dependencies

Correlation of candidate MRI measure values with age, weight and gender is important, firstly, as such factors provide plausible surrogates for disease-related changes, usefully evidencing potential outcome measure validity. Conversely these dependencies, if severe, may confound imaging assessment of outcome by masking changes due to disease. In our healthy volunteers, consistent with age-related impaired muscle strength and neurophysiological performance [4, 45], muscle MTR reduced while T2 and FF increased with age in both thigh and lower leg muscles. Schwenzer et al. [25] also demonstrated increases in calf-level FF and T2 in older subjects, but not MTR. Our contrasting MTR observation may be due to acquisition condition differences, or the advantage of performing B1 correction [35] in our study. MTR was the measure most sensitive to demographic factors, the negative correlation with age being highly significant (p < 0.01) for both overall means, and many individual muscles. The correlation remained significant in a model with T2 and FF included as covariates, suggesting an MTR age-dependence independent of age-related muscle lipid increases, presumably reflecting myofibre quality and density changes. Future studies involving fat-suppressed or IDEAL-based measurement [46] may conclusively identify muscle-tissue water variations independent of lipid content change.

The significant associations between FF, T2 and MTR with weight, and also between MTR and gender, in the thigh, none of which were observed in the calf-level muscle groups, presumably reflect preferential lipid accumulation in the thigh. These quantitative imaging findings are consistent with muscle lipid increases with weight [17, 18] but not gender [27] on 1H-MRS. In any case, these demographically driven differences are smaller than the expected pathological changes in NMDs, and thus too small to pose a significant finding in longitudinal studies. This is in contrast with the typically wide variation present in the healthy population for neurophysiology and myometry outcome measures.

Feasibility/study limitations

To allow for straightforward application in future multi-site trials, we chose to test sequences readily implemented on standard MRI systems with unmodified software, and which can provide reasonable anatomical coverage in practical examination times. This necessarily limited the measurement sophistication, e.g. multi-echo T2 measurement sequences allowing analysis of multiple T2 decay components [29] did not meet the criteria of ready availability and anatomical coverage versus acquisition time. Nevertheless the sequences chosen were adequate to provide sensitive and reproducible measures of FF, T2 and MTR relevant to muscle pathology.

A challenge in lower-limb quantitative MRI is the inherent B1 inhomogeneity, particularly at field strengths of 3T and higher. While the dual-contrast TSE T2-relaxometry and Dixon FF measurements used here were reasonably insensitive to this, even with B1-correction MTR and T1-relaxometry data were compromised in regions of maximum B1 deviation. Despite this we were able to demonstrate strong muscle-MTR dependencies upon age, weight and gender. In this study, T1 was the least reproducible measure, the least sensitive to demographic variations, and did not add explanatory power for these factors. We conclude that lower-limb muscle T1 obtained using the DESPOT-1 relaxometry method may not be useful as an NMD outcome measure.

Although the T1, T2, FF and MTR values and healthy variations we present provide useful reference data to guide the design of future NMD MRI acquisition protocols, the specific absolute values obtained may be partially dependent on sequence design details and field strength. Quality control to ensure consistent inter-site measurement values will be an important first stage in the design of multi-centre trials incorporating MRI outcome measures. The reproducibility and sensitivity to healthy variations we obtained strongly support the potential applicability of these MRI measures to assess longitudinal disease progression.

We demonstrated the feasibility of performing a comprehensive range of MRI measurements in two anatomical levels in both lower limbs. In certain patients such measurements may not provide suitable outcome measures if pathological involvement is minimal, or already progressed to an end-state severity at these levels. Whole-body muscle MRI applications are increasingly being used for diagnostic purposes [47], and obtaining normative data from all skeletal muscle regions will be a priority in future studies. Natural history studies will identify the anatomical levels where disease progression is actively evolving in specific patient groups, allowing optimally efficient, anatomically-targeted protocols to be tailored to specific trial applications. The resulting reduction in required examination times may be crucial for harmonised use in future multiple-site trials, since long duration acquisitions may represent a problem in NMD patients with for example, cardiac or pulmonary involvement.

Conclusions

Lower-limb muscle T2, FF and MTR measures may be obtained using readily implemented methods with sufficient reliability and sensitivity to detect subtle dependencies in health upon biological factors including muscle compartment, age, weight and gender. The observations provide strong suggestive evidence that quantitative MRI can provide practical, anatomically specific outcome measures with less potentially confounding inter-subject variation than current non-imaging measurements.