Automatic quantification of tenosynovitis on MRI of the wrist in patients with early arthritis: a feasibility study

Objectives Tenosynovitis (inflammation of the synovial lining of the sheath surrounding tendons) is frequently observed on MRI of early arthritis patients. Since visual assessment of tenosynovitis is a laborious task, we investigated the feasibility of automatic quantification of tenosynovitis on MRI of the wrist in a large cohort of early arthritis patients. Methods For 563 consecutive early arthritis patients (clinically confirmed arthritis ≥ 1 joint, symptoms < 2 years), MR scans of the wrist were processed in three automatic stages. First, super-resolution reconstruction was applied to fuse coronal and axial scans into a single high-resolution three-dimensional image. Next, 10 extensor/flexor tendon regions were segmented using atlas-based segmentation and marker-based watershed. A measurement region of interest (ROI) was defined around the tendons. Finally, tenosynovitis was quantified by identifying image intensity values associated with tenosynovial inflammation using fuzzy clustering and measuring the fraction of voxels with these characteristic intensities within the measurement ROI. A subset of 60 patients was used for training and the remaining 503 patients for validation. Correlation between quantitative measurements and visual scores was assessed through Pearson correlation coefficient. Results Pearson correlation between quantitative measurements and visual scores across 503 patients was r = 0.90, p < 0.001. False detections due to blood vessels and synovitis present within the measurement ROI contributed to a median offset from zero equivalent to 13.8% of the largest measurement value. Conclusion Quantitative measurement of tenosynovitis on MRI of the wrist is feasible and largely consistent with visual scores. Further improvements in segmentation and exclusion of false detections are warranted. Key Points • Automatic measurement of tenosynovitis on MRI of the wrist is feasible and largely consistent with visual scores. • Blood vessels and synovitis in the vicinity of evaluated tendons can contribute to false detections in automatic measurements. • Further improvements in segmentation and exclusion of false detections are important directions of future work on the path to a robust quantification framework.


Introduction
Initiation of treatment in the early stages of rheumatoid arthritis (RA) has been associated with higher chances of drug-free sustained remission and improved quality of life [1]. Therefore, it is important to recognize patients who are at risk of progressing to RA as early as possible, either in the symptomatic phase of arthralgia, which precedes clinical arthritis, or in the earliest phases of clinically detectable arthritis. Recent studies suggest that MRI-detected inflammation can aid this task [2][3][4], especially in combination with serological markers [2]. Among the different types of inflammation observed on MRI of hands and wrists, it has been shown that tenosynovitis (inflammation of the synovial lining of the sheath surrounding tendons) is independently predictive of RA development, both in patients presenting with early arthritis and with arthralgia [2][3][4][5]. In addition, changes in MRIdetected tenosynovitis may be of interest in treatment response evaluation.
Assessment of tenosynovitis on MRI is commonly done according to the scoring method of Haavardsholm et al [6], in which a reader examines multiple tendon regions and estimates the thickness of peritendinous effusion or synovial proliferation with contrast enhancement. This is a laborious task, which requires the availability of trained, experienced readers. Automating the evaluation of tenosynovitis could offer standardized, high precision measurements derived directly from the image data and alleviate the time burden and cost associated with visual scoring. To date, limited research is available on this topic. Bowes et al have published a conference abstract on quantifying change in tenosynovitis over time in 34 RA patients receiving treatment [7], but data on single time point validation of these quantitative measurements with respect to visual scores are not publicly available.
In a recent study, we developed an automatic framework for measuring bone marrow edema (a strong predictor of radiographic progression in RA patients [8]) on MR images of the wrist [9]. In the work presented here, we sought to extend that framework to measure tenosynovitis of the extensor and flexor tendons of the wrist. Our aim was to investigate the feasibility of tenosynovitis quantification and assess the correlation between quantitative measurements and visual scores in a large cohort of early arthritis patients.

Patients
A total of 563 early arthritis patients consecutively included in the Leiden Early Arthritis Clinic cohort [10] were studied. Mean age (±SD) was 54.9 (± 15.4) years; 350 patients (62.2%) were female. Inclusion required clinically confirmed arthritis by physical examination in ≥ 1 joints and symptom duration < 2 years. The cohort study was approved by the medical ethics committee of Leiden University Medical Center (Leiden, The Netherlands). All participants provided written informed consent.

MRI scanning and visual scoring
The wrist joint of the most painful side (or the dominant side in cases of equally severe symptoms on both sides) was scanned with a 1.5T extremity MR scanner (GE Healthcare) using a 100-mm coil, with contrast enhancement and frequency-selective fat saturation (T1-Gd). Table 1 summarizes the acquisition parameters. In line with the definitions proposed by Haavardsholm et al [6], tenosynovitis was evaluated in six extensor compartments and four flexor regions within the wrist joint ( Fig. 1). Visual scoring was independently performed by two trained readers blinded to clinical data. For each anatomical region, the readers provided a grade on a 0-3 scale based on the estimated maximum width of peritendinous effusion or synovial proliferation with contrast enhancement, as follows: grade 0, normal; grade 1, < 2 mm; grade 2, ≥ 2 mm and < 5 mm; grade 3, ≥ 5 mm. The scoring region was bounded by the distal radius/ulna proximally and the hook of the hamate distally. The intra-reader intra-class correlation coefficients (ICCs) of the two readers for the total tenosynovitis score (sum across all tendon regions), based on 40 MRIs scored twice, were 0.99 and 0.83. The inter-reader ICC for the total tenosynovitis score, based on all 563 MRIs, was 0.87. In what follows, the mean score of the two readers was always considered.

Super-resolution reconstruction
The coronal and axial MR scans compensate each other in terms of anatomical detail, since the slice thickness in each Described are the acquisition parameters of T1-weighted fast spin-echo sequences with frequency-selective fat saturation obtained after intravenous injection of Gd-chelate (gadoteric acid, Guerbet, Paris, France, standard dose of 0.1 mmol/kg) of the scans (2 mm in coronal; 3 mm in axial) is much larger than the in-plane spacing between voxels (~0.2 mm). In order for a quantitative framework to make use of all available image data in a compact and efficient manner, it is desirable to fuse the two scans into a single 3D image using superresolution reconstruction (SRR). The application of SRR to MR images of the wrist has been detailed in our previous work [9]. We applied the method of Poot et al [12] with Laplacian regularization (λ = 0.05).

Measurement region of interest
The computation of the ROI required automatically segmenting the tendons, carpal bones, distal radius/ulna, and the image region bounded by skin. The bones and initial landmarks for the tendon regions were obtained using atlas-based segmentation [13]. The atlas consisted of 13 early arthritis patients (separate dataset, excluding patients evaluated visually and quantitatively in this study). For each atlas patient, the tendon regions and bones were manually segmented in the axial T1-Gd images and then extended to SRR space by nearest neighbor interpolation. After spatially mapping every atlas image onto the target image using the Elastix toolbox [14][15][16], a majority vote was applied across all mappings, determining whether a voxel would be labeled as one of the tendons, bones, or neither. It should be noted that all atlas images contained the right wrist joint. For segmentation of the left wrist, atlas images were horizontally mirrored prior to registration. Having obtained initial landmarks for the tendon regions, the tendons were segmented by a similar approach to Chen et al [17] using marker-based watershed segmentation [18][19][20], followed by removal of segmented regions whose intensity was > 75 (tendons are characterized by low image intensities on T1-Gd images) or whose volume was < 0.01 ml. An example of the resulting segmentations is shown in Fig. 2b.
In order to segment the image region bounded by skin, the entire image extent of the hand was approximated. First, the background was segmented by performing region growing with seeds placed at the four corners of each image slice. Then, the resulting binary image was inverted and the largest connected component was retained.
Finally, for each segmented tendon, a distance transform was performed and voxels within a fixed distance (D) of the tendons were included in the measurement ROI as long as these voxels were not part of other labeled structures. The distal radius/ulna boundary of the ROI was determined by identifying the axial slice where the two bones were closest to each other. The hook of the hamate boundary was determined by searching for the axial slice with the largest number of segmented hamate voxels. An example of the resulting measurement ROI is shown in Fig. 2c. As detailed in the optimization section, the value of distance parameter D was obtained by maximizing correlation with visual scores on a training set of patients.

Assessment of tendon segmentation accuracy
To assess the accuracy of tendon segmentation, a leaveone-out cross-validation was performed. In each of the 13 runs, 12 out of 13 atlas images would constitute the atlas set, and the remaining image would be used as the target image to be segmented. The result was validated against manual segmentation of the axial image. Segmentation accuracy was evaluated by computing precision and recall rates for each of the 10 tendon regions. Here, precision rate refers to the fraction of voxels segmented by the algorithm that overlap with the manual segmentation, while recall rate refers to the fraction of voxels within the manual segmentation that were correctly segmented by the algorithm.

Tenosynovitis quantification
Tenosynovitis is characterized by high signal intensity on T1-Gd (fat-suppressed) images due to contrast enhancement. Intensity values vary per acquisition, depending on the  (1); ulnar bursa, including flexor digitorum profundus and superficialis tendon quartets (2); flexor pollicis longus (tendon) in radial bursa (3); and flexor carpi radialis (4). Note: the flexor carpi ulnaris does not have a tenosynovial sheath; nevertheless, inflammation around this tendon is also observed, and therefore, enhancement of tissue surrounding this tendon is scored [11] relative strength of contrast enhancement, the homogeneity of the fat suppression, and the inherent magnetic field inhomogeneities of the MR scanner. To account for these acquisitionspecific intensity ranges of tenosynovitis, fuzzy C-means clustering [21,22] was applied to the intensity values of all voxels in each image, assuming two clusters. This yields two probability map images, where each voxel contains the probability of that location belonging to the respective cluster. Let C2 be the cluster whose center value is the higher of the two computed cluster centers. As Fig. 2d illustrates, high probabilities (bright voxels) within the C2 probability map correspond to locations of healthy synovial tissue. Since our focus is on regions of inflammation, where image intensity is expected to be higher compared to healthy synovium, voxels whose intensity was lower than the value of C2 cluster center were removed, resulting in a one-sided C2 probability map.
Tenosynovitis was then quantified by computing the fraction of voxels within the measurement ROI whose one-sided C2 probability values p C2 were bounded by T L ≤ p C2 < T H . As detailed below, the numeric values of the lower and upper thresholds (T L , T H ) were optimized on a training set of patients to maximize correlation with visual scores.

Optimization
In order to optimize the (T L , T H ) thresholds and distance parameter D based on correlation with visual scores, a training set of patients was defined. The number of patients with low tenosynovitis (grades 0 and 1) in our early arthritis cohort was much larger than the number of patients with moderate-severe tenosynovitis (grades 2 and 3). Therefore, a random sampling of the cohort would not guarantee inclusion of patients with severe tenosynovitis in the training sample. In order to produce a more balanced training set representing the full range of tenosynovitis severity, we used a similar sampling approach as in our study on bone marrow edema [9]. We categorized 563 patients by the maximum visual score (V max ) across the scored tendon regions. Three sampling categories were defined corresponding to three severity intervals within V max range (0-3): V max = 0, 0 < V max ≤ 1, 1 < V max ≤ 3. Table 2 lists the defined categories and the number of patients that fall into each category. Next, 20 patients were randomly selected from each category to form a training set of 60 patients. The optimal distance and threshold values were found by computing the quantitative measurement for D = 1, 2, 3, 4, 5, and 6 mm and Some of the voxels identified in neighboring flexor region 2 corresponded to a low grade enhancement, but were not picked up in visual scoring. Several blood vessels located within the ROI were also included, introducing a number of false detections that counted towards the quantitative measurement all possible combinations (step size 0.01) of (T L , T H ) and determining which set of parameters maximized the Pearson correlation coefficient r between the total visual score of tenosynovitis (sum across all tendon regions) and the total quantitative tenosynovitis measurement.

Validation
After optimizing and locking the values of D, T L , and T H , the method was validated by computing the quantitative tenosynovitis measurement for the 503 patients that were not part of the training set and evaluating the Pearson correlation coefficient between the total visual score and the total quantitative measurement.

Statistical analysis
When assessing the Pearson correlation coefficient between visual scores and quantitative measurements, p values below 0.05 were considered to be statistically significant. Statistics were computed using MATLAB R2015b (The MathWorks, Inc.).

Assessment of tendon segmentation accuracy
The median and interquartile range (IQR) of recall and precision rates of tendon region segmentation across 13 atlas images are shown in Fig. 3. Flexor regions exhibited high precision rates (median values ranging from 0.92 to 0.97) and moderate-high recall rates (median values ranging from 0.85 to 0.90). The rates were generally lower for extensor regions and exhibited more variability (median precision ranging from 0.78 to 0.94 and median recall ranging from 0.32 to 0.85). The lowest recall (including three failed segmentations) was observed for extensor region III.

Optimization
The highest Pearson correlation value (r = 0.93, p < 0.001) between the total visual score of tenosynovitis and the total quantitative measurement over 60 training set patients was observed with distance parameter D = 3 mm and threshold values T L = 0.82 and T H = 0.94. As illustrated by the scatter plot in Fig. 4, increasing levels of tenosynovitis severity were fairly consistently matched with increasing values of the quantitative measurement. The measurements of patients with total visual score 0 had a median offset from zero of 0.04 (IQR 0.03-0.05), constituting 14.8% of the largest observed value of 0.27 for the most severely affected patients. Figure 2e shows an example of measurement ROI locations that were counted towards the quantitative measurement.

Validation
Having obtained the optimized parameter values, the quantitative measurement was computed for 503 patients, and correlation was assessed. The resulting Pearson correlation coefficient was r = 0.90 and p < 0.001. The scatter plot in Note: Random sampling across all categories would form a training set that mainly consists of patients with V max ≤ 1. In contrast, randomly selecting 20 patients from category 2, for example, guarantees that the training set will include 20 patients in which at least one tendon region received a visual score greater than 1. Thus, random sampling from each severity category helps ensure D, T L , and T H are optimized with respect to the entire range of tenosynovitis severity shows that majority of patients exhibited a consistent trend of increasing quantitative measurements with increasing levels of tenosynovitis severity. The measurements of patients with total visual score 0 had a median offset from zero of 0.04 (IQR 0.03-0.05) (same as in training), constituting 13.8% of the largest observed value of 0.29. Visual inspection of results indicated that blood vessels and synovitis present within the measurement ROI were often mistakenly counted as tenosynovitis by the quantitative measurement, increasing its numeric value. The strongly outlying case of a patient with visual score 0 and a quantitative measurement of 0.15 was caused by a failed tendon segmentation due to an unusually low intensity distribution of healthy synovium.

Discussion
In this study, we investigated the feasibility of automatic quantification of tenosynovitis on MRI of the wrist in a large cohort of early arthritis patients. The presented method extended our previously developed atlas-based framework [9] to the extensor and flexor tendons of the wrist, providing the landmarks necessary for tendon segmentation and definition of the ROI in which tenosynovitis was measured. The results exhibited strong correlation between quantitative measurements and visual scores. Quantitative measurements should not be viewed as a replication of visual scoring and therefore this study assessed consistency and correlation, rather than absolute agreement. The observed correlation is especially encouraging considering that there is an inherent degree of variability within visual scores due to the interval-based definition of the visual grades. These findings indicate that automatic quantification of tenosynovitis on MRI of early arthritis patients is feasible, and that quantitative measurements are largely consistent with visual scoring. However, this study also brings out multiple challenges pertinent to the quantification task, such as moderate segmentation performance and sources of false detections. As detailed in the following discussion, these are important issues that will need to be addressed on the path to a robust quantification framework. Interestingly, the overall moderate tendon segmentation recall rates did not seem to have a strong adverse effect on correlation between quantitative measurements and visual scores. This can be explained by the fact that even if a tendon is partially segmented, the measurement ROI around the segmentation is still likely to include the tendon's synovial lining. Although the ROI will then also include voxels inside the tendon, on T1-Gd images tendons are characterized by low image intensities which do not contribute towards the inflammation measurement; one exception is enhancement due to concomitant tendinitis. It should be recognized, however, that in this study, we measured the total inflammation across all evaluated tendon regions, which may have reduced sensitivity to errors made on the individual region level. This is particularly relevant when considering the low recall rates for extensor region III. The 3/13 failed cross-validation cases indicate that reliable quantification of inflammation around this tendon was not always feasible. A likely reason for this is that extensor region III is the smallest of the 10 tendon regions and exhibits higher curvature, making the placement of atlas-based landmarks more challenging.  One type of segmentation error that had no effect on total measurements was mislabeling of one tendon region as another; however, future studies must thoroughly assess mislabeling errors if evaluation of tenosynovitis on individual region level is of interest. More generally, it should be noted that inaccuracies in tendon segmentation do affect the total number of voxels included in the measurement ROI and thereby introduce some variability in the quantitative measurement. Therefore, improving segmentation accuracy is an important direction of future work both for measurement precision and accurate evaluation of tenosynovitis on the individual region level.
As illustrated by Fig. 2e, locations counted towards the quantitative measurement did not always include all voxels within the inflammation, but most voxels along the boundary of the inflammation were typically included. One possible reason for this is that the threshold parameters (T L , T H ) were optimized with respect to scores that reflect the maximum thickness of peritendinous effusion or synovial proliferation in each tendon region. Maximum thickness is not equivalent to total volume, and therefore, it is plausible that some voxels within the inflammation were not included in the measurement. Figure 2e also illustrates that one drawback of the current method is that blood vessels introduce false detections that contribute towards the quantitative measurement. This observation explains one of the factors behind the consistent offset from zero both during training and validation. Future improvements should include detection of blood vessels and their exclusion from the measurement ROI.
Visual inspection of quantification results indicated that synovitis present within the measurement ROI (for example, between carpal bones and tendons) was mistakenly counted as tenosynovitis by the quantitative measurement. In visual scoring, trained readers employ their expertise and pattern recognition to classify the observed inflammation as either synovitis or tenosynovitis. The presented method did not include such classification, and therefore, it is not surprising that it counted all inflammation detected within the measurement ROI as tenosynovitis. This is another contributing factor to the offset observed in training and validation. Since synovitis is often present in joints in the vicinity of tendons affected by tenosynovitis [11], a more specific definition of the measurement ROI is warranted.
In conclusion, the presented method provides a reference on the path to automatic quantification of tenosynovitis on MRI and lays out possible directions for future improvements. The common presence of tenosynovitis in RA and its association with RA development in arthralgia and early arthritis patients motivate the development of quantitative measurement techniques. These advances would aid clinical researchers by standardizing interpretation and allowing them to dedicate more resources to analysis rather than visual scoring, facilitating both research and potential clinical implementation.