Introduction

Osteoarthritis (OA) is a common chronic medical condition in older adults, leading to progressive structural damage of the joints and subsequent functional disability [13]. Although the hand is frequently involved in OA patients, leading to pain and impaired hand function, osteoarthritis research is focused predominantly on the hip and knee [47]. Besides patient-reported outcomes (i.e., pain, physical function and patient global assessment), progressive decrease of radiographic joint space width (JSW) is an important parameter in clinical trials on hand OA [810]. Hand radiographs are used commonly to diagnose and monitor hand OA because of their wide availability and the relatively low costs. Semi-quantitative methods with standard atlases are the bench tools used by clinicians to determine changes in JSW [1113]. Although radiographic OA features scored by these methods are widely performed [14, 15], there is a limitation in reproducibility due to the difficulty to standardize the scoring between different readers. The use of an ordinal scale is a limitation to measurement accuracy, which could be improved by assessment of structural damage on a continuous metric scale [16].

At the moment, only symptomatic treatment is available [17], and the development of new structure modifying trials on hand OA is hampered by limitations in outcome measures [18]. Since OA is a slowly progressive disease, an accurate and reproducible method is needed to detect subtle changes throughout follow-up, especially when evaluating new therapies.

Currently, radiography is usually digital, facilitating implementation of computerized quantitative JSW measurements. Different software tools have been developed to measure JSW in radiographs [1925]. These are mainly semi-automatic tools requiring manual detection of articular margins or joint space by the user. A newly developed JSW quantification method automatically detects the interphalangeal (IP) and metacarpophalangeal (MCP) joints and quantifies the JSW in hand radiographs [26]. In a recent cross-sectional study, good agreement was found between measured JSW by this method and the atlas-based ordinal scores according to the OARSI system [26].

Because the decrease in radiographic JSW is an important surrogate marker in clinical trials on hand OA, the primary aim was to assess the accuracy and sensitivity to change in JSW by comparing the automatically determined JSW to true distance between bony contours of the finger joints. Since JSW cannot be adjusted in human subjects, this gold standard was obtained by varying the true JSW in an acrylic phantom and in cadaver finger joints, using an attached mechanical micrometer. The second aim of the study was to evaluate the influence of joint location and joint shape on the measured JSW.

Materials and methods

Three experiments were composed to validate the automated quantification method. In all experiments, a specially developed micrometer device was used to define and adjust the true JSW. Plain digital radiographs were acquired by a standard digital X-ray imaging system (Canon Inc., Tokyo, Japan) and the resulting images were analyzed by our developed software [26]. Subsequently, the differences between the measured and true JSW were evaluated.

In the first experiment, the repeatability of the JSW measurement was tested using an acrylic phantom joint, which mimics an MCP joint, attached to the micrometer (Fig. 1) with a fixed JSW. In the second experiment, the sensitivity to progression was tested by varying the JSW in the acrylic phantom and measuring this simulated progression by the automated quantification method. In the final experiment, the MCP, the proximal interphalangeal (PIP) and distal interphalangeal (DIP) joints of human cadaver bones were attached to the micrometer (Fig. 2) in order to study the influence of differences in joint shapes on the sensitivity to progression of Joint Space Narrowing (JSN). All experiments were repeated for different locations of the joint, determined by a standardized hand template, which is used to position the hand in research settings.

Fig. 1
figure 1

The acrylic phantom joint connected to a micrometer

Fig. 2
figure 2

The micrometer set-up showing the cadaver metacarpal-phalangeal joint of the 3rd digit. Middle and distal phalangeal bones of the 3rd and 5th digit were also used in experiment 3

Experiment 1; Repeatability, phantom joint

In order to test repeatability, the acrylic phantom was placed on the hand template at the location of the MCP of the 3rd digit (middle finger), to which the X-ray focus was projected, perpendicularly to the receptor plate. In order to simulate conditions comparative to (follow-up) clinical trials, in which user dependent focus-film distance and (re)positioning differences may appear, ten exposures were made with focus-film distances of 110, 115, and 120 cm (n = 4, 3 and 3, respectively), on each occasion repositioning the phantom, table and X-ray focus between exposures. The true JSW was set at 1.00 mm. The experiment was repeated using the standard anatomical location of the DIP of the 5th digit on the template, which produces the most angulated projection.

Experiment 2; Sensitivity to progression, influence of joint location

The phantom joint was placed at the positions of the MCP III, PIP III, DIP III, and DIP V on the hand template, where the MCP III location is the centre point of the x-ray beam. True JSW was varied between 0.20 and 2.40 mm. In the intervals [0.20; 0.80] and [1.20; 2.40], this was done with an increment of 0.20 mm. In the interval [0.90;1.20] a smaller increment of 0.02 mm was used, to simulate subtle progression rates as probably encountered at the onset of OA [22;23;26]. A total of 88 measurements were performed, 22 for each joint.

Experiment 3; Sensitivity to progression, influence of joint shape

In order to study the influence of different joint shapes, we used human cadaver matched metacarpal and phalangeal bones of the 3 rd and 5th finger, from which all hyaline cartilage and soft tissues were dissected. JSW was varied in the same way as described in experiment 2. The X-ray focus was centered at the location of the MCP III, PIP III, DIP III and DIP V joint. A total of 88 measurements were performed, 22 for each joint.

Image analysis

The automatic quantification method first identifies the individual joints in the standard hand radiographs [26]. Subsequently, the proximal and distal margins and the measurement interval are determined in each joint, thereby defining the joint space. Finally, the JSW was calculated as the average distance between the joint margins enclosed by the measurement interval. In order to analyze the radiographs containing the phantom joints, the first step of the program was omitted and an observer had to locate the position of the phantom joint manually.

Statistical analysis

In the first experiment, the standard deviation (SD) of the paired differences between measured and true JSW was defined as a measure of repeatability (random error). The smallest detectable difference (SDD) or the smallest detectable chance (SDC) is used in OA research as a threshold for detection of JSN, and is defined as 1.96 x SD [2729]. SDs were compared between DIP and MCP joints, with the Levene’s test for homogeneity. The mean of the differences gives the systematic error. To test the statistical significance of this systematic error in these clustered data, we used a generalized linear model (GLM), with the error in JSW as dependent variable, and location and exposure number as random factors. Differences in systematic errors between two morphologically different joints like the DIP and MCP joint were tested with an unpaired t test, assuming equal variances. Normal distribution of the differences was confirmed with the Kolmogorov-Smirnov test.

In the second and third experiments, a Bland-Altman plot was made to investigate whether the systematic error was dependent on the size of the JSW measurement and to calculate the repeatability (SD of differences).

Differences in systematic errors between joint locations (experiment 2) and between joint shapes (experiment 3) were analyzed with a GLM model, as described above, with location and joint type as random factors, respectively, by testing whether the corresponding coefficients were significantly different from 0. Differences in random errors between locations and joint types were tested with a Levene’s test.

To test the statistical significance of the systematic error for the entire group, we tested whether the intercept in the GLM analysis was significantly different from 0.

A significance level of 0.05 was used for all statistical tests.

Results

Experiment 1; Repeatability, phantom joint

The results (Table 1 and Fig. 3) show a systematic error of 0.052 mm (5% over-estimation). The systematic error was independent of focus-film distance and not significantly different between the DIP V and MCP III location. We found a significant difference in the repeatability between the measurement at the DIP V location and the measurement at the MCP III location (both p values 0.046). Highest repeatability was found at the location of the MCP of the 3 rd digit.

Table 1 Systematic error and repeatability in the phantom joint at different locations. The true JSW (micrometer) was set at 1.00 mm
Fig. 3
figure 3

Measured JSW by automatic quantification and true JSW for two joint locations

Experiment 2; Sensitivity to progression, influence of joint location

The results (Table 2 and Fig. 4) show that the systematic and random errors were 0.054 mm and 0.037 mm, respectively, and both were independent of the size of JSW. These errors were slightly higher than in experiment 1. Again highest repeatability was found at the MCP of the 3rd digit, but no statistically significant differences in random errors were found between the four different joint locations (i.e., MCPIII, PIPIII, DIP III and V). Progression of JSN was estimated without any systematic error and with a random error of 0.016 mm (Table 3 and Fig. 5). Therefore progression of 0.032 mm, as defined by the smallest detectable difference, was measured in this phantom experiment.

Table 2 Systematic error and SDDs in the phantom joint on different locations
Fig. 4
figure 4

The difference between true and measured JSW against the true JSW

Table 3 Systematic error and SDDs in the measurement of progression in the phantom joint from a true JSW of 1.1 mm at baseline
Fig. 5
figure 5

The difference between true and measured progression against the true progression, where a true JSW of 1.1 mm was taken as baseline

Experiment 3; Sensitivity to progression, influence of joint shape

The mean systematic error was 0.210 mm (Table 4 and Fig. 6) and there was a significant difference in the systematic errors between the different joints. The systematic error was smallest in DIP V (0.050 mm) and highest in PIP III (0.354 mm). Progression of JSN in the different joints was estimated without any systematic errors (Table 5 and Fig. 7). The overall precision in detecting progression as defined by the smallest detectable difference was 0.031 mm, being smallest in DIP V (0.018 mm) and highest in PIP III (0.047 mm) (Table 5 and Fig. 7).

Table 4 Systematic error and SDDs in the cadaver derived joints
Fig. 6
figure 6

The difference between true and measured JWS is plotted against the true JSW

Table 5 Systematic error and SDDs in measuring progression in the cadaver derived joints from a true JSW of 1.1 mm at baseline
Fig. 7
figure 7

The difference between true and measured progression against the true progression, where a true JSW of 1.1 mm was taken as baseline

Systematic error in the phantom experiments

The systematic error of 0.052 mm, found in the phantom studies, did not differ between the various focus-film distances. To test if the over-estimation was caused by the software or by the phantom design we determined the exact shape and fitting of the phantom joint, by scanning the phantom in a micro-CT scanner. This revealed that the distal and proximal surfaces did not fit perfectly, leaving a small asymmetric gap (Fig. 8). This small additional space was being measured by the automatic quantification method.

Fig. 8
figure 8

Sagittal and coronal view of a Micro-CT scan of the acrylic phantom, showing a small asymmetric gap between ball and socket

Discussion

We validated an automatic method to measure radiographic JSW of the finger joints, for which we previously found a good agreement with the atlas-based ordinal score according to the OARSI system [26]. Results of the current study show that this automatic method has a high accuracy in measuring the JSW. We also found a high repeatability (SDD between 0.021 and 0.032), which varied slightly between the different hand joint locations. Measured systematic errors were between 0.056 mm and 0.047 mm and a progression of JSN between 0.012 and 0.047 mm could be detected. Both systematic errors and precision of progression estimation were dependent on the joint type, implying that the morphology differences between the MCP, PIP and DIP joints influenced the accuracy of JSW measurement.

Repeatability and systematic errors

The results of experiment 1 show that centering the X-ray beam to the location of a particular joint improves the repeatability slightly. The differences in repeatability may be explained by X-ray beam angulation, since the SDD of the measurements on the DIP V location was significantly higher than the SDD of the measurements on the MCP III location. The systematic error of 0.052 mm, found in the phantom studies, was caused by the phantom and not by the measurement software.

A similar phantom experiment was executed by Angwin et al. who studied the sensitivity and reliability of mean computerized JSW measurements in standard clinical hand radiographs in healthy subjects [19]. They used a phantom MCP joint consisting of a gold plated aluminum ball and socket mounted on a micrometer to investigate the errors of their measurements method. In their phantom experiment an overestimation of JSW of 0.018 mm was found, which is smaller than the overestimation in our experiment. It is likely that their overestimation was also caused by the phantom model design, leaving a gap between the two components. In the same study, Angwin et al. used hand radiographs of healthy subjects to determine the smallest detectable difference, where they consequently assumed that repeatability was independent of the size of the JSW. In our experiment we could confirm that this is indeed the case. We found, however, that repeatability is influenced by the shape of the joint and slightly by the joint location, as shown in experiment 2 and 3.

The results of experiment 3 showed that systematic errors were different between joint types. The automatic quantification software calculates a mean JSW depending on the measured area between two bony contours, whereas the micrometer device is calibrated on the minimal distance between two phalangeal bones. It is likely that the shape of the different joints influences the definition of JSW as implemented in the automatic quantification method compared to the minimal measured space by the micrometer. This may explain the differences in measured systematic errors between the various types of joints. For example, the small systematic error in DIP V may be related to the relatively flat shape of the articular surfaces. Although systematic errors differ between the different joint shapes, this may be of less relevance in future clinical trials, in which progression is being measured and the smallest detectable difference (SDD) would be more important than the systematic error.

Applicability of the automatic quantification method in clinical trials

In clinical trials, progressive reduction in JSW from a patient given baseline would be assessed by the automated program. We used a micrometer determined baseline of 1.1 mm, based on results of a previous study in which JSW values were between 1.6 mm (MCP healthy subjects) and 0.6 mm (DIP OA patients) [26].

Angwin et al. studied the sensitivity and reliability of mean computerized JSW measurements in standard clinical hand radiographs of healthy subjects and the effect of hand position and joint angulation on measurement reliability [19]. They found that a change > 0.11 mm in JSW in an individual joint would represent an actual physical change in JSW (outside the 95% confidence interval), and that the smallest detectable change decreased to 0.05 mm when different measurements across fingers of a single subject where averaged. In our study we also investigated the influence of joint location, joint shape and JSW size on the smallest detectable difference. The results of our study show that the smallest detectable change in JSW of our quantification method ranges from 0.012 mm to 0.047 mm per individual finger joint. The differences in outcomes between the two studies might be explained by the different measurements methods used, or by the fact that Angwin et al. used digitalized images captured from standard film radiographs.

In order to assess the applicability of the quantification method in future clinical trials, an indication of the expected rate of decline in JSW in the normal and OA population is needed. Pfeil et al. [2224] published a set of normative age-related and gender-specific JSW data in 869 normal (non-OA) patients. Those results show that in the normal population, aged between 20 and 80 years, there is a decrease in JSW between 0.1 and 0.2 mm every 20 years, depending on age and type of finger joint. This corresponds to an annual decrease between 0.005 mm and 0.01 mm. Our SDD results show that normal JSW reduction in non-OA patients could thus be detected in the DIP joints within 2–4 years, especially when measurements of different fingers and hands are averaged. The rate of JSW decline in hand OA patients is not known exactly, but a recently published prospective observational study showed that the semi-quantitative OARSI atlas scoring method was able to detect JSN progression after 2 years in 33 (19.2%) of 172 hand OA patients [30]. Because rapid decrease of JSW is one of the main factors in hand OA, it is to be expected that the first signs of progression can be detected within 1 or 2 years with this automatic quantification method.

A few limitations apply to our study. The influence of finger joint flexion and extension in the measurement of JSW was not tested. Angwin et al. [19] have demonstrated that this influences JSW measurements slightly. However, in contrast to rheumatoid arthritis, deformity of the hand joints is not as marked in OA. Only in a late stage of OA, full extension of the hand is limited by joint destruction. As stated by Angwin et al., the value of a sensitive measurement method lies in the detection of early progression in order to test possible benefits of newly developed methods to arrest or slow down the OA process. Therefore, it is expected that limited finger extension will not play a significant role in future OA research.

It is possible that in vivo both systematic error and SDD may be different from the values that we found in the cadaver derived bone experiment, since the actual radiographic contrast between bone, cartilage and synovial fluid is lower than between bone and air in our experiment. However, conducting a study in vivo in humans using induced progression of JSN is practically and ethically impossible.

In hand OA research, progression of JSN is one of the most important parameters. The results of this study show that, dependent on the type of joint, a decrease in JSW of 0.01 to 0.05 mm can be detected with our automatic quantification method. It is to be expected that with this method the first signs of OA progression can be detected within 1 or 2 years, making it a sensitive tool in future hand OA research.