Artificial intelligence for automated detection and measurements of carpal instability signs on conventional radiographs

Objectives To develop and validate an artificial intelligence (AI) system for measuring and detecting signs of carpal instability on conventional radiographs. Materials and methods Two case-control datasets of hand and wrist radiographs were retrospectively acquired at three hospitals (hospitals A, B, and C). Dataset 1 (2178 radiographs from 1993 patients, hospitals A and B, 2018–2019) was used for developing an AI system for measuring scapholunate (SL) joint distances, SL and capitolunate (CL) angles, and carpal arc interruptions. Dataset 2 (481 radiographs from 217 patients, hospital C, 2017–2021) was used for testing, and with a subsample (174 radiographs from 87 patients), an observer study was conducted to compare its performance to five clinicians. Evaluation metrics included mean absolute error (MAE), sensitivity, and specificity. Results Dataset 2 included 258 SL distances, 189 SL angles, 191 CL angles, and 217 carpal arc labels obtained from 217 patients (mean age, 51 years ± 23 [standard deviation]; 133 women). The MAE in measuring SL distances, SL angles, and CL angles was respectively 0.65 mm (95%CI: 0.59, 0.72), 7.9 degrees (95%CI: 7.0, 8.9), and 5.9 degrees (95%CI: 5.2, 6.6). The sensitivity and specificity for detecting arc interruptions were 83% (95%CI: 74, 91) and 64% (95%CI: 56, 71). The measurements were largely comparable to those of the clinicians, while arc interruption detections were more accurate than those of most clinicians. Conclusion This study demonstrates that a newly developed automated AI system accurately measures and detects signs of carpal instability on conventional radiographs. Clinical relevance statement This system has the potential to improve detections of carpal arc interruptions and could be a promising tool for supporting clinicians in detecting carpal instability.

surfaces of the following facets were annotated: (a) the third metacarpal and lunate facet of the capitate, (b) the capitate and radius facets of the lunate.
For developing the carpal arc interruption detection component, N.H. labelled for each frontal view radiograph whether any signs of carpal instability were present according to the original radiology reports.Then, polylines of the carpal arcs were generated for radiographs without any reported signs of carpal instability using the developed AI pipeline.This way, a shape model of noninterrupted carpal arcs could be built (see Appendix E7).Finally, the generated carpal arcs were visually inspected by N.H. to check for any significant inaccuracies in either the generated carpal arcs or derived carpal stability status.This resulted into a selection of 511 radiographs (434 AP/PA view, 77 oblique view) without carpal instability.

Appendix E3. Training Data Selection
Figure E1 shows a flowchart of the radiograph selection for the training dataset (dataset 1).The sample size was determined based on the data required to saturate the performance of the models.
To maximize the variance in the dataset, only one frontal view (including [neutral, ulnar-deviated, clenched fist] AP/PA, and oblique view) and one lateral view radiograph was sampled per patient.
Finger series were excluded during the data selection, as they provide little coverage of the carpal bones and are not suited for conducting carpal instability measurements.

Appendix E4. Measurement Value Distribution of Dataset 2
Figure E2 shows a histogram of the distribution of the SL distance, SL angle, and CL angle measurements for the dataset 2 and the observer study subset.interpolation) method from the OpenCV Python library (version 4.7.0.68, 2022) [5].
Save a copy of the original image.c.Fix the image size to 1600 × 1600 pixels by zero padding or center-cropping.

Extract regions-of-interest:
a.If a frontal view radiograph is provided, segment the anterior side of the scaphoid, lunate, triquetrum, capitate, and hamate with a segmentation convolutional neural network (CNN) optimized for processing frontal view radiographs.If a lateral view radiograph is provided, segment the lateral side of the scaphoid, lunate, and capitate with another segmentation CNN optimized for processing lateral view radiographs.The architecture and training procedure of the CNNs are described in Appendix E7.The detection threshold was set to 0.5.b.Smooth the segmentation masks by applying morphological closing with either a 13 × 13 pixel (frontal view) or 15 × 15 pixel (lateral view) disk kernel.A disk kernel is chosen over a standard square kernel to better preserve the organic shapes of the bones.c.Compute the number of connected components for each segmentation mask and remove all but the largest connected component.This step removes noise and leaves only the largest segmented object, which is expected to be the bone mask.d.Project the segmentation masks back onto the original input image by reversing the normalization operations conducted in the previous steps.e. Extract a 33 × 33 millimeter patch surrounding each segmented bone in the original image and mask.To ensure that all bones fit within the patch, the patch size was set to the 99 th percentile bounding box width or height of the segmented bones in dataset 1.The image and mask are cropped to a fixed-size patch instead of a bounding box of the segmentation mask to preserve the scale.The angle of rotation of the bone is aligned to either the minor axis (lunate) or major axis (other bones) before extracting the patch (orientation is estimated through ellipse fitting on mask).A patch with the original bone orientation is also saved.f.Standardize and enhance the image contrast of each patch using the contrast stretching method.Use the minimum and maximum value in the bounding box of the segmentation mask in the original image as output value range.g.Standardize the laterality (left or right hand) by processing the scaphoid patch with the laterality detection CNN from Hendrix et al. [6] and flipping the extracted patches if the radiograph depicts a left hand.For compatibility purposes, the scaphoid patch is resized to 299 × 299 pixels using bilinear interpolation and the patch with the original bone orientation is used as input.The pixel values are normalized by first rescaling the values between 0 and 1 via min-max scaling and then zero centering the values using the per-channel mean and standard deviation of the ImageNet dataset [7].The laterality is standardized to improve the landmark fitting results of the active appearance models (AAMs) in the next step (see Appendix E7 for more details).The laterality is determined visually instead of using the metadata of the DICOM file, as this information is not always available in the metadata.

Fit landmarks to articular facet joint surfaces:
a.For each segmented bone (with exception of the scaphoid on the lateral view), localize the anatomical landmarks on the articular facet joint surfaces with patchbased AAMs (see training details in Appendix E7).First, given a certain bone, the bounding box from the segmentation mask is used to align the AAM's reference shape (i.e., mean aligned shape in training data) for initializing the fitting procedure.
Next, an AAM is fitted to the mask.Finally, the fitted landmarks are passed to a multi-scale AAM (two scales: half and full image resolution) and are finetuned by fitting this AAM to the image.Following the default settings of the Menpo framework (version 0.10.0,2021) [8], the AAMs are fitted to the images and masks using Lucas-Kanade optimization with the Wiberg Inverse-Compositional algorithm [9] for a maximum of 20 iterations (15 and 5 iterations at the lowest and highest scale for the multi-scale AAMs).For the multi-scale AAMs, at the lowest scale, 5 shape components and 30 appearance components are used during the fitting procedure.At the highest scale, 20 shape components and 150 appearance components are used.When no multi-scale features are used, the latter setting is applied.
b. Project the fitted landmarks on the original input image by reversing the normalization operations conducted in the previous steps.

Conduct measurements:
a. Measure the SL joint distance by connecting the midpoints of the lunate facet of the scaphoid and the scaphoid facet of the lunate.To compensate for any inaccuracies in the localization of the articular surfaces, the selection of endpoints of the estimated articular surfaces is optimized before selecting the median (middle) anatomical landmark.First, all possible quadrangles that can be defined between the landmarks of the opposite articular surfaces are determined.Then, both the area and entropy of the angles (squareness) of each quadrangle is calculated.Finally, the endpoints (corners) of the left and right side of the quadrangles are ranked by a weighted average between the quadrangle area (weight=1) and squareness (weight=2) (descending order), and the highest ranked pair of endpoints is selected.This way, a pair of midpoints is selected on the facets with an optimal trade-off between the proximity to the original median anatomical landmark (quadrangle area) and parallelism of the underlying articular surfaces (quadrangle squareness).b.Measure the SL angle by first assessing the long axis of the scaphoid through ellipse fitting (major axis) and assessing the midplane axis of the lunate by connecting the mid-points of the capitate and radius facets of the lunate.The angle is then derived from the axes.c.Measure the CL angle by first assessing the long axis of the capitate by connecting the mid-points of the third metacarpal and lunate facet of the capitate.The angle is then derived from this axis and the lunate axis (obtained in the previous step).f.Obtain a single detection score by calculating the percentage of significant reconstruction errors.A percentage of reconstruction errors was chosen over the maximum reconstruction error as detection score, because estimating the interruption magnitude using a single point was found to be susceptible to small non-meaningful inaccuracies in the generated polylines of the carpal arcs.

Appendix E7. Training Procedure
All experiments were conducted on a system with an Nvidia RTX Titan graphics card and Intel Core i9 9900K CPU.The two carpal bone segmentation CNNs for processing frontal and lateral view radiographs were trained on dataset 1 using the PyTorch machine learning framework (version 1.13.1, 2023) [10].The original DICOM files were first converted to 16-bit PNG files.A random subset of 100 radiographs per radiographic view was used for validation and the rest of the dataset was used for training (no patient overlap).The CNNs were randomly initialized by applying normal initialization following He's method [11] (intermediate layers) and Xavier's method [12] (output layer) for efficient weight optimization.The architecture was adapted from the scaphoid segmentation model from Hendrix et al. [13], which has a light-weight encoder-decoder structure based on the U-Net architecture [14].Table E2 provides an overview of the adapted architecture and hyperparameter settings.The number of filters per layer was doubled to compensate for increased input variance resulting from the added bones and larger rotation augmentations (later on more about this).The ADAM optimizer [15] (β1 = 0.9, β2 = 0.999) was used for weight optimization and minimized the categorical cross entropy loss over a single image: where M is the number of output masks, N is the number of pixels in the image, and yi,j and pi,j are respectively a binary label and probability indicating whether the pixel belongs to the given mask.
The initial learning rate was set to 1 × 10 -5 and it was reduced to 1 × 10 -6 when the training loss did not decrease for 10 epochs.The training process was ended when the validation loss did not decrease for 10 epochs to prevent potential overfitting.Furthermore, the following data augmentations were applied using the Albumentations image augmentation Python library (version 1.3.0,2022) [16]: horizontal flipping, horizontal and vertical translation (max.factor 0.0625), scaling (max.factor 0.1, "zoom" in/out), rotation (max.45 degrees, both directions), grid distortion (max.distortion 0.03, five grid cells per side), increased brightness and contrast (max.factor 0.3 and 0.4).
The translation, scaling, and rotation augmentations were applied with 80% probability, whereas the other augmentations were applied with 50% probability.
The AAMs of the articular surfaces were trained on dataset 1 using the Menpo framework (version 0.10.0,2021) [8].AAMs are well-investigated statistical deformable models of object shape The polylines of the articular surfaces were first converted to anatomical landmarks by uniformly resampling a fix set of points.The optimal number of sampling points per surface (per bone and view) was determined by plotting the number of points against the Fréchet distance (mean and standard deviation) between the subsampled and original polylines.Based on this analysis, 20 sampling points per facet joint surface were selected, except for the surfaces between the bones in the proximal row.For these surfaces, 15 sampling points were selected instead.The order and orientation (laterality and angle of rotation) of the landmarks per surface was normalized, and then the landmarks were exported with the corresponding images using the PTS file format (i.e., raw landmark points in text file) and 8-bit PNG file format.The angle of rotation was normalized by assessing the minor axis (lunate) or major axis (other bones) based on the segmentation mask through ellipse-fitting.
Next, per carpal bone (except for the scaphoid on the lateral view) two patch-based AAMs were trained on the annotated images.Unlike a holistic appearance representation that covers the entire texture enclosed within the landmarks, a patch appearance representation only covers the texture enclosed within a fixed-size patch sampled on each landmark.A patch representation was chosen over a holistic representation, because only the bone surfaces needed to be modelled and the texture enclosed by the landmarks has little descriptive value for this task.The first AAM used the bone segmentation mask as appearance representation and fitted the landmarks based on shape information only.To compensate for small segmentation errors and to add context information, the second AAM finetuned the fitted landmarks while using dense SIFT image features [17] as appearance representation (see also Appendix E6).Dense SIFT features were used instead of the raw pixel data because of their rotation and scale invariant properties.
When building the AAMs, the diagonal of the bounding box enclosing the landmarks was normalized to 150 pixels (resizing the shape and image).For the second AAM, the appearance was modelled at two scales, which included the half and full image resolution (lowest and highest scale).
At inference time, the fitting procedure moves from the lowest scale to the highest scale for more robust optimization.The patch dimensions for the first AAM were set to 23 × 23 pixels, and they were set to respectively 15 × 15 pixels (lowest scale) and 23 × 23 pixels (highest scale) for the second AAM.The maximum number of shape and appearance components was set to respectively 20 and 150.These settings were adapted from the provided settings in the documentation of the Menpo framework (version 0.10.0,2021) [8] and were experimentally found to be already optimal for the task at hand.
The point distribution model (PDM) was trained on shapes of non-interrupted carpal arcs obtained from AP/PA view radiographs in dataset 1 using the Menpo framework (version 0.10.0,2021) [8].Similar to the development of the AAMs, the polylines of the carpal arcs were first converted to anatomical landmarks by uniformly resampling a fix set of points and then they were exported as PTS files.Per arc, 100 points were resampled on the corresponding polyline.The number of active components was set to keep 95% of the variance in order to remove any noise captured by the last components.In contrast to the carpal bone segmentation CNNs and articular surface AAMs, using oblique view radiographs for building the PDM was not found to have performance benefits in terms of carpal arc interruption detection.
frontal view radiographs where the lunate was incompletely depicted (n = 4, <50% visible in three radiographs) or where a wrist with severe scaphoid lunate advanced collapse (SLAC) was depicted (n = 1).Two triquetrums were not segmented in frontal view radiographs where the triquetrum was incompletely depicted (<50% visible).One lunate was not segmented in a lateral view radiograph where the wrist was in cast and contained osteosynthesis material causing overprojection.Even though the segmentation failure rate was low, the findings underline the importance of providing radiographs to the AI system with complete and non-obstructed depiction of the wrist.

Appendix E11. Anatomical Landmark Localization Results
Table E6 shows the mean (final) fitting error, DSC, and HD (in millimeter) with standard deviation of the anatomical landmark localization results per carpal bone and per radiographic view obtained through a ten-fold cross-validation on a subset of dataset 1.The results are reported with and without the bone orientation normalization and landmark prefitting processing step (i.e., initialize fitting procedure by first fitting an AAM to the segmentation mask, see steps 2e and 3a in Appendix E6).It is important to note that manual segmentation masks were used for all steps in the evaluation procedure (i.e., training, inference, evaluation).The results show that the landmark prefitting step had the most beneficial effect on the performance metrics, which indicates that bone segmentation has added value for localizing the landmarks.The bone orientation normalization step had less effect on the performance metrics, but the effect might be more pronounced in cases of subluxations with angulation.Applying both the orientation normalization and landmark prefitting step led to the best results overall.

Appendix E12. Bland-Altman Plot Analysis of Measurements of AI and Clinicians
Figures E4-E9 show a Bland-Altman plot with 95% CI bands comparing the measurements of the AI system and clinicians with the reference standard for the SL distances, SL angles, and CL angles in the observer study subset.

5 .
d. Generate polylines (i.e., lists of points with straight line segments drawn between consecutive points) of the three carpal arcs by connecting the anatomical landmarks corresponding to the relevant facet joint surfaces.Detect and measure the degree of interruptions in the carpal arcs: a. Uniformly resample the three carpal arc polylines using 100 sampling points per arc.b.Reconstruct the expected or hypothetical shape of the carpal arc polylines if noninterrupted.An example of this reconstruction is shown in Figure E3.c. Determine the reconstruction error per point between the observed and reconstructed hypothetical normal carpal arcs by calculating the pair-wise Euclidean distance.d.Convert the reconstruction errors to z-scores using the mean and standard deviation of the reconstruction errors corresponding to non-interrupted carpal arcs in the training data (dataset 1).Reconstruction errors with a z-score of two or greater are considered significant.e.Display the reconstruction errors with vectors and color coding (Fig.E3).
and appearance (a.k.a.texture) that can be matched to a new image.Due to the constraints imposed by the shape and appearance modelling, relatively few training examples are required to train an AAM.Since anatomical variation of the articular surfaces is limited and creating annotations of these surfaces is challenging and time-intensive, an AAM was chosen over a deep learning-based model for this task.The training procedure and hyperparameter optimalization was conducted as follows.
The metrics are reported with their standard deviation.Each bone is not always depicted and therefore the number of masks (n) is reported.Frontal view included neutral, ulnar-deviated, clenched fist anterior-posterior (AP) or posterior-anterior (PA) view and oblique view.DSC = Dice similarity coefficient, HD = symmetric Hausdorff distance, mm = millimeter.

4
Note. -The articular facet joint surfaces were labelled in 400 radiographs from dataset 1 (200 frontal view, 200 lateral view, equally distributed between hospitals).All bones were (fully) depicted in the radiographs.The metrics are reported with their standard deviation.Per bone and radiographic view, metrics with the most optimal values across the configurations are made in bold (lowest fitting error [output cost function Wiberg Inverse-Compositional algorithm], lowest Hausdorff distance [HD], and highest Dice similarity coefficient [DSC];lowest standard deviation in case of equal values)."Prefitting" refers to whether the fitting procedure was initialized by prefitting an active appearance model to the segmentation masks (see Appendices E6 and E7).

FrontalFiguresFigure
Figures Figure E1.Flowchart for the inclusion and exclusion of x-rays in dataset 1 (training).The number of

Figure E2 .
Figure E2.Histograms of the measurement value distributions of the scapholunate (SL) distance, SL

Figure E3 .
Figure E3.Example of carpal arcs detected by the artificial intelligence (AI) system (top left) and its

Figure E4 .
Figure E4.Bland-Altman plots of the measurement agreement between the AI system and the

Figure E5 .
Figure E5.Bland-Altman plots of the measurement agreement between the junior doctor (Jr Doc)

Figure E6 .
Figure E6.Bland-Altman plots of the measurement agreement between the hand surgeon (H Surg)

Figure E7 .
Figure E7.Bland-Altman plots of the measurement agreement between the emergency doctor (ER

Figure E8 .
Figure E8.Bland-Altman plots of the measurement agreement between the radiologist (Rad) and the

Figure E9 .
Figure E9.Bland-Altman plots of the measurement agreement between the musculoskeletal

Table E3 : Abnormal Measurement Detection Results of the AI System on Dataset 2
-95% confidence intervals are reported in parentheses.The abnormality thresholds for the measurements are defined in the main text.CL = capitolunate, SL = scapholunate.