Reliability of humeral head measurements performed using two- and three-dimensional computed tomography in patients with shoulder instability

Purpose The aim of the study was to compare two measurement methods of humeral head defects in patients with shoulder instability. Intra- and inter-observer reliability of humeral head parameters were performed with the use of 2D and 3D computed tomography. Methods The study group was composed of one hundred humeral heads measured with the use of preoperative 2D and 3D computed tomography by three independent observers (two experienced and one inexperienced). All observers repeated measurements after 1 week. The intra-class correlation coefficient (ICC) and the minimal detectable change with 95% confidence (MDC95%) were used for statistical analysis of diagnostic agreement. Results For 3D inter-observer reliability, ICC values were “excellent” for all parameters and MDC95% values were “excellent” or “reasonable.” All intra-observer ICC and MDC95% values for 3D were “excellent” for experienced and inexperienced observers. For 2D-CT, ICC values were usually “good” or “moderate” with MDC95% values higher than 10 or 30%. Conclusions Three-dimensional CT measurements are more reliable than 2D for humeral head and Hill-Sachs lesion assessment. This study showed that 2D measurements, even performed by experienced observers (orthopaedic surgeons), are burdened with errors. The 3D reconstruction decreased the risk of error by eliminating inaccuracy in setting the plane of the measurements.


Introduction
Shoulder instability is a common condition affecting 25 in every 100.000 people a year. It affects mainly young people, especially men (3:1) [1]. In people under 20 years of age, the risk of recurrent instability after the first dislocation can be up to 90% [2].
During anterior shoulder dislocation, the humeral head is displaced in front of the glenoid and its posterior surface is wedged into the anterior edge of the glenoid. The resulting bone impression, called Hill-Sachs defect, is the common diagnosis in patients with recurrent shoulder instability. The presence of a Hill-Sachs defect may predispose to a conflict between the humeral head and glenoid, and consequently to the dislocation of the shoulder joint [3].
Diagnosis of the defects of the anterior glenoid rim and the Hill-Sachs defect is important in treatment and facilitates the selection of an appropriate treatment method [4,5]. Recently, significant attention has been paid to the issue of coexistence of humeral bone defects with glenoid defects. Not only the presence of a humeral head defect but also its morphology, location, and its interplay with glenoid bone loss might matter from the biomechanical point of view [6,7].
Currently, the golden standard in the treatment of patients with shoulder instability is arthroscopic Bankart repair, which in patients with normal morphology of the glenoid and humeral head proves to be highly effective and with a low risk of recurrence of instability [4]. However, in the case of glenohumeral bone defects, the effectiveness of the soft tissue repair method drops significantly. Both glenoid and humeral head defects have been considered as the most important risk factors that should guide us in our selection of the most appropriate technique to stabilize the shoulder in case of instability [8][9][10].
Hence, correct tools for bone evaluation are needed. Instability severity index score (ISIS) developed by Balg and Boileau uses a quite simple way of X-ray evaluation [8] for definition of the defects. However, Bouliane at al. found there to be very low inter-rater accuracy in the approach [11,12]. According to Hirshmann et al. [13], conventional X-ray is characterized by poor reliability, two-dimensional computed tomography (2D-CT) by medium reliability, and three-dimensional computed tomography (3D-CT) by high reliability of assessment.
We have been able to show in our previous study that 3D glenoid reconstruction is more reliable for glenoid bone loss assessment than 2D [14]. Therefore, we have focused on the measurement of humeral head defects in shoulder instability.

The aim
The aim of this study was to assess the reliability of the most common methods to measure humeral head and its defects in patients with recurrent shoulder instability using 2D-CT and 3D-CT. We have hypothesized that 3D-CT will be more reliable than 2D-CT in assessing humeral head deficiency. In order to test the hypothesis, we have performed intraobserver and inter-observer tests by experienced and inexperienced evaluators. This study is a continuation of previous approach to evaluate the glenoid defects [14].

Material and methods
One hundred consecutive CT scans performed on 100 patients (mean age 35.5; SD 15.5; min 17; max 69) diagnosed with traumatic anterior shoulder instability were obtained from our radiology department. The scans were collected by independent orthopaedic surgeons who did not participate in the assessment of method reliability. All the patients underwent a physical examination in order to identify other shoulder pathology like rotator cuff tears, deformations, fractures, or osteoarthritis. In the next stage, a basic radiology assessment showed that 63 of the 100 CT scans displayed signs of shoulder instability (63 glenoid bone loss, of which 47 Hill-Sachs defects).
Each of the CT scans was subjected to appropriate computer processing depending on the type of measurement method used: 2D-CT with multiplanar reconstruction and 3D-CT reconstruction. For both techniques, several of the most frequently measured indices of humeral head and Hill-Sachs lesion were chosen for further assessment (Table 1): circle area of humeral head, Hill-Sachs length, Hill-Sachs depth, humeral head length, humeral head height, anatomical neck width.
All related measurements were performed by three independent observers. Two observers were orthopaedic surgeons specializing in shoulder surgery. The third was an observer with no experience in medical orthopaedic treatment (not a physician). Each of them performed measurements twice with a seven day interval without prior knowledge of the results of the first measurement and the findings of the second investigator.

2D measurement method
2D-CT method was based on an analysis of two-dimensional computed tomography with multiplanar reconstruction, using the OsiriX MD 64-bit software (v.8.5). In the first stage of the assessment, the shoulder CT scans were reconstructed in the 3D Curved-MPR module. Then, this image was set in three planes: frontal, sagittal, and transverse ( Fig. 1) and measurements were performed (

3D measurement method
3D-CT was based on CT analysis in three-dimensional reconstruction. In the first stage, all scans were reconstructed in three-dimensional space using the 3D Slicer software (3D Slicer ver 4.4). The program allowed conversion of a DICOM file into a Mesh file, which could then be further evaluated using the GOM Inspect (GOM, ver 8) software (Figs. 4 and 5).
3D reconstruction was conducted by one observer to avoid any errors associated with the 3D reconstruction itself. Thirty randomly chosen tomograms were reconstructed with an interval of seven days. One of the CT scans in pair was marked as "model" and converted into the CAD format, while the second tomogram, called "comparative," was converted into the Mesh format. Both CT scans as a pair were compared with each other using the GOM Inspect (V8) program. The reliability of the 3D reconstruction was positively assessed, as the average differences within pairs did not exceed 0.15 mm.
Next, all measurements were performed on 3D-CT with the use of GOM Inspect program ( Table 1).

Statement of human and animal rights
This article does not contain any studies involving human participants and animals performed by any of the authors.

Statistical analysis
In our study, we relied on two models of reliability testing: intra-and inter-observer reliability in which we choose three independent researchers with different levels of experience. In the process of developing of our study, we relied on existing research that concerned reliability assessment in measurement methods. Moreover, our study is the continuation of previous study, regarding the assessment of reliability of glenoid bone defects measurements on 2D and 3D CT [14]. However, in this study, we decided to add one more observer (orthopaedic surgeon) to present reliability of measurements performed by observers with the same level of expertise.
Calculation of sample size for reliability tests with intraclass correlation coefficient (ICC) was performed to check whether number of patients included in the analysis allows reliable statistical analysis [20]. In the intra-observer reliability, with the use of ICC, the sample size should not contain less than N = 13 and N drop = 15 in case of 10% dropout (expected ICC 0.92; two repetitions; the lower acceptable ICC was 0.7 and a significance level for a one-tailed test was a = 0.05). For the inter-observer reliability, the sample size should not contain less than N = 38 with N drop = 43 in case of 10% dropout (expected reliability ICC = 0.92; precision 0.05 with confidence level 95%).
Intra-observer and inter-observer reliability were calculated for 100 CT scans for humeral head measurements (circle area of humeral head, humeral head length, humeral head height, anatomical neck width) and for 47 CT scans containing visible Hill-Sachs lesion (Hill-Sachs length, Hill-Sachs depth). All measurements were repeated after seven days by each observer.
Reliability was calculated by means of ICC in which values can range from 0 to 1, where "0" means total non-compliance, and "1" absolute compliance of the measurement.
The minimal detectable change (MDC) defined as the minimal amount of change that is required to distinguish a true performance change from a change due to variability in performance or measurement error. MDC with 95% confidence (MDC 95 %) was calculated as percentages of measurement mean and showed real change and repeatability of the test. MDC 95 % values lower than 30% were assessed as "reasonable" and lower than 10% as "excellent" [23].

Inter-observer reliability
In 2D CT method, ICC values were "excellent" for parameters circle area of humeral head and Hill-Sachs depth; "good" for Hill-Sachs length, humeral head length, and humeral head height; and "moderate" for anatomical neck width. In 3D CT method, ICC values for all parameters were "excellent." For 3D measurements, the MDC 95 % values were "excellent" for circle area of humeral head, Hill-Sachs length, Hill-Sachs-depth, humeral head length, and humeral head height (2.76-2.89) and "reasonable" for anatomical neck width (14.00).
All measurements are presented in Table 2.

Intra-observer reliability
In 2D-CT method, ICC values for 1st experienced observer were "excellent" for Hill-Sachs length and Hill-Sachs depth, "good" for circle area of humeral head and humeral head height, and "moderate" for humeral head length and anatomical neck width.
All ICC values for the first experienced observer in 3D method were "excellent." MDC 95 % values were excellent for all 3D measurements (2.78-9.46) and reasonable for all 2D parameters (11.16-22.47).
For the second experienced observer in 2D-CT evaluation, ICC values were "excellent" for anatomical neck width, Hill-Sachs depth, "good" for circle area of humeral head and humeral head height, and "moderate" for Hill-Sachs length and humeral head length.
All ICC values for the second experienced observer in 3D CT method were "excellent."  For the inexperienced observer, in 2D CT method, ICC values were "good" for circle area of humeral head, Hill-Sachs depth, and humeral head height and "moderate" for Hill-Sachs length, humeral head length, and anatomical neck width.
All results are presented in Table 3.

Discussion
Based on the results of the study, we have confirmed the hypothesis that computed tomography with 3D reconstruction is more reliable than 2D-CT for evaluation of humeral head parameters and bone defects. Furthermore, we have also shown that a 3D-CT evaluation seems to be resistant to bias resulting from the level of the researcher's experience. In all evaluations, ICC values were "excellent" for all 3D-CT measurements. MDC 95 % values for 3D measurements were "excellent" for almost all parameters (except inter-observer anatomical neck width measurement, where the MDC 95 % value was "reasonable" (14.00)). For comparison, 2D measurements had usually good or moderate ICC values and "reasonable" or above 30% threshold values of MDC 95 %. Bone defects on the lateral surface of the humeral head were first described in 1855 by Malgaigna [24], but only in 1940 did Hill and Sachs completely described and published the morphology of these defects [25]. The incidence of the Hill-Sachs defect increases with the number of shoulder joint dislocations. After the first episode of dislocation, Hill-Sachs presence is found in about 65% of cases, and in patients with recurrent instability in almost 93% of cases [26,27]. In our series, the incidence of  humeral head lesions was 47%. The exact assessment depends on the use of an appropriate diagnostic method. The presence of Hill-Sachs bone loss is important in the case of a risk of a conflict between the humeral bone defect and the anterior glenoid rim. Therefore, an accurate assessment of the morphology of the defect (length, width, depth, and location), which is essential due to its impact on the choice of treatment method, depends on the quality of the examination methodology [28]. The assessment of the bone defects of the anterior glenoid rim and humeral head is usually based on a two-dimensional analysis of transverse CT scans. Currently available software for computed tomography analysis provides a number of useful tools, with the help of which we can perform the necessary measurements such as a measurement of the length of a straight line joining two points, the surface area of the selected point, or volume of space. However, the analysis of a twodimensional image of an essentially three-dimensional object leads to the risk of making a mistake resulting from image imperfections or measurement errors on the part of the person performing the measurement. Some models of twodimensional image processing, available in commercial programs, gave the opportunity to reconstruct a stack of individual images into one three-dimensional model, the projection of which can be set in three planes (transverse, frontal, and sagittal). However, the analysis of such projections (application of measurements) is still carried out only on one plane, which will not eliminate the basic errors of the method [29,30]. Referring to this type of "hybrid" image projection as a three-dimensional reconstruction is therefore not fully correct.
The real three-dimensional method assumes the reconstruction of a virtual three-dimensional model of the tested object and gives the possibility to perform such measurements in three planes. When applying measurements within the glenoid or the humeral head, the model can be freely rotated to identify and apply the correct measurement point. This reduces the risk of error arising from faulty setting of the initial measurement projection, as in the two-dimensional method [14].
In our previous study, we analyzed the reliability of the 2D and 3D measurement method of anterior glenoid bone loss assessment in patients with anterior shoulder instability [14]. We have proved that ICC values for 3D-CT reconstruction were significantly more reliable for most measurements than the 2D method. Just as in this study, we have proven that the 3D method allows for more accurate measurement by researchers with different levels of experience. Similar to the measurements of the glenoid defect, different measuring methods of the Hill-Sachs bone loss were described in the literature [31]. Kodali et al. positively assessed the reliability of the Hill-Sachs measurement by two-dimensional tomography, measuring the width and depth of the defect in three planes (sagittal, frontal, and transverse) [17]. In contrast, the method of three-dimensional tomography was used by Cho et al. assessing the width and depth of Hill-Sachs defects and Table 3 ICC values for 1st and 2nd experienced and in-experienced intra-observer measurements. ICC, interclass correlation coefficient; 95% CI, 95% confidence interval N Experienced observer 1 Experienced observer 2 In-experienced observer their position relative to the articular surface of the humeral head [16]. Ho et al. assessed the reliability of 3D-CT measurements of nine anatomically shaped bone models of Hill-Sachs lesions. There was strong agreement between all raters for all measured parameters (length, width, depth) [32].
One of the most important findings in our study was the experience of evaluation in interpretation of CT images matters if is based on 2D images. The spatial view and 3D reconstruction seem to provide more relatable tools independent of the experience of the surgeon. This aspect of measurement methods has not been, to our best knowledge, studied in shoulder imagine evaluations up until now (the exception being our previous publication on glenoid defects [14]). Kaup et al. evaluated the impact of radiologists' experience in diagnostic accuracy of osteoporotic vertebral compression fractures in CT and MRI imaging [33]. In another field of imaging, radiologists' experience was also addressed in the assessment of salivary gland tumours with the use of CT and MRI [34]. In both studies, higher experience resulted in greater reliability.
Traditional X-rays have also been used for the evaluation of humeral head defects. They have been part of the commonly used ISIS. This score assists surgeons in identifying the risk factors for recurrence of shoulder instability following shoulder stabilization treatment. In the case of an absence of risks, arthroscopic Bankart repair has a high potential for effective treatment. Bone defects are the major criteria and misinterpretation may lead to underscoring and hence incorrect surgical planning. Burkhart et al. shows that in 67% of patients with an invertedpear glenoid have recurrent shoulder instability after soft tissue repair and a 100% recurrence in patients with Hill-Sachs [4]. Tauber et al. found bone defects in 57% out of 41 patients reoperated on for recurrence of instability [9]. Finally, Boileau et al. identified risk factors for recurrence instability-attritional glenoid defect (> 25% bone loss) and Hill-Sachs with stretched anterior capsule or laxity [35]. The inexperience of the surgeon and the case of unclear image together with low value of instruments could be some of the reasons for such weak assessments. Traditional X-ray allows us to diagnose the presence of a defect only in about 7% of cases after the first dislocation episode, in comparison, computed tomography or magnetic resonance tomography images are much more accurate and allow us to determine the presence of a defect in more than 90% of cases [36]. Chalmers et al. report that linear measurements resulted in most aggressive recommendations of treatment [37]. Stillwater et al. assessed that there are no significant differences between measurements performed on 3D-CT and 3D-MR postprocessed images [38]. On the other hand, there are some studies which undermine the accuracy of 3D-CT measurements in comparison to measurements performed with the use of arthroscopy [39].
One of the limitations of the study is that we have just focused on humeral head defects. Recently, as studied by Di Giacomo et al. [40] and Yamamoto et al. [7], the importance of HSL the position (not only the size) and bipolar lesions have been found to play an important role in so called engagement. The identification of both seems to be an important factor in deciding on the choice of optimal operating technique to stabilize the shoulder. This study is a continuation of our work on glenoid evaluation. An evaluation of the interplay of bipolar lesions would exceed the scope of one research paper and is proposed as a matter for a further study.
Another weakness identified in current diagnosis methods is the complexity of 3D reconstruction measurements. 3D methods of measurement with the currently available software are relatively advanced and difficult to use accurately. As a result, it may be troublesome and time consuming in everyday clinical practice. An automated process could improve the practical use applicability of CT-based image reconstruction. Such attempts have already been implanted in surgical planning for arthroplasty. Good examples of this are patientsspecific instruments (PSI) software used in hip, knee, or shoulder replacement (OrthoView software etc.).
To conclude, 3D-CT measurements are more reliable than 2D for humeral head and Hill-Sachs lesion assessment. This study showed that 2D measurements, even performed by experienced observers (orthopedic surgeons) are burdened with errors. The 3D reconstruction decreased the risk of error due to inaccuracy in setting the plane of the measurements and might be precise and easy to use for evaluators inexperienced in computed tomography assessment.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.