A rater agreement study on measurements in cross-sectional CBCT images exploring the association between alveolar bone morphology and craniofacial height

Objectives To investigate rater agreement regarding measurements of height and width of the maxilla and mandible using cross-sectional images from CBCT examinations. Furthermore, to explore the association between vertical craniofacial height and alveolar bone morphology. Methods Pre-treatment CBCT scans from 450 patients referred for treatment to a private clinic for orthodontics and oral surgery in Scandinavia were available and of these, 180 were selected. Lateral head images were generated from the CBCT volumes to categorise subjects into three groups based on their craniofacial height. Cross-sectional images of the maxillary and mandibular bodies at three locations in the maxilla and mandible, respectively, were obtained and measured at one height and two width recordings by five raters. One-way analysis of variance with a Tukey post hoc test was performed. A significance level of 5% was used. Results Rater agreement was mostly excellent or good when measuring height and width of the maxilla and mandible in cross-sectional CBCT images. For height (of the alveolar bone/bodies), there were statistically significant differences between the low- and the high-angle groups for all the observers when measuring in the premolar and midline regions, both in the maxilla and in the mandible. Conclusion The high agreement found ensures a reliable measurement technique and confirms the relation between craniofacial height and alveolar bone height and width. Supplementary Information The online version contains supplementary material available at 10.1007/s11282-020-00493-4.


Introduction
Knowledge of the morphological features of the alveolar bone is of great importance for orthodontic tooth movement as well as for the planning and thus, the outcome of dental implant treatment. In orthodontics, movements of teeth in a narrow alveolar bone may cause bone dehiscence, root resorption and gingival recessions, especially in the lower midline region [1]. In implant treatment, sufficient bone volume is a prerequisite for satisfactory outcome and the underlying bone structure plays a key role in the establishment of an acceptable aesthetic result, especially in the anterior maxilla. The bone surrounding an implant site must be of sufficient height and thickness to obtain and keep a harmonious gingival margin [2]. For these clinical situations, assessments and often measurements in radiographs are performed. Before assessment tools can be used for a clinical situation, the reliability of the tool must be established [3] where reliability can be seen as the ability of a measurement to differentiate among subjects or objects. Furthermore, agreement between measurements is an important concept to provide information about the quality of measurements [4].
Several studies have provided evidence that there is a significant association between craniofacial height and the morphology of the alveolar bone [5][6][7]. A majority of these studies have used lateral cephalometric radiographs to analyse the vertical and the sagittal dimensions of the face [8,9] and to some extent, these studies have used radiographs obtained with a posterior-anterior projection [9,10]. The drawbacks of these two imaging techniques are that the three-dimensional structure of an object is imaged two dimensionally which causes a loss of information. The use of tomographic imaging provides the possibility to assess the alveolar bone morphology in three dimensions as well as different regions in detail. Computed tomography (CT) of dry skulls has been used to study the association between mandibular structures and craniofacial type as well as between craniofacial type and bucco-lingual molar inclination [11,12]. Gracco and co-workers used cone beam CT (CBCT) images of patients to investigate associations between the morphology of the upper jaw, the position of the upper incisors, and craniofacial type as well as the association between the morphology of the mandibular symphysis and the various craniofacial types [13,14]. Also, based on measurements in CBCT images, significant relationships have been found between craniofacial height and alveolar bone height and width in different tooth-bearing regions of the maxilla and mandible [15,16]. With regard to measurements of alveolar bone height and width and the association to craniofacial height, some studies present intrarater agreement [11,15]. However, presentation of interrater agreement is infrequent.
The aim of this study was, therefore, to investigate rater agreement regarding measurements of height and width of the maxilla and mandible using cross-sectional images from CBCT examinations. Furthermore, to explore the association between vertical facial height and alveolar bone morphology.

Materials and methods
This is a retrospective rater-based study on agreement of measurements of the maxilla and mandible in CBCT images obtained in patients before orthodontic treatment. It was conducted, analysed, and reported in accordance with the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) [4]. An initial study protocol was prepared, including data collection, raters and statistical analyses. The protocol was discussed and accepted by the raters.

Subjects
Pre-treatment CBCT scans from 450 patients, females over 15 years and males over 16 years, referred for treatment during 2008-2013 to a private clinic for orthodontics and oral surgery in Scandinavia were available. Either CBCT scans from individuals with missing permanent teeth, other than third molars, periodontal disease visually detected on the radiographs, major asymmetries of the jaws or previous orthodontic treatment was excluded.

Radiography and categorization of subjects
CBCT examinations were performed using an i-CAT CBCT 17-19 (Imaging Sciences International, LLC 1910 N Penn Road, Hatfield, PA 19440, US). The patients were seated in an upright position during scanning. With the aid of laser markers, the midsagittal and occlusal planes were adjusted perpendicular to each other. Field of view (FOV) was set to 16 cm × 13 cm with a voxel size of 0.3 mm. Exposure was set at 120 kVp and 18.54 mAs with a scanning time of 17.8 s. Calibration of this machine was regularly performed according to the manufacturer's requirements twice a year.
Lateral head images were generated from the CBCT scans using the i-CAT software program. Cephalometric analysis of lateral images was done using the computer software program Total Interactive Orthodontic Planning System [17] (TIOPS, www.tiops .com). Mouse-click on the points of landmarks was used to classify subjects into three groups based on their craniofacial height using the angle of the lower mandibular border (Mandibular line, ML) in relation to cranial base (Nasion-Sella line, NSL). The inclination of the angle formed between the NSL line and the ML line was used to categorize the subjects into the following: low-angle < 27°, average/normal-angle 27-37° and a highangle group > 37°. After identifying 60 individuals in the low-angle group, this number of scans was set as the limit for the number to be included in the normal-and high-angle group for equal comparisons giving a total of 180 subjects, as described previously (16).
Using i-CAT Vision software (Imaging Sciences International, Hatfield, Pennsylvania, USA), a fully reconstructed three-dimensional image with sagittal, coronal, and axial slices was generated.

Raters and rating (measurements)
Five raters performed measurements on the CBCT images. Of the raters, one is a specialist in oral and maxillofacial radiology (with 29 years of experience), and one is a post doc in oral and maxillofacial radiology (with 5 years of experience). Furthermore, the raters consisted of one specialist in oral and maxillofacial surgery (with 16 years of experience), one resident at the same department and one general dental practitioner. All raters were aware of the purpose of the study and performed the same measurements independently of each other. Prior to the measurements, an information session and calibration exercise took place with all the raters, and the assessment instructions were specified both verbally and in writing. Thus, the instructions were provided to all the raters. All raters were familiar with handling CBCT images.
All measurement sessions took place in the same room and a BARCO (MFGD 1318; BARCO, Kortrijk, Belgium) 18.10 greyscale liquid crystal display monitor was used with a luminance of 400 cd/m 2 and resolution of 1280 × 1024 pixels. The observation room was dimly lit and kept constant below 50 lx as recommended by American Association of Physicists in Medicine Task Group 18 [18]. The distance to the screen was approximately 50 cm. There was no restriction on the observation time. The raters were allowed to use the zooming tool. All raters were blinded to clinical features such as craniofacial height and sex.
Before beginning to measure, raters had the possibility to adjust for small deviations in the patient´s head position during exposure by re-aligning the skull through an adjustment of the images in the sagittal, coronal and axial planes, respectively. The nasion line of the subject was oriented horizontally prior to measurements in the maxilla. For mandibular measurements, the mandibular base line was set horizontally. For every group of patients (low, normal, high angle) 3 sites (molar, premolar, and midline region) in maxilla and mandible, respectively, were measured by each rater in rotation (Fig. 1a). Sites were chosen within the three groups of patients to obtain an even distribution between molar, premolar and midline regions. Measurements were performed, with one height and two width measurements between the teeth at selected cross-sectional sites (Fig. 1b). The measurements were performed using the measurement tools in the software program i-CAT vision. For calculation of intrarater agreement, 10% of the sites were randomly selected in IBM SPSS software (version 22.0; IBM Corp Armonk, NY, USA) and measured by all raters in a second session after approximately 2 months.
The measurements were simultaneously and manually documented in an Excel (Microsoft Office Excel ® 2010; Microsoft Corporation, Redmond, WA) file by the responsible researcher.

Data analysis
All computations necessary for the statistical analysis were performed using IBM SPSS software (Version 22.0; IBM Corp Armonk, NY, USA). For all variables, the three groups (low, normal, high angle) were compared using a one-way analysis of variance with a Tukey post hoc test. A significance level of 5% was used in all comparisons.
Inter-as well as intrarater agreement of measurements in selected cross-sectional sites was calculated as intra-class correlation coefficients (ICCs 2.1) with 95% confidence interval (CI). Only measurements from the first measurement session performed by each rater were used to calculate interrater ICC. The level of agreement was interpreted according to the guideline proposed by Koo and Li [3] as follows: < 0.50, poor; between 0.50 and 0.75, fair; between 0.75 and 0.90 good; above 0.90, excellent agreement.

Results
Characteristics of individuals belonging to the three groups of craniofacial height is seen in Table 1. Height and width of the maxillary and mandibular alveolar bone/bodies were measured by all raters at all selected sites giving a total of 1080 measurements per rater. For calculation of intrarater agreement, re-measurements were performed by all raters at 10% of the sites giving 108 measurements per rater.

Interrater agreement
Overall interrater agreement ICC for height measurements was in general excellent or good and varied between 0.75 and 0.91 (CI 0.67-0.83 and 0.88-0.94) depending on the measured site. The values were higher for measurements in the mandible compared with the maxilla, and the highest value was recorded in the mandibular premolar region (Fig. 2a).
Corresponding ICC values for width measurements were in general lower than for height measurements. In all sites but one, coronal measurements showed higher ICC than apical measurements. Overall interrater agreement ICC for coronal measurements varied between 0.55 and 0.88 (CI 0.43-0.66 and 0.83-0.92) with the highest value in the mandibular premolar region. For apical measurements, ICC varied between 0.50 and 0.78 (CI 0.37-0.63 and 0.70-0.85) with the highest value also being found in the mandibular premolar region (Fig. 2b).
For pairwise interrater, the highest ICC values were also achieved for measurements in the mandible compared with the maxilla. The values for height measurements were in general higher and when comparing coronal and apical width measurements, the highest values were seen for coronal measurements. Taking all pairwise interrater agreements into consideration, 8% was interpreted as excellent, 47% as good, 39% as fair and 5% as poor agreement according to the suggested guidelines for interpretation of ICC values by Koo and Le [3] (Supplementary Table S1). Fig. 1 a When selecting sites for measurements, the steps performed by the raters when adjusting the volumes for small deviations in the patient´s head position during exposure. b Measurements of height and width in cross sectional CBCT images of the maxilla and mandi-ble. One height and two width measurements at each site were measured. The sites were named according to location of the neighboring teeth, e.g. upper molar = UM

Intrarater agreement
No rater consistently presented the highest or the lowest intrarater agreement, but the CI was somewhat wider for some raters. The highest agreement was found in the molar region in the mandible when measuring the height of the alveolar bone (Fig. 3).

Height measurements
Statistically significant differences between the low-and highangle groups for all raters were found for measurements in the premolar and midline regions, both in the maxilla and in the mandible. Regarding the molar region in the maxilla, there were significant differences between the low-and high-angle groups for three of the raters. Regarding the height measurement in the molar region in the mandible, there were no significant differences between any of the groups for any of the five raters (Table 2).

Width measurements
Coronal width measurements in the maxilla at molar, premolar and midline regions displayed no statistically significant differences between any of the groups for the five raters. When measuring in the mandible, there were no statistically significant differences between the three facial groups when measuring in the molar region. When measuring in the premolar region, statistical differences were seen between the low-and high-angle groups for one rater. But when measuring in the midline region, there were statistically significant differences between the low-and high-angle groups for four out of five raters (Table 3).
Apical width measurement displayed no statistically significant differences between any of the three craniofacial groups when measuring in the molar, premolar, and midline region in the maxilla. No statistical differences were presented between any of the craniofacial groups in the molar and premolar region in the mandible. Although, in the incisal mandibular region, there were statistically significant differences between the low and high facial groups for the majority of the raters, four out of five (Table 4).

Discussion
The present study demonstrates that it is possible to achieve good agreement between several raters as well as within raters when height and width of the alveolar bones are measured in cross-sectional CBCT images. Furthermore, an association between craniofacial height and alveolar bone height found in this study was statistically significant for all five raters' measurements.
Accuracy is one part of investigating the strength of a diagnostic method, the other is the agreement, which is the degree to which scores, or ratings, are identical [4]. Unfortunately, agreement studies are generally neglected and do not appear in the different stages of evaluating studies of diagnostic methods or in studies where diagnostic methods are used to evaluate treatment outcomes [19,20]. The results of a study of an imaging method and clinical problem will be influenced not only by the number of objects and raters but also by the rater selection, e.g. their expertise [21]. The raters in the present study represented professional experience from different fields of expertise and the length of their experience varied; this is also the case with all potential users in a clinical situation. Since the raters may have different other prior experience and visual concepts, a study with several raters can be anticipated to give a more reliable result. To avoid influence on the assessments and consequently the result, the raters were blinded to all patient information and to the other raters' measurements.
The raters received instructions to choose an approximal cross-sectional site between first and second molar, between first and second premolar and in the midline at which to perform the height and width measurements. This may have given a lower ICC score and wider confidence intervals than if pre-selected sites had been used. On the other hand, this situation mimics the clinical situation better as "free selection" takes place in a clinical situation. It is important to be able to apply the results in a clinical setting as the external validity would be limited if the results were only applicable in a staged research environment.
Although a standardised calibration procedure to prevent bias was applied prior to measurements, a variation in rater agreement was found. Overall interrater agreement was in general higher for measurements in cross-sectional CBCT images of the mandible compared with the maxilla and it was the highest in the premolar region for both height and width measurements. This indicates that anatomical landmarks in the mandible might be easier to identify. The marginal bone area and basis of the mandible are probably more distinct in an image than the marginal bone area in the maxilla and the borders to maxillary sinus and nasal cavity. Pairwise interrater agreement was the  Table S1) [3]. The lowest agreement was noted when the raters measured in the upper midline region. This might be explained by difficulties in the interpretation of sites where there are anatomical variations such as the incisive foramen.
Interrater agreement was expressed as overall ICC as well as pairwise interrater agreement. The pairwise interrater calculations is able to detect if any rater differs considerably from the others, which can then be analysed further.
The difference might be due to a misinterpretation of the instructions or an unfavourable measurement technique. In this study, no clear deviation was observed for any rater (Supplementary Table S1).
The CI of the ICC was in some measurements negative which indicates considerable uncertainty. The somewhat low rater agreement can be explained by variations in the steps performed by the raters when adjusting for small deviations in the patient's head positioning and in the selection of sites for measurements as well as difficulties in identifying anatomical structures and handling CBCT volumes. In the study  [15]. However, no confidence interval was reported, and the number of observers was vaguely described as "other orthodontists" without mentioning their exact numbers or professional experience. Regardless of the statistical approach used, confidence intervals as measures of statistical uncertainty should be reported to allow the readers to be able to determine, in particular, the lower level of reliability/agreement. Taking several rater measurements into consideration, the results of this study showed that patients with large craniofacial height (high angle) has a significantly higher alveolar bone, both in the maxilla and in the mandible compared to those with low craniofacial height (low angle). The association between craniofacial height and cross-sectional maxillary and mandibular bone height was most evident in the premolar and incisal regions. These results strengthen the findings of our previous study [16] and are to a certain extent in concordance with the results of the study by Sadek et al. [15] where statistical differences were found in the anterior part of the maxilla. Therefore, it can be concluded that the dentoalveolar compensatory mechanism, via continued tooth eruption, responds in the maxilla and mandible by enlarging the vertical size of the frontal dentoalveolar heights in long-face subjects and, conversely, less tooth eruption will take place in short-face subjects [22]. A further indication, on the association between vertical craniofacial height and alveolar bone morphology, especially in the anterior region, is that in this study the coronal and apical width in the midline region of the mandible (LO-MID) was narrower in the group with high craniofacial height compared with the group with low craniofacial height. In the premolar and molar areas, there were no statistically significant differences in the coronal nor in the apical width measurements between any of the groups.

Strength and limitations
Rater agreement has been investigated to explore measurement errors and variations in interpretation, which affects the value of measurements in clinical practice [23] and have to be taken into account when evaluating methods in any diagnostic yield. The number of subjects in the study is larger than that included in other comparable studies [15,  ]. Nevertheless, it is a retrospective study design where CBCT examinations were performed prior to the design of the current study. The subjects included in the study were referred to a specialist clinic for orthodontic and oral surgery which means that the results may not be generally applicable.

Conclusions and clinical implications
Knowledge of the morphological features of the alveolar bone is of importance when planning orthodontic tooth movement or dental implant treatment. The results from this study show the significant association between craniofacial height and alveolar bone dimensions as explored by several raters and would provide reference data that can be useful prior to orthodontic or dental implant treatment in subjects with different craniofacial types.