Background

The analyses of lateral cephalograms are a fundamental part of orthodontic diagnostics and treatment planning. They are used to determine the skeletal, dental and soft tissue relations, to evaluate treatment effects and to assess the vertebrae [1,2,3,4,5,6,7]. For this purpose, defined landmarks are placed on the radiographs. These can be anatomical, radiological or constructed points. Parameters such as angles, distances or ratios are measured between these landmarks and compared to standard values.

Standardized cephalometric radiographs were introduced into orthodontics by Broadbent and Hofrath in 1931 [8, 9]. The 22-item analysis used today in the Department of Orthodontics at the University of Münster is based on the analyses by Downs, Ricketts, Rakosi and Steiner [1, 2, 10, 11].

Originally, the analyses were performed manually by drawing the landmarks, angles, and distances on the analog lateral cephalogram by hand [12]. The greatest potential for error has always been in the localization of the landmarks [13]. As early as in the 1960s, computer-based systems were developed with the intention to enable faster and less error-prone cephalometric analyses. Landmark coordinates were initially transferred by hand using a drawing table [14] and later using digital reading systems [15], which were only slowly gaining acceptance due to their high cost [16]. Meanwhile, methods for digitizing radiographs were already developed [17], but until the 1990s these were qualitatively inferior to the use of digital readout systems [18]. Nowadays, direct digital x-ray technology eliminates the need for time-consuming and quality-reducing intermediate steps for viewing and tracing cephalograms on a computer. Furthermore, digital radiographs offer the option of changing the image in contrast, brightness and size, so that structures of different translucency can be viewed in detail. Another advantage of digital x-ray technology is the lower radiation dose for the patient [19].

For the diagnosis of digital radiographs a darkened room and a suitable viewing monitor are required. The use of tablet computers for radiographic analysis was already considered shortly after the introduction of the iPad (Apple, Cupertino, CA, USA) in 2010 [20]. Initial comparisons to conventional liquid-crystal displays (LCD) [21, 22] were promising, but the observer performance on iPads was found to be significantly lower than with calibrated monitors [23]. With the introduction of the high-resolution “Retina” display as part of the third-generation iPad in 2012, there was no longer a significant difference in comparison to calibrated viewing monitors [24] and the American Board of Radiology considered the iPad’s retina display adequate for examination in all specialties [25]. There was also no significant difference between tablet computers and viewing monitors in terms of reliability of landmark identification [26]. Finally, a 2015 systematic review found that the use of a tablet computer does not generally affect the interpretation of a radiograph [27].

In contrast to a PC with a viewing monitor, the use of a tablet computer allows for more flexible work. One can perform analyses directly in a darkened lecture hall, and even patient-side use is an option, since sterile packaging and disinfection of the device are possible [28, 29].

When using a tablet computer, inputs are made with the finger or a stylus directly on the touchscreen. The reproducibility of cephalometric analyses on tablet computers using a stylus and desktop computers using a mouse driven cursor has been studied previously, and no differences in measurements between the two modalities were found for any of the cephalometric parameters [30].

The aim of this study was to investigate the accuracy and tracing time of dental students when identifying landmarks on lateral cephalograms using a tablet or desktop computer. The null hypothesis was that the device used would have no effect on the accuracy or tracing time of landmark identification.

Methods

This prospective study received approval from the Ethics Commission of the Medical Faculty of the University of Münster, Germany (2021-060-f-S). The study took place at the Department of Orthodontics at the University Hospital Münster, Germany.

Software

A web-based application for performing cephalometric analyses of digital lateral cephalograms was developed.

The application was implemented with Typescript using the React frontend framework. Internationalisation for German and English was realised using the react-intl library to allow for future international use of the software. To import radiographs according to the Digital Imaging and Communications in Medicine (DICOM) standard, a lightweight parser was implemented.

The application allows the brightness, contrast and magnification of the cephalogram to be freely adjusted. The sequence in which the landmarks are placed is suggested by a list representation, but is freely choosable. Placed landmarks can be corrected at any time. To assist the examiner, a small schematic drawing showing the ideal position of the selected landmark and its definition is provided (Fig. 1, Table 1).

Fig. 1
figure 1

Placement of landmark Nasion in the web-based application. A small schematic drawing at the lower right edge shows the examiner the ideal position of the selected landmark and its definition

Table 1 Definitions of the Landmarks used in the 22-item cephalometric analysis of the University of Münster as shown in the software

The same web-based application was used on both the tablet and desktop computers. Therefore, a software-independent comparison of the cephalometric analysis performed with the two types of computers was possible.

To carry out the analyses, each student was provided with an iPad with Retina display (Apple, Cupertino, CA, USA) while the students used their own desktop computers.

Data acquisition

Of all lateral cephalograms taken at the Department of Orthodontics in 2012-2017, 30 were randomly selected using a random number generator [31]. To obtain the radiographs, the heads of all patients were aligned with the sagittal plane perpendicular to the X-rays and the Frankfurt plane parallel to the floor. The teeth were in maximum intercuspation and the lips closed. After anonymisation of the cephalograms the following exclusion criteria were applied: unerupted or missing incisors, unerupted or missing first molars, malposition of the head in the cephalostat, osteosynthesis plates in situ or a missing scale. Selection was made without regard to gender, type of occlusion or skeletal pattern. After application of the exclusion criteria, 26 radiographs remained. From these, three were finally selected using the random number generator.

Fig. 2
figure 2

Selection and allocation of the cephalograms with number of cephalograms (\(n_c\)), semesters (\(n_s\)), and analyses (\(n_a\))

One cephalogram was used to introduce the software to the students only. The other cephalograms (A, B) were analyzed by the students on the tablet and desktop computer accordingly. Two different cephalograms were used to avoid learning effects. The assignment of the cephalograms (A, B) to the computer type (tablet, desktop) was switched semester wise so that an influence of the cephalogram could be assessed separately from an influence of the device (Fig. 2).

Fig. 3
figure 3

Landmarks used in the 22-item cephalometric analysis of the University of Münster: Nasion (N), Basion (Ba), Orbitale (Or), Porion (P), Pterygoid point (Pt), Sella (S), Anterior nasal spine (Spa), Posterior nasal spine (Spp), A point (A), Condylion (Co), Condylar midpoint (DC), Anterior border of the Ramus (R1), Posterior border of the Ramus (R2), Semilunar incisure (R3), Lower border of the Ramus (R4), Ramus midpoint (Xi), Horizontal tangent point (hT), Menton (Me), Pogonion (Po), B Point (B), Suprapogonion (Pm), Constructed gnathion (Gnk), Upper Incisor edge (UpIe), Upper Incisor apex (UpIa), Lower Incisor edge (LoIe), Lower Incisor apex (LoIa), First Upper Molar mesial apex (1UpMma), First Upper Molar distal contact (1UpMdc), Apex nasi (Ap), Subnasal (Sn), Upper Lip (UpL), Lower Lip (LoL), Pogonion molle (Pom). Figure adapted from [32]

Eligible participants were dental students of one orthodontic course that is part of the clinical curriculum in the seventh semester at the University of Münster. All students received the same education on cephalometric anaylsis. The course consisted of a lecture on the history, landmarks, planes and measurements of cephalometry combined with practical exercises on manual landmark positioning. The course lasts four hours and is divided into five sessions. This is followed by a 45-minute software demonstration session. The cephalograms used in this study were not used in the teaching or during the demonstration to avoid a learning effect.

Each student performed the 22-item cephalometric analysis of the University of Münster on the tablet computer (using a finger) and desktop computer (using a mouse) in no particular order. The students were instructed to perform the analysis without interruption and in a darkened room. The landmarks used for the 22-item analysis can be found in Fig. 3.

The landmark locations as well as timestamps for the first and last landmark placement were exported from the software in JSON (JavaScript Object Notation) format and submitted for evaluation. The JSON files were pseudonymised and processed using a Python script. The pseudonym was generated from the plain name and a salt (a random string) using the cryptographic one-way function SHA3-256 and subsequent sorting and ranking.

Exclusion criteria for the submitted cephalometric analyses were use of a cephalogram other than the ones provided, incorrect assignment of the cephalogram to the device type, missing landmarks, and duplicate submissions.

To establish a reference, six experienced orthodontists performed the analysis for each image on a desktop computer with the calibrated medical viewing monitor RadiForce RX220 (EIZO, Hakusan, Ishikawa, Japan) in a darkened room. Mean values for each landmark position were used as the reference (\(x_{i_{ref}}\), \(y_{i_{ref}}\)).

Students’ accuracy was evaluated as the mean radial error (MRE) in mm (Eqs. 1 and 2), defined as the sum of all Euclidean distances (\(d_i\)) to the reference landmarks divided by the number of landmarks (\(l=33\)).

$$\begin{aligned} d_i = \sqrt{\left(x_{i_{stud}} - x_{i_{ref}}\right)^2 + \left(y_{i_{stud}} - y_{i_{ref}}\right)^2} \end{aligned}$$
(1)
$$\begin{aligned} MRE = \frac{\sum _{i=1}^{l}{d_i}}{l} \end{aligned}$$
(2)

Timestamps of the placement of the first and last landmark were recorded and the difference used as a measure of the students tracing time.

The resulting dataset contained the pseudonym of the student, the identifier of the image (A or B), the computer type used (tablet, desktop), the order identifier (0 if this is the students first analyses, 1 otherwise), the time required for identification of all landmarks in minutes and the students accuracy as defined above.

Statistical analysis

The reliability of the established reference coordinates was assessed with an intraclass correlation coefficient using a two-way mixed effects model for the absolute agreement of multiple raters (ICC(A,k) according to McGraw and Wong [33]) using the irr package [34] for R [35]. The level of reliability was defined according to Koo and Li [36]: poor reliability \(<0.5\), moderate reliability \(<0.75\), good reliability \(<0.9\), excellent reliability \(>0.9\).

Descriptive statistics were performed for the students accuracy, tracing time and successful detection rate. A deviation of 2 mm was considered clinically acceptable [37, 38].

Linear mixed effect analysis was performed to test the influence of the device on accuracy and tracing time. Computer type (tablet or desktop), cephalogram (A or B), gender of the student, and order of analysis were considered as fixed effects. A random intercept for subjects was also included. The significance of each fixed effect was tested by a likelihood ratio test of a model with that effect against a null model. In a second step, models with an increasing number of these significant effects were tested against the previous models. Finally, a model with all effects that showed a significant improvement was selected. The linear mixed effects analyses were executed using the lme4 package [39] for R [35].

Results

The study was conducted from 2018 to 2022 over a period of 8 semesters. During this period 303 analyses were submitted. Of these 26 analyses had to be excluded due to the exclusion criteria: 16 contained the wrong cephalogram, 5 had a screenshot of the provided cephalogram, 3 had missing landmarks, and 2 were invalid json files, ultimately resulting in 277 submissions with a total of 9141 landmarks being included in the study. The resulting study group consisted of 161 (108 female, 53 male) students.

The interrater reliability of the six orthodontists that established the reference coordinates (Table 2) was excellent (\(ICC~>~0.9\)).

Table 2 Reference coordinates for the landmarks as established by six orthodontists with the corresponding interrater reliabilities

Accuracy of students’ landmark identification

The mean landmark deviation of the students was 2.05 mm (SD = 2.63). The landmarks LoIe, UpIe, Ap, Sn, S and N were identified with the smallest deviation. The largest deviation was found for the landmark R4, Co, R3, P, Ba and R1. The deviations for all landmarks are listed in Table 3 and visualised in Fig. 4. The landmarks as placed by the students are shown in Fig. 5.

Table 3 Accuracy of students’ landmark identification evaluated as the mean radial error and the successful detection rate below different thresholds
Fig. 4
figure 4

Deviation of the students’ landmarks to the reference in mm

Fig. 5
figure 5

Positioning of the landmarks by the students on image B

The likelihood ratio tests showed a significant effect of the image (\({\chi }^2(1)~=~19.10\), \(p~<~.001\)) and students’ gender (\({\chi }^2(1)~=~5.54\), \(p~=~0.02\)) on the accuracy. The type of computer (\({\chi }^2(1)~=~0.98\), \(p~=~0.32\)) and the order in which the analyses were conducted (\({\chi }^2(1)~=~0.11\), \(p~=~0.75\)) had no significant effect. There was no significant interaction between image and gender (\({\chi }^2(1)~=~0.08\), \(p~=~0.78\)).

The resulting model suggested that image B was more difficult to analyse than image A, with an estimated effect of 0.21 mm, and that male students performed better than female students regardless of the image, with an estimated effect of 0.24 mm. The estimates and confidence intervals of the effects are shown in Table 4.

Table 4 Linear mixed effect model for accuracy (deviation in mm) and tracing time (in minutes per analysis)

Successful detection rate

The successful detection rate (SDR) for the clinically acceptable threshold of 2 mm was 68.6% over all landmarks. The SDR for 2 mm was greater than 90% for 8 landmarks and less than 35% for 4 landmarks. The SDRs for all landmarks and different thresholds are listed in Table 3.

Tracing time

The median tracing time for the students was 11.80 minutes per analysis (IQR 7.70–20.49), while for the orthodontists it was 5.15 minutes (IQR 4.27–7.24).

Regarding students’ tracing time the likelihood ratio tests showed a significant effect of the order in which the analyses were conducted (\({\chi }^2(1)~=~19.55\), \(p~<~.001\)). The image (\({\chi }^2(1)~=~0.08\), \(p~=~0.77\)), type of computer (\({\chi }^2(1)~=~1.53\), \(p~=~0.22\)) and gender (\({\chi }^2(1)~=~0.03\), \(p~=~0.86\)) had no significant effect.

The resulting model suggests that performing the analysis for the second time is faster with an estimated effect of 11.72 minutes. The estimates and confidence intervals of the effect are shown in Table 4.

Cephalometric measurements

Cephalometric measurements were calculated using both the reference landmarks and the landmarks placed by the students. Significant differences were only found for four of the 22 measurements (facial depth, mandibular plane, relative mandibulary length and relative maxillary length) as shown in Table 5.

Table 5 Cephalometric measurements calculated from the reference landmarks and those placed by the students. Descriptive statistics with mean (M) and standard deviation (SD) as well as the results of t tests (assuming heterogeneous variances)

Discussion

The present study focuses on the development and evaluation of a web-based application for performing cephalometric analyses of digital lateral cephalograms. The study results showed no influence of the type of computer (i.e. tablet or desktop) on the students’ accuracy or speed when performing the analysis.

Previous studies on app-based versus manual tracing showed no clinically relevant differences in tracing accuracy [40,41,42]. Recent studies comparing desktop computers to smartphones found comparable results on tracing accuracy [43, 44], but inconsistent results on tracing time [44, 45]. For tablet computers with pen-input, two studies found no significant difference from desktop-computer-based analyses [30, 46] and one study found that the mobile apps were inferior [47]. To our knowledge, there have been no studies comparing computers with touch-input (i.e. smartphone or tablet) with desktop computers, using the same application on both devices.

Most studies comparing the accuracy of tracing methods [30, 41,42,43,44,45, 47] used the cephalometric measurements as a measure of tracing accuracy, while one study from 2015 [46] as well as more recent studies covering neural network based approaches used the landmark location.

The advantage of using landmark locations is that they are easier to compare across studies, as the number of different - non-comparable - measurements that can be made with the same set of landmarks is naturally greater. In addition, angular measurements in cephalometry mask placement errors that occur when the landmark is misplaced along the arms of the measured angle.

The landmarks identified with the smallest deviation (LoIe, UpIe, Ap, Sn, S and N) are consistent with previous studies on the reliability of cephalometric landmarks [18, 48,49,50,51,52,53,54,55,56,57]. Regarding large deviations, the results are also in agreement with previous studies stating that the identification of landmarks in the petrous temporal region (i.e. Ba, Co and P) is difficult due to superimpositions and that the error is generally larger for landmarks along gradually curved surfaces (i.e. R1, R3 and R4) due to elliptical error distribution [53].

The results of the mixed linear effect model showed that image B was slightly more difficult to analyze, with an increase in mean deviation of 0.2 mm. This could be explained by more structures being superimposed in image B. It was also found that the gender of the students had a significant influence, with male students being more accurate by 0.2 mm.

In the study population, the gender distribution was unbalanced with 108 female and 53 male students. This imbalance is related to the higher prevalence of female students in dental education. In recent decades, the proportion of female students in dentistry has increased, which can be attributed to a higher application rate with comparable admission rates between the genders [58]. Considering the unequal gender distribution and the small effect size found, the gender-specific difference in accuracy should be interpreted with caution.

Regarding the tracing time, the results showed that the students perform the second analysis faster than the first one with a mean decrease of 11.72 minutes, indicating a learning effect. The fact that the students needed a median of 12 minutes for a cephalometric analysis, while the orthodontists were significantly faster with a median of 5 minutes, shows that the time needed decreases with increasing experience. The other effects considered (i.e. device and gender) had no significant influence on the tracing time.

To assess the clinical performance of the students, cephalometric measurements were calculated for both the reference and student landmarks (Table 5). The variability of the student measurements were comparable to that reported in previous studies ([41,42,43,44,45, 47]). A significant difference to the reference was only found for four of the 22 measurements (facial depth, mandibular plane, relative mandibular length and relative maxillary length).

In view of the progress made in the field of automated cephalometry, the question arises as to whether manual landmark positioning is still relevant. Although there has been great progress in the field of automated evaluation of cephalometric analyses in recent years with the availability of open annotated datasets [59] and the continuous development of various neural network architectures [60], recent studies that have evaluated cephalometric analyses by such AI-based systems and those performed by experienced orthodontists could only recommend the use of these systems under supervision [61]. On the other hand, the idea of collaboration between AI-based systems and students seems promising [62] and should be evaluated as an approach to support the teaching of cephalometry.

According to our results, using tablets for cephalometric analyses in orthodontic education must be considered an appropriate approach and can be recommended. Considering that teaching cephalometric landmark identification with a smartphone-based application has been shown to be at least equivalent to lecture-based instruction [63], a fully digital workflow seems feasible.

Strengths and limitations

The prospective nature and the large number of submitted cephalometric analyses can be seen as a strength of the present study. It provides valuable data on what can be expected from beginners in orthodontics in terms of accuracy and tracing time. However, this study has some limitations that need to be considered when interpreting the results.

The cephalograms chosen seemingly had different degrees of difficulty. The analysis of multiple cephalograms was conducted to minimise bias in the results with respect to landmarks that are particularly difficult to locate in the image. Due to the voluntary nature of the participation, the analysis of only two cephalograms per student was possible, resulting in a limited sample size. A larger sample size would increase the generalisability of the findings and provide more statistical power.

The study was conducted at the Department of Orthodontics at the University Hospital Münster, Germany. The findings may not be applicable to other universities with different curricula, as there may be variations in the expertise and techniques employed at different institutions.

The students who performed the cephalometric analyses were aware of the computer type (tablet or desktop) they were using. This lack of blinding could introduce bias and influence their performance.

The students were instructed to conduct the analysis without interruption and in a darkened room, but this could not be controlled and should be taken into account when interpreting the results. In addition, the timestamps were only registered for the entire session and not for individual landmarks, since the order of landmark placement and later corrections of their position could not be tracked.

Each student was provided with an iPad (Apple, Cupertino, CA, USA) while the students used their own desktop computers, which must be considered as another limitation of the present study, as it contributes to the heterogeneity of the desktop computer based analyses, also because it could not be guaranteed that the respective screens were suitable for x-ray diagnosis.

The overall accuracy of the students was low and the tracing time was high, which was to be expected as the students were taught cephalometry in the semester in which the study was conducted.

The study focused on the students’ accuracy and tracing time as outcome measures. While these measures provide insights into the performance of the web-based application, its clinical validity was not evaluated. Further research is required to identify and address any potential limitations introduced by the software itself and to assess its clinical validity.

Conclusions

No significant influence of the device used to perform a cephalometric analysis was found with regards to accuracy and speed. The use of tablet computers for cephalometric analyses in orthodontic education can be recommended.