Introduction

Anatomically, the airway space region consists of structures from above the plica vocalis to its two openings (nose and mouth) responsible for nasal respiration and the growth and development of craniofacial structures [1, 2]. Changes in the pharyngeal airway morphology that occur following treatment in terms of management modalities have recently garnered attention among Ear Nose Throat (ENT) physicians/surgeons, maxillofacial surgeons, and orthodontists [3]. Severe deformation of airway morphology may result in impaired respiratory function, lower quality of life, and even life-threatening illnesses such as obstructive sleep apnea (OSA) [4]. OSA may lead to blocking which results in inadequate breathing during sleep. OSA accounts around 2–7% of adults [5]. However, most OSA cases remain underdiagnosed due to high cost and delayed diagnostic processes [6].

Polysomnography is a conventional technique that remains the gold standard in OSA diagnosis and uses the apnea-hypopnea index (AHI) to determine the severity. This technique is however expensive, time-consuming, and often unreliable [7]. Efforts were made to develop new techniques using image modalities to directly reflect the upper airway status [8]. Although 2-dimensional imaging techniques were used to assess craniofacial morphology, nevertheless, the complexity of the airways could not be entirely carried out [9]. Later, three-dimensional (3D) computed tomography (CT) imaging techniques have been developed to assess the changes in the airway extending from the tip of the nose to the upper end of the trachea [10].

Since Cone Beam Computed Tomography (CBCT) has the ability to capture high-quality images which demonstrate more reliable to be used in reconstruction of the 3D airway structures. It enabled the accurate analysis of the cross-sectional areas and the volumetric regions [11]. As a result, various studies utilized 3D model building from CBCT images to investigate airway anatomy. In 2021, a comprehensive scoping review was published which shed light on the existing evidence on the utilization of artificial intelligence and machine learning in the field of orthodontics and its clinical translation [12]. This review found that Artificial Neural Networks (ANN) were the most commonly used machine learning algorithm and their major domains were diagnosis and treatment planning followed by landmark and growth assessment. Another systematic review was published in 2022 which analysed the development, application and performance of artificial intelligence (AI) in automated cephalometric landmark detection and diagnosis [13]. From the outcome of this review it was seen that AI based analyses offered a clinically acceptable diagnostic performance which accuracy levels similar to trained orthodontic specialists. The major benefit seen was in terms of providing a very quick turnaround time for cephalometric diagnosis. In the year 2022 another paper was published discussing the validity of machine learning models in the field of orthodontics [14]. The authors concluded with the help of various clinical examples that orthodontic practitioners must identify the limitations and benefits of using AI models and that algorithms that base their learning on human mistakes also adopt mistakes and biases.

In orthodontics, multitudinous studies have reported the correlation between orthodontic treatment and changes in the anatomy and airway functioning [15]. Orthodontists address joint and skeletal deformities using standard procedures for correcting or concealing jaw discrepancies. These procedures alter the surrounding soft tissues, including the pharyngeal airway, which may lead to OSA. In addition, researchers discovered a correlation between a narrower upper airway size and incisor retraction distance [16,17,18,19]. Thus, an in-depth understanding of the airway space and its function is required for orthodontic diagnosis and treatment planning. In addition, the precise determination of skeletal classes from airway and cephalometric landmarks is also necessary as it needs to be taken in account for comprehensive management of such problems. Typically, the values obtained from various measurements of cephalometric landmarks are used to determine skeletal class based on interpretation, which sometimes may not be accurate. In order to increase the reliability of skeletal class determination based on landmark values obtained from CBCT images, the objective of this study was to develop a predictive model with an acceptable level of accuracy using airway landmark values obtained from analyzing different CBCT images.

Materials and methods

The study protocol was approved by the Kasetsart University Research Ethics Committee (Study Code: KUREC-SRC66/029). Samples of skeletal anatomical data were retrospectively obtained from the Faculty of Dentistry, University of Puthisastra, Cambodia.

3D Model reconstruction

All samples had been acquired from CBCT scanner (Vatech PAX i3D Green, VATECH Co., Ltd., South Korea) which recorded in Digital Imaging and Communications in Medicine (DICOM) file format. The DICOM files were used to reconstruct 3D models using 3DSlicer (slicer.org) by thresholding airway regions by applying Hounsfield Units (HU) range between −700 and 3071 units on CBCT images [20, 21]. These threshold regions in each slice were used to build up 3D polygon models (Stereolithography file format) of airway regions for each sample. The 3D models were measured for different landmarks that included measurements across the nasopharynx, the oropharynx, and the hypopharynx. The following landmarks were considered in this study:

  1. (1)

    Nasopharynx cavity volume (NCV), unit: cm3,

  2. (2)

    Oropharynx cavity volume (OCV), unit: cm3,

  3. (3)

    Hypopharynx cavity volume (HCV), unit: cm3,

  4. (4)

    Length of the soft palate (LSP), unit: mm,

  5. (5)

    Distance between the soft palate tip to the posterior wall of pharynx (DSP), unit: mm,

  6. (6)

    Distance between the epiglottis tip to the posterior wall of the pharynx (DEP), unit: mm,

  7. (7)

    Sella turcica diameter (SDI), unit: mm,

  8. (8)

    Sella turcica length (STL), unit: mm,

  9. (9)

    Sella turcica depth (SDE), unit: mm.

These landmarks were selected from a combination of various published studies that involved measurements across the airway for various purposes such as: orthodontic treatment outcome related airway changes, airway diagnostic studies, correlation studies between sella turcica and skeletal malocclusions [22,23,24,25].

Data inclusion criteria and label preparation

The CBCT data inclusion criteria included adult CBCT image which age not less than 18 years old at the time of scanning, having either craniofacial normality (Class I) or abnormality (Class II and III). The skeletal Class I, Class II, and Class III were classified based on ANB angle measured in the lateral cephalometric record derived from the CBCT i.e. ANB 0–4 degrees: Class I, ANB > 4 degrees: Class II, and ANB: <0, degree: Class III) [26]. A total of 300 samples that met the inclusion criteria and available at acquisition site were used for further deep-learning model development. All samples were labelled the skeletal class by three authors, who are experienced dental surgeons. The sample included 150 males, and 150 females, with an average age of 22 years. The total of samples in each skeletal class is shown in Table 1.

Table 1 Represents the total sample size of each malocclusion

Data analysis

There were 7 supervised learning models considered in this study i.e. Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine (SVM), k-Nearest Neighbours (k-NN), Logistic Regression, and ANN. Male and female subjects were combined as one dataset to develop supervised learning models, as it is more practical in clinics for predicting skeletal class regardless of gender factor. Five-fold cross-validation (CV) technique was used to train and validate each model. The model was built up based on Python programming language with Scikit-Learn (Machine learning library for Python). For all classification models, the hyperparameters were investigated to achieve the best results.

For decision tree, the maximum depth of node spanning was set to 10, 20, and 30. The maximum depth of the Random Forest was set to 10 and 20, while the number of trees in the forest was set to 50, 100, and 150, the same as the gradient boosting classifier. The kernel of the SVM was tested for both linear kernel and Radial Basis Function (RBF). The C parameter of the linear kernel SVM was set to 0.1, 1, and 10, while the RBF kernel used the same C but added the gamma parameter as 0.1, 1, and 10. The number of neighbors of the k-NN was investigated as 2, 5, and 7. The C parameter of the logistic regression was set to 0.1, 1, and 10. For the ANN, the number of hidden layers and the nodes inside them can be varied from 10 nodes single hidden layer, 20 nodes single hidden layer, and 10 nodes double hidden layers. In addition, the maximum number of epochs of the ANN was set to 2000 and early stopping technique was used to avoid overfitting. The best hyperparameter for each supervised learning model would be compared against each other based on their accuracy.

Results

The supervised learning model with the best accuracy was Random Forest with a value of 0.74. All the other models were lower in terms of their accuracy levels (Table 2). The classification report of the Random Forest with parameters and confusion matrix were shown in Table 3; Fig. 1, respectively. On analyzing the classification report for Random Forest, it was observed that the precision and recall scores for Class I malocclusions were 0.63 and 0.71 respectively while the F1 scored observed was 0.67. For Class II malocclusions the precision score was 0.80, the recall 0.69 and the F1 score was 0.74. For Class III malocclusions the precision score observed was 0.75, the recall 0.77 and the F1 score 0.76 respectively. Figure 1 demonstrated an accuracy rate of 0.74 using Random Forest. As can be seen in the Table 3 the recall scores for Class I, II and III malocclusions were 0.71, 0.69 and 0.77 respectively which represent the total number of actual positive cases predicted correctly making the sensitivity of the model high. Both Table 3; Fig. 1 demonstrate that the precision levels were high implying that the model was returning a higher rate of relevant results compared to irrelevant ones while the high recall suggested that the model was returning most of the relevant results.

Table 2 Cross-validation models (5-fold CV)
Table 3 Classification report of the Random Forest with parameters max_depth = 10 and number of trees = 100
Fig. 1
figure 1

Confusion matrix of the Random Forest with parameters max_depth = 10 and number of trees = 100

Discussion

In the past few decades, a lot of emphasis has been placed on airway estimation because of the indirect correlation with different skeletal malocclusions. In 2019, the Board of Trustees of the American Association of Orthodontists (AAO) published a white paper utilizing inputs from experts in dental sleep medicine [27]. This white paper was compiled to guide practicing orthodontists on the role of orthodontists in identifying and managing obstructive sleep apnea. It was also recommended that orthodontists meet legal standards to not only manage sleep apnea but also engage in meaningful research on the same to enhance the standards of patient care (19). Another paper published by the American Academy of Sleep Medicine (AASM) included clinical practice guidelines for the diagnostic testing of obstructive sleep apnea [28]. One of the recommendations of this paper was that there is a lower degree of certainty regarding the outcome of sleep apnea-afflicted patient-care strategies. It was also specified that patient decisions must be made by clinicians considering the available diagnostic tools and available treatment options. The problem with the diagnostic tools currently available is that most of these are in-patient tests, which require a lot of tests [29]. According to the practice guidelines published by AASM, multiple tests are needed to conclude on whether the patient is at risk of sleep apnea. Even polysomnography, which is often considered the gold standard for diagnosing sleep apnea, may require to be repeated if initial tests or the first round of tests are inconclusive.

This study was conducted to use the measured anatomical landmarks to formulate a supervised learning model with an acceptable level of accuracy, which could be used to detect patients at risk of developing sleep apnea. In the past decade, many AI models and solutions have been developed in the healthcare sphere to decrease the workload and increase the efficiency of diagnosis of medical problems. There have been models developed using image processing and volumetric measurements using supervised learning to enhance the detection of healthcare problems [30, 31]. In the dental field, there are many applications of AI, such as convolutional neural networks (CNN) to classify teeth, detect caries using panoramic images, and other oral health problems [31,32,33]. By utilizing AI as a second opinion, dentists can provide a faster and more accurate diagnosis of patients’ healthcare concerns.

Many other studies in the dental sphere have used only one single model [34,35,36,37]. Although the accuracy is acceptable, these studies have not compared the accuracy against other models, which may produce better accuracy. This study demonstrated the accuracy comparison across the developed models, which revealed that the Random Forest had the best accuracy in skeletal class determination (0.74). An accuracy level of 70% has previously been defined as a very good realistic performance of a model and it is also in line with industry standards [38]. Also it was observed in this study that trust in a predictive model depends on both the stated and the observed accuracy and the former may change based on the latter. In orthodontic clinical terms there are very few studies that have till date managed to achieve a higher accuracy rate [39]. There have been machine learning models developed with similar sample sizes to evaluate extraction versus non-extraction decisions and these managed to achieve very similar Random Forest accuracy levels [39]. This also implies that it is much more difficult to achieve higher levels of accuracy in the healthcare arena compared to other fields with a higher proportion of static parameters.

The concept of Random Forest is an extension of the Decision Tree model and is flexible for regression and classification problems. Since this study determined skeletal class from measured values of Airway’s landmarks, it was a classification problem. Random Forest utilizes the bootstrapping aggregating (Bagging) technique with decision trees. The benefit of bagging is that the classification results of each decision tree, in this case, 100 trees, are combined as voters to determine the final result. Random Forest is a very well-known and researched model in the machine-learning sphere [36, 37]. The benefit of Random Forest over other models, such as SVM, is that it utilizes data from several decision trees and randomizes them to ensure diversity of outcomes [34]. There have also been studies comparing Random Forest to Gradient Boosting algorithms, and it was concluded that Random Forest algorithms were superior in terms of performance [40]. Study conducted previously to compare the performance of Random Forest versus Logistic Regression have concluded that the former performed significantly better than the latter in average prediction performance [41].

As discussed above, Random Forest models have many advantages and help reduce bias to some extent by using random characteristics to segregate the data rather than the most preferred feature. This process is known as bagging, and it leads to an output that has very low variation. Random Forest will run multiple decision trees when provided with data of many qualities and then take an average of the results to arrive at an actual prediction [34].

As more advanced machine learning models are developed AI based applications and technologies are bound to play a more important role in the orthodontic diagnosis and treatment planning phase. Although our study reaffirms the power of AI in predicting skeletal malocclusions using selected airway and cephalometric values there are many other parameters including clinical observations that are required to form a complete diagnosis. Additional records are required for formulating a complete diagnosis for the patient before a treatment plan is established. This type of predictive model can be used in conjunction with an extensive package of AI based systems to help form a complete diagnostic record of the patient.

There have been previous studies conducted on obstructive sleep apnea prediction, but a majority of these were conducted using 2-dimensional cephalometric records, that do not accurately represent a 3-dimensional space. Also, there have been other studies that have compared pre-treatment and post-treatment outcomes without baseline data to compare against. This study was planned as a pilot study to develop an artificial intelligence-based predictive model that could be utilized to determine a skeletal class based on airway landmark values. This data could be used as baseline data for future studies where efforts can be made to improve the accuracy of such models. In some skeletal classes, it may be difficult to acquire CBCT samples. Due to the fact that the number of samples affects the prediction accuracy, collecting more samples can improve classifier performance. After that, the data is made undergoing preprocessing techniques as aforementioned this study as well as utilizing principal component analysis (PCA) or supervising feature selection to transform the features into a different feature space along with dimensionality reduction, this might increase the models’ accuracy. Such studies can also be conducted to study the effects of specific treatment procedures on the airway and used to observe skeletal changes, if any, based on the airway landmark measurements used in this study. In some skeletal classes, it is difficult to find the number of samples.

Limitations

While the confusion matrix provides extensive information about model predictions and their actual outcomes, it may not be able to capture the wider context of the problem domain. Understanding the significance of misclassifications and their real-world impact requires additional domain knowledge. The confusion matrix alone cannot assess whether a model and its predicted probabilities align well with the true probabilities, which is essential for tasks like healthcare risk assessment. There is no software that can completely replace an orthodontist in the diagnostic and treatment planning stage as these require years of clinical and theoretical training enabling them to make patient-specific decisions.

Conclusions

From the results of this study, it can be observed that Artificial Intelligence has the potential to be a game changer in the field of obstructive sleep apnea. Artificial intelligence can help enhance the accuracy and efficiency of the diagnosis of different skeletal malocclusions using different landmarks. In this study, multiple supervised machine learning models were developed to find out the most accurate one for predictive purposes. It was observed that the Random Forest model was the most accurate model for predicting skeletal malocclusion based on various airway and cephalometric landmarks considered in our study.