Introduction

The prevalence of obstructive sleep apnea (OSA) has increased rapidly in recent years. Approximately, 10–17% of men and 3–9% of women suffer from OSA [1]. Common symptoms reported by patients, such as snoring, lethargy, nonrestorative sleep, and excessive daytime sleepiness, as well as systematic diseases like hypertension [2], coronary artery disease, and stroke [3, 4], are linked to untreated OSA.

Obstruction or collapse of the upper airway, which consists of nasopharyngeal, oropharyngeal, and laryngopharyngeal portions, plays a very important role in the development of OSA. Although surgery for patients with OSA is considered to be an important option, night time ventilation therapy which uses continuous positive-airway pressure (CPAP) remains the standard therapy [5]. Thus, evaluation of the upper airway obstruction of patients is important to develop their therapeutic strategy.

The oropharyngeal level is the most common obstruction site of OSA due to the lack of bony structural support. The soft palate, the base of tongue, the tonsil size, and the collapse of the oropharyngeal airway are places where potential obstruction occurs. Unfortunately, there is no unified scale or score to evaluate the obstruction of the oropharynx, leaving the evaluation to the clinician’s personal experience. Therefore, a comprehensive and objective score for evaluation of oropharyngeal obstruction would be useful.

The main airway assessments at the oropharyngeal level include tonsil size, Mallampati score, nasofibroscopy-assisted Muller’s maneuver (NMM), CT scan of the upper airway, and drug-induced sleep endoscopy [6]. However, all these assessments are either insufficient to make a treatment decision in OSA or are invasive and unfeasible in the outpatient setting. For example, tonsil size grading and Mallampati score for tongue position are the traditional methods used to evaluate the oropharyngeal obstruction, but they can only demonstrate the obstruction in a single dimension for awake patients. CT is a three-dimensional measurement and CT is therefore commonly used for volumetric and linear evaluation of the upper airway. A drawback of CT is that it exposes patients to radiation [7]. NMM is an endoscopic test that a patient can easily receive at clinic. It is performed while the patient is awake [8] and therefore, the morphology of the upper airway during sleep cannot be observed. Drug-induced sleep endoscopy (DISE) is now the most accuate assessment to evaluate the upper airway during sleep. However, DISE is invasive, time-consuming, and costly [9].

Therefore, the objective of this study was to develop a reliable score for evaluation of oropharyngeal obstruction to help identify patients  at-risk for OSA. The oropharyngeal obstruction score was first derived from a sleep clinic cohortand its prediction accuracy for OSA was then validated by ROC analysis.

Materials and methods

Subjects

This retrospective study was approved by the Institutional Review Board of the Beijing Jishuitan Hospital Ethical Committee. All methods were performed in accordance with the relevant guidelines and regulations by the Institutional Review Board of the Beijing Jishuitan Hospital Ethical Committee.

A retrospective analysis was performed on adult patients (> 18 years old) with habitual snoring and obstruction only in oropharyngeal level who were admitted to Beijing Jishuitan Hospital from June 2015 to June 2020. The medical history, age, gender, height, weight, and other basic information of the patients were recorded. Relevant physical examinations and routine examinations of otorhinolaryngology department were conducted. Patients with the following conditions were excluded from this study: (1) patients with severe heart, lung, liver, and kidney dysfunctions; (2) patients with sleep breathing disorders caused by special reasons such as thyroid dysfunction, hypopituitarism, acromegaly (gigantism), narcolepsy, myasthenia gravis, recurrent laryngeal nerve palsy, etc.; (3) patients with central sleep apnea; (4) patients with alcohol dependence or long-term use of psychotropic drugs; and (5) patients who had received previous OSA-related treatment, such as ventilator treatment, surgical treatment (such as palate pharyngoplasty, nasal dilatation, upper and lower jaw surgery), and oral appliance therapy. The flowchart of the screening process is shown in Fig. 1.

Fig. 1
figure 1

The flowchart illustrating the recruitment process and the study design 

Anthropometric and oropharyngeal airway anatomic variables

The following indices were considered in this research:

  1. I.

    Body mass index (BMI, kg/m2) was calculated as the ratio of the weight (in kg) to the square of the height (in m).

  2. II.

    Neck circumference (in cm) was measured just below the laryngeal prominence with the head positioned in the Frankfort horizontal plane.

  3. III.

    Tonsil size grading was adopted in this study as: (1) tonsils were hidden within the pillars; (2) tonsils extended to the pillars; (3) tonsils were beyond the pillars but not to the midline; and (4) tonsils extended to the midline.

  4. IV.

    The modified Mallampati score of each patient was assessed in a sitting position with maximal tongue protrusion (without using a tongue depressor). A scaled score was derived based on the anatomical ratio of tongue to the structures of the oropharyngeal: (1) tonsils, palatal arches, and soft palate were all visualized; (2) uvula and palatal arches were visualized; (3) only soft palate was visualized; and (4) only hard palate was visualized [10].

  5. V.

    NMM was used to evaluate the collapse of the oropharyngeal airway. It was performed with patients in the supine position and the Frankfort plane perpendicular to horizontal. Prior to the procedure, patients were given instructions for the Muller maneuver and given time to practice until they were comfortable performing it. When performing the Muller maneuver, patients were asked to perform a maximal inspiration against a closed airway. Following the administration of topical anesthetic nasal spray (4% lidocaine), the nasopharyngoscope (Olympus Medical Systems Corp) was inserted through the anesthetized nasal cavity to the lower oropharynx. Images were taken during tidal breathing and the Muller maneuver. The position of the scope was kept still when obtaining images, to make consistent magnification among pictures. Visual monitoring during the procedure was required so that an orthogonal view was maintained and captured for subsequent analysis. The lateral pharyngeal walls’ (LPWs) collapse during the Muller Maneuver was assessed and classified into four levels: (1) slight (collapse < 25%); (2) moderate (collapse ≥ 25% but < 50%); (3) substantial (collapse ≥ 50% but < 75%); and (4) predominant (collapse ≥ 75%) [11].

Polysomnogram (PSG) and diagnosis

PSG (CompuMedics, Australia) was completed in the ENT Department of Beijing Jishuitan Hospital. Electroencephalography, submental electromyography, and electro-oculography were recorded with surface electrodes by standard techniques. OSA was defined as a reduction in airflow by more than 90% of the baseline for at least 10 s with effort to breathe. Obstructive hypopnea was an abnormal respiratory event characterized by a reduction of at least 30% in thoracoabdominal movement or airflow, for at least 10 s and with more than 4% of blood oxygen saturation reduction, when compared to the baseline [12]. Apnea hypopnea index (AHI) was defined as the total number of the obstructive apneas and hypopneas within 1 h of sleep under electroencephalography. OSA was diagnosed when the AHI was higher than 5/h. All PSG findings were recorded by physicians.

Statistical methods

All statistical analysis was conducted with SPSS 22.0 for Windows (I SPSS Inc., Chicago, IL). First, Spearman’s correlation coefficient was adopted to identify the relationships between the clinical variables and the AHI. Next, a stepwise multiple linear regression analysis was conducted for the significant variables selected in the first step, to distinguish the independent clinical variables. These variables were then grouped to the oropharyngeal stenosis scoring system. Receiver operating characteristic (ROC) curve analysis was conducted and the area under ROC curve (AUROC) was calculated to quantify the predictability of oropharyngeal stenosis scoring system. The statistical significance of the above analysis was set at P < 0.001.

Results

Among 163 patients, 127 were men. Ages of the patients ranged from 18 to 68 years with a mean of 40.2 years. The average BMI of the patients was 27.2 kg/m2, with the standard deviation of 4.7 kg/m2. The clinical OSA predictors, i.e., anthropometry and oropharyngeal anatomy variables, and AHI values, are shown in Table 1. Seventy-three (45%) patients had simple habitual snoring with an AHI < 5/h. Seventeen patients (10%) who had AHI higher than 5/h but lower than 15/h were considered to have mild OSA; 21 patients (13%) with AHI of 15/h and up to 30/h had moderate OSA; and 52 (32%) had severe OSA with AHI ≥ 30/h.

Table 1 Baseline clinical data

Correlations between clinical OSA predictors and AHI values

The AHI scores differed between sex (P = 0.019; Table 2). Spearman’s correlation coefficients exhibited significant positive correlation between AHI values and multiple clinical predictors, including BMI, NC, tonsil size grading, modified Mallampati score, and NMM grading. Their Spearman’s coefficients are 0.419, 0.399, 0.652, 0.593, and 0.707, respectively.

Table 2 Correlation between AHI and clinical predictor variables

Stepwise multiple linear regression for the model

To find the appropriate parameters to compose the oropharyngeal stenosis scoring system, a stepwise multiple linear regression analysis was performed to determine the significant predictor variables of the regression model (Table 3). The stepwise multiple linear regression analysis excluded NC and sex from the final model. The AHI for patients with oropharyngeal obstruction was a function of four predictor variables, and was written as the form as follows:

Table 3 Stepwise multiple linear regression of AHI grading with predictor variablesa
$$\mathrm{AHI}\hspace{0.17em}=\hspace{0.17em}0.407\hspace{0.17em}\times\hspace{0.17em}\mathrm{NMM}\;\mathrm{grading}\hspace{0.17em}+\hspace{0.17em}0.435\hspace{0.17em}\times\hspace{0.17em}\mathrm{tonsil}\;\mathrm{size}\;\mathrm{grading}\hspace{0.17em}+\hspace{0.17em}0.314\hspace{0.17em}\times\hspace{0.17em}\mathrm{modified}\;\mathrm{Mallampati}\;\mathrm{score}\hspace{0.17em}+\hspace{0.17em}0.043\hspace{0.17em}\times\hspace{0.17em}\mathrm{BMI}\hspace{0.17em}-\hspace{0.17em}2.620$$

From the equation, the coefficients of NMM grading, tonsil size grading, and modified Mallampati score were similar (ranged from 0.314 to 0.435), and the coefficient of BMI was much smaller than the other three variables. It indicated that BMI was much less important in determining the AHI, while NMM grading, tonsil size grading, and modified Mallampati score were of almost equal importance. After checking the variance inflation factor of each predictor variable against one another, no substantial indication of multicollinearity was found. Therefore, these variables were treated as potential independent predictors.

Establishment of the Peng score

In this study, we defined the oropharyngeal stenosis score system consisting of NMM grading + tonsil size grading + modified Mallampati score, and named it as Peng score. We set four points for each part based on the similar coefficients in the regression model, with full score of 12 points. Scoring criteria are listed in Table 4.

Table 4 Scoring system for oropharyngeal plane stenosis consists

Validation of the Peng score

The sensitivity and specificity of the Peng score, along with tonsil size grading, modified Mallampati score, and NMM grading, were used to plot receiver operating characteristic (ROC) curves (shown in Fig. 2) of the three following groups: the presence of OSA (AHI ≥ 5/h), the presence of moderate/severe OSA (AHI ≥ 15/h), and the presence of severe OSA (AHI ≥ 30/h). Accuracy of each grading system was measured by the area under the ROC (AUROC) curve.

Fig. 2
figure 2

Illustrations of ROC curves for the scores (orange lines tonsil size grading, green lines modified Mallampati score, blue lines NMM grading, red lines Peng score). Results concern the ability of the scores to detect at least mild (A), moderate to severe (B), and severe (C) OSA

The value larger than 0.9 often represents extremely good discrimination. Table 5 lists the statistics observed from ROC curve for AHI ≥ 5, including AUROC, sensitivity, specificity, corresponding value for diagnosis, and standard error. The corresponding value of 7.5 for Peng score (sensitivity = 0.836, specificity = 0.692) may be useful to predict the presence of OSA (AHI ≥ 5/h; AUROC = 0.935, 95%CI = 0.900–0.970, P < 0.001). The same corresponding value for Peng score (sensitivity = 0.945, specificity = 0.778, shown in Table 6) may also be useful to predict the presence of moderate OSA (AHI ≥ 15/h; AUROC = 0.939, 95%CI = 0.904–0.973, P < 0.001). An increased cutoff value of 8.5 for Peng score (sensitivity = 0.865 specificity = 0.856) was obtained with ROC analysis to predict the present of severe OSA (AHI ≥ 30/h; AUROC = 0.939, 95%CI = 0.899–0.969, P < 0.001; shown in Table 7).

Table 5 Statistics observed from ROC curve for AHI ≥ 5(Fig. 2A)
Table 6 Statistics observed from ROC curve for AHI ≥ 15 (Fig. 2B)
Table 7 Statistics observed from ROC curve for AHI ≥ 30 (Fig. 2C)

Discussion

In this study, we derived an evaluation score, the Peng score, for assessing the degree of oropharyngeal airway obstruction. This score ranges between 3 and 12, and threshold values of ≥ 8 and ≥ 9 appear to be appropriate to identify patients at an increased risk of at least mild and severe OSA, respectively. From the data obtained in this study, the sensitivity and specificity of Peng score were 83.6% and 69.2% for detecting patient with mild OSA (AHI ≥ 5) and 86.5% and 85.6% for patient with severe OSA (AHI ≥ 30), respectively. Compared with previous OSA-related scores, the Peng score has the potential to offer a low cost and easily accessible strategy in comprehensive and consistent evaluation of oropharyngeal obstruction, and it has the potential to play an important role in future diagnostic and treatment decisions.

Our proposed score has three main advantages: First, the score is based on three parameters which include tonsil size, tongue position, and NMM, assessing the coronal diameters, sagittal diameters, and the collapse of the oropharyngeal airway, respectively.

Second, three variables of the score can be easily implemented during the examination in the outpatient setting and within a short time. The information regarding tonsil size and modified Mallampati score can be gathered by a clinician during routine physical exam. Nasofibroscopy, which is used during NMM, regularly available in an ENT clinic. All these assessments can be safely finished within a few minutes.

Compared with current scores, the proposed score exhibits greater predictive accuracy (AUROC = 0.935). The STOP-Bang Questionnaire [13, 14] and OSA50 [15], which consider snoring, tiredness, observed apnea, high blood pressure, body mass index (BMI), age, neck circumference (NC), and male sex, have the OSA prediction accuracy of 0.626 and 0.712, respectively [16]. The P-SAP [17], OSA-score [18], and DES-OSA [19], which include morphometry parameters like tongue position and tonsil size, have OSAS prediction accuracy range from 0.739 to 0.809 [16, 18]. Kljajić found strong positive association of AHI with modified Mallampati score, as well as positive correlation of AHI with tonsillar size in the multivariate forward stepwise regression analysis in children (range from 2 to 9 years old) [20]. A derived model based on the local clinical findings (modified Mallampati score, tonsil size, adenoid size, age, gender, and body mass index) significantly overlapped with the results of an overnight polysomnography in diagnosing OSA in children. The sensitivity of the tested model was 84%, and specificity was 74% [21]. There are two possible reasons for the high OSA predictive accuracy of the newly proposed score: (1) The score is specifically designed for the evaluation of oropharyngeal obstruction and prediction of risk for oropharyngeal-related OSA, while other scores are applicable for patients with different kinds of OSA. As mentioned before, obstructions in any level of upper airway can lead to OSA, which complicates the ability to sort out the cause and degree of obstruction. (2) NMM mimics the collapse of oropharyngeal airway under obstruction and the proposed scoring systsem is the only system that includes NMM which increases its predictive accuracy.

The current study has limitations: (1) The inter-observer variability of Muller maneuver and MS is a limitation. To minimize inter-observer variability as much as possible, we have employed a strategy of standardized training for our observers prior to data collection. (2) The newly proposed score cannot distinguish moderate OSA (AHI ≥ 15) from mild OSA (AHI ≥ 5) because a threshold value of > 7.5 was chosen for both groups (shown in Table 5 and 6). (3) This study was performed in a sleep center, where the subjects have higher likelihood of OSA than the subjects in the general population; i.e., the distribution of subjects in this study may not reflect the actual distribution in general population. These limitations may serve as sources of bias in our results.

In conclusion, this study describes a new scoring system to evaluate the obstruction of the oropharyngeal airway, based on morphologic characteristics and NMM. The findings suggest that thenew score may be effective in identifying patients at risk of oropharyngeal OSA. Future studies with larger sample sizes and patients with different degrees of OSA severity may be required to validate the model’s predictive ability.