Introduction

The extraction of premolars for orthodontic treatment and their effects on soft tissue are highly debated topics in orthodontics. Extractions are used to solve moderate-to-severe crowding, and to alleviate dental or dentoalveolar protrusion [1, 2], and they are recommended in patients with dental and skeletal sagittal problems, open-bite, and increasing overjet [3].

It has been shown that the two most important parameters influencing the extraction or non-extraction decision are the soft tissue profile and the amount of crowding [4]. In fact, especially in borderline patients, where there are no evident indications of whether extraction is necessary, an appropriate study of patients’ frontal and profile appearance is recommended.

A recent study by Jackson et al. observed a mild decreasing trend of extraction cases, settling near 25% of total treated cases at the university clinic setting from 2000 to 2011 [3]. This change may be due to the evolution of modern orthodontic techniques [4] but also to the patients’ reluctance to have healthy teeth extracted. Another reason may be the greater importance given to facial profile in recent years. In fact, some authors concluded that extractions may straighten the patient’s profile, thus worsening its aesthetics [5, 6], although the literature also offers studies that contradict the association with negative changes of soft tissues appearance [7]. It has been shown that lips deserve an accurate pre- and post-treatment evaluation because after orthodontic extraction, the lip profile became more concave than in non-extraction patients; however, this difference was small and clinically irrelevant in most of the cases [8], and highly dependent on lip thickness [9].

Moreover, in the literature, there are many studies that analyse the effects produced by orthodontic extractions on profile aesthetics, but, to the knowledge of the authors, none of these has ever verified the correlation between the torque of incisors and the aesthetics of the profile. Indeed, previous studies were mainly focused on extractions per se, while the mechanics used and their effect on facial profile were often neglected. This last consideration should not be overlooked because it is common knowledge that the retraction of the anterior teeth is followed by a loss of torque, which leads to a more upright position of the incisors [10]. Therefore, it can be postulated that it is not the extraction treatment itself that causes profile straightening, but rather the use of incorrect space closure mechanics, which results in insufficient torque of the incisors and could consequently be the cause of profile worsening. A closer look at these aspects could provide new and clinically meaningful data in the extraction/non extraction debate.

Therefore, the aim of the present study was to evaluate the relationship between soft tissue profile aesthetics and incisor torque, as well as the effect of crowding, anchorage, and extraction pattern, in adult patients treated without extraction or with two or four extractions. The null hypothesis was that no interaction exists between the predictors and the response variables.

Materials and methods

This retrospective study and all the procedures that followed were approved by the Ethical Committee of the University of L’Aquila (protocol no 28954, ID 04/2021) and, were in accordance with the Declaration of Helsinki from 1975 and subsequent revisions. Written informed consent was obtained from the subjects or their legal tutors.

Patients

Sample size calculation revealed that to be able to reject the null hypothesis for a linear multiple regression, all coefficients in the model had to be equal to zero. Considering a medium effect size (f2) of 0.15 [11], with a power of 80% and a type I error of 5%, 77 subjects were needed for the whole sample (G*Power version 3.1.9.2, Franz Faul, Universität Kiel, Germany). Seventy-seven patients were thus selected after screening the records of patients treated at the Orthodontic Clinic, Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, in chronological order from January 2011 to December 2019, using the following inclusion criteria:

  • Patients with permanent dentition;

  • Class I or Class II molar relationship;

  • Orthodontic treatment with extraction of two or four premolars, or non-extraction orthodontic treatment with fixed appliances;

  • Lateral cephalograms taken before and after treatment;

  • Complete clinical documentation concerning the photographic documentation, the study casts and the type of orthodontic treatment received, indicating the type of mechanics used to close spaces and the type of anchorage used.

The exclusion criteria were as follows:

  • Patients in mixed dentition;

  • Extraction of teeth other than premolars;

  • Patients who have undergone orthognathic surgery;

  • Presence of congenital abnormalities regarding dentition or craniofacial growth;

  • Patients with cleft lip.

The study sample was divided into three groups: the first study group was composed of 24 patients treated with four extractions; the second study group was composed of 24 patients treated with two extractions in the upper arch; finally, a control group was composed of 29 subjects treated without extractions. For each patient, the pre-treatment clinical condition was evaluated, defining the Angle’s molar class and the amount of crowding. The amount of crowding in mm was calculated for each arch from the dental casts as the total tooth-size arch-length discrepancy with the method described by Nance. Then, the upper and lower arch of each patient was classified into three categories of severity based on the amount of crowding, according to the criteria proposed by Proffit [10]: mild crowding between 0 and 4 mm; moderate crowding between 5 and 9 mm; severe crowding when greater than 9 mm. In addition, the type of programmed anchorage (maximum posterior anchorage, minimum posterior anchorage, and medium anchorage) for both the upper and lower arches was extracted from the clinical history and recorded.

Cephalometric analysis

Cephalometric analysis was carried out on digital lateral cephalograms, taken in natural head position, at two different timepoints, before treatment (T0) and post-treatment (T1).

Measurements were performed by one operator using a cephalometric software (Orisceph CE, Elite Computer Italia srl, Vimodrone, Italy). Images were appropriately calibrated using reference points located at a known distance. To perform the cephalometric analysis, a personalised cephalometric method was previously created and inserted into the software’s database.

Skeletal changes were evaluated by assessing the angle formed by the intersection of the Frankfurt plane with the mandibular plane (FMA) and the ANB angle between the A point (the deepest point on the curvature of the maxillary alveolar process, or sub-spinal point of the maxilla), the Nasion point, (N, the most anterior point of the front-nasal suture) and the B point (the deepest point on the curvature of the mandibular alveolar process, or supra-mental point of the mandible) (Fig. 1).

Fig. 1
figure 1

Cephalometric template used in the study for the evaluation of skeletal and dental changes. The nasion point N: the most anterior point of the front-nasal suture, FH: Frankfurt Horizontal plane, A point: the deepest point on the curvature of the maxillary alveolar process (or sub-spinal point of the maxilla), B point: the deepest point on the curvature of the mandibular alveolar process (or supra-mental point of the mandible), ANS: the anterior nasal spine, PNS: the posterior nasal spine, U1: long axis of the upper incisor, L1: long axis of the lower incisor, Me: menton point, Go: gonion point

Dental changes were assessed by measuring the interincisal angle (the angle resulting from the intersection of two planes: one passing through the long axis of the upper central incisor and one passing through the long axis of the lower central incisor), the torque of the upper incisor (the angle between the long axis of the upper incisor and the palatal plane passing through the anterior nasal spine and the posterior nasal spine), and the torque of the lower incisor (the angle between the long axis of the lower incisor and the mandibular plane passing through the menton and gonion points) (Fig. 1).

To evaluate labial changes a true vertical line (TVL) was first drawn, tracing a line perpendicular to the Frankfurt plane passing through N. Then, the orthogonal distance between the most protruding point of the upper lip (LU) and the TVL and the orthogonal distance between the most protruding point of the lower lip (LL) and the TVL were measured (Fig. 2).

Fig. 2
figure 2

Cephalometric template used in the study for the evaluation of soft tissues changes. TVL: perpendicular line to the Frankfurt plane, LU: the most protruding point of the upper lip, LL: the most protruding point of the lower lip, Sn: base of the nose, G: glabella point, located on the foremost point of the forehead, PgC: pogonion cutaneous point

Profile changes were evaluated assessing the angle of profile convexity (Facial convexity angle), which was obtained by connecting the glabella point (G), located on the foremost point of the forehead, base of the nose (Sn) and pogonion cutaneous point (PgC) (Fig. 2).

Error of the method

To evaluate the error of the method, the radiographs of 25 subjects, selected randomly using an online tool (www.randomizer.org), were re-traced by the same operator after 2 weeks. The random error was calculated using Dahlberg’s formula [12]. The intra-rater agreement was assessed using an intraclass correlation coefficient.

Statistical analysis

Descriptive statistics for all variables were calculated. A Shapiro − Wilk normality test was computed for all the variables to evaluate the type of data distribution. To verify that before treatment (T0), the three groups showed comparable characteristics described by the measured variables, a one-way ANOVA was performed, testing data homoscedasticity with a Levene test and using Tukey’s or a Games − Howell post hoc test to perform pairwise comparisons. Similarly, to verify that the distribution of age, gender, the type of crowding (mild, moderate, and severe) and anchorage used (minimum, moderate and maximum) were comparable between subjects belonging to the three groups, a Fisher’s exact test was performed.

To assess the presence of a statistically significant change between cephalometric variables before and after treatment, a paired T-test or a Wilcoxon signed rank test – − depending on the type of data distribution – − was performed for each pair of variables at T1 and T0. To evaluate the differences in the T1 − T0 changes among the three groups, a one-way ANOVA was computed.

Finally, to evaluate the effect of the extraction pattern, the amount of initial crowding, the final position of the upper and lower incisors, and the type of anchorage used on the T1 values of the variables related to the soft tissues (LU-TVL, LL-TVL, facial convexity angle), linear regressions were performed, in addition to Durbin − Watson statistics and residual analysis through q-q plots.

For all statistical tests, a type I error of 0.05 was set.

Results

Descriptive statistics are reported in Table 1.

Table 1 Descriptive statistics

The random error measured with the Dahlberg formula for linear measurement ranged between 0.23 and 0.48 mm, while the random error for angular measurement ranged between 0.32 and 0.39°. The intra-rater agreement, assessed through an intraclass correlation coefficient, was above 0.9 for all variables.

The gender distribution is reported in Fig. 3, no statistically significant difference was observed between the three groups (Pearson χ2 = 3.64, p = 0.163). The distribution of mild, moderate, and severe crowding for each group and for both the upper and lower arch was calculated (Fig. 4). A significantly different distribution was observed in the control group compared to the two-extraction and the four-extraction groups for both the arches (Fisher exact for crowding in the upper arch = 31.3, p < 0.001; Fisher exact for crowding in the lower arch = 22.2, p < 0.001). Similarly, the distribution of minimum, medium, and maximum posterior anchorage for each group was determined only for the upper arch (Fig. 5) since no extractions in the lower arch were performed in the two-extractions group, and no statistically significant differences were found (Fisher exact = 3.52, p = 0.193) (Fig. 6).

Fig. 3
figure 3

Distribution of sex in each group

Fig. 4
figure 4

Distribution of pre treatment molar relationship

Fig. 5
figure 5

A Distribution of the amount of upper arch crowding in each group. B: Distribution of the amount of lower arch crowding in each group

Fig. 6
figure 6

A Distribution of upper arch anchorage type in each group. B: Distribution of lower arch anchorage type in each group

Table 2 reports the results of the one-way ANOVA test comparing all the cephalometric variables between the three groups at T0 and T1. No differences were observed for the torque of the upper incisors at T0; however, when considering the results of the post-hoc tests (Table 3), the control group was significantly different from the other two groups regarding all other variables, but no differences were present between the two-extraction and the four-extraction groups at T0. At T1, all the cephalometric variables were different between the control group and the two-extractions group and between the control group and the four-extractions group; on the other hand, the only variables that differed at T1 between the two-extractions and the four extractions group were the Interincisive angle, the L1-GoMe angle and the Facial convexity angle (Table 3).

Table 2 One-way ANOVA test to compare the different variables between the three groups at T0 and at T1
Table 3 Post-hoc pairwise comparisons between all the three groups for the cephalometric variables measured at T0 and at T1

The results of the one-way ANOVA test to compare the T1-T0 variation for all the different cephalometric variables between the three groups are reported in Table 4: a statistically significant difference was observed for FMA, the Interincisive angle, the U1-ANSPNS angle, and the L1-GoMe angle. The results of the post-hoc tests for this comparison are reported in Supplementary file 1.

Table 4 One-way ANOVA test to compare the T1–T0 variation of all the cephalometric variables between the three groups

The changes of the cephalometric variables between T0 and T1 within every group were evaluated using a paired T-test or a Wilcoxon signed-rank test, whose results are reported in Table 5. A significant reduction of nearly 2° of the FMA angle was observed in the four-extraction group. The interincisive angle was significantly reduced in the two-extraction group, while it was increased in the four-extraction groups. The lower incisors were significantly tilted forward of 5° in the two-extraction group.

Table 5 Paired samples T-test and Wilcoxon signed rank test for comparing differences between T1 and T0 in all the three groups

To evaluate the influence that the upper and lower incisors positions, the number of extractions, the grade of crowding, and the type of anchorage have on the profile, linear regressions were made. (Tables 6, 7, 8, and 9). These tests showed that the changes in LU-TVL and LL-TVL were mostly influenced by the type of anchorage used, in particular by the medium and minimum anchorage (Tables 6 and 7). Similarly, the facial convexity angle was mostly influenced by the type of anchorage, especially when considering the predictors of the upper arch (Table 8), even if there was a statistically significant impact of minimum and maximum anchorage in the lower arch (Table 9). On the other hand, when considering in the model for the facial convexity angle (which, of course, is relative to the entire facial profile) the predictors related to the lower arch, a significant effect of the two-extractions pattern along with the type of anchorage was observed (Table 9).

Table 6 Linear regressions of the effect of upper incisor torque, extraction pattern, amount of crowding, and type of anchorage (upper arch) on the position of the upper lip (LU-TVL) at T1
Table 7 Linear regressions of the effect of lower incisor torque, extraction pattern, amount of crowding, and type of anchorage on the position of the lower lip (LL-TVL)
Table 8 Linear regressions of the effect of upper incisor torque, extraction pattern, amount of crowding, and type of anchorage (upper arch) on the position of the facial convexity angle (GSnPgC)
Table 9 Linear regressions of the effect of lower incisor torque, extraction pattern, amount of crowding, and type of anchorage on the position of the facial convexity angle (GSnPgC)

Discussion

The results of the present research, which outlined the absence of significant aesthetic differences between the groups treated without extraction or with two or four extractions, were supported by some of the existing literature. A study that evaluated long-term changes in profiles after orthodontic extractions showed that two years after treatment, the extraction and non-extraction groups were both rated positively by observers, concluding that there were no differences between groups and that all groups were perceived more favourably when compared with the initial observation [13].

Since sagittal and vertical skeletal changes may follow extraction treatment [14], in the present study, the ANB and FMA variables were evaluated at T1 and T0 to discriminate between the soft tissue changes that could be related to such skeletal changes. According to recent studies, the influence of premolar extractions on facial profile and vertical dimension is generally overestimated and should not be the main reason for non-extraction treatment [8]. However, in the present study, FMA showed significant changes in the four-extraction group, suggesting that the vertical dimension should be considered during treatment planning. Indeed, FMA showed a T1-T0 reduction of nearly 2° (Table 5) in the four-extractions group, and this effect was statistically significant when compared to the other two groups. This result is in contrast with part of the recent literature according to which it is not necessary to do extractions to increase the anterior overbite [15]. Despite the FMA changes, there were no differences for the ANB angle between the three groups, which changed but not in a statistically significant way: meaning that extractions had no effect on the sagittal skeletal variables, as supported by different authors [16, 17].

Despite the presence of T1–T0 changes in FMA, interincisive angle and upper and lower incisor torque (Table 4), no significant differences in soft tissue aesthetics were observed between the three groups. Indeed, the mean value of incisors’ torque at T1 was in a range of 110° ± 3° for the upper arch, suggesting a good control of their position at the end of treatment. Conversely, the position of the lower incisors showed a much larger variability (Table 1) that could be attributed to the different management of crowding in the three groups. The results of this study support the hypothesis that if there is a well-controlled incisor torque, the profile does not deteriorate. In fact, it is known that bucco-lingual inclination of the maxillary incisors has a major effect on profile smile attractiveness [17]. Kusnoto et al. demonstrated the presence of a strong correlation between mandibular anterior teeth location and changes in the position of the upper and lower lips [18]. Thus, the use of a correct orthodontic mechanic, which is able to maintain a proper incisor torque, is of great importance [19].

The study results, supported by the aforementioned research, clearly showed that the soft tissue changes in the extraction and non-extraction patients were comparable at the end of treatment. Previous studies related incisor retraction to lip position, assessing that the soft tissue response was different depending on the amount of retraction of the anterior teeth [20, 21]. In extraction cases, the available extraction space was almost completely used by the retraction of the anterior segment, regardless of the number of extractions [22]. In fact, some authors analysed the amount of movement necessary for the closure of extraction spaces, concluding that the incisors underwent to a greater movement than the first molars of premolar extraction groups [23]. This significant retraction could cause the loss of incisor torque if not properly controlled, leading to a worsening of profile aesthetics.

The results of the present investigation showed that the upper incisor torque was well controlled throughout the treatment in all groups (the greatest T1-T0 change being of − 2.6° for the U1-ANSPNS angle in the four-extraction group and of 4.8° for the L1-GoMe angle in the two-extractions group), and the soft tissue changes in the three groups were comparable at the end of treatment, even if with a large standard deviation.

Although no significant differences were observed between the groups in terms of profile aesthetics, when studying the relationship between soft tissues, incisor torque, crowding and anchorage through linear regressions, it was observed that: the incisors torque did not have an impact on the soft tissue profile, while the type of anchorage showed a significant interaction, and that the two-extractions pattern had a significant interaction with the facial convexity angle.

These analyses confirmed that a well-controlled incisor torque did not modify the profile aesthetics.

Moreover, the inclusion of crowding and anchorage in multiple linear regressions were a novelty compared to previous studies. Some authors stated that the prediction of soft tissue changes could be achieved by using multivariable regression analysis [24], but they analysed only some soft tissues variables and related them to the extraction pattern without considering the anchorage and the crowding, which are two of the most important variables that could modify the treatment outcome [24, 25].

Regarding the type of anchorage used, there were no differences in the upper anchorage used between the two-extraction and four-extraction groups, while the lower anchorage was classified only for four-extraction group. Looking at the LU-TVL and U1-ANSPNS values stratified by anchorage type (Fig. 7), it can be observed that the T1–T0 change of U1-ANSPNS decreased from minimum posterior anchorage to the maximum posterior anchorage in both two-extractions and four-extractions group, but that the T1–T0 change in LU-TVL was not proportional to the incisors torque. On the other hand, a more predictable behavior could be observed in the lower arch (Fig. 8). Moreover, in the present research article, the evaluation of the upper and lower lips position was assessed evaluating their distance from the TVL. Another common cephalometric method for analysing the lips position is the distance of upper and lower lips from the Ricketts’ aesthetic line. The Rickett’s aesthetic line is a cephalometric line, derived from the conjunction of the most protruding point located on the chin and the tip of nose [26]. The use of the TVL was preferred over the latter because it is well known that the human nasal cartilage changes over the years, therefore introducing a potential variability in the reproducibility of the tip of nose landmark [27, 28]. The linear regression (Tables 6, 7, 8, and 9) showed that the type of anchorage statistically influenced the facial convexity angle and the position of the upper and lower lips; in particular, while the facial convexity angle was influenced both from maximum, medium and minimum anchorage, the position of upper and lower lips was influenced only from the medium and maximum anchorage. Since in the linear regression the overall measurements of the three groups were used, the different role of the medium anchorage could be attributed to the distribution of this kind of anchorage across the three groups, and the general effect of anchorage type should be interpreted instead.

Fig. 7
figure 7

A Relationship between upper arch anchorage and mean of LU-TVL post treatment. B Relationship between upper arch anchorage and difference pre and post treatment of LU-TVL C Relationship between upper arch anchorage and mean of U1-ANSPNS post treatment. D Relationship between upper arch anchorage and difference pre and post treatment of U1-ANSPNS

Fig. 8
figure 8

A Relationship between lower arch anchorage and mean of LL-TVL post treatment. B Relationship between lower arch anchorage and difference pre and post treatment of LL-TVL. C Relationship between lower arch anchorage and mean of L1-GoMe post treatment D Relationship between lower arch anchorage and difference pre and post treatment of L1-GoMe

Furthermore, since extractions have always been proposed as a method of treatment for dental crowding [29], this study also evaluated the influence of the initial amount of crowding on the final aesthetic profile thanks to linear regression, observing non-significant interactions. As shown in Fig. 4, the amount of upper and lower crowding was more severe in four-extraction patients, while in two-extraction patients, there was a variable pattern. Interestingly, while the control group showed a significantly lower amount of upper and lower crowding, there were no significant differences between the four-extraction and two-extraction groups. In addition, all studied variables were comparable at T0 between the two-extraction and the four-extraction groups, suggesting that the choice of extraction pattern is a complex decision that involves many parameters, not least the clinician’s sensibility and experience.

It should be noted that all the regression models were able to explain a relatively small amount of variation, around 40%, of the dependent variables, suggesting that there are many other parameters to consider and that were not investigated in the present study, such as the phenotype of the lips and the orthodontists’ preferences. In fact, it has been observed that the correlation between the retrusion of the incisors and the flattening of the lip profile is more significant in patients with thin lips, while it is much less evident in patients with thick lips [9]. Additionally, recent research has determined that orthodontists prefer a more vestibular position of incisors rather than a lingual inclination [17], in contrast to the opinion of laypeople. Therefore, these features are surely determinants of orthodontic treatment results and should be considered for final treatment evaluation.

Finally, although care was taken to reduce the selection bias by using a rigid chronological criterion, apart from the retrospective design, the presence of subtle differences between the extraction groups and the control group limited the present investigation. However, such differences arose from the impossibility of randomising treatment choice. The need for extractions was determined by several skeletal, dental, and aesthetical considerations; therefore, it was expected that the extraction and non-extraction patients would be different from each other. It would obviously be ethically questionable to deliberately treat a patient without extraction when an extraction is required, and vice-versa. Debatable ethical issues would also arise in the case that the clinician treat a patient without an extraction who needs an extraction but refuses. These considerations were confirmed in a historical study by Odenrick et al. who compared the morphology of patients treated with and without extraction, showing differences for the dental crowding, the lengths of maxilla and mandible, and the width of teeth, which was higher in extraction patients [29].

Conclusions

Similar soft tissue aesthetics were observed after treatment in the three groups. A statistically significant reduction in the FMA angle was observed in the four-extractions group, while a statistically significant modification of the lower incisor torque was observed in the two-extraction group. Considering the importance of soft tissue in the diagnosis of orthodontic treatment, linear multiple regression revealed that, even if there were no statistically significant differences regarding soft tissue profile, the upper and lower lip protrusion and facial convexity angle were affected by the anchorage pattern.