Background

Kyphosis, the convex curvature of the thoracic spine is considered ‘normal’ between 20 and 40° [1]. Where this exceeds 40°, the curvature is described as hyperkyphosis. This is associated with a higher risk of falling, developing pulmonary dysfunctions, and poor quality of life [2,3,4]. Hyperkyphosis is also associated with a higher risk of mortality for any cause [2,3,4]. A prospective longitudinal study, which followed 610 women for over 13 years, found that people with a greater thoracic kyphosis, who previously sustained a vertebral fracture, have a 1.5 times higher risk of death than those who have a smaller kyphotic curvature [5]. Consequently, it has been suggested that thoracic kyphosis is an important parameter to monitor, especially in the elderly population, to detect more frail people who may be at higher risk of unfavourable health [5].

The prevalence of hyperkyphosis increases with age; 20–40% of people older than 60 years of age and 55% of those older than 70 years have a kyphosis exceeding 40° [2,3,4]. Consequently, hyperkyphosis has been associated with ageing [4]. However, the relationship between kyphosis and age has not been systematically investigated. Individual studies show conflicting results [6, 7], and evidence supporting this association is derived from narrative reviews [2,3,4], rather than methodologically rigorous systematic reviews [8].

Despite evidence suggesting that peoples’ kyphosis often exceeds 40° [2,3,4], this value is widely used in clinical practice as the cut-off for normality [3, 4]. Consequently, clinicians may find many of their patients present with hyperkyphosis. Several authors have highlighted the need for a more accurate threshold for diagnosing hyperkyphosis [2,3,4], and a recent narrative review proposed to move the cut-off of normality to 50° [2]. The Scoliosis Research Society suggests using a range of 20–60° instead [9]. However, since people of different age groups have different degrees of kyphosis [2, 3], moving the cut-off of normality to a higher value, or expanding its range, may not reduce the risk of misdiagnosis. For these reasons, and due to the importance of the thoracic curvature when restoring patients’ sagittal alignment during spinal corrective surgery, to avoid post-operative complications such as proximal junctional kyphosis [10], having specific age-related reference values of kyphosis may be useful.

Objective

This systematic review aims to investigate the sagittal curvature of the thoracic spine of adults with no health conditions which may affect their thoracic kyphosis and do the following:

  1. 1.

    Explore the relationship between kyphosis and age

  2. 2.

    Provide reference values of kyphosis for different age groups

  3. 3.

    Examine data for differences between genders or ethnic groups

Methods

Protocol and registration

The review’s protocol followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for protocols (PRISMA-P) [11] and was registered on PROSPERO (CRD42020175058). The methods were informed by the Cochrane Handbook [12]. The manuscript adhered to the PRISMA [13] and the Synthesis Without Meta-analysis (SWiM) guidelines [14] for reporting.

Eligibility criteria

The research question was informed by the Sample, Phenomenon of Interest, Design, Evaluation, Research type (SPIDER) tool [15], whose details are in Table 1.

Table 1 Eligibility criteria

Information sources

Two reviewers (MZ/SL) independently searched for eligible articles on MEDLINE, EMBASE and PsycINFO through Ovid, and on AMED, The Index of Chiropractic Literature and CINAHL through EBESCO, from inception to April 2020. The Spine Journal, the reference list of the studies included in the review, and grey literature on SIGLE, through Open Grey, were also searched. The research was limited to studies published in English.

Search

Keyword selection was informed by scoping review and researcher expertise (NRH). The search strategy was individualised for each database, combining keywords, Medical Subject Headings, and Boolean operators, and following consultation with a librarian. Keywords selected were middle back, dorsal spine, middle spine, mid-back, thoracic spine, kyphosis, hyperkyphosis, Dowager’s hump, hunchback, rounded back, and sagittal curvature (see Additional file 1 for search strategy examples).

Study selection

The screening process was conducted independently by MZ and SL, then agreement was sought. In case of disagreement, a third reviewer (NRH) acted as a moderator. The studies were screened from their title and abstract first, then from their full text [8].

Data collection

The data collection process was informed by the Cochrane Handbook [16]. The data extraction form was piloted with data extraction performed independently by MZ and SL and then cross-checked. If further information was necessary to reach a consensus among the research team, the authors were contacted by MZ.

Data items

Data extraction was informed by the recommendations for reviews in clinical anatomy [8]. This included study title, author’s name, publication year, method for measuring kyphosis, degrees of kyphosis and range, sample size, age, age range, gender, body mass index, the standard deviation (SD) of the measures and ethnicity, defined as a group of people sharing cultural, geographical and social attributes.

Risk of bias in individual studies

The studies’ quality assessment was performed independently by MZ and SL; NRH acted as a moderator in case of disagreement. The Anatomical Quality Assessment (AQUA) tool, devised for assessing the quality of anatomical studies [17], was used. As suggested by Chhapola et al. [18], a supplementary table to improve the tool’s performance was created (see Additional file 2). The AQUA tool is composed of 5 domains (i.e. objective(s) and subject characteristics, study design, methodology characterisation, descriptive anatomy, reporting of results); each of them has a specific set of questions whose answers could be either yes, no or unclear to enable the readers to evaluate the study’s quality. Currently, only indications about how to evaluate each individual domain of the AQUA tool exist. To be considered at low risk of bias in a single domain, the study must receive yes answers to all the questions of that specific domain; otherwise, the study would be considered at high risk [17]. Each domain was evaluated following the procedure just described. However, since no guidance exists on how to classify the overall quality of the evaluated study, the research team agreed that for a study to be considered, overall, high-quality, this must be at low risk of bias in all five domains. If at low risk in three or four domains they were considered moderate-quality, otherwise low-quality. The tool was then piloted before study commencement by MZ and SL on five articles and interrater agreement computed according to McHugh [19]. Perfect agreement was achieved (κ = 1).

Summary measures

Data was analysed with Microsoft Excel of the Microsoft Office 365 package. Since kyphosis varies depending on the body references used to calculate it [6, 20], analysis was performed comparing the measurements for the same body references.

The mean kyphosis and age were used for correlation analysis. Either the Pearson’s or Spearman’s correlation coefficient was computed, depending on whether the data were normally distributed or not. Data distribution was investigated with the Kolmogorov-Smirnov test, and correlation was interpreted as recommended [21].

The means and their precision estimates were used to calculate the reference/normative values, or ranges, of kyphosis for each age group. Since SDs represent the dispersion of the values around their means, whereas confidence intervals are used to assess a treatment’s efficacy [22], SDs were deemed to be more appropriate to establish ranges. The mean kyphosis was utilised for group comparisons. Previous evidence regarding the relationship between kyphosis and age [2, 4, 6] was used to create the groups for analysis. These were people younger than 40 years old (x < 40), people between 40 and 60 (40 < x < 60), people older than 60 (x > 60), people younger than 50 (x < 50), and those older than 50 years old (x > 50). Inferential statistics was performed using the independent two-tailed t-test, for two group comparisons (x < 50, x > 50), or one-way ANOVA, for multiple group comparison (x < 40, 40 < x < 60, x > 60). Gender and ethnic group differences were investigated comparing each individual age group using the independent two-tailed t-test. Levene’s test was used to assess between groups’ equality variances. The selected alpha level was 0.05, and the Bonferroni correction was applied for post hoc analysis, after ANOVA, to reduce the chances of type I error [23, 24].

Synthesis of results and risk of bias across studies

Since important clinical and methodological heterogeneity were observed during the scoping review, meta-analysis was not performed [25]. Data were synthesised narratively, and descriptive statistics presented [26]. The overall level of evidence was evaluated using a modified Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system [27]. Whilst limited to observational studies, if the results were consistent (> 80% concordant results) [28], precise, and obtained predominantly from high-quality studies, the overall quality was upgraded from low to moderate. For correlation analysis, consistency was assessed by evaluating the direction of the correlation (positive or negative). For the reference values and for gender and ethnic group comparisons, statistical significance between groups’ means was used. Correlation analysis to be precise must be statistically significant, whereas for the normative values and for gender and ethnic group comparisons, the ranges of the groups with statistically significant different means must not overlap. Furthermore, their difference must be greater than the standard error of measurements for the modality employed to calculate kyphosis. These values were 2.4° for the kyphometer [29], 0.4 cm for the flexicurve [30], and 3° for Cobb’s method [7]. If the results were inconsistent, imprecise and coming primarily from low-quality studies, the results’ quality was downgraded to very low.

Results

Study selection

A total of 12,366 studies were retrieved, and 68 selected for full-text screening. Thirty-eight studies were excluded after the full-text screening, and four added following reference review, resulting in a total of 34 studies included in the review [6, 7, 20, 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61] (Fig. 1).

Fig. 1
figure 1

PRISMA flowchart

Study characteristics and individual studies results

Details about the included studies are in Table 2. From 7633 participants, the age range was 18–95 years old. Kyphosis was measured between C7–T12 (n = 220), T1–T12 (n = 2154), T2–T12 (n = 212), T3–T12 (n = 101), T4–T12 (n = 1617) and T5-T12 (n = 4018). Kyphosis was measured with a flexicurve in 293 individuals. Most studies used Cobb angle with just two (n = 293) studies using a flexicure [47, 52].

Table 2 Study characteristics and individual studies results

Risk of bias within studies

Ten of the studies were high-quality [20, 31, 32, 36, 44,45,46,47, 57, 61], 24 were moderate-quality [6, 7, 33,34,35, 37,38,39,40,41,42,43, 48,49,50,51,52,53,54,55,56, 58,59,60] and none low-quality (see Table 3 for details). The most frequent limitation regarded studies’ methodology, with 12 studies [33, 34, 37,38,39, 42, 43, 48, 52, 53, 59] not reporting the accuracy of their measures. This limitation equally affected all measurement types.

Table 3 Risk of bias within studies

Relationship between kyphosis and age

Only studies measuring kyphosis using Cobb’s method were included in the analysis because of the greater sample size, which provides greater statistical power [23], and those using a flexicurve included only women, limiting their generalisability. No analysis was performed for C7–T12 and T3–T12 because data came from single studies.

A positive correlation between kyphosis and age was found (see Table 4). The strength of the correlation was moderate for T5–T12 (Spearman 0.52) and low for T4–T12 (Spearman 0.45). The sample size for T5–T12 was more than double that for T4–T12 [25], giving more confidence in the findings for T5–T12.

Table 4 Correlation analysis, normative values and between-group difference

Normative values

Table 4 provides details of the mean kyphosis and normative values of kyphosis for different age groups, as well as between-group mean difference in kyphosis and the sample sizes. The same studies utilised to investigate the relationship between kyphosis and age were also used for calculating the reference values. Only 12 studies divided their sample by age groups [6, 7, 20, 31, 32, 35, 41, 43, 48, 50, 58, 59]. The ranges surpassed 40° in people < 60 years old 58.3% of the time and 75% in those older, questioning the accuracy of the current cut-off for normality.

Gender and ethnic group differences

Fourteen studies specified sample ethnicity [20, 32, 34,35,36,37,38, 41,42,43,44,45,46, 59]; consequently, geographical provenience was the main determinant for ethnic group subdivision. Two studies were excluded from the sub-analysis between ethnicities. One study [60] did not divide their sample by age groups and did not report mean’s SD, whereas in the other study [48], the sample size was too small to exclude the chance of committing type II error. Fifteen of the included studies presented their results according to gender [6, 31, 32, 34, 36, 37, 40,41,42,43, 45, 48, 53, 58, 59], and only eight of those divided their sample by age [6, 31, 32, 41, 43, 48, 58, 59]. The results are reported in Table 4. No differences between genders were observed, but North Americans and Europeans showed a greater thoracic curvature than Asians (Fig. 2).

Fig. 2
figure 2

Ethnic group comparison. Data presented as mean standard deviation. *Statistical significance for p < 0.05 (t-test). x < 40, people younger than 40 years old; 40 < x < 60, people between 40 and 60 years old; x > 60, people older than 60 years old; x < 50, people younger than 50 years old; x > 50, people older than 50 years old

Synthesis of results

There is moderate-quality evidence that a moderate positive correlation between age and kyphosis exists and that kyphosis does not differ between genders. The quality of the evidence for the normative values presented, and for the differences in kyphosis observed between ethnicities is low (Table 5).

Table 5 Synthesis of results

Discussion

This is the first review exploring the relationship between kyphosis and age, in addition to providing normative kyphosis values for different ages, ethnic groups and genders. Findings evidence a positive correlation between kyphosis and age, as well as the influence of ethnicity on kyphosis. Gender, instead, does not appear to influence thoracic sagittal curvature.

Relationship between kyphosis and age

Muscle strength, vertebral body shape and intervertebral disc morphology can affect kyphosis angle [3]. However, vertebral body shape and intervertebral disc morphology account for 86–93% thoracic spine curvature [62]. Disc morphology has a stronger negative correlation with ageing than vertebral morphology [62, 63]. Therefore, the increase in thoracic kyphosis observed with ageing may be related to the changes occurring in intervertebral discs. Most of these changes occur in the middle section of the thoracic spine [64], which can explain why statistical significance was reached only when kyphosis was measured from T4/5. For these reasons, and due to the technical difficulties with visualising the vertebrae above T4 from lateral radiographs [2], measuring kyphosis from T5 may provide more accurate measurements.

Normative values

The normative values surpassed 40° in 65% of the analysis. This finding challenges the accuracy of the current threshold used for defining normality (i.e. 40°). This cut-off was first introduced by Roaf in 1960 [1], but without supporting evidence for it. Despite subsequent studies showing that healthy children, adolescents and adults could have thoracic curvatures exceeding 40° [6, 65], this value is still used in practice [3, 4]. Some authors suggested moving this cut-off to 50° [2]. However, even this suggestion may not decrease the chances of misclassifying patients, since 35% of the ranges presented in this review surpassed 50°. Using a range of 20–60° [9] may seem more appropriate, since the ranges provided never exceeded 60°. Nonetheless, people x < 40 appeared to have a significantly smaller kyphosis than those x > 60. Consequently, using the same reference values for both groups may lead to misclassification anyway. When kyphosis was measured between T4/5 and T12, its value significantly differed also between people x < 50 and x > 50. This may indicate a higher measurement precision when those body references were used. Thoracic kyphosis varied depending on the body references selected to calculate it, with a trend showing that including higher vertebrae leads to greater values. Therefore, using specific reference values, like those presented in this review, which account for age and body references, could be the most accurate alternative for clinicians.

Gender and ethnic group differences

Thoracic kyphosis does not seem to be influenced by gender, since the between-group mean difference never reached statistical significance. Although the precision of the results could have been affected by the small number of studies subdividing their sample by age groups and gender, these findings align with previous evidence [7, 57].

Significant differences in kyphosis between the ethnic groups were seen, with Europeans and North Americans showing a greater kyphosis than Asians. Genetic differences may explain this result. A twins study found that thoracic kyphosis is influenced by genetics and that it also negatively correlates with bone mineral density [66], also related to genetics [67]. However, other lifestyle factors, such as sports, could also influence thoracic curvature [68], but no data were available to investigate those relationships. Since only 14 studies specified the sample ethnicity [20, 32, 34,35,36,37,38, 41,42,43,44,45,46, 59], people were grouped according to geography. This can represent a limitation since some areas have habitants from different socio-cultural backgrounds. Most of the studies that specified sample ethnicity included people from Asia [32, 34, 35, 37, 38, 41,42,43, 45, 59] or Europe [20, 36, 38, 44], which further affects the reliability of the results for North America.

Strengths and limitations

This reviewed employed rigorous methods, with transparent reporting (PRISMA and SWiM guidelines), and a completed PRISMA checklist relative to this article can be found in Additional file 3. The main strength of this review lies in the high quality of studies included and the large sample size utilised for computing the values presented. These factors strengthen the confidence in study findings. No information about kyphosis measured with a kyphometer or flexicurve was provided because of poor information retrieval, perhaps due to the limited sensitivity of the search tool [15]. The AQUA tool was utilised to assess study quality, but data regarding its validity and reliability is lacking [17]. Since clinical and methodological heterogeneity can preclude a meta-analysis [25], and concerns regarding the reliability of the results of the meta-analysis carried out on observational studies exist [69], the authors considered a narrative synthesis most appropriate. Finally, the sample utilised to create the normative values presented was not randomly selected from the general population, but it was created by combining the samples of the individual studies included in the review, and this could represent a form of selection bias. However, the rigorous methodology employed, the size and the heterogeneity of the sample may partially mitigate this limitation.

Clinical implications

Surgical interventions aiming to correct adult spinal deformities are recommended in those cases with progressive deformities, significant neural compromising, pain or functional limitations, and that did not respond to conservative management [9]. To help these patients, different surgical approaches are available, from minimally invasive operations, such as laminectomies, to deformity correction and vertebral fusion surgeries. These more invasive interventions may target only a limited and specific number of vertebrae in mild and moderate cases or extensive portions of the thoracic and lumbar spine in more severe cases [70], reaching as high as T3–T4 in some instances [71]. These more invasive interventions are associated with high risk of complications and worse functional outcomes if the surgical correction is suboptimal; thus, careful surgical planning is paramount [70]. Among the individual patient’s characteristics to be considered when planning for surgery, there are patient’s age [72] and ethnicity [71]; consequently, we believe that the normative values provided in this review, which account specifically for these characteristics, despite being supported by low-quality evidence, may prove beneficial in a clinical context. This information may help clinicians deciding and planning their interventions.

Conclusion

This review provides evidence that a positive correlation between kyphosis and age exists. It also shows that thoracic kyphosis seems to not be influenced by gender, but to vary depending on ethnicity, age, and the body references used to measure it. The normative values of kyphosis currently used in clinical practice may not reduce the chances of misclassifying patients, since they do not account for those characteristics, and they may not be precise enough to correctly inform clinicians when planning and performing corrective spinal surgeries. Therefore, using specific reference values, such as those presented in this study, which account for body reference, age, and ethnicity, when assessing and treating patients may represent the most accurate solution for clinicians.