Introduction

Adverse consequences of hyperkyphosis (excessive thoracic kyphosis) include physical functional limitations [14], injurious falls [5], back pain [6], respiratory compromise [7], restricted spinal motion [8], fractures [9, 10], and mortality [1113]. However, a recent randomized, controlled trial found that hyperkyphosis was remediable, encouraging further study of its prevention and treatment [14].

Impediments to large-scale hyperkyphosis research are the difficulties inherent in obtaining the criterion standard measurement, the modified Cobb angle [1519], including expense, limited portability of X-ray equipment, X-ray exposure, and the time necessary to procure and read the radiographic image.

To facilitate hyperkyphosis research, investigators have developed inexpensive and X-ray-free kyphosis measures, such as the Debrunner kyphometer and the flexicurve ruler. The Debrunner kyphometer consists of a protractor mounted on two arms, the ends of which are positioned on specified bony landmarks; kyphosis angle is read from the protractor [6, 20]. While those with advanced training may readily recognize the landmarks, other research staff may have a difficult time accurately and reproducibly identifying the correct levels. The flexicurve ruler, gently pressed onto the back, adopts the thoracic and lumbar contours of the participant. The researcher then traces the ruler’s retained shape onto paper and calculates the kyphosis index (Fig. 1) [21]. One can also calculate an inscribed angle of kyphosis from the tracing, using geometric formulae (Fig. 1) [14].

Fig. 1
figure 1

Three methods of quantifying thoracic kyphosis angles are illustrated. The modified T4–T12 Cobb angle (dotted lines) measures the angle created by lines drawn parallel to the limit vertebrae visualized on a lateral standing thoracolumbar radiograph. In this case, the limit vertebrae are pre-specified at T4 and T12. The Flexicurve kyphosis index and angle are computed using measurements taken from the flexicurve tracing of the thoracic curve, represented here by the solid dark curve posterior to the thoracic vertebral bodies. To calculate the Flexicurve kyphosis index, the apex kyphosis height (E) is divided by the length of the entire thoracic curve (L). The Flexicurve kyphosis angle, Theta (θ), is calculated using lines drawn perpendicular to the short sides of the triangle inscribed by the thoracic curve. This triangle is demarcated by points a (Apex), b (at the cranial end of the curve), and c (at the caudal end). Theta equals arc tan (E/L1) + arc tan (E/L2)

Although the non-radiological kyphosis measures minimize cost and obviate radiation, they have enjoyed limited adoption. One explanation may be that they are not calibrated to the Cobb angle, which limits their clinical interpretation. A metric that translates a non-radiological kyphosis result into an approximate Cobb angle would allow estimation of clinical severity from non-Cobb measures. Demonstrations of the reliability and validity of the non-radiological measures, especially in older persons, have been minimal, a possible second reason for limited use [13, 20, 2224].

Therefore, we designed this study to describe: (1) the intra-rater and inter-rater reliability of three non-radiological kyphosis measures, the Debrunner kyphosis angle, the flexicurve kyphosis index, and the flexicurve kyphosis angle; (2) the validity of each non-radiological measure using the modified Cobb angle as the criterion standard; and (3) a translational formula that provides an approximate Cobb angle based on results of the non-radiological measures. We used baseline data from the Yoga for Kyphosis trial, during which we performed standing lateral radiographs to assess modified Cobb angle as well as multiple, same-day, intra-rater and inter-rater measures of the non-radiological assessments.

Methods

Participants

The analysis sample came from the Yoga for Kyphosis Trial, a single masked, randomized, controlled trial (RCT) of Yoga intended to improve thoracic hyperkyphosis [14]. The trial enrolled 118 participants aged ≥60 years with Debrunner kyphometer-assessed kyphosis angle ≥40°. Major RCT exclusions were: serious comorbidity; use of an assistive device; or unable to pass a movement-safety screen. Of 118 persons enrolled in the RCT, 113 had a standing radiological Cobb angle and at least one non-radiological assessment of kyphosis at RCT baseline, making them eligible for this analysis.

Kyphosis measurement

All kyphosis measures were made on the same day, within a 4-h window. The modified Cobb angle, based on the technique originally described by Cobb to quantify scoliosis, was measured on standing lateral thoracolumbar radiographs [1719], specifying the limit vertebrae at T4 and T12 [18]. Because some radiographs did not permit use of specified limit vertebrae (e.g., due to overlying structures) Cobb angles from 20 films were based on eight vertebrae (T4–T11 or T5–T12) and Cobb angles from six films were based on seven vertebrae (T5–T11). Non-radiological measures of kyphosis included the Debrunner kyphometer angle, the Flexicurve kyphosis index, and the Flexicurve kyphosis angle. The upper arm of the Debrunner kyphometer was placed on C-7 and the lower arm on T-12. The circumscribed kyphosis angle was read from the protractor [6, 20]. Debrunner measurements were flagged as problematic in eight cases, because it was difficult to get the base of the arms flush on the landmarks. The Flexicurve kyphosis index was measured using a Flexicurve [21, 25]. The cephalic end of the Flexicurve was placed on C-7, and it was molded to the spine in the caudal direction. The shape was traced onto paper, and the apex kyphosis height was estimated relative to the length of the entire thoracic spine; this is the Flexicurve kyphosis index (Fig. 1). Using geometric formulae, the Flexicurve kyphosis angle was also calculated from the Flexicurve tracing. By definition, this inscribed angle is systematically less than the circumscribed angle (Fig. 1).

Training and time required for non-radiological kyphosis measures

Research staff had baccalaureate degrees, but none had formal training in anatomy. Staff training consisted of an initial didactic and demonstration (with the aid of volunteer subjects) by Principal Investigator (GAG). It included: review of basic spine anatomy using illustrations; instruction in how to find landmarks by palpation; demonstration of the placement of the kyphometer and how to read the angle from the instrument’s protractor; demonstration of how to apply the flexible ruler and how to make measurements from it. Each staff member then practiced identifying landmarks and conducting the measures. In aggregate, the didactics and staff practice took approximately 40 min. During the conduct of the study, each Debrunner measurement took between 1 and 2 min to make and record, depending on the degree of difficulty ascertaining landmarks. Each flexible ruler measure took 30 s to make; subsequent tracing of the shape on paper and taking the measurements to calculate the angle and index took 2.5 min.

Intra-rater and inter-rater reliability

Each clinical kyphosis assessment was made three times for each participant (with repositioning) by the same staff person; the average was the primary value. These three measures also permitted evaluation of intra-rater reliability. For inter-rater reliability, immediately following the first set of measures, one other masked research associate made a 4th assessment, with repositioning, in 54 participants. (Inter-rater sample size ranged from 51 to 54 due to missing values.)

Statistical analyses

We examined the within-rater, intra-class correlation coefficients (ICC = between-person variance divided by total variance) for each of the non-radiological kyphosis measures using the three measurements made on each participant by the primary rater. In the 54 participants in the inter-rater subset, who had paired ratings made by a single first and a single second rater, we compared the average of the three measures from the primary rater with the single measure from the secondary rater, calculating inter-rater ICCs. Both intra-rater and inter-rater ICCs were also examined after stratification by kyphosis severity, defined by Cobb angle median split: moderate if <53°, severe if ≥53°. To compare the non-radiological kyphosis measures with the Cobb angle criterion standard, we examined Pearson correlations between each non-radiological measure and Cobb angle. These analyses were repeated after first excluding 26 participants whose Cobb angles did not span T4–T12 and then excluding seven individuals whose Debrunner measurements were flagged as problematic. In each of these samples, correlations were also examined after stratification by kyphosis severity. We created mathematical formulae to convert the non-radiological results to equivalent Cobb angles. Formulae were created by simple linear regression of the Cobb angle on each of the non-radiological measures in the sample that excluded participants whose Cobb angles did not span T4–T12 and whose Debrunner measurements were flagged as problematic. To test if Cobb angles measured using alternate landmarks had systematic error, in the 20 participants whose Cobb angle measurements spanned either T5–T12 or T4–T11, we compared the measured Cobb angle with the Cobb angle predicted by the clinical measures, using the paired t test. Finally, in the sample in which we derived the Cobb angle prediction equations (Table 5), we conducted Bland–Altman analyses. Bland–Altman analysis consists of the examinations of two graphs. The first graph is an identity plot, a scatter plot of the two measurements along with the line y = x. If the measurements agree closely, then the scatter plot points will line up near to the line y = x. The identity plot was produced only for measured Cobb angle and the measured Debrunner kyphosis angle, because they measure the same thing (circumscribed kyphosis angle) and use the same metric (degrees). The second graph is a Bland–Altman plot, a scatter plot of the variable’s means plotted on the horizontal axis and the variable’s differences plotted on the vertical axis; it includes approximate 95% confidence bands (the confidence bands assume normality of differences). The Bland–Altman plot illustrates the amount of disagreement between the measures being compared. Bland–Altman plots were created for the measured Cobb angle and each of the following: measured Debrunner kyphosis angle; Debrunner-predicted Cobb angle; Flexicurve kyphosis index-predicted Cobb angle; and Flexicurve kyphosis angle-predicted Cobb angle. The scientific importance of these differences is judged qualitatively; however, we also computed the standard deviation of the mean difference between the Cobb angle and each comparator to gauge the magnitude of the error [26].

Results

The mean age of the study sample was 75.3 years, average body mass index was 26.5, and 80.5% were women. These and other characteristics of the full sample and the inter-rater reliability sample are summarized in Table 1.

Table 1 Baseline demographic, behavioral and medical characteristics of study participants

Shown in Table 2, the mean Cobb angle in the full sample was 53.76°. In the 87 cases with T4–T12 Cobb angles, the mean Cobb angle value was 55.43. Average Debrunner kyphosis angle was similar to the average Cobb angle. As expected, the inscribed flexicurve kyphosis angle averaged about 20° less than the circumscribed Cobb and Debrunner angles.

Table 2 Average values and distributions of standing Cobb angle and non-radiological kyphosis measurements

In the full sample, intra- and inter-rater reliabilities (ICCs) were uniformly high for all kyphosis assessments, 0.96 to 0.98 (Table 3). We also computed ICCs in subsamples, using the median value of the sample Cobb angle to define severity. Restriction of range in subsamples compared to the full sample systematically lowers the ICC value, but ICCs of the two subsamples can be compared to each other: reliabilities were similar in those with moderate and severe kyphosis. We also calculated the inter-rater reliability based on only the first measurement from the rater one and the 4th from rater two; results did not differ (data not shown). Analyses excluding eight cases that were flagged for difficult kyphometer placement did not alter the intra- or inter-rater reliability estimates for that device (data not shown).

Table 3 Intra- and inter-rater reliabilities of three non-radiological kyphosis assessments

The modified Cobb angle was our criterion measurement; non-radiological measures were compared to it to gauge their validity (Table 4). In the full sample, the Pearson correlations between the non-radiological kyphosis measures and the Cobb angle ranged from 0.62 to 0.69 (95% confidence Interval [CI] for each estimate was ±0.184). Correlations between each non-radiological measure in the 87 persons with T4–T12 Cobb angles were approximately 0.72, somewhat higher than the correlations based on the entire sample. In the sample that was also restricted to those whose Debrunner measures were not flagged as difficult (N = 80), the Pearson correlations between the clinical kyphosis measures and the Cobb angle were even higher, and ranged from 0.762 to 0.758. In aggregate, there was a trend towards higher correlations as the samples were progressively restricted. Comparing the severity subsamples, correlations between each non-radiological measure and the Cobb angle were somewhat higher in those with severe compared to those with moderate hyperkyphosis, but overlapping CIs did not support a statistically significant difference between them.

Table 4 Validity of three non-radiological measurements of kyphosis compared to the Cobb angle criterion standard

Non-radiological tests were calibrated to the Cobb angle, using linear regression: the T4–T12 Cobb angle was the outcome and each non-radiological kyphosis measure was the predictor (Table 5). The R 2 was 0.57–0.58 for each of the measures. Except for a systematic bias of about 5°, the Debrunner kyphosis angle was very similar to the Cobb angle: the beta coefficient, or scaling factor, to convert Debrunner angle to Cobb angle was 1.067. As expected, the flexicurve angle was systematically smaller than the Cobb angle; it had to be scaled by 1.53 to get the equivalent Cobb angle. The kyphosis index may also be approximated to the Cobb angle by using the conversion factor (about 315) and an offset of about 5°.

Table 5 Calibration of non-radiological kyphosis measurements to theT4–T12 Cobb angle (n = 80)

In the 20 individuals with Cobb angle measurements that spanned one less vertebral body (i.e., T4–T11 or T5–T12), mean Cobb angle was smaller than the Cobb angle predicted by the clinical kyphosis measures by about 8° in each case (data not shown), indicating that when the Cobb angle measure spans fewer vertebral bodies, the Cobb angle is systematically underestimated.

An identity plot graphically displays the agreement between the measured Cobb angle and the Debrunner angle (Fig. 2a). To graphically portray the disagreement between the kyphosis measures, Bland–Altman plots, scatter plots of the variable means on the horizontal axis and the variable differences on the vertical axis, were created. These plots include approximate 95% confidence bands. We also computed the standard deviation (SD) of the mean difference between the Cobb angle and each comparator to gauge the magnitude of the error. Figure 2b, c, displays Bland–Altman plots for the measured Cobb angle and each of the following: measured Debrunner kyphometer angle (SD of mean difference, 11.4); Cobb angle-predicted using the Debrunner angle (SD of mean difference, 10.96); Cobb angle-predicted using the Flexicurve kyphosis index (SD of mean difference, 11.26); and Cobb angle-predicted using the Flexicurve kyphosis angle (SD of mean difference, 10.24).

Fig. 2
figure 2figure 2

Identity plot of the measured Cobb angle and the measured Debrunner angle (a). Bland–Altman plots of the measured Cobb angle and each of the following: measured Debrunner angle (b); Cobb angle predicted using the Debrunner angle (c); Cobb angle predicted using the Flexicurve kyphosis Index (d); and Cobb angle predicted using the Flexicurve kyphosis angle (e). Bland–Altman plots include approximate 95% confidence bands and also provide the SD of the difference between the Cobb angle and each comparator. Please see Methods for details

Discussion

The overarching goals of this study were to calculate the reliability and validity of the Debrunner kyphometer angle, flexicurve kyphosis index, and flexicurve kyphosis angle and to calibrate each to the Cobb angle. Intra- and inter-rater reliabilities for the three non-radiological kyphosis assessments were uniformly high (0.96 to 0.98) and did not differ statistically from each other. Comparing the non-radiological kyphosis measurements to the Cobb angle also yielded validity estimates that were not distinguishable; all correlations were moderate (0.62 to 0.69). Our derived regression equations that scaled the non-radiological kyphosis estimates to the Cobb angle had robust R 2 values, between 0.57 and 0.58.

This study’s high inter-rater and intra-rater reliabilities of Debrunner kyphometer and the Flexicurve kyphosis index, based on ICC values, mirrored reliabilities developed in a sample of 26 postmenopausal women with osteoporosis (but whose age range and degree of kyphosis was not specified); in that sample, inter-rater and intra-rater ICCs between 0.89 and 0.99 were found for each test [20]. The present analysis expands upon prior work by including a greater sample size, older subjects (in whom measurements may be more challenging), and a broad range of kyphosis over which reliabilities were assessed. The two studies agree, however: inter- and intra-rater reliabilities approach perfect and do not differ between the Debrunner kyphometer and the Flexicurve kyphosis index [27]. Although Ohlen examined reliability of the Debrunner kyphometer in 31 young volunteers and Ettinger tested reliability of the Flexicurve kyphosis index in 75 women aged 65–91 years, these two studies used different statistical methods to quantify reliability than those used in the present study, precluding direct comparison of their reliability estimates to ours [22, 24].

To our knowledge, published work has not reported the validity of the Debrunner kyphometer or the Flexicurve kyphosis index compared to the standing Cobb angle. Based on a sub-sample of 120 women from the Fracture Intervention Trial, Kado et al. calculated an ICC of 0.68 for the kyphosis index compared to a supine Cobb angle; however, the supine position would be expected to lessen the angle of kyphosis and lower the validity estimate [28].

Creating a mathematical formula that approximates Cobb angle based on a non-radiological kyphosis measure is not a novel idea and its value in avoiding radiation and facilitating longitudinal measurement has been recognized [23]. However, cross-calibration has been done only for the Debrunner instrument in an adolescent sample [23]. The present study offers metrics that allow researchers and clinicians to scale the Debrunner angle, Flexicurve kyphosis index, and the newly developed Flexicurve kyphosis angle to a standing radiological Cobb angle in adults with hyperkyphosis. For example, the Flexicurve kyphosis index–Cobb translations could enhance the interpretation of an important finding from the Study of Osteoporotic Fractures (SOF): that greater Flexicurve kyphosis indices predicted higher mortality independently of vertebral fracture [13]. It is now possible to approximate the Cobb angles that these indices represented: using the current study’s metric, the SOF sample’s mean predicted Cobb angle would be 43.8° (standard deviation, 10.7). Thus, the relative mortality hazard per kyphosis index standard deviation developed in SOF can be roughly translated to a 15% increase in mortality per each 10.7° increment in Cobb angle.

This study intended to inform deliberations about which of the three non-radiological tests used in the Yoga for Kyphosis project might be best suited to large observational or interventional kyphosis studies, in which sizable numbers of participants would be evaluated at multiple times. Because these types of studies necessitate multiple raters, the first consideration is the inter- and intra-rater reliabilities. On this basis, all three assessments performed nearly perfectly and equally. A second basis for ranking the three tests is validity, but this also did not discriminate among them. Finally, when compared to the criterion standard measured Cobb angle, Cobb angles predicted using each of the non-radiological measures had similar magnitude errors according to the Bland–Altman plots. Therefore, factors such as simplicity of use and sensitivity to anatomical variability may suggest the most favorable approach. The flexicurve may be easier for research staff without medical training, as it does not require identification of caudal landmarks. The flexicurve traces the contour of the entire spine; the inflection points between the cervical lordosis, thoracic kyphosis, and lumbar lordosis define the spinal curves. In contrast, the Debrunner kyphometer must be placed on palpated landmarks [6]. Despite careful protocols, the inferior landmark can be particularly difficult to discern, especially when lumbar lordosis has reversed [21]. The Cobb and Debrunner angles base their measurements entirely on the two ends of the spinal curve. If there are no problems at these locations (such as endplate tilt of limit vertebrae or difficult Debrunner placement), dependence on the terminal portions of the curve will not be strongly influential [29]. However, when anatomical abnormalities are present, then an instrument such as the Flexicurve, which uses the entire spinal contour, will be more robust because deformities in part of the spine will not introduce large errors. In this regard, the Flexicurve is akin to the centroid angle, which computes kyphosis using the midpoints of all vertebral bodies from T1–T12 [29]. Indicative of the error introduced by difficult landmark determination was the trend toward higher a correlation between the Debrunner and Cobb angles when eight individuals with difficult Debrunner measures were omitted from the validity computation (Table 4).

Use of the T4–T12 constrained Cobb angle had merits and limitations. In favor of the constrained Cobb is that the uppermost thoracic vertebrae are often poorly visualized due to overlying tissue density. Another attribute of the constrained technique is that the identification of the most inclined vertebral body, which marks the transition from the thoracic to the lumbar curves, can be difficult, leading to low intra-rater reliability for determination of limit vertebrae, a problem circumvented by using the constrained Cobb technique [30, 31]. It must be acknowledged that the constrained method will misestimate the true kyphosis angle when the transition vertebra is not at the same level as the specified level. In aggregate, the potential measurement errors in the Cobb angle degrade the accuracy of the criterion standard, conservatively biasing this study’s validity estimates.

The reliability and validity estimates of the non-radiological measures of kyphosis calculated from this sample cannot be assumed to apply to all instances in which these measuring devices are used; they are not immutable characteristics of the tests themselves [32]. Deterioration of reliability and validity may occur due to subject characteristics (e.g., obesity hampers landmark location) or to operator characteristics (e.g., staff capability). Because the research associates who performed the measures in the current study had no formal training in anatomy and likely comparable to other entry-level research or clinical staff, we believe that operator characteristics are unlikely to be influential in other settings.

The metrics developed in this study to scale the non-radiological tests to the standing Cobb angle must be viewed as approximations, intended to give investigators and clinicians a “feel” for what the values of the non-radiological tests mean in Cobb angle terms. They are not intended to translate individual patient’s non-radiological measures to Cobb angle values in clinical practice. Rather, these approximate conversion formulae are meant to help researchers get a handle on what the non-radiological tests mean in Cobb angle terms, which will inform the general clinical translation of research results.

In summary, in our study sample, we found that the Debrunner kyphometer, the flexicurve kyphosis angle and the flexicurve kyphosis index had strong and similar validity and reliability. Its low cost, ease of use by entry-level research staff, short measurement time, and relative robustness to variations in spine contour and deformity argue for use of the Flexicurve in longitudinal assessments of kyphosis. This study also provides approximate conversion factors that permit translation of results from three non-radiological kyphosis measures to an approximate Cobb angle value, which will assist researchers in interpreting the clinical meaning of the non-radiological tests.