Reliability of vertebral fracture assessment using multidetector CT lateral scout views: the Framingham Osteoporosis Study
- First Online:
- Cite this article as:
- Samelson, E.J., Christiansen, B.A., Demissie, S. et al. Osteoporos Int (2011) 22: 1123. doi:10.1007/s00198-010-1290-6
- 201 Views
Two radiologists evaluated images of the spine from computed tomography (CT) scans on two occasions to diagnose vertebral fracture in 100 individuals. Agreement was fair to good for mild fractures, and agreement was good to excellent for more severe fractures. CT scout views are useful to assess vertebral fracture.
We investigated inter-reader agreement between two radiologists and intra-reader agreement between duplicate readings for each radiologist, in assessment of vertebral fracture using a semi-quantitative method from lateral scout views obtained by CT.
Participants included 50 women and 50 men (age 50-87 years, mean 70 years) in the Framingham Study. T4-L4 vertebrae were assessed independently by two radiologists on two occasions using a semi-quantitative scale as normal, mild, moderate, or severe fracture.
Vertebra-specific prevalence of grade ≥1 (mild) fracture ranged from 3% to 5%. We found fair (κ = 56-59%) inter-reader agreement for grade ≥1 vertebral fractures and good (κ = 68-72%) inter-reader agreement for grade ≥2 fractures. Intra-reader agreement for grade ≥1 vertebral fracture was fair (κ = 55%) for one reader and excellent for another reader (κ = 77%), whereas intra-reader agreement for grade ≥2 vertebral fracture was excellent for both readers (κ = 76% and 98%). Thoracic vertebrae were more difficult to evaluate than the lumbar region, and agreement was lowest (inter-reader κ = 43%) for fracture at the upper (T4-T9) thoracic levels and highest (inter-reader κ = 76-78%) for the lumbar spine (L1-L4).
Based on a semi-quantitative method to classify vertebral fractures using CT scout views, agreement within and between readers was fair to good, with the greatest source of variation occurring for fractures of mild severity and for the upper thoracic region. Agreement was good to excellent for fractures of at least moderate severity. Lateral CT scout views can be useful in clinical research settings to assess vertebral fracture.
KeywordsComputed tomographyLateral scoutReliabilityScoutviewsSemiquantitativeVertebral fracture
Vertebral fractures are the hallmark of osteoporosis and the most common osteoporotic fracture affecting as many as half of US white women and men in their 80s [1–3]. In Europe, one in eight individuals greater than 50 years have vertebral fractures , and worldwide, the estimated number of clinical vertebral fractures for the year 2000 was 1.4 million  In 2005, incident vertebral fractures were responsible for more than 1.0 billion dollars in direct healthcare costs in the US . Vertebral fractures cause pain, disfigurement, depression, and functional impairment [7–9]. Moreover, vertebral fractures are harbingers for future disabling fractures as well as death [10–12]. Despite the impact of vertebral fracture, less than one third of individuals with vertebral fractures receive medical attention and even fewer are treated [13–15].
Numerous methods have been developed to assess vertebral fracture from conventional lateral radiographs [16, 17]. More recently, these methods for detecting vertebral fracture have been applied to lateral spine images from dual-energy X-ray absorptiometry (DXA) vertebral fracture assessment (VFA) [18–20] with the intention to improve diagnosis and treatment without increasing cost and exposure to radiation. Increasingly, older individuals are undergoing thoracoabdominal computed tomography (CT) examinations that include acquisition of posterior-anterior and lateral scout view images which are 2D digital radiographs used to localize the area to be imaged. Although resolution of a scout view is lower than for a conventional radiograph, there is increasing interest in use of CT scout view images for evaluation of vertebral fracture [21, 22]. Whereas reliability of vertebral fracture assessment using radiographs and VFA has been well studied [16, 17, 20, 23–28], there is only a single prior study , conducted more than a decade ago, that reported reliability of vertebral fracture assessment from CT lateral scout views. Since this study, there have been significant advances in CT technology, allowing for faster imaging and better resolution.
As part of a large observational study of osteoporosis, we are evaluating CT image data of the spine acquired for participants in the Framingham Study who underwent cardiac imaging studies. Our protocol calls for vertebral fracture assessment from lateral CT scout views using a semi-quantitative method, and we aimed to evaluate reliability of our measurements. Therefore, we conducted a study to determine inter- and intra-reader reliability for vertebral fracture assessment using lateral spine scout views obtained as part of CT examinations for a sample of 100 participants.
Participants for the current study were selected from the Framingham Heart Study Offspring and Third Generation Multi-detector Computed Tomography (MDCT) Study . Members of the Offspring and Third Generation Cohorts include second and Third Generation Offspring (and their spouses) of the original cohort that was established in 1948 [30–32]. In 2002-2005, QCT scans of the chest and abdomen were acquired in 3,529 participants in the MDCT Study (age 40-90 years, mean 51 years) for assessment of coronary and aortic calcium . Exclusion criteria for the MDCT Study included pregnancy or age less than 40 years for women, age less than 35 years for men, and weight greater than 320 pounds.
We selected 100 subjects to conduct a reliability study of vertebral fracture identification. To ensure an adequate number of individuals with vertebral fractures in our reliability study, a clinical investigator experienced in evaluating lateral spine images reviewed the scout images of persons age 70 years and older to identify 16 individuals (eight women and eight men) with suspected vertebral fracture . We then selected a convenience sample of 84 additional persons so that our study group included 100 individuals (50 women and 50 men) in the age groups representing the distribution of age in the parent study and giving more weight to the older groups with greater prevalence of fracture, as follows: seven persons age 50-54 years and seven persons age 55-59 years, 14 persons age 60-64 years, 16 persons age 65-69 years, 19 persons 70-74 years, and 37, age 75+ years.
Computed tomography scan
An 8-slice multidetector computed tomography scanner (Lightspeed Ultra, General Electric Medical Systems, Milwaukee, WI, USA) was used for cardiac imaging of calcification [29, 35]. The scout views consisted of frontal and lateral low energy 2D scanograms extending from the upper thoracic (T4) to sacral (S1) vertebral levels.
Study design and assessment of prevalent vertebral fracture
The lateral scout images for the 100 subjects were evaluated twice (“Time 1” and “Time 2”), several weeks apart, by each of two independent readers (“Reader A” and “Reader B”). Thus, 4 evaluations were independently performed for each subject’s scout view image. The scans for the 100 subjects in the reliability study were randomly ordered within five larger sets of scans for evaluation in the parent study. Details of the reliability study were not revealed to readers and images were stripped of demographic and any other identifying information. In addition, we provided different identification numbers associated with the 100 subjects for each reading set so that each batch appeared to the readers as a unique set of subjects. Also, the order of the subject scans was changed for each evaluation.
The two readers were experienced radiologists who underwent training with a musculoskeletal radiologist who developed a semi-quantitative method to assess vertebral fracture . Prior to this reliability study, CT scout views for 45 subjects randomly selected from the parent study were used as a training set for readers along with a radiographic atlas. We provided readers with an automated system and instruction manual to directly enter findings into an electronic database.
Readers evaluated 13 vertebral levels from T4 to L4 and visually classified vertebrae as normal (grade 0), mildly deformed (grade 1, ∼20–25% reduction in anterior, middle, and/or posterior height and a reduction in area 10–20%), moderately deformed (grade 2, ∼25–40% reduction in any height and a reduction in area 20–40%), or severely deformed (grade 3, ∼40% reduction in any height and area) . Scan quality was recorded as “good” or “poor”. Vertebral levels unable to be adequately visualized were classified as ‘unreadable’. Vertebral deformities due to causes other than fracture, such as spondylosis and Scheuermann’s disease, were classified as non-osteoporotic deformities. These were considered to be non-fractured in the analysis .
Prevalent vertebral fracture was defined as a vertebral body graded at least mildly deformed (grades 1–3). We also used a more restrictive classification of vertebral fracture, defined as moderately to severely deformed (at least ∼25–40% reduction in any vertebral height or grades 2–3).
We calculated the frequency of unreadable vertebrae by spinal location for each reader at each of the two time points. Prevalence of fracture, per person and per vertebra, was calculated using two dichotomized definitions of fracture:  grades 1-3 (mild, moderate, and severe) versus grade 0 (normal), and  grades 2-3 (moderate and severe) versus grades 0-1 (normal and mild). In calculating prevalence, vertebrae that were unreadable were classified as normal.
The number and severity of fractured vertebrae is given for each of the two readers at each of the two time points. To calculate severity of vertebral fracture per person, individuals with more than one fracture were classified according to his/her most severe fracture grade (normal, mild, moderate, and severe).
We estimated agreement for fracture prevalence per vertebra, corrected for chance, using a simple kappa statistic and associated 95% confidence intervals (CI) . We provide kappa for intra-reader agreement of facture prevalence per vertebra between time 1 and time 2, separately for each of the two readers, and kappa for inter-reader agreement of facture prevalence per vertebra between reader A and reader B, separately for each of the two time points.
We conducted stratified analysis to evaluate potential differences in agreement of facture prevalence per vertebra by spinal region, categorized as T4-T9, T10-T12, and L1-L4 so that we could compare results to others  and have adequate numbers of fractures within each region for meaningful statistical analysis. We calculated kappa (ĸ) to estimate inter- and intra-reader agreement for fracture prevalence per vertebra beyond that expected by chance, and considered kappa >0.75 as excellent agreement, 0.40-0.75 as fair to good agreement, and <0.40 as poor agreement beyond that expected by chance as characterized by Fleiss . Finally, we repeated our analysis including only those scans indicated as good quality by readers.
Characteristics of sex- and age-stratified convenience sample for reliability study of vertebral fracture assessment of 100 participants in the Framingham Offspring and Third Generation Multidetector Computed Tomography Study, 2002-2005
Women (N = 50)
Men (N = 50)
N or mean
N or mean
Body mass index (kg/m2)
Number of individuals according to number of vertebral fractures T4 to L4 independently assessed on CT lateral scout views by two radiologists and two time points, Framingham CT Study
Participants (N = 100)
Number of fractures
Individuals with ≥1 fracture
Semi-quantitative grade of prevalent vertebral fracture (T4-L4) independently assessed on CT lateral scout views, per vertebra and per person, by two readers at two time points, Framingham CT Study
Semi-quantitative fracture grade per vertebra, N = 1,300
Grade 0 (normal)
Grade 1 (mild)
Grade 2 (moderate)
Grade 3 (severe)
Total vertebrae grade ≥1 (mild-severe)
Semi-quantitative fracture grade per persona, N = 100
Grade 0 (normal)
Grade 1 (mild)
Grade 2 (moderate)
Grade 3 (severe)
Total persons grade ≥ 1 (mild-severe)
Agreement for grade ≥1 and grade ≥2 prevalent fractures, per vertebra, for two readers at two time points, Framingham CT Study
Vertebrae (N = 1300)
Intra-reader agreement between time 1 and time 2
Inter-reader agreement between reader A and reader B
≥Grade 1 (mild-severe)
≥Grade 2 (moderate-severe)
Agreement for grade ≥1 prevalent fractures, per vertebra, according to spinal region, for two readers at two time points, Framingham CT Study
Intra-reader agreement between time 1 and time 2
Inter-reader agreement between reader A and reader B
Restricting the analysis to include only scans indicated as good image quality had little or no improvement on agreement for either intra- or inter-reader agreement.
We determined reliability of a semi-quantitative method to evaluate prevalent vertebral fracture from CT lateral scout views. Four evaluations (by two readers at two time points) were independently performed to classify 1,300 vertebra in 100 participants in a community-based cohort. We found fair (κ = 56-59%) inter-reader agreement for ≥grade 1 vertebral fracture and good (κ = 68-72%) inter-reader agreement for ≥grade 2 fracture. Intra-reader agreement for ≥grade 1 vertebral fracture was fair (κ = 55%) for one reader and excellent for another reader (κ = 77%), whereas intra-reader agreement for ≥grade 2 vertebral fracture was excellent for both readers (κ = 76% and 98%). Upper thoracic vertebrae were more likely to be unreadable than the lower thoracic and lumbar regions, and agreement was lowest (inter-reader κ = 43%) for fracture at the upper (T4-T9) thoracic levels evaluated and highest (inter-reader κ = 76-78%) for the lumbar spine.
Takada and colleagues  used similar methods as ours and reported higher inter- and intra-reader reliability than estimates found in the current study. Agreement for ≥grade 1 fracture between readers (at time 1) was kappa = 0.68, compared to our inter-reader reliability estimates of 0.56 for time 1 and 0.59 for time 2, and intra-reader agreement was 0.79 and 0.87 compared to our estimates of 0.55 and 0.77. Prevalence of fracture was 40% lower in our study which may explain, at least in part, lower estimates of agreement. The subjects in the study of Takada et al.  were women with mean age 56 years, and more than half were part of a clinical trial that required T-scores at least as low as −2.0. In contrast, participants in our study were mean age 70 years and included equal numbers of women and men. While information on fracture severity was not provided in the study of Takada et al. , higher frequency of more severe fractures is expected in this group which may also help explain the higher reliability estimates compared to our results. In fact, agreement between readers for ≥grade 2 fracture in our study was 68-72%, comparable to Takada’s findings, and inter-reader agreement for ≥grade 2 fracture was 76-98%, somewhat higher than reported for ≥grade 1 fracture by Takada et al. . However, prevalence of ≥grade 2 fracture was low (1-2%) in our population-based study.
We found frequency of unreadable vertebrae was highest and intra-reader agreement was lowest for the upper thoracic spine. This was consistent with Takada et al.  and others who used different imaging modalities [20, 39, 40] and fracture classification methods . Reproducibility of readings in the upper thoracic spine may be worse than other regions due to poorer image quality, in part attributable to the projection of other structures (pulmonary hilus and scapula) on the vertebral body. The lower levels of the spine have larger vertebrae and less obliquity and obscuring lung markings than the upper vertebrae. In contrast, Olmez et al.  reported, in a study of 67 postmenopausal women, little difference in agreement by vertebral level, assessed using the Kleerekopper method .
Intra-reader agreement for semi-quantitative vertebral fracture assessment using CT scout images in our study is lower than estimates reported for lateral spine images using DXA or conventional radiography. In one study that examined reliability of DXA for fracture assessment in 203 postmenopausal women (mean age 68 years), intra-reader agreement, as measured by kappa, was 64-77% . Intra-reader agreement is highest using lateral radiograph of the spine, considered the gold standard for vertebral fracture diagnosis, with a kappa estimate given at 89% . Although image quality is lower for CT scout views than conventional radiography, CT scout views offer some advantages. Specifically, CT fan-beam images do not suffer from parallax distortion present in cone-beam imaging geometry as in conventional radiographs. Since a lateral scout view is acquired as part of conducting a CT study for purposes other than identification of spinal fracture, assessment of vertebral fracture using this method exposes individuals to lower radiation levels than conventional radiography, while at the same time identifies candidates for treatment who might not have otherwise been evaluated.
This study is limited by the relatively small number of fractures; however, prevalence is lower for community-based populations as in the current study than for subjects recruited from clinical settings.
Another limitation is that we did not perform sagittal midline reformations to improve the sensitivity of fracture detection. The scope of this study was to evaluate in 100 subjects reader reliability of methods used to assess vertebral fracture in participants in the parent study, the Framingham Heart Study Offspring and Third Generation MDCT Study . Thus, although sagittal spine images will identify vertebral fractures not evident on the transverse axial images [43–46], use of reformations was beyond the scope of the current study.
This study is also limited by the lower spatial resolution and higher photon noise of CT scout views compared to other image modalities intended for evaluation of vertebral fracture, such as conventional radiography. In contrast to radiographs, however, CT scout views are not affected by parallax distortion which can reduce accuracy in fracture diagnosis. Additionally, information on reliability of the method used in this study is needed since these images allow identification of fracture in patients undergoing CT examinations for other purposes without additional procedures or radiation exposure. Given the high risk of fracture (and death) in older adults with prevalent vertebral fracture, identification of vertebral fracture from CT scout views provides opportunity for diagnosis and treatment to prevent pain, loss of function, and medical costs associated with subsequent fracture.
In conclusion, in this community-based sample of women and men, agreement between readers was fair to good for assessment of vertebral fractures using lateral CT scout views, (56-59% for mild fractures and 68-72% for moderate to severe fractures). Agreement for mild to severe fracture within readers was good to excellent (55-77%), whereas within-reader agreement for moderate to severe fracture was excellent (76-98%). Evaluation of lateral CT scout views using a semi-quantitative method is a reliable approach to identifying prevalent vertebral fractures. Lateral CT scout views can be useful in clinical settings to assess vertebral fracture.
This work was supported by NIH R01AR053986, R01AR/AG041398, K01 AR053118, T32 AG023480, and by the National Heart, Lung, and Blood Institute (NHLBI) Framingham Heart Study (NIH/NHLBI Contract N01-HC-25195).
Conflict of interest