Abstract
Objective
To test observer agreement and two strategies for possible improvement (consensus meeting and reference images) for the modified Chrispin-Norman score for children with cystic fibrosis (CF).
Methods
Before and after a consensus meeting and after developing reference images three observers scored sets of 25 chest radiographs from children with CF. Observer agreement was tested for line, ring, mottled and large soft shadows, for overinflation and for the composite modified Chrispin-Norman score. Correlation with lung function was assessed.
Results
Before the consensus meeting agreement between observers 1 and 2 was moderate-good, but with observer 3 agreement was poor-fair. Scores correlated significantly with spirometry for observers 1 and 2 (−0.72<R<−0.42, P < 0.05), but not for observer 3. Agreement with observer 3 improved after the consensus meeting. Reference images improved agreement for overinflation and mottled and large shadows and correlation with lung function, but agreement for the modified Chrispin-Norman score did not improve further.
Conclusion
Consensus meetings and reference images improve among-observer agreement for the modified Chrispin-Norman score, but good agreement was not achieved among all observers for the modified Chrispin-Norman score and for bronchial line and ring shadows.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Conventional chest radiography is currently the most commonly used method of monitoring lung structure in cystic fibrosis (CF) patients and annual radiographs are recommended [1]. Several chest radiograph scoring systems to quantify CF-related lung abnormalities have been described in the literature [2–8]. These scoring systems are widely used, both clinically and in (drug) research [9]. Scores correlate significantly with lung function test results and various other clinical parameters [8, 10, 11]. Chest radiographs are abnormal in 85% of children at the age of five years [12] and scores seem to worsen more rapidly over time than spirometry [12–14]. Studies also showed significant treatment effects as measured by chest radiograph scores that were not reflected by spirometric measurements [15]. It is known that high-resolution CT (HRCT) is more sensitive than chest radiographs in detecting structural lung abnormalities in CF and early abnormalities can be seen on HRCT in a substantial number of patients with normal radiographs [16–20]. However, because of the increased radiation dose and costs of HRCT and a non-quantified amount of benefit from more accurate structural HRCT information, chest radiographs are still the most commonly used method for structural lung disease assessment in CF in most centres. Large sets of radiographs are available to (retrospectively) address research questions.
Scores are potentially associated with substantial observer variation and scoring can be more time-consuming than making a clinical report. Our ultimate aim is to develop an automated version of the modified Chrispin-Norman score [2, 6]. In 1974, Chrispin and Norman [2] described their structured methods of semi-quantifying the morphological features that are commonly seen in CF patients. For dose-saving purposes Benden et al. [6] described a modified method for frontal chest radiographs only. The reproducibility between observers of the (modified) Chrispin-Norman scoring system as described in the literature is good with an intraclass correlation coefficient of 0.91 [6], but the reproducibility for the individual scoring system items has not yet been described. Also the illustrations available for the (modified) Chrispin-Norman scoring system are limited. The aim of our study was to determine the reproducibility of the modified Chrispin-Norman score including its components and to test whether a consensus meeting or reference illustrations could improve observer agreement.
Materials and methods
Study population
Chest radiographs were obtained from all 238 children in our CF clinic at the time of annual review A paediatric pulmonologist (observer 3) selected 2 sets of 25 CF children with a range of disease severity who were able to undergo spirometry. For each patient the most recent annual chest radiograph was used for this study. The retrospective study was approved by the institutional review board and informed consent was waived.
Chest radiograph scoring
The modified Chrispin-Norman score is presented in Table 1. Briefly, the modified Chrispin-Norman score uses the frontal radiograph to quantify structural lung disease in CF. The radiograph is divided in four quadrants and for each quadrant the severity of bronchial line shadows, ring shadows, mottled shadows and large soft shadows is scored on a scale from 0 to 2. Bronchial line shadows are thought to represent bronchiectasis and bronchial wall thickening, ring shadows may also represent bronchiectasis, mottled shadows are most likely to be mucus plugs in small and large airways and large soft shadows represent larger lung consolidations with or without loss of lung volume. In addition to these abnormalities the degree of overinflation is scored on a scale from 0 to 2 by assessing both the level of the diaphragm, the degree of hyperlucency and the shape of the thoracic cage.
Three observers who interpret chest radiographs from CF patients on a regular basis were involved in the study (one radiologist with a special interest in chest imaging, one radiology resident with an interest in chest imaging and one paediatric pulmonology fellow with previous research experience in chest radiograph scoring in CF). There were two datasets of 25 radiographs and three scoring rounds. For round 1 the observers were provided with the relevant literature related to the modified Chrispin-Norman score [2, 6, 21] and scored the first dataset of 25 chest radiographs blinded to clinical characteristics except age and gender. Before round 2 a consensus meeting was organised and the second dataset of 25 chest radiographs was scored by the three observers, again independently. During the consensus meeting several chest radiographs with discrepant scores from round 1 were discussed by the observers. The individual scoring system items were individually discussed in consecutive order. For overinflation is was decided to use the level of the posterior rib at the right hemi-diaphragm. For thoracic shape and the degree of hyperlucency no consensus definition was reached and the observers found these abnormalities difficult to define. For ring shadows, bronchial line shadows and mottled shadows mainly differences were noticed between present but not marked and marked. Differences between observers with regard to this cut-off were discussed interactively, but no specific definition was developed. For large soft shadows we discussed that it should be scored when the heart borders or diaphragms were not visible due to lung consolidation. After the second round a set of reference illustrations was developed for each scoring system item (Figs. 1, 2, 3, 4, 5 and 6) by relating HRCT findings to chest radiograph findings for individuals in our cohort who had an HRCT and chest radiograph within one month. These HRCTs were obtained for clinical indications; no additional HRCTs were made as part of this study. None of the reference illustrations was included in the radiographs from set 2. The reference images were not developed by the observers although observer 1 checked the quality of the reference illustrations before they were used in the study. For round 3, the second dataset was scored again by the readers with help of the reference images. The interval between rounds 1 and 2 and between rounds 2 and 3 was at least 1 month.
Lung function testing
Spirometric measurements were available for all the 25 children in both sets of chest radiographs. Spirometry was obtained, as the chest radiograph, at a clinically stable period during the annual check-up. Measurements included forced vital capacity (FVC), forced expiratory volume in one second (FEV1) and mid-expiratory flow at 50% and 75% of VC (MEF50 and MEF75). Measurements were expressed as a percentage of predicted values using the reference values of Zapletal et al. [22] and the FEV1 to FVC ratio was calculated and expressed as a percentage.
Data analysis
Reproducibility between and within observers was assessed visually in scatter plots with a line of identity and by using an intraclass correlation coefficient for the modified Chrispin-Norman score and weighted Kappa values for the individual items of the scoring. The intraclass correlation coefficient takes the distance to the line of identity of the observers into account. An intraclass correlation coefficient between 0.6 and 0.8 represents moderate agreement and values above 0.8 represent good agreement. Kappa values of <0.20, 0.21–0.40, 0.41–0.60, 0.61–0.80 and 0.81–1 are generally considered to represent poor, fair, moderate, good and very good agreement, respectively. Correlation between the modified Chrispin-Norman score and lung function was assessed by using a Spearman correlation coefficient. SPSS 15.0 (Inc. Chicago, IL, USA) and MedCalc 11.2 (Mariakerke, Belgium) were used for data analysis. Data are presented as mean ± standard deviation (range) unless indicated otherwise.
Results
Study population
Age characteristics and lung function test results of the two groups are presented in Table 2. The children ranged in age from 5.3–18.6 years and in FEV1 from 33 to 124% predicted. Structural abnormalities were slightly more severe / more common in the first dataset (p = 0.04).
Reproducibility between and within observers of chest radiograph scoring
For all three rounds the kappa values for the scoring system items and the intraclass correlation coefficients for the modified Chrispin-Norman score are provided in Table 3. Also the within-observer agreement for dataset 2 (round 2 versus round 3) is presented in Table 3. Before the consensus meeting (round 1) the agreement between observers 1 and 3 and 2 and 3 ranged from poor to fair. After the consensus meeting (round 2) the levels of agreement improved between observers 1 and 3 and 2 and 3, especially for the mottled shadows, large soft shadows (fair to good levels of agreement) and modified Chrispin-Norman score, but not for overinflation, ring shadows and bronchial line shadows. Overall the agreement between observers 1 and 2 was slightly lower in round 2 compared with round 1, which might be related to the milder structural lung disease in the second round, although the lower scores may also be a result of the consensus meeting. In round 3 the agreement between the observers improved from poor-fair to moderate-good for overinflation. Also agreement between observers 1 and 2 and between observers 1 and 3 improved for mottled shadows and large soft shadows. Within-observer agreement was better than between-observer agreement.
Correlation with lung function
The correlation between the observers’ modified Chrispin-Norman scores and lung function is presented in Table 4. For round 1 the modified Chrispin-Norman score significantly correlated with the lung function test results for observers 1 and 2, but not for observer 3. In rounds 2 and 3 the modified Chrispin-Norman score for all three observers showed a significant correlation with most lung function tests and the correlation further improved in round 3 when the reference images were used.
Discussion
We demonstrated that when experienced observers use the modified Chrispin-Norman score based on the described literature and published illustrations [2, 6, 21] between-observer agreement can differ substantially and can even be poor to fair, which is contrary to most previous studies. After a consensus meeting we were able to improve the between-observer agreement to more acceptable levels for the overall score and for the items mottled shadows and large soft shadows. Between-observer agreement improved for hyperinflation when reference illustrations were used and correlations with lung function improved for the modified Chrispin-Norman score. Our results indicate that differences between observers within a routine clinical setting or in research studies might easily occur. Contrary to previous studies we also provide insight into the individual scoring system items. Our data suggest that for bronchial line shadows and ring shadows it is difficult to achieve good agreement among observers.
Several options exist to obtain more consistent chest radiograph interpretation results. Within a specific study a simple consensus meeting can improve observer agreement. In our study overall agreement for the modified Chrispin-Norman score improved after the consensus meeting, which is explained by the items mottled shadows and large soft shadows. The consensus meeting did not improve agreement for bronchial line shadows and ring shadows. Also apparently the definition of over-inflation did not lead to improvement of among-readers agreement for this item. We have no good explanation for the differences in improvement between the individual scoring system items as each item was discussed in the consensus meeting. A second step is the development of a set of ‘reference’ images as has, for example, been done with the International Labour Office classification of radiographs of pneumoconiosis [23]. Because for the (modified) Chrispin-Norman score published illustrations were limited in the literature we developed a set of these images for children in whom a high-resolution CT was available. With help of these images, the modified Chrispin-Norman score did not further improve although correlations with lung function did improve. Also the assessment of hyperinflation in particular improved.
A further step would be computer aiding or fully automated scoring of chest radiographs for cystic fibrosis-related disease. Such tools have been developed for radiological investigations [24, 25], but to our knowledge such tools do not exist for chest radiographs in cystic fibrosis. We believe that such a tool might help in obtaining more consistent chest radiograph assessments in a clinical setting. Such a quantification tool might also be helpful to analyse (retrospectively) large sets of chest radiographs or large databases, both for clinical and research purposes. We started the process of developing a fully automated method, based on our previous experience in other diseases [26–28]. Although several CXR systems have been developed for CF we chose to use the modified Chrispin-Norman score for two reasons. First, the Chrispin-Norman score has been modified for the frontal radiograph and our software is currently developed for frontal radiographs. Second, a previous study compared 6 CXR scores [8] and found the modified Chrispin-Norman score to have good reproducibility and correlation with lung function. We believe that although HRCT and magnetic resonance imaging might gain an important clinical role in several centres, many centres all over the world will continue to rely on plain chest radiography for monitoring structural lung disease in CF for many years to come.
Our study has several limitations. First, we included only three observers whereas more observers would have been preferable. However, all observers are routinely interpreting these images and we expect that the observer disagreements would not have been different if more observers had been included in the study. Second, we studied two datasets of which the second may have had slightly milder chest radiograph abnormalities, although possibly the lower scores were a result of the consensus meeting. It is more difficult to obtain good observer agreement when the range of disease is smaller; therefore, the improvement in agreement in round 2 is real but might have been larger if the second dataset had slightly milder structural disease. Third, we were unable to develop a clear definition for the overinflation sub-items chest wall shape and lung fields, although the reference images helped to improve the between-observer agreement for this item. Fourth, we used pulmonary function tests as a reference standard, while HRCT would have been a better comparator for structural lung disease, but we do not obtain HRCT scans routinely in our clinic.
Conclusion
Between-observer agreement for the modified Chrispin-Norman score ranged from poor-good before a consensus meeting was organised and reference images were developed and it improved thereafter to moderate-good levels. Observer differences can easily occur as illustrated in this study and our reference illustrations of the scoring system might improve observer agreement between observers and centres. In the future a fully automated version of the modified Chrispin-Norman score could be useful to obviate observer variation clinically and in research studies.
References
Kerem E, Conway S, Elborn S, Heijerman H (2005) Standards of care for patients with cystic fibrosis: a European consensus. J Cyst Fibros 4:7–26
Chrispin AR, Norman AP (1974) The systematic evaluation of the chest radiograph in cystic fibrosis. Pediatr Radiol 2:101–105
Weatherly MR, Palmer CG, Peters ME, Green CG, Fryback D, Langhough R, Farrell PM (1993) Wisconsin cystic fibrosis chest radiograph scoring system. Pediatrics 91:488–495
Brasfield D, Hicks G, Soong S, Tiller RE (1979) The chest roentgenogram in cystic fibrosis: a new scoring system. Pediatrics 63:24–29
Conway SP, Pond MN, Bowler I, Smith DL, Simmonds EJ, Joanes DN, Hambleton G, Hiller EJ, Stableforth DE, Weller P et al (1994) The chest radiograph in cystic fibrosis: a new scoring system compared with the Chrispin-Norman and Brasfield scores. Thorax 49:860–862
Benden C, Wallis C, Owens CM, Ridout DA, Dinwiddie R (2005) The Chrispin-Norman score in cystic fibrosis: doing away with the lateral view. Eur Respir J 26:894–897
Sawyer SM, Carlin JB, DeCampo M, Bowes G (1994) Critical evaluation of three chest radiograph scores in cystic fibrosis. Thorax 49:863–866
Terheggen-Lagro S, Truijens N, van Poppel N, Gulmans V, van der Laag J, van der Ent C (2003) Correlation of six different cystic fibrosis chest radiograph scoring systems with clinical parameters. Pediatr Pulmonol 35:441–445
Nasr SZ, Kuhns LR, Brown RW, Hurwitz ME, Sanders GM, Strouse PJ (2001) Use of computerized tomography and chest x-rays in evaluating efficacy of aerosolized recombinant human DNase in cystic fibrosis patients younger than age 5 years: a preliminary study. Pediatr Pulmonol 31:377–382
Cleveland RH, Zurakowski D, Slattery D, Colin AA (2009) Cystic fibrosis genotype and assessing rates of decline in pulmonary status. Radiology 253:813–821
Li Z, Kosorok MR, Farrell PM, Laxova A, West SE, Green CG, Collins J, Rock MJ, Splaingard ML (2005) Longitudinal development of mucoid Pseudomonas aeruginosa infection and lung disease progression in children with cystic fibrosis. JAMA 293:581–588
Farrell PM, Li Z, Kosorok MR, Laxova A, Green CG, Collins J, Lai HC, Makholm LM, Rock MJ, Splaingard ML (2003) Longitudinal evaluation of bronchopulmonary disease in children with cystic fibrosis. Pediatr Pulmonol 36:230–240
Terheggen-Lagro SW, Arets HG, van der Laag J, van der Ent CK (2007) Radiological and functional changes over 3 years in young children with cystic fibrosis. Eur Respir J 30:279–285
Cleveland RH, Zurakowski D, Slattery DM, Colin AA (2007) Chest radiographs for outcome assessment in cystic fibrosis. Proc Am Thorac Soc 4:302–305
Slattery DM, Zurakowski D, Colin AA, Cleveland RH (2004) CF: an X-ray database to assess effect of aerosolized tobramycin. Pediatr Pulmonol 38:23–30
Hansell DM, Strickland B (1989) High-resolution computed tomography in pulmonary cystic fibrosis. Br J Radiol 62:1–5
Santis G, Hodson ME, Strickland B (1991) High resolution computed tomography in adult cystic fibrosis patients with mild lung disease. Clin Radiol 44:20–22
Lynch DA, Brasch RC, Hardy KA, Webb WR (1990) Pediatric pulmonary disease: assessment with high-resolution ultrafast CT. Radiology 176:243–248
Bhalla M, Turcios N, Aponte V, Jenkins M, Leitman BS, McCauley DI, Naidich DP (1991) Cystic fibrosis: scoring system with thin-section CT. Radiology 179:783–788
Maffessanti M, Candusso M, Brizzi F, Piovesana F (1996) Cystic fibrosis in children: HRCT findings and distribution of disease. J Thorac Imaging 11:27–38
van der Put JM, Meradji M, Danoesastro D, Kerrebijn KF (1982) Chest radiographs in cystic fibrosis. A follow-up study with application of a quantitative system. Pediatr Radiol 12:57–61
Zapletal A, Samanek M, Paul T (1987) Lung function in children and adolescents. Methods, reference values. In: Zapletal A (ed) Progress in respiration research. Karger, Basel, pp 114–218
International Labour Office (2002) Guidelines for the use of the ILO international classification of radiographs of pneumoconioses. International Labour Office, Geneva
Zheng B, Lu A, Hardesty LA, Sumkin JH, Hakim CM, Ganott MA, Gur D (2006) A method to improve visual similarity of breast masses for an interactive computer-aided diagnosis environment. Med Phys 33:111–117
Moore W, Ripton-Snyder J, Wu G, Hendler C (2010) Sensitivity and specificity for lung nodule detection on chest radiograph with CTA correlation. J Digit Imag. doi:10.1007/s10278-010-9284-7
van Ginneken B, Hogeweg L, Prokop M (2009) Computer-aided diagnosis in chest radiography: beyond nodules. Eur J Radiol 72:226–230
Arzhaeva Y, Prokop M, Tax DM, De Jong PA, Schaefer-Prokop CM, van Ginneken B (2007) Computer-aided detection of interstitial abnormalities in chest radiographs using a reference standard based on computed tomography. Med Phys 34:4798–4809
Loog M, van Ginneken B (2006) Segmentation of the posterior ribs in chest radiographs using iterated contextual pixel classification. IEEE Trans Med Imaging 25:602–611
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
de Jong, P.A., Achterberg, J.A., Kessels, O.A.M. et al. Modified Chrispin-Norman chest radiography score for cystic fibrosis: observer agreement and correlation with lung function. Eur Radiol 21, 722–729 (2011). https://doi.org/10.1007/s00330-010-1972-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00330-010-1972-7