Automatic determination of Greulich and Pyle bone age in healthy Dutch children
- 2.3k Downloads
Bone age (BA) assessment is a routine procedure in paediatric radiology, for which the Greulich and Pyle (GP) atlas is mostly used. There is rater variability, but the advent of automatic BA determination eliminates this.
To validate the BoneXpert method for automatic determination of skeletal maturity of healthy children against manual GP BA ratings.
Materials and methods
Two observers determined GP BA with knowledge of the chronological age (CA). A total of 226 boys with a BA of 3–17 years and 179 girls with a BA of 3–15 years were included in the study. BoneXpert’s estimate of GP BA was calibrated to agree on average with the manual ratings based on several studies, including the present study.
Seven subjects showed a deviation between manual and automatic BA in excess of 1.9 years. They were re-rated blindly by two raters. After correcting these seven ratings, the root mean square error between manual and automatic rating in the 405 subjects was 0.71 years (range 0.66–0.76 years, 95% CI). BoneXpert’s GP BA is on average 0.28 and 0.20 years behind the CA for boys and girls, respectively.
BoneXpert is a robust method for automatic determination of BA.
KeywordsBone age Skeleton Radiography Automated recognition
The assessment of skeletal maturity or bone age (BA) is a routine procedure in paediatric radiology, for which the Greulich and Pyle (GP) method is by far the most commonly employed technique. The second edition of the GP atlas contains high-quality reproductions of hand radiographs . Greulich and Pyle derived their atlas from Todd’s large study of well-off children from Ohio examined between 1931 and 1942 . For each chronological age (CA) they selected an image close to the median maturity to represent the standard for that CA. The children from Todd’s study came from upper middle class homes, i.e. they had what Todd would call a better-than-average constitution, and this might explain why the BAs in the GP atlas have been found to be advanced relative to almost all the normal populations studied since then. However, the tendency of modern children to mature faster implies that children are slowly catching up with the GP standard .
It is well known that different populations have a different tempo of maturation, so it is not to be expected that the average GP BA of children of different populations agrees with their CA [4, 5]. Instead, BA assessment should be regarded as a quantification of the aspects of bone morphology that are related to maturation. This measure is conveniently expressed in years by reference to the GP atlas, but this value must be viewed as an arbitrary scale of maturation .
In a clinical context dealing with patients from a particular region, the clinician should ideally establish a local BA reference for healthy children. If the population consists of several clearly distinguishable segments, e.g. different ethnicities, there should be one reference for each segment. As a minimum, one should determine the average BA deviation of the local population relative to the GP scale in a relevant age interval for boys and girls. Thus if boys are known to be on average 0.6 years behind the GP BA scale, an observed BA of 9 for a 10-year old boy means that his maturity is 0.4 years behind expectation.
We report here the performance of BoneXpert’s GP BA in the Erasmus study. In addition we report the average differences between GP BA and CA in this population.
Materials and methods
The Erasmus study was performed in 1997 in children from the Erasmus Gymnasium in Rotterdam by researchers at the Erasmus Medical Center Rotterdam (EMCR) . The younger subjects were children of employees (and their relations) at the EMC institutions. For the initial study, IRB approval was given to obtain radiographs of the left hand in all children, and subsequent use of these data was permitted by the IRB. For all children younger than 12 years of age informed consent was obtained from the parents or guardians; for children aged 12 years and older informed consent was obtained from the parents or guardians and from the child. This is in keeping with Dutch guidelines on clinical studies in children. A total of 255 boys (median age 12.4 years, range 3.8–20.1 years) and 276 girls (median age 12.6 years, range 3.8–20.0 years), all Caucasian, were included, yielding in total 531 healthy children.
Radiographs of the left hand were recorded on mammography film (Diagnost H; Philips, Eindhoven, The Netherlands) or GTU film (Imation, Oakdale, MN), and Alfa-II Trimax intensifying screens (3M, Maplewood, MN). Radiographic parameters were: small 0.6-mm focus, film–focus distance 1.5 m, 45 kVp, 16 mAs. The images were digitized to 300 dpi with 12 bits per pixel using a Vidar Diagnostic Pro Advantage scanner (Vidar, Hemdon, VA) using TWAIN v5.2.
The films were bone-age rated by two paediatric radiologists who each rated approximately half of the images. The radiologists had knowledge of the CA, which reflects the daily practice of most paediatric radiologists. Intraobserver coefficient of variation of duplicate assessment of skeletal age for investigator 1 was 2.4% and for investigator 2 was 1.5%. We found no significant systematic differences between the two observers regarding variability and levels of measurement, and the interobserver agreement was good .
The first layer reconstructs the borders of 15 bones – the five metacarpals, the phalanges of fingers 1, 3 and 5, and the radius and ulna, as shown in Fig. 1. The bone reconstruction algorithm is based on a so-called generative model of image analysis. This enables the method to determine to what extent the bone appears normal. Abnormal bones, as well as wrongly posed normal bones, are automatically rejected.
The second computational layer determines bone maturity values, called intrinsic bone ages, for 13 of these 15 bones based on the appearance of the bone. If a BA value deviates more than 2.4 years from the average of all the bones, it is deemed unacceptable. If fewer than eight bones are accepted the image is rejected and no BA values are reported.
The third layer transforms the computed intrinsic bone ages to agree on average with GP BA based on a training set of images with manual ratings. (The method can also determine the Tanner and Whitehouse BA, but this was outside the scope of this study.)
The first two layers, which are by far the most complex, were developed from the radiographs of Danish and Belgian children (age range 7–17 years), supplemented by radiographs from various sources to extend the age range to 2.5–19 years for boys and 2–18 years for girls; in total 1,678 images . BoneXpert’s accuracy is recognized to be poor for boys above 17 years and girls above 15 years, so the intended age range for the clinical use of BoneXpert v1.0 is GP BA 2.5–17 years for boys and 2–15 years for girls, and the performance tested in this work was therefore restricted to these age ranges.
The adjustment of BoneXpert v1.0 to GP BA (the third layer) was made by pooling three datasets in order to average over several manual raters – this study (Erasmus study), a study performed in Tübingen , and the GP atlas. Based on these data, a nonlinear transformation of the intrinsic BA into the BoneXpert v1.0 GP BA was constructed. The fact that the Erasmus data were used (together with other data) to develop layer C of BoneXpert v1.0 and also to validate the accuracy of this version in this study requires a careful explanation, because such a strategy could potentially weaken the study, and this is addressed in the appendix.
In the main analysis only the 226 boys and 179 girls with average BA of the manual and automatic methods younger than 17 years or 15 years, respectively, were used.
BoneXpert was marketed as a medical device in Europe in April 2008.
Quality of films
As described in the previous section, BoneXpert automatically rejects images with poor image quality or abnormal bone structure, but no images were rejected by BoneXpert in the Erasmus study. All radiographs were of good quality – the hands were correctly positioned, the images contained all hand bones and film noise was low.
Analysis of deviations
The agreement of the individual observations was quantified with the rms error, rather than with the standard deviation, because the latter hides an overall bias. The agreement was poorer for boys with an average BA above 17 years and for girls with an average BA above 15 years. These data were, in accordance with the intended use of BoneXpert, excluded from the results. The agreement between the BoneXpert and manual GP BA ratings for boys was 0.65 years and for girls was 0.76 years (rms errors).
Retardation relative to GP BA
Standard deviation between BA and CA
The SD between BA and CA for various BA rating methods (computed for CA <17 for boys and CA <15 for girls).
SD between BA and CA (years)
Original manual rating
Original manual rating after correcting seven cases
The purpose of the re-rating was to generate BA values close to the “true” values, so a discussion of this concept is relevant. There is no objective reference for BA rating. We define the true BA of a radiograph statistically as the average of the ratings by many qualified raters. Thus we regard the intra- and interrater variability as random effects that can be eliminated by taking the average of a large number of ratings. We consider this “aggregate reading” a better procedure than a consensus reading, which is applied in many clinical studies where it improves sturdiness of the dataset. Aggregate reading also underlies the design of BoneXpert where we used the ratings of five different raters to pinpoint the transformation in Layer C of BoneXpert v1.0.
It is well known that there is interobserver variation in BA rating – as well as in many other radiological procedures – and that the rating can be biased by various expectations . This is inherent in human nature and as such tends to be accepted in the community as unavoidable. However, with the advent of automatic BA determination, rater variability is eliminated. Analysis of the seven radiographs in which manual and automatic rating differed by more than 1.9 years was striking. In all cases the error was on the human side, underlining the robustness of BoneXpert.
We hypothesize that the origin of the large deviations between BoneXpert and the original manual rating was an interpretation bias due to knowledge of the CA. Figure 3 supports this hypothesis. Such an effect was also reported by Berst et al. , but our study displayed a more dramatic effect. This finding is a problem in a PACS-based environment, where in daily clinical routine it is virtually impossible to blind radiologists to the CA. The best remedy seems to be to inform the radiologists about the severity of this bias and to encourage them not to look at the CA while rating. The computerized method receives only the image and the gender as input so by design there is no bias from any other factors.
The susceptibility of raters to bias could be particularly large in BA rating because the result is a continuous value which can easily slide. We have been able to study only the bias from knowing the CA, but there could be other biases. For instance, in a clinical trial where excessive advancement of BA is an undesired effect, the rater could be biased to underestimate the BA. There could also be a bias from looking at the radiograph taken 1 year earlier; if that radiograph was overrated by 1 year there would be a tendency for the new examination also to be overrated. There could be bias from knowledge of the sexual development or height of the child. These biases are undesired because BA rating should be a procedure defined strictly as an isolated interpretation of the hand radiograph without knowing anything other than the sex.
BoneXpert has been calibrated by reference to five different human raters, and therefore embodies a well-supported standardization of GP BA rating. The agreement with the two raters of the Erasmus study in Fig. 4 attests to the extent to which these raters were consistent with the new BoneXpert standard. In general there was good agreement. The bias for boys at age 6–9 years (where BoneXpert overshot the manual rating) is counterbalanced by an opposite bias in the other studies used for the calibration; for instance, BoneXpert underestimates the nominal BA of the GP atlas at these ages. These biases are, therefore, interpreted as rater idiosyncrasies, which the calibration method diluted through the use of many raters. These bias effects are considerably smaller than the observed rms errors, so it is concluded that the participating GP raters and the GP atlas were fairly consistent with each other, and this consistent rating is reflected in BoneXpert’s standardized rating.
The fact that BoneXpert has been designed to agree on average with manual GP ratings is of great practical importance because this allows clinicians to adopt the new method while still being able to relate the results to previous manual ratings. BoneXpert’s standardized ratings are particularly useful in multicentre studies where geographic location does not have to be a limitation per se in the study set-up, and could serve as a replacement for “central reading”. Currently BoneXpert is a Windows-based application, serving as a PACS node to which PACS stations can send DICOM files, which can then be analysed in the Windows program. Full integration into a PACS environment is currently work in progress.
The validation of the BoneXpert method in the Erasmus study showed that BoneXpert was able to analyse all images, and blind re-rating of the seven subjects with the largest deviations from the manual rating showed that in these subjects the manual rating was wrong. These subjects had a very retarded or advanced BA, revealing the radiologist’s bias as they were aware of the CA. After correcting the rating of the seven outliers, and omitting boys older than 17 years and girls older than 15 years, the rms deviation between manual and automatic BA was 0.65 years and 0.76 years for boys and girls, respectively. The deviation for both sexes combined was 0.71 years (range 0.66–0.76 years, 95% CI).
Studies in healthy children are rare in recent times because they are difficult to justify ethically. The Erasmus study is therefore of unique value for establishing a reference for GP BA for modern Western European children. The study showed that boys and girls in this population are expected, on average, to have a BoneXpert GP BA 0.28 years and 0.20 years below the CA, respectively.
The study has the limitation that the Erasmus data were used both to adjust the overall BA scale and to validate a range of other aspects of the same system. As discussed in the appendix, this does not, in our opinion, significantly reduce the strength of the study. However, it does make the presentation of the study more complicated. A more serious limitation is that this study included only healthy Caucasian children recorded on high-quality radiographs. It is, therefore, appropriate to mention that the study is complemented by the Tübingen study , where the children had various endocrine disorders and where the image quality was more typical.
Novo Nordisk is acknowledged for providing access to the film scanner.
The BoneXpert technology is proprietary to Visiana, a company owned by H. H. Thodberg. BoneXpert has been marketed as a medical device. A patent application on BoneXpert has been filed.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- 1.Greulich WW, Pyle SI (1959) Radiographic atlas of skeletal development of hand and wrist, 2nd edn. Stanford University Press, StanfordGoogle Scholar
- 2.Todd TW (1937) Atlas of skeletal maturity. Part I. Hand. Kimpton, LondonGoogle Scholar
- 3.Tanner JM (1981) A history of the study of human growth. Cambridge University Press, CambridgeGoogle Scholar
- 6.Tanner JM, Whitehouse RH, Cameron N et al (1975) Assessment of skeletal maturity and prediction of adult height (TW2 method), 2nd edn. Academic Press, LondonGoogle Scholar
- 9.Thodberg HH, Kreiborg S, Juul A et al (2008) The BoneXpert method for automated determination of skeletal maturity. IEEE Trans Med Imaging (in press). doi: 10.1109/TMI.2008.926067
- 10.Martin D, Deusch D, Schweizer R et al (2007) Validation of automatic Greulich and Pyle bone age on GHD, UTS, SGA and Silver-Russell syndrome children. Horm Res 68 [Suppl 1]:69Google Scholar