Brain MR imaging is essential in the assessment of Chiari II malformation in clinical and research settings concerning spina bifida. However, the interpretation of morphological features of the malformation on MR images may not always be straightforward. In an attempt to select those features that unambiguously characterize the Chiari II malformation, we investigated the interobserver reliability of all its well-known MR features.
Brain MR images of 79 children [26 presumed to have Chiari II malformation, 36 presumed to have no cerebral abnormalities, and 17 children in whom some Chiari II malformation features might be present; mean age 10.6 (SD 3.2; range, 6-16) years] were blindly and independently reviewed by three observers. They rated 33 morphological features of the Chiari II malformation as present, absent, or indefinable in three planes (sagittal, axial, and coronal). The interobserver reliability was assessed using κ statistics.
Twenty-three of the features studied turned out to be unreliable, whereas the interobserver agreement was almost perfect (κ value > 0.8) for nine features (eight in the sagittal plane and one in the axial plane, but none in the coronal plane).
This study presents essential features of the Chiari II malformation on MR images by ruling out the unreliable features. Using these features may improve the assessment of Chiari II malformation in clinical and research settings.
Chiari II malformation is a complex developmental malformation of the central nervous system. It is characterized by a small posterior fossa and downward displacement of the cerebellum and brainstem through an enlarged foramen magnum (hindbrain herniation) . Chiari II malformation is almost uniquely associated with open spinal dysraphism . McLone and Knepper  hypothesized that leakage of cerebrospinal fluid through the spinal anomaly reduces the distension of the embryonic ventricular system. The decreased inductive pressure on the surrounding mesenchyme results in an abnormally small posterior fossa. Approximately one third of the patients with Chiari II malformation develop signs and symptoms of brainstem compression . The mortality in this symptomatic group is 15 to 35 % [5, 6].
Usually, Chiari II malformation is clinically diagnosed with the help of MR imaging. On MR images, the malformation is characterized by a constellation of morphological features (Table 1). Most of these features were originally derived from post-mortem examinations [7–10] and computed tomography studies [11–14]. With the introduction of MR imaging, most features were simply adopted to evaluate MR images [15–19]. However, the interpretation of features as seen on MR images may not always be straightforward. First, the malformation is heterogeneous in itself and in its relation with spinal dysraphism. Second, an abundance of features exist, which may obscure unambiguous assessment of Chiari II malformation. Third, the definitions of some features are equivocal and reviewers may interpret features differently. Although most features are typical for Chiari II malformation, knowledge about the reliability of rating these features on MR images is lacking.
Still, brain MR imaging plays a substantial role in clinical decision making regarding the management of children with spina bifida [18, 20]. On the one hand, the discussion on selective treatment of severely affected newborn infants is still ongoing . On the other hand, fetal imaging and prenatal surgery are becoming more important every day. Recently, a randomized control trial showed important improvement of hindbrain herniation following prenatal surgery for spina bifida . However, the assessment of Chiari II malformation may be even more complicated in prenatal MR imaging. A discrepancy of 41 % was seen in judgment of the degree of cerebellar herniation in prenatal MR imaging studies . When choices have to be made about pre- and postnatal treatment options, it is important to have consensus about the morphological features that unambiguously characterize Chiari II malformation. As a proper reference standard is not available, however, testing the validity of different features is unattainable. The next best method to appraise these features is to evaluate interobserver reliability.
Therefore, we initiated a study to investigate the interobserver reliability of morphological features of Chiari II malformation on MR images. The purpose of this study was to select those features among the abundance of features that are essential for the diagnosis of the malformation, hypothesizing that several features would be too unreliable to adequately characterize Chiari II malformation.
Material and methods
Brain MR images of 79 children [mean age 10.6 (SD 3.2; range, 6-16) years] were evaluated. Of these children, 26 had open spinal dysraphism, while 17 children had closed spinal dysraphism (13 with lipomyelomeningocele and four children with other types of closed spinal dysraphism). The children with open spinal dysraphism were presumed to have Chiari II malformation , while children with closed spinal dysraphism might have some features of hindbrain herniation according to the literature [24, 25]. The latter group was included to reduce context bias . The majority of these children with spinal dysraphism (n = 36) were recruited at the outpatient clinics of Pediatric Neurology of the Radboud University Nijmegen Medical Centre (RUNMC) as part of a prospective research program dedicated to outcome and prognosis of spina bifida. MR images of the remaining seven children were obtained retrospectively from the archives of the Department of Radiology of the RUNMC, from which we also obtained MR images of 36 children without spinal dysraphism, who were presumed to have no cerebral pathology. Although MR imaging in these 36 children was performed with suspicion of or to rule out cerebral pathology, the images had been assessed as normal by an independent radiologist in a clinical setting before the start of the study.
All MR images were acquired using a 1.5-T MR imaging unit (Siemens Avanto; Siemens Medical Solutions, Erlangen, Germany) with a standard head coil. MR imaging in the 36 children who were part of the prospective research program consisted of T1-weigthed images in the sagittal plane and T2-weigthed images in the axial and coronal plane. The retrospectively obtained MR images were acquired using comparable sequences. For different reasons, MR images were not acquired in three planes for all 79 children. Images in the sagittal plane were available for 69 children (41 with spinal dysraphism), images in the axial plane for 58 children (32 with spinal dysraphism), and images in the coronal plane for 51 children (37 with spinal dysraphism).
The Regional Committee on Research involving Human Subjects approved the study protocol. Prior to inclusion in the study, written informed consent was obtained from the parents of all 36 children and all children above 12 years of age taking part in the prospective research program.
All MR images were blinded for demographic and diagnostic information. The MR images were mixed and arranged by plane into three data sets: a sagittal set, an axial set, and a coronal set. These three data sets were reviewed consecutively and independently by three observers: a junior pediatric neurologist (N.G.) with 6 years of experience in reviewing pediatric brain MR images, a senior pediatric neurologist (R.A.M.), and a senior neuroradiologist (T.V.), both with more than 20 years of experience in reviewing pediatric brain MR images. A few weeks separated the reviews of the three datasets to prevent bias by recognition of images from a former set as much as possible. The images were available on compact disks and were reviewed on an Agfa workstation or on a personal computer using Agfa software (Impax Client, release 4.5).
The morphological features of Chiari II malformation to be assessed were selected from the literature and incorporated in a review protocol (Table 1). First, the feasibility of the protocol was evaluated in a pilot study (n = 10), resulting in a final set of study features with their definitions. The observers rated all features as being present, absent, or indefinable.
For each feature, the ‘present’, ‘absent’, and ‘indefinable’ ratings were tallied up per observer. First, the ‘indefinable’ ratings were evaluated to assess the applicability of each feature. If two or three observers rated a feature as indefinable in more than 5 % of the MR images, it was qualified as non-applicable and subsequently excluded from the further analyses.
Interobserver agreement analyses were performed for the applicable features using only the ‘present’ and ‘absent’ ratings. The percentages of agreement were obtained from contingency tables. Based on these tables, κ values for multiple observers were calculated to measure the extent of agreement among the three observers . To comprehend possible sources of disagreement, κ values were also calculated for pairs of observers. We considered a feature reliable when the κ value was above 0.8, which denotes almost perfect agreement . The analyses were performed using SAS software version 8.2 (SAS Institute).
For each feature, the percentages of ‘present’ and ‘indefinable’ ratings are summarized per observer in Table 2. All observers rated most features in the sagittal plane as present in 20–35 % of the MR images, whereas the percentages of ‘present’ ratings in the axial and coronal planes varied substantially among features and among observers. In general, observer C rated features as ‘present’ less often than the other two observers did, whereas observer B rated features as ‘indefinable’ more often than the other two observers did. In the sagittal plane, all but one feature (Stenogyria) turned out to be applicable. In contrast, in the axial and coronal plane more than half of the features turned out to be non-applicable (Table 2). One observer rated Enlarged massa intermedia in the axial plane as indefinable in all but one MR image. The ratings of features in children with open or closed spinal dysraphism or without spinal dysraphism are presented in Table 3. With a few exceptions, features were quite common in children with open spinal dysraphism and hardly seen in the other children.
The interobserver agreement of the applicable features is presented in Table 4. The right panel of the table shows the percentages of agreement and disagreement, while the left panel shows the κ values. The interobserver agreement among all three observers was almost perfect (κ value > 0.8) for the following features in the sagittal plane: Downward herniation cerebellum, Downward herniation tonsil, Downward displacement medulla, Downward displacement fourth ventricle, Medullary kinking, Abnormal width fourth ventricle, Hypoplastic tentorium, and Beaked tectum (Fig. 1). Only one feature in the axial plane (Small fourth ventricle) showed almost perfect agreement, while none of the features in the coronal plane did. The overall κ values for the remaining features ranged from 0.50 (Cerebellum wrapped around brainstem) to 0.75 (Downward displacement pons), except for a very low κ value for Enlarged massa intermedia (0.10). Table 4 also lists the κ values for pairs of observers. For seven features, the κ values differed substantially among pairs of observers: Downward herniation vermis, Upward herniation cerebellum, Downward displacement pons, and Abnormal course straight sinus in the sagittal plane; Cerebellum wrapped around brainstem in the axial plane; and Indentation and Gyral interdigitation in the coronal plane. In general, the agreement between observers A and B was stronger than the agreement of each of them with observer C.
On brain MR images, Chiari II malformation is generally assessed based on a constellation of morphological features. The current study reports on the reliability of these features leading to the identification of essential features that may improve consensus on the diagnosis of Chiari II malformation.
In this study, reliable features were distinguished from unreliable features, with reliable features predominantly being found in the sagittal plane. This in itself is not surprising, as most of the morphological abnormalities are best shown in the midsagittal plane, which is usually used to assess Chiari II malformation. Still, a substantial number of features in the sagittal plane (six out of 14) showed less than perfect or poor reliability and most features in the axial and coronal plane were non-applicable. These results support our assumption that the MR interpretation of Chiari II malformation is not always straightforward. The unreliability of features may be explained by their qualitative nature and the fact that the distinction between normal and abnormal brain development is not defined by an unambiguous cutoff point. Judgment of the features is further complicated by the morphological diversity of the malformation and the fact that MR images capture features to various degrees. These general explanations mainly apply to features with random disagreement, that is to say, when the overall κ value and all pairwise κ values are low (e.g., Upward herniation cerebellum, Flattened pons, and Gyral interdigitation; Table 4).
On the other hand, the results for pairwise agreement showed systematic disagreement for some features; i.e., stronger agreement between observers A and B than the agreement for each of them with observer C. Perhaps, reappraisal of some definitions may further improve reliability, for instance, for Cerebellum wrapped around brainstem and Indentation (Figs. 2 and 3).
The systematic disagreement for Downward herniation vermis is of special interest. Blurred cerebellar contours in a crowded posterior fossa and partial volume effects may hamper precise demarcation of the vermis and may make it difficult to distinguish the vermis from the tonsil and from medullary kinking (Fig. 1). This is in agreement with previous studies that reported that the vermis could not be clearly delineated in about 50 % of children with Chiari II malformation [15, 16]. On the other hand, systematic disagreement may have resulted from different concepts about the morphology of Chiari II malformation. Observer C, in contrast to the other two observers, considered Downward herniation vermis to be present more often than Downward herniation tonsil (Table 2). Yet, from post-mortem studies, it is known that herniation of the vermis without herniation of the tonsils does not occur . Therefore, we recommend to assess downward herniation of the cerebellum irrespective of this being herniation of the vermis or herniation of the tonsils.
One of the limitations of this study was the possibility of context bias, i.e., knowledge from other sources that exaggerated interobserver agreement . To deal with this phenomenon, we mixed the images expected to show Chiari II malformation with images expected to be without abnormalities and with images in which some features of hindbrain herniation could be present. However, observers may have tended to rate a feature according to the general appearance of the cerebellum, as complete blinding of each solitary feature was impossible. Another potential source of bias was the ratio between present and absent ratings as excess of one of the two affects the κ value . In the current study, the proportion of present ratings per feature generally ranged from 25 to 35 % (Table 2). Within this small range, κ values can be safely compared among features. Yet, a few features were rated as present in considerably lower proportions. As the κ value will underestimate agreement in case of low proportions , reliability of the features in question may be better than expected from the actual κ values. Furthermore, response bias may have decreased κ values [29, 30]. This is particularly relevant when a rating is ambiguous. Although the observers had the opportunity to rate ambiguous features as indefinable, response bias was not completely avoided, since observers A and B generally rated features more often as present than observer C. As this was clearly the case for Downward displacement pons and the κ value was just below the cutoff point of 0.8, underestimation of agreement may be relevant for this feature. Potential institutional bias may be another limitation of the study. All observers worked at the same academic hospital, which might have increased agreement. However, the observers differed in terms of experience and educational and professional background. These differences might have reduced the interobserver agreement. On the other hand, the participation of senior and junior specialists with different backgrounds implies that the results are particularly useful for radiologist and other specialists who might be less familiar with reviewing brain MR images.
Nevertheless, this study showed that among all features that are evaluated while diagnosing Chiari II malformation, only a subset seems to be reliable. Although the Chiari II malformation seems to be a clear entity, clinicians and researchers should be aware of the different interpretations of its features among observers. The use of reliable features may facilitate plain communication about Chiari II malformation in clinical and research settings. In the management of individual patients, decisions about treatment options should be based on clinical signs and symptoms in combination with reliable MR findings. Although Chiari II malformation is almost uniquely associated with open spinal dysraphism, there might be exceptions. In such cases, the reliable features presented might be useful. In discussions on prenatal surgery and postnatal selective treatment of spina bifida, this study provides clinicians and researchers with features that unambiguously describe the Chiari II malformation.
In addition to the qualitative method, a morphometric approach quantifying the morphological distortions may be helpful to overcome the problems of unreliable features. Morphometric measures are less subjective and may be less liable to interobserver variability. They may also provide cutoff points that distinguish between normal and abnormal brain development. The reliability and diagnostic performance of morphometric measures is subject of the second part of our study on MR assessment of Chiari II malformation.
In conclusion, the following morphological features can reliably be used to assess Chiari II malformation on MR images: downward herniation of the cerebellum, downward displacement of the medulla, pons, and fourth ventricle, medullary kinking, abnormally shaped fourth ventricle, hypoplastic tentorium, and beaked mesencephalic tectum. The use of these essential features may improve MR assessment of Chiari II malformation by providing a solid basis for consensus on the diagnosis.
Barkovich AJ (2005) Congenital malformations of the brain and skull. In: Barkovich AJ (ed) Pediatric neuroimaging, 4th edn. Lippincott Williams & Wilkins, Philadelphia, pp 374–384
Chiari H (1891) Ueber veränderungen des kleinhirns infolge von hydrocephalie des grosshirns. Deut Med Wochenschr 17:1172–1175
McLone DG, Knepper PA (1989) The cause of Chiari II malformation: a unified theory. Pediatr Neurosci 15:1–12
Stevenson KL (2004) Chiari type II malformation: past, present, and future. Neurosurg Focus 16:E5
McLone DG (1992) Continuing concepts in the management of spina bifida. Pediatr Neurosurg 18:254–256
Oakeshott P, Hunt GM (2003) Long-term outcome in open spina bifida. Br J Gen Pract 53:632–636
Peach B (1965) Arnold-Chiari malformation: anatomic features of 20 cases. Arch Neurol 12:613–621
Emery JL, MacKenzie N (1973) Medullo-cervical dislocation deformity (Chiari II deformity) related to neurospinal dysraphism (meningomyelocele). Brain 96:155–162
Variend S, Emery JL (1976) Cervical dislocation of the cerebellum in children with meningomyelocele. Teratology 13:281–289
Variend S, Emery JL (1979) The superior surface lesion of the cerebellum in children with myelomeningocele. Z Kinderchir 28:328–335
Naidich TP, Pudlowski RM, Naidich JB, Gornish M, Rodriguez FJ (1980) Computed tomographic signs of the Chiari II malformation. Part I: Skull and dural partitions. Radiology 134:65–71
Naidich TP, Pudlowski RM, Naidich JB (1980) Computed tomographic signs of Chiari II malformation. II: Midbrain and cerebellum. Radiology 134:391–398
Naidich TP, Pudlowski RM, Naidich JB (1980) Computed tomographic signs of the Chiari II malformation. III: Ventricles and cisterns. Radiology 134:657–663
Naidich TP, McLone DG, Fulling KH (1983) The Chiari II malformation: Part IV. The hindbrain deformity. Neuroradiology 25:179–197
Wolpert SM, Anderson M, Scott RM, Kwan ES, Runge VM (1987) Chiari II malformation: MR imaging evaluation. AJR Am J Roentgenol 149:1033–1042
El Gammal T, Mark EK, Brooks BS (1988) MR imaging of Chiari II malformation. AJR Am J Roentgenol 150:163–170
Just M, Schwarz M, Ludwig B, Ermert J, Thelen M (1990) Cerebral and spinal MR-findings in patients with postrepair myelomeningocele. Pediatr Radiol 20:262–266
Kawamura T, Morioka T, Nishio S, Mihara F, Fukui M (2001) Cerebral abnormalities in lumbosacral neural tube closure defect: MR imaging evaluation. Childs Nerv Syst 17:405–410
Miller E, Widjaja E, Blaser S, Dennis M, Raybaud C (2008) The old and the new: supratentorial MR findings in Chiari II malformation. Childs Nerv Syst 24:563–575
Mitchell LE, Adzick NS, Melchionne J, Pasquariello PS, Sutton LN, Whitehead AS (2004) Spina bifida. Lancet 364:1885–1895
Barry S (2010) Quality of life and myelomeningocele: an ethical and evidence-based analysis of the Groningen Protocol. Pediatr Neurosurg 46:409–414
Adzick NS, Thom EA, Spong CY et al (2011) A randomized trial of prenatal versus postnatal repair of myelomeningocele. N Engl J Med 364:993–1004
Mangels KJ, Tulipan N, Tsao LY, Alarcon J, Bruner JP (2000) Fetal MRI in the evaluation of intrauterine myelomeningocele. Pediatr Neurosurg 32:124–131
Tubbs RS, Bui CJ, Rice WC et al (2007) Critical analysis of the Chiari malformation type I found in children with lipomyelomeningocele. J Neurosurg 106:196–200
Milhorat TH, Bolognese PA, Nishikawa M et al (2009) Association of Chiari malformation type I and tethered cord syndrome: preliminary results of sectioning filum terminale. Surg Neurol 72:20–35
Egglin TK, Feinstein AR (1996) Context bias. A problem in diagnostic radiology. JAMA 276:1752–1755
Fleiss JL, Levin B, Paik MC (2003) The measurement of interrater agreement. In: Shewart WA, Wilks SS (eds) Statistical methods for rates and proportions, 3rd edn. Wiley, New York, pp 598–626
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Ker M (1991) Issues in the use of kappa. Invest Radiol 26:78–83
Hoehler FK (2000) Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. J Clin Epidemiol 53:499–503
Conflict of interest
The authors declare that they have no conflicts of interest.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Geerdink, N., van der Vliet, T., Rotteveel, J.J. et al. Essential features of Chiari II malformation in MR imaging: an interobserver reliability study—part 1. Childs Nerv Syst 28, 977–985 (2012). https://doi.org/10.1007/s00381-012-1761-5
- Chiari II malformation
- Spina bifida
- Brain MR imaging