Volume Estimation of the Thalamus Using Freesurfer and Stereology: Consistency between Methods
- First Online:
- Cite this article as:
- Keller, S.S., Gerdes, J.S., Mohammadi, S. et al. Neuroinform (2012) 10: 341. doi:10.1007/s12021-012-9147-0
- 2k Downloads
Freely available automated MR image analysis techniques are being increasingly used to investigate neuroanatomical abnormalities in patients with neurological disorders. It is important to assess the specificity and validity of automated measurements of structure volumes with respect to reliable manual methods that rely on human anatomical expertise. The thalamus is widely investigated in many neurological and neuropsychiatric disorders using MRI, but thalamic volumes are notoriously difficult to quantify given the poor between-tissue contrast at the thalamic gray-white matter interface. In the present study we investigated the reliability of automatically determined thalamic volume measurements obtained using FreeSurfer software with respect to a manual stereological technique on 3D T1-weighted MR images obtained from a 3 T MR system. Further to demonstrating impressive consistency between stereological and FreeSurfer volume estimates of the thalamus in healthy subjects and neurological patients, we demonstrate that the extent of agreeability between stereology and FreeSurfer is equal to the agreeability between two human anatomists estimating thalamic volume using stereological methods. Using patients with juvenile myoclonic epilepsy as a model for thalamic atrophy, we also show that both automated and manual methods provide very similar ratios of thalamic volume loss in patients. This work promotes the use of FreeSurfer for reliable estimation of global volume in healthy and diseased thalami.
KeywordsFreeSurfer Diffusion tensor imaging Fractional anisotropy MR image analysis Stereology Volume
The thalamus is of central interest in many disorders of the nervous system (Andreasen 1997; Dom, et al. 1976; Lee and Marsden 1994; Meador-Woodruff, et al. 2003; Speedie and Heilman 1983; Williams 1965; Xuereb, et al. 1991). The functioning of the thalamus is crucial to many sensory, motor and cognitive systems, and therefore has also been subject to a great deal of investigation in the cognitive neurosciences (Basso, et al. 2005; Engelborghs, et al. 1998; Herrero, et al. 2002). It is in these capacities that analysis of thalamic structure and function is a continually researched theme in neuroimaging investigations, particularly using magnetic resonance imaging (MRI). Analysis of volume or shape using MRI techniques may provide important information with respect to the involvement of the thalamus in neurological and neuropsychiatric disorders, including generalized (Du, et al. 2011; Pulsipher, et al. 2009) and partial (Gong, et al. 2008; Pulsipher, et al. 2007) epilepsy, schizophrenia (Adriano, et al. 2010), Huntington’s disease (Douaud, et al. 2006; Kassubek, et al. 2005), Parkinson’s disease (McKeown, et al. 2008), and Alzheimer’s disease (de Jong, et al. 2008). Reliable measurement of thalamic structure is, however, notoriously difficult to achieve, particularly given the typically poor between-tissue MR contrast of the thalamic nuclei and adjacent white matter (Amini, et al. 2004). It is therefore important to develop new and improve and validate existing methodologies that provide thalamic metrics. Like for other subcortical brain structures, there are several approaches freely available to estimate thalamic volume from MR images. At either end of the MR image analysis spectrum, there are manual and automated approaches; manual approaches are user-dependent, time consuming but are considered to be the gold standard of MR image analysis techniques (Bonilha, et al. 2004; Collins and Pruessner 2010; Crum, et al. 2001; Pruessner, et al. 2000). Automated approaches remove the need of an expert anatomist, are dependent on computer algorithms, and are time efficient, but require a great deal of validation against manual methods to determine the specificity and validity of measurements (Chupin, et al. 2009; Morra, et al. 2008). The primary goal of the present study was to evaluate the validity of thalamic volume measurements obtained from a frequently used automated approach with respect to a reputable manual approach widely used in the imaging, anatomical and histological sciences.
The fully automated approach investigated in the present study was the subcortical segmentation and volume estimation techniques (Fischl, et al. 2002) incorporated into FreeSurfer software (http://surfer.nmr.mgh.harvard.edu/), which provide observer-independent volumes for individual subcortical nuclei from conventional MR images. Similarly to other methods that automatically segment and estimate subcortical volume such as FIRST (Patenaude, et al. 2011) incorporated into FSL software (http://www.fmrib.ox.ac.uk/fsl/first/index.html), there has been a recent proliferation of studies using FreeSurfer methods for volumetric studies, some of which have included comparison with manual methods, most notably for the hippocampus (Cherbuin, et al. 2009; Dewey, et al. 2010; Morey, et al. 2009; Pardoe, et al. 2009; Shen, et al. 2010; Tae, et al. 2008). To our knowledge, there has been no independent comparison between manual methods and FreeSurfer methods for volume estimation of the thalamus. The manual approach used to evaluated FreeSurfer-based thalamic volumes in the present study was the Cavalieri method of design-based stereology in conjunction with point counting (Gundersen and Jensen 1987; Gundersen, et al. 1999; Mayhew 1992; Roberts, et al. 2000), which is a 100 % investigator interactive technique that requires manual determination of sampling density for a given brain structure (i.e. the stereological parameters necessary to produce a reliable volume estimate) and investigator decisions on whether or not sampling probes (i.e. points) intersect the brain region-of-interest (ROI). Stereology requires the use of a human anatomist with expert knowledge of anatomical boundaries that divide legitimate (i.e. thalamic) and illegitimate (i.e. non-thalamic) brain tissue. Manual approaches such as stereology are considered gold standard because it is assumed that human knowledge and perception is superior to computer algorithms that determine regional brain boundaries.
We examined the consistency between manual and automated thalamic volume estimation in two ways. Firstly, we compared thalamic volume estimates in a sample of neurologically and psychiatrically healthy subjects. Secondly, we compared the methods in their sensitivity in detecting thalamic atrophy in patients with juvenile myoclonic epilepsy (JME). JME is an electro-clinical syndrome that by definition is non-lesional and without abnormality on conventional magnetic resonance imaging (MRI) (Berg, et al. 2010; ILAE 1989), but has previously been shown to be associated with thalamic structural alterations (Deppe, et al. 2008; Kim, et al. 2007; Mory, et al. 2011; Pulsipher, et al. 2009), and is generally considered to be intimately associated with thalamic dysfunction (Holmes, et al. 2010). We therefore compared morphometric approaches for volume estimation of both healthy and diseased thalami.
We studied a neurologically and psychiatrically healthy control group that was composed of 62 adult volunteers (32 females, mean age 27.9 ± 4.3 SD, range 21–43), all of whom had normal neurological examination and normal MRI (T1-, T2-weighted, and FLAIR). We also studied ten patients (6 females, mean age 28.6 ± 8.8 SD, range 19–42) with JME. Clinical information for these patients can be found elsewhere (Deppe, et al. 2008; Keller, et al. 2011a). There was no statistical difference in age between patients and controls (t = 0.38, p = 0.70). All subjects gave written informed consent and the local ethics committee approved this study.
Magnetic Resonance Imaging
The Cavalieri method of design-based stereology in conjunction with point counting (Gundersen and Jensen 1987; Gundersen, et al. 1999; Mayhew 1992; Roberts, et al. 2000) was used as an unbiased estimator of the volume of the left and right thalamus in all subjects. By using the Cavalieri method, volume is directly estimated from equidistant and parallel MR images of the brain with a uniform random starting position. A second level of sampling is required to estimate the section area from each image by applying point counting within the ROI. The mathematical justification and implementation of the methodology is simple and it can be applied to structures of arbitrary shape (Garcia-Finana, et al. 2009). This technique has been frequently applied to reliably estimate brain volume and surface area on MR images (Acer, et al. 2010; Bas, et al. 2009; Cowell, et al. 2007; Eriksen, et al. 2010; Hallahan, et al. 2011; Howard, et al. 2003; Jelsing, et al. 2005; Keller, et al. 2009; Keller, et al. 2007; Keller, et al. 2002a; Keller, et al. 2009b; Keller, et al. 2002b; Lux, et al. 2008; Mackay, et al. 1998; Mackay, et al. 2000; Ronan, et al. 2006; Salmenpera, et al. 2005; Sheline, et al. 1996), and more widely applied to study other aspects of anatomy with and without the use of MRI. Stereology has been shown to be at least as precise as tracing and thresholding volumetry techniques and substantially more time efficient, with validation relative to post-mortem measurements (Garcia-Finana, et al. 2003; Garcia-Finana, et al. 2009; Keller and Roberts 2009; Keshavan, et al. 1995). Windows-compatible Easymeasure software (Keller, et al. 2007; Puddephat 1999) was used for point counting on MR images.
FreeSurfer software (http://surfer.nmr.mgh.harvard.edu/) was used to obtain thalamic volumes for all subjects using an observer-independent approach, which could be contrasted with the manual stereological measurements of the thalamus. Thalamic segmentations are based on the assignment of neuroanatomical labels to each voxel in an MR image based on the probabilistic information automatically estimated from a manually labelled training set. The methods of the automated volumetric approach have been described in detail previously (Fischl, et al. 2002), and the accuracy of automated labelling and volumetry of subcortical structures have been independently validated with respect to ‘gold standard’ manual volumetric techniques, predominantly for the hippocampus (Cherbuin, et al. 2009; Dewey, et al. 2010; Morey, et al. 2009; Pardoe, et al. 2009; Shen, et al. 2010; Tae, et al. 2008), and also of the amgydala (Dewey, et al. 2010; Morey, et al. 2009) and striatum (Dewey, et al. 2010). To our knowledge, there has been no independent comparison of the automated thalamic volumetry offered by FreeSurfer and a manual volumetric method (although thalamic tracings were compared with the performance of FreeSurfer in the original methods paper by Fischl et al. (2002)). Figure 1 shows the comparison of automated labelling of the thalamus (and extra-thalamic structures) in an individual control subject using FreeSurfer relative to stereological volume estimation of the thalamus in the same subject.
FreeSurfer analyses were performed on a Mac Pro (Version OS X 10.6.6, 32 GB, 2 × 2.93 GHz 6-Core Intel Xeon (HT)), which permitted the FreeSurfer ‘recon-all’ function (for cortical reconstruction and brain segmentation; http://surfer.nmr.mgh.harvard.edu/fswiki/recon-all) to complete 23 participants in less than 20 h. After the ‘recon-all’ function, the neuroanatomical labels were inspected for accuracy in all patients and controls. Despite that FreeSurfer permits manual editing to improve subcortical segmentation, no obvious errors in the automatic labelling were observed for any subject, and so all data obtained from FreeSurfer analyses were 100 % automated and not influenced by manual intervention.
Two-way mixed intra-class correlation coefficients for absolute agreement (Shrout and Fleiss 1979) were used to determine inter-rater agreement between two human raters using manual stereology and FreeSurfer for volume estimation of the thalamus in ten randomly selected controls using the statistics software SPSS (Version 18, www.spss.com). Intra-class correlations were subsequently performed between stereological volumes obtained by one human rater and FreeSurfer volumes for the entire sample of patients and controls (n = 72). Univariate ANOVAs were used to investigate patient-control differences in volumes, and corrected for multiple comparisons using Statistica version 9.1, (Stat Soft. Inc, www.statsoft.com).
Stereology vs FreeSurfer: Consistency between volumetric measures
Inter-rater intra-class coefficients for volumetric measures
Stereology vs Freesurfer: Identification of thalamic atrophy in JME
The volume of the thalamus is a notoriously difficult metric to estimate reliably given the low contrast between thalamic gray matter and adjacent white matter on T1-weighted MR images, which is a particular challenge for automated MR image analysis methods (Amini, et al. 2004). Only by comparing such automated methods with manual investigator-intensive methods can we establish the reliability of volume estimates. The present study provides important data indicating the specificity and validity of automated thalamic volume estimation using FreeSurfer software. In particular, further to demonstrating consistency between stereological and FreeSurfer volume estimates of the thalamus in healthy subjects and neurological patients, we demonstrate that the extent of agreeability between stereology and FreeSurfer is equal to the agreeability between two human anatomists estimating thalamic volume using stereological methods.
FreeSurfer software is now a frequently used tool for the estimation of subcortical structure volume. At the time of writing, a pubmed search using “Freesurfer” and “volume” yields 87 articles (October 2011). The vast majority of these articles are application studies, particularly in neurological disorders, and only a few have sought to evaluate the validity of volume measurements. Various levels of consistency between FreeSurfer and manual ROI methods have been reported for the hippocampus (Cherbuin, et al. 2009; Dewey, et al. 2010; Morey, et al. 2009; Pardoe, et al. 2009; Shen, et al. 2010; Tae, et al. 2008), amgydala (Dewey, et al. 2010; Morey, et al. 2009) and striatum (Dewey, et al. 2010). Dewey et al. (2010) performed a series of comparisons between the fully automated techniques of FreeSurfer and Individual Brain Atlases using Statistical Parametric Mapping (IBASPM) with auto-assisted manual tracings of the hippocampus, amygdala, putamen and caudate. The authors report that FreeSurfer segmentations exhibited significantly higher mean spatial overlap with auto-assisted tracings in all structures compared to IBASPM using dice coefficients. We were not in a position to perform spatial overlap analyses of the thalamus given that stereology and FreeSurfer are two inherently distinct MR image analysis approaches. However, this is one of the primary strengths of the data presented here, insomuch that a reliable volume estimate obtained using a gold-standard (non-voxel labelling) manual approach on MR images without automated spatial transformations (i.e. in native space) is comparable to a fully automated approach that requires spatial transformations in order to label an ROI and obtain a volume. Our interest was with respect to the reliability of the volume estimate of the thalamus.
To our knowledge, the present study is the first to independently provide data validating the application of FreeSurfer to obtain automated volumes of the left and right thalamus. Based on the congruence between the data obtained from FreeSurfer and manual stereology—the latter of which is considered to represent the ‘gold standard’ approach due to the requirement of an expert anatomist—we recommend the use of FreeSurfer software for accurate volumetric quantification of the thalamus using high-resolution T1-weighted MRI. The removal of an expert anatomist for volumetric analyses is cost effective and time efficient, particularly in large-scale volumetric studies. Importantly, we demonstrate that the automated technique is as sensitive in detecting pathological alterations of the thalamus relative to stereology, which promotes the use of FreeSurfer in neurological contexts.
There are two additional issues that should be highlighted. Measurements made in the present study were of global thalamic volume. The thalamus is composed of lamellae that segregate multiple nuclei with distinct connections and functions, which are likely to be differentially affected in various neurological and neuropsychiatric disorders. For example, in disorders where the thalamus is implicated in patients also exhibiting deficits in frontal lobe functioning—such as in JME (Pulsipher, et al. 2009)—it would be expected that anterior thalamic nuclei that project to the frontal lobe would be preferentially affected (Deppe, et al. 2008). In such circumstances it will be interesting to investigate structural alterations of differential thalamic subregions, which are measures that the techniques applied in the present study cannot provide. There are other techniques that may provide the basis for quantitative measurements of thalamic subregions based on DTI and quantitative T1 and T2 imaging (Behrens, et al. 2003; Johansen-Berg, et al. 2005; Traynor, et al. 2011). Secondly, the global estimates of thalamic volume using FreeSurfer in the present study was obtained on a Philips Intera T30 3 T MRI system, requiring no additional manual edits for (obvious) incorrect labelling of the thalamic ROI after the application of our in-house image inhomogeneity and resampling algorithm. Different MRI systems and head coils may have different image contrast characteristics that can potentially affect the performance of automated MR image analysis techniques. However, reproducibility of FreeSurfer estimated thalamic volume from serially acquired MR images on the same MR system is high (Jovicich, et al. 2009; Morey, et al. 2010), and MR system manufacturer has been shown to have little effect on volume estimates (Jovicich, et al. 2009).
In summary, this study provides convincing evidence for the reliability of global thalamic measurements using FreeSurfer in healthy and damaged thalami. The use of this software is cost effective and particularly advantageous in large-scale cross-sectional studies and longitudinal investigations in neurological settings.
Information Sharing Statement
FreeSurfer software is publicly and freely available from the FreeSurferWiki resource (http://surfer.nmr.mgh.harvard.edu/fswiki/FreeSurferWiki), which is developed and maintained at the Martinos Center for Biomedical Imaging (http://www.nmr.mgh.harvard.edu/martinos/noFlashHome.php). All software, information and support are provided online at the FreeSurferWiki webpage. Easymeasure software for volume estimation using stereology is freely available from the authors of this manuscript upon request. Dr. Mike Puddephat developed Easymeasure software at the University of Liverpool, UK. Further information can be found at (http://www.easymeasure.co.uk/).
This work was supported by the Transregional Collaborative Research Centre SFB/TR 3 (Project A8) of the Deutsche Forschungsgemeinschaft (DFG). EBR acknowledges support from the Neuromedical Foundation (Stiftung Neuromedizin), Münster. SM acknowledges support from the Wellcome Trust; Open access to the paper were supported by the Wellcome Trust grant number 091593/Z/10/Z.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.