A Fast Method for the Segmentation of Synaptic Junctions and Mitochondria in Serial Electron Microscopic Images of the Brain
- First Online:
DOI: 10.1007/s12021-015-9288-z
- Cite this article as:
- Márquez Neila, P., Baumela, L., González-Soriano, J. et al. Neuroinform (2016) 14: 235. doi:10.1007/s12021-015-9288-z
Abstract
Recent electron microscopy (EM) imaging techniques permit the automatic acquisition of a large number of serial sections from brain samples. Manual segmentation of these images is tedious, time-consuming and requires a high degree of user expertise. Therefore, there is considerable interest in developing automatic segmentation methods. However, currently available methods are computationally demanding in terms of computer time and memory usage, and to work properly many of them require image stacks to be isotropic, that is, voxels must have the same size in the X, Y and Z axes. We present a method that works with anisotropic voxels and that is computationally efficient allowing the segmentation of large image stacks. Our approach involves anisotropy-aware regularization via conditional random field inference and surface smoothing techniques to improve the segmentation and visualization. We have focused on the segmentation of mitochondria and synaptic junctions in EM stacks from the cerebral cortex, and have compared the results to those obtained by other methods. Our method is faster than other methods with similar segmentation results. Our image regularization procedure introduces high-level knowledge about the structure of labels. We have also reduced memory requirements with the introduction of energy optimization in overlapping partitions, which permits the regularization of very large image stacks. Finally, the surface smoothing step improves the appearance of three-dimensional renderings of the segmented volumes.
Keywords
Three-dimensional electron microscopy Automatic image segmentation cerebral cortex Mitochondria SynapsesIntroduction
The availability of technologies such as combined Focused Ion Beam milling/Scanning Electron Microscopy (FIB/SEM) and Serial Block-Face Scanning Electron Microscopy (SBFSEM) for the study of biological tissues permits the automated acquisition of large numbers of serial sections from brain samples (see for example (Denk and Horstmann 2004; Knott et al. 2008; Merchan-Perez et al. 2009)). These three-dimensional samples contain invaluable structural information that must be extracted from the stack of serial images. Electron micrographs of nervous tissue typically show a large variety of structures, such as neuronal and glial processes with their corresponding cytoplasmic organelles (e.g., vesicles, tubules, filaments and mitochondria) and synapses. From a practical point of view, manual segmentation of these structures is a difficult and time-consuming task that requires a high degree of expertise. As a consequence, much effort has been devoted to the development of automated algorithms.
Brain images produced by electron microscopy (EM) are very complex and noisy with strong gray-level gradients that do not always correspond to region boundaries. Moreover, different neuronal structures may have similar local image appearance. Hence, it is extremely difficult to develop a fully automated segmentation algorithm. Although automated image processing techniques have addressed the problem of membrane detection and dendrite reconstruction (Turaga et al. 2010), standard computer vision algorithms used for the segmentation of textures (Haindl and Mikes 2008) or natural images (Martin et al. 2001) perform poorly, and standard techniques for the segmentation of biomedical images such as contour evolution (Jurrus et al. 2009) cannot handle the abundant image gradients.
Among the various structures visualized with EM, mitochondria and the synaptic junctions are of particular interest to neuroscience. Indeed, most information in the mammalian nervous system flows though chemical synapses. Thus, the quantification and measurement of synapses is a major goal in the study of brain synaptic organization in both health and disease (DeFelipe 2010). Mitochondria are organelles that produce most of the cell’s supply of adenosine triphosphate (ATP) which transports chemical energy within cells for metabolism. In addition to supplying cellular energy, mitochondria are involved in many other crucial cellular physiological tasks (e.g., McBride et al. 2006) and their alterations have been associated with a number of diseases such as Alzheimer’s disease (e.g., Santos et al. 2010). Therefore, substantial effort has been put into developing methods for accurate segmentation of synapses and mitochondria in the brain.
Although there are good practical synapse segmentation approaches relying on semi-automated tools (Morales et al. 2011), recent research has focused on machine learning approaches to diminish the degree of user interaction. (Becker et al. 2013) introduced a synaptic junction segmentation approach specifically designed for isotropic resolution image stacks, that is, stacks where voxel dimensions were identical in all X, Y and Z-axes. This method is based on a boosting algorithm that discovers local context cues related to the presence of the synaptic junction. The local context around potential synapse-like regions is also used in (Jagadeesh et al. 2013). However, the approach of Jagadeesh et al. relies on a computationally demanding set of image features that require up to 12 hours of computing time in a 32-node cluster. An alternative way of detecting synapses is by selectively staining them with ethanolic phosphotungstic acid (Navlakha et al. 2013), although this obscures other subcellular details and the tissue preservation is not appropriate for detailed ultrastructural analysis. Finally, (Kreshuk et al. 2011) used the Ilastik toolbox to segment synaptic junctions.
Several algorithms have been specifically designed to segment mitochondria in EM images. A texton-based approach comparing K-NN, SVM and AdaBoost classifiers was proposed (Narasimha et al. 2009). Lucchi and colleagues (Lucchi et al. 2012) later introduced an algorithm using as input 3D supervoxels and assuming almost isotropic image stacks. A different approach has been presented by Giuly and colleagues (Giuly et al. 2012). Their method performs the segmentation of mitochondria in anisotropic stacks of images. However, it is computationally very expensive and requires long processing times.
Consequently, our aim was to develop a method that does not require isotropic voxels and that is computationally efficient to allow the interactive segmentation of large image stacks that are now available. Moreover, our approach also involves image regularization and surface smoothing techniques to improve the segmentation.
Material & Methods
Thus, our stacks had anisotropy factors ranging from 5.41 to 1.36. To make our method comparable to others, we used an additional stack of SBFSEM images available online with an anisotropy factor ρ = 5.
Description of the Segmentation Algorithm
Our automatic segmentation algorithm has three steps: feature extraction, voxel-wise classification and regularization. An optional fourth step, smoothing, enhances the visual appearance of the segmentation when it is rendered in 3D.
Feature Extraction
Feature extraction is performed on all voxels in the stack. The features of a voxel are a vector of real numbers that concisely describe the relevant visual information in the vicinity of that voxel. A feature extractor is a function from the space of EM stacks to the space of feature stacks. We have developed two feature extractors, F2D and F3D, which aggregate visual information around each voxel at several scales, and are rotationally invariant and robust to the noise present in EM images. F3D is a feature extractor that takes into account three-dimensional neighborhoods around each voxel. It is adequate for isotropic stacks. F2D, on the other hand, extracts a feature vector for each pixel in an image of the stack considering visual information of a neighborhood of the pixel in that slice and ignoring the information in other slices. F2D is a feature extractor that is suitable for anisotropic stacks. In the paragraphs that follow, we first describe F2D and then introduce F3D as a generalization.
The complete feature vector for each pixel is the concatenation of several partial feature vectors. We apply this procedure at n different scales {σ_{0},…, σ_{n−1}}, producing a feature vector with 4n components for each pixel in I. The set of scales should match the size of the structures that have to be detected in the images. In practice, the user only sets the initial scale σ_{0}, which we call the base scale, and the rest of scales are given by \(\sigma _{i} = 2^{\frac {1}{2}i}\sigma _{0}\). For example, if we use n = 4 scales and set the smallest scale to σ_{0} = 4 pixels, our feature vectors will have 16 dimensions and they will range from 4 to 11.31 pixels in scale.
Classification
A classifier uses the feature vectors to determine the probability that a voxel belongs to each label. This classifier has to be trained with labeled data to learn the relationship between feature vectors and labels. Here we briefly present how the classifier is trained and how a trained classifier can be used with new unclassified data.
In the training step these parameters are estimated from training data. The user manually labels a few voxels of the stack. During training, the voxels labeled with label y are used to estimate μ_{y}, i.e., the mean of the feature vectors of voxels labeled as y, and Σ_{y}, i.e., the covariance matrix of these feature vectors.
When the dimension of the feature vectors is large, the training data often falls in a proper subspace of the complete k-dimensional feature space producing a singular or near singular covariance matrix Σ_{y}. We avoid this problem by first performing Principal Component Analysis (PCA)-based dimensionality reduction on the cloud of all feature vectors . The dimensionality after the PCA is established to retain 99 % of variance.
P(y) is the a priori probability of the label y. It is learned from the user-provided data in the training step. In short, the training step consists of estimating the parameters μ_{y} and Σ_{y} of the conditional distribution (after a PCA-based dimensionality reduction) and the prior P(y) from the user-provided data with the interactive tool.
In preliminary experiments, we tested other classifiers such as support vector machines. Although these methods improve the results obtained with the Gaussian classifier, their performance is only marginally better at the expense of much higher computational time (in the order of hours vs. seconds), which makes them unsuitable for operation in real time.
Regularization
If voxels are assumed to be independent of each other, it is possible to segment the stack by simply assigning to each voxel i the label y^{∗} with higher probability, i.e., y^{∗} = argmax_{y}P(y∣x). However, this offers far from optimal results, since the resulting segmentation is noisy, and it shows many sparse pixels, grainy regions and small holes (Fig. 3c, d).
Finding the best segmentation with the probability distribution of the CRF, that models some degree of dependency among neighboring voxels, requires maximizing the probability P(Y∣x; θ) for a fixed observation x. This is equivalent to minimizing the energy E(Y).
There are two kinds of potentials in this graph. The first kind of potential is associated with the terms ϕ_{i}(y_{i}, x). We will call them unary terms, since they only depend on a single variable y_{i} for a fixed observation in the energy function. The second kind of potential is related to the terms ϕ_{ij}(y_{i}, y_{j}). In an analogous way, we will call them pair-wise terms, since they depend on pairs of label variables.
The unary and pair-wise terms have to be defined in such a way that they provide lower values for good, probable inputs. The unary terms ϕ_{i}(y_{i}, x) are responsible for introducing the observed data into the energy value. It is customary to define the unary terms as the minus logarithm of the probability that our trained classifier provides: ϕ_{i}(y_{i}, x) = −logP(y_{i}∣f_{i}(x)). This definition is justified since, in the absence of the pair-wise terms, the CRF would lead to the segmentation given by the classifier acting on each voxel separately.
The role of the pair-wise terms ϕ_{ij}(y_{i}, y_{j}) is twofold. First, they regularize the segmentation results by penalizing the change of labels between neighboring voxels. This prevents the occurrence of isolated pixels and small holes that could appear (Fig. 3c, d). Second, they serve to introduce some extent of high-order knowledge about the structure of the stack. For example, we could impose the condition that synaptic junctions and mitochondria cannot touch each other, by setting a very large penalty to that label change in our experiments.
Once we have defined our energy, we need to find the segmentation that minimizes the energy function, Y^{∗} = argmin_{Y}E(Y). This is in general an NP-hard optimization problem. However, it is known that when the terms are up to order two, i.e., there are only pair-wise (order 2) and unary (order 1) terms, the number of labels is two and the pair-wise terms are submodular, i.e., ϕ(y_{i}, y_{i}) + ϕ(y_{j}, y_{j}) ≤ ϕ(y_{i}, y_{j}) + ϕ(y_{j}, y_{i}), then a max-flow/min-cut algorithm finds the global optimum of the energy in polynomial time.
The graph-cut techniques needed for regularization require a considerable amount of computer memory. For a reasonably sized stack, the required memory usage usually becomes too big. Therefore we need to regularize parts of the full stack separately and merge them together at the end.
A simple approach is to divide the stack into disjoint, i.e., non-overlapping substacks and regularize them separately. This method works well for the inner areas of each substack, but it looks jumpy in their boundaries, since the CRF does not have enough context to determine the correct labels. This is visually noticeable as abrupt terminations of mitochondria and synaptic junctions at these boundaries.
Determining the optimal size of the margin is a problem beyond the scope of this paper. However, we have found that a margin of 10 voxels in each direction offers very good results in practice.
Finally, the size of the substacks is limited by the available memory. As a rule of thumb, the regularization process takes 5 or 6 times the memory used by the original substack being regularized.
Segmentation Smoothing
Note that this smoothing only affects the estimated surfaces and, therefore, the rendering of these surfaces. The segmentation volume and the numerical results extracted from it are not affected by this procedure. Therefore, the quantitative comparisons offered in the Section “Results” are computed with no smoothing.
Results
As explained in the previous section, we conceived our algorithm to be used interactively. However, to evaluate its performance we have used two datasets that have been fully segmented manually. We need these manual segmentations as ground-truth data to validate our results and compare them to others. Moreover, given that we have enough training data, we use them to find optimum values for the base scale σ_{0} and the regularization term σ_{XY}.
- True positive rate (TPR):$$ \text{TPR} = \frac{\text{TP}}{\text{TP}+\text{FN}} $$(20)
- False positive rate (FPR):$$ \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}} $$(21)
- Accuracy (ACC):$$ \text{ACC} = \frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}} $$(22)
- Jaccard index (JAC):$$ \text{JAC} = \frac{\text{TP}}{\text{TP} + \text{FP} + \text{FN}} $$(23)
- Volume error (VOE):$$ \text{VOE} = \frac{\left|\text{FP} - \text{FN}\right|}{\text{TP} + \text{FN}} $$(24)
Unless otherwise stated, all running times were obtained on a Linux system with an Intel Xeon at 2.40GHz with no GPU processing. Our algorithm is mostly implemented in Python using the NumPy library. The inner parts of the CRF regularization are written in C++. Our implementation runs in a single thread.
Mitochondria Segmentation
We used a stack of serial EM images from the mouse cerebellum to test our method for the segmentation of mitochondria. This stack is available online in the Cell Centered Database^{1} with ID 8192. We have selected this stack to make our method comparable with Cytoseg, an automatic segmentation tool proposed by (Giuly et al. 2012). These researchers provide the raw images as well as a manual segmentation of mitochondria at the Cytoseg web page.^{2} The stack has a size of 700×700×50 voxels and a voxel size of 10×10×50 nm (anisotropy factor ρ = 5). We have applied our method to automatically detect the mitochondria in this stack.
From the results of the cross-validation process, we choose σ_{0} = 6 as it offers a good trade-off between the considered metrics. For the regularization parameter θ_{XY}, we select two different values: 10 and 20. The parameter \(\theta _{Z}=\frac {\theta _{XY}}{\rho }\) is set to 2 and 4, respectively.
Method | TPR | FPR | ACC | JAC | VOE |
---|---|---|---|---|---|
Ours, σ_{0} = 6; θ_{XY} = 10; θ_{Z} = 2 | 0.78 | 0.018 | 0.96 | 0.68 | 8.26 % |
Ours, σ_{0} = 6; θ_{XY} = 20; θ_{Z} = 4 | 0.81 | 0.024 | 0.96 | 0.67 | 2.51 % |
Cytoseg | 0.80 | 0.02 | 0.97 | N/A | N/A |
Ilastik | 0.77 | 0.02 | 0.96 | 0.66 | 5.45 % |
We also applied the software Ilastik ((Sommer et al. 2011), www.ilastik.org) to segment mitochondria in this dataset. The quantitative results obtained in Ilastik (see Table 1) are comparable to those of the other methods. However, Ilastik took 56.5 minutes for processing the full stack using 8 threads, resulting in a total of 452 minutes of CPU. This is about 500 times slower than our method.
Other methods for mitochondria segmentation are even less suitable for large anisotropies. Supervoxel segmentation with learned shape features (Lucchi et al. 2012) aims to learn non-local shapes of the target objects to segment. They use a combination of supervoxels, 3D ray features and structured prediction. 3D ray features are specially affected by anisotropy since both the edge detector and the length of the rays are highly dependent on the orientation. The achievable segmentation accuracy —i.e., the highest accuracy that can be achieved using supervoxels assuming perfect classification of each supervoxel— drops significantly with anisotropy. Moreover, the structured prediction requires training with a large portion of the stack fully labeled in order to infer the terms of the pairwise interaction. As a consequence of these factors, the method from (Lucchi et al. 2012) required more training data (half of the stack) to work properly, and provided rather unsatisfactory results with low Jaccard indices (<0.48). The running times were also higher (>21 minutes) due mainly to the cost of extraction of the ray features.
Mitochondria and Synaptic Junctions Segmentation
Quantitative results for the simultaneous segmentation of mitochondria and synaptic junctions
Method | TPR | FPR | ACC | JAC | VOE |
---|---|---|---|---|---|
Ours, mit-vs-rest, σ_{0} = 1, θ_{XZ} = 4, θ_{Z} = 2.94 | 0.84 | 0.02 | 0.97 | 0.60 | 15.72 % |
Ours, syn-vs-rest, σ_{0} = 1, θ_{XZ} = 4, θ_{Z} = 2.94 | 0.60 | 0.004 | 0.99 | 0.26 | 87.17 % |
Ours, mit-vs-rest, σ_{0} = 2, θ_{XZ} = 4, θ_{Z} = 2.94 | 0.78 | 0.014 | 0.97 | 0.63 | 0.01 % |
Ours, syn-vs-rest, σ_{0} = 2, θ_{XZ} = 4, θ_{Z} = 2.94 | 0.42 | 0.004 | 0.99 | 0.23 | 24.00 % |
Ilastik, mit-vs-rest | 0.68 | 0.009 | 0.97 | 0.61 | 21.58 % |
Ilastik, syn-vs-rest | 0.36 | 0.002 | 0.99 | 0.29 | 41.43 % |
Running Time Comparison
Absolute and normalized running times for different methods. Absolute times are given in seconds of CPU (s ⋅CPU), and normalized times (in parentheses) are given in seconds of CPU per megavoxel \(\left (\frac {\mathrm {s}\cdot \textrm {CPU}}{\text {Megavoxel}}\right )\)
There is an important difference in running times for our method in both datasets (2.15 vs. 15.9). This large difference is due to the regularization with >2 labels, where a single graph-cut is inviable and iterative, slower algorithms such as αβ-swap are required.
Counting Structures
Estimating the number of structures from the results of an automatic segmentation process is still an open problem with plenty of ongoing research. As an approximate, simple solution, it is commonly assumed that each connected component of the segmentation is one structure. This is the approach we use. Despite its simplicity, it has several drawbacks, namely, a group of structures close to each other often merge in a single connected component, and large structures are sometimes split into two or more connected components. Also, when spatial regularization is not present, false positive detections result in many small connected components that bias the counting estimations. To alleviate these problems, we discard the connected components smaller than a given threshold. Setting the threshold is not trivial, as it might greatly affect the counts depending on the quality of the segmentation. A good segmentation is expected to be more robust to different thresholds than a bad one, i.e., estimations should be close to the real value and should be stable for a large range of thresholds.
Average absolute error of estimations of the number of mitochondria and synaptic junctions over all thresholds in the range [10,2000] voxels
Method | CCDB-8192 | Mit&Syn, Mit | Mit&Syn, Syn |
---|---|---|---|
Ilastik (Sommer et al. 2011) | 12.12 | 68.44 | 166.17 |
Ours | 10.71 | 54.97 | 161.76 |
Discussion
Concerning the segmentation of mitochondria, Lucchi and colleagues (Lucchi et al. 2012) have recently used ray descriptors and the gray-level histogram as the key features to classify 3D image supervoxels. The result of this classification is further regularized using graph cuts to minimize an energy function involving learned potentials. They used stacks of FIB/SEM images from the hippocampus and striatum that had isotropic resolution. In their method, isotropy is an essential requirement for the computation of the 3D supervoxel over-segmentation. Alternatively, Giuly and colleagues (Giuly et al. 2012) segment mitochondria in anisotropic stacks of images obtained by SBFSEM. They use a random forest classifier to label 2D image patches. The result of this initial segmentation is further refined using 2D contour classification across images and 3D level-set surface evolution. Their method, however, is computationally intensive, requiring long processing times
Regarding synapses, the popular Ilastik toolbox (Sommer et al. 2011) used by (Kreshuk et al. 2011) to segment synaptic junctions uses a random forest classifier with a set of differential image features. They use a simple regularization strategy based on Gaussian smoothing. Overall, the resulting algorithm is also very demanding in terms of computing power.
Our method does not require isotropic voxels so it can be applied to image stacks that have been acquired with different resolution in the X, Y and Z axes. The results obtained with our method were similar or better than those obtained with the Cytoseg process (Giuly et al. 2012) for mitochondria only, and to those obtained with Ilastik for both mitochondria only and simultaneous mitochondria and synaptic junctions. Other approaches such as the one from (Lucchi et al. 2012) are not ready to work with anisotropic stacks and therefore our method outperforms them. Unlike Cytoseg, that focuses on mitochondria segmentation, our method is not tied to a specific type of cellular structure but can be used to segment a variety of structures. When compared to Ilastik we obtained better visual results thanks to the regularization and surface smoothing techniques described above.
Moreover, our method is much faster than any other approach we have tried. The speed up comes from the Gaussian classifier, that can be trained in O(Nk^{2} + k^{3}), being N the number of data points and k the dimension of the feature space. For comparison, the complexity of training random forests is O(MNkd), being M the number of trees and d the average depth of the trees. We found in our experiments that the classifier was the main bottleneck of the Ilastik approach. In our approach the most expensive computation was the regularization step, which Ilastik omits. On the other hand, we found no significant difference in speed for feature extraction, taking only a small fraction of the total processing time in all compared methods.
For the case of segmentation of 2 labels, a speed of 2.15 seconds per megavoxel in a single thread is fast enough to enable interactive segmentation of the large image stacks that are now available, providing real-time feedback to the user. Of course, parallelization of the proposed approach is straightforward, and it would make it even faster. To our knowledge, no other previous work provides state-of-the-art performance while running in an interactive setting.
Conclusions
We have presented an algorithm that can be trained to segment a variety of structures in anisotropic EM stacks. In this work we have focused on its capabilities for the segmentation of synaptic junctions and mitochondria. It features some important properties that are not available in other methods in the literature. It uses a graph cut-based image regularization procedure that not only provides better segmentations, but also introduces high level knowledge about the structure of labels. We have solved the limitation of graph cuts in terms of memory requirements with the introduction of energy optimization in overlapping partitions. This allows the regularization of very large stacks. The surface smoothing step introduces smoothness priors on the segmentation that improves the appearance of three-dimensional renderings of the segmented volumes. Finally, and most importantly, we have also shown that our approach is much faster than any other competing method with a state-of-the-art quantitative segmentation performance.
Information Sharing Statement
The automatic segmentation method described in this paper is available as a plugin of the imaging processing software Espina. The software and instructions for installing it can be found at http://cajalbbp.cesvima.upm.es/espina.
This software provides an efficient multi-thread implementation of the presented algorithm together with an intuitive user interface. After activating the Automatic Segmentation plugin, the user has to segment a few voxels of the target objects manually and receives almost real-time feedback of the results. Additional manual segmentations can be performed until the user is satisfied with the final results. Quantitative data regarding the segmented objects are then obtained with standard Espina tools.
Acknowledgements
This work was supported by funding from the Spanish Ministry of Economy and Competitiveness (grants TIN2013-47630-C2-2-R to L.B. and BFU2012-34963 to J.DF.), CIBERNED (CB06/05/0066 to J.DF.), the Cajal Blue Brain Project, Spanish partner of the Blue Brain Project initiative from EPFL (to J.DF. and L.B.) and the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 604102 (Human Brain Project) to J.DF.
Supplementary material
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.