1 Introduction

Studying the structure and function of the brain requires dedicated imaging techniques, allowing to map the highly complex nerve fiber architecture both with high resolution and over long distances. The neuroimaging technique Three-dimensional Polarized Light Imaging (3D-PLI) [1, 2] was designed to reconstruct the three-dimensional orientations of nerve fibers in whole brain sections with micrometer resolution.

To remove noise and artifacts in 3D-PLI images, Independent Component Analysis (ICA) has successfully been used [10,11,12]. However, the ICA has only been applied to mesoscopic images with a resolution of 64 \(\upmu \)m pixel size and not to microscopic images with a resolution of 1.33 \(\upmu \)m pixel size so far. In order to resolve single nerve fibers, e.g. in the cerebral cortex, such a microscopic resolution is required. Light scattering, thermal effects, inhomogeneity of optical elements, or simply dust on the used filters are noise sources, which combined with the weak birefringence 3D-PLI signal in cortical areas inevitably lead to a low signal-to-noise ratio (SNR) and reconstruction errors.

Identifying and removing these noise components in microscopic 3D-PLI images is very challenging. The amount of data that has to be processed is extremely large and the sampling has to be done differently as compared to mesoscopic images. When applying the developed ICA method on microscopic images, the characteristic differences of the signal strengths in white matter and gray matter brain regions need to be taken into account. As the birefringence 3D-PLI signal of densly packed nerve fibers (i.e., fiber bundles) of the white matter proceeding within the sectioning plane are very strong and show a higher SNR than the less dense fiber tracts present in the gray matter, the denoising procedure needs only to be applied in regions of gray matter, which massively reduces the required computing time.

Here, we present an automatic ICA denoising procedure for gray matter areas in microscopic 3D-PLI images. It consists of an automatic segmentation of gray matter, followed by a data-parallel ICA artifact removal with automatic classification of noise and signal activations.

2 Methods

2.1 Preparation of Brain Sections

Brain sections from a Wistar rat (3 months old, male) and a vervet monkey (2.4 years old, male) were selected for evaluation.Footnote 1 The brains were removed from the skull within 24 h after death, fixed in a buffered solution of 4 % formaldehyde for several weeks, cryo-protected with 2% DMSO and a solution of 20% glycerin, deeply frozen, and cut along the coronal plane into sections of 60 \(\upmu \)m with a cryostat microtome (Polycut CM 3500, Leica, Microsystems, Germany). The resulting brain sections were mounted onto a glass slide each, embedded in a solution of 20% glycerin, cover-slipped, sealed with lacquer, and measured with 3D-PLI up to one day afterwards.

2.2 Three-Dimensional Polarized Light Imaging (3D-PLI)

3D-PLI reconstructs the nerve fiber architecture of the brain with micrometer resolution. By transmitting linearly polarized light through unstained histological brain sections and analyzing the transmitted light with a circular analyzer, the birefringence of the brain section is measured, thus providing information about the three-dimensional orientations of the highly birefringent nerve fibers (myelinated axons) in the tissue [1, 2]. The 3D-PLI measurements were performed with the same setup as described in [23] (LMP-1, Taorad GmbH, Germany), using incoherent green light with a wavelength of about 550 nm. During the measurement, the direction of polarization of the incoming light was rotated by \(\rho = \{0^{\circ },10^{\circ }, \dots , 170^{\circ }\}\) and the transmitted light behind the circular analyzer was recorded by a CCD camera (Qimaging Retiga 4000R) for each rotation angle, yielding a series of \(N=18\) images. The pixel size in object space was about 1.33 \(\upmu \)m. For each image pixel, the measured intensity values form a sinusoidal light intensity profile (PLI-signal, \(I(\rho )\)). The average value of the signal, i. e. the polarization-independent transmitted light intensity, is called transmittance and is a measure for tissue absorption and scattering (highly scattering tissue components such as nerve fibers appear dark in the transmittance image). The amplitude of the normalized signal is called retardation and indicates the strength of birefringence of the tissue. It is related to the out-of-plane angles of the nerve fibers in the brain section (in-plane nerve fibers show very high birefringence, while out-of-plane nerve fibers show much less [22]). The phase of the signal indicates the in-plane direction angle of the nerve fibers. Combining in-plane and out-of-plane angles, 3D-PLI allows to reconstruct the full three-dimensional orientations of the nerve fibers.

2.3 Segmentation of White and Gray Matter

Morphologically, brain tissue consists of two different tissue types: Gray matter and white matter. Gray matter contains various components, such as neuronal cell bodies, dendrites, synapses, glial cells, blood capillaries as well as myelinated and unmyelinated axons. Most of the gray matter regions are located at the outer surface of the brain (cortex), but also inner parts of the brain (i.e., sub-cortical nuclei) contain islands of gray matter. White matter is mainly composed of myelinated and unmyelinated axons. The largest portion of myelinated axons is located in the white matter.

For the ICA method presented here, it is necessary to consider gray matter regions separately from white matter regions.Footnote 2 In the following, we present a fully automated procedure to generate masks of white and gray matter.

As nerve fibers (myelinated axons) are highly birefringent, all regions with high birefringence signals (i. e. large retardation values \(r > r_{\text {thres}}\) in the 3D-PLI measurement, cf. Fig. 1(a) in orange) can be considered as white matter. (The determination of threshold values such as \(r_{\text {thres}}\) will be described below.) On the other hand, regions with low birefringence signals (\(r < r_{\text {thres}}\), Fig. 1(a) in blue) do not necessarily belong to gray matter, because regions with a small number of myelinated fibers, crossing nerve fibers, and nerve fibers that point out of the brain section (out-of-plane fibers) also yield low birefringence signals [22].

Studies by Menzel et al. [20] have shown that regions with crossing nerve fibers and regions with in-plane parallel nerve fibers yield similar transmittance values \(I_{\text {T}}\), while regions with out-of-plane nerve fibers show lower transmittance values. Gray matter regions, on the other hand, show notably higher transmittance values. Hence, we can use the transmittance value in the region with maximum retardation \(I_{\text {rmax}}\) as a reference value (we expect that this region contains mostly in-plane parallel nerve fibers) and can then define all regions with similar or lower transmittance values as white matter (\(0 < I_{\text {T}} \lesssim I_{\text {rmax}}\), containing crossing and out-of-plane fibers). All other brain regions are considered to be gray matter. To separate brain tissue from background, we make use of the fact that the transmittance in the recorded images is expected to be much higher outside of the tissue than within the tissue (\(I_{\text {T}} > I_{\text {lower}}\), cf. Fig. 1(b) in gray).

To enable an automated segmentation into white and gray matter regions, we consider the retardation and transmittance histograms (consisting of 128 bins, see Fig. 1 on top). Before computing the transmittance histogram, the values are normalized to [0, 1] and a median filter with circular kernel (radius of 10 pixels) is applied to the image to reduce noise. While the retardation histogram shows usually only a single peak at very low retardation values (caused by background and gray matter), the transmittance histogram shows one peak for low transmittance values (white matter), another peak for larger transmittance values (gray matter), and a third peak for high transmittance values (background).

To compute the threshold value \(r_{\text {thres}}\) (\(I_{\text {upper}}\)), we determine the point of maximum curvature behind (before) the biggest peak in the retardation (transmittance) histogram, i. e. the position for which the angle difference between two neighboring data points becomes minimal. To ensure that the point of maximum curvature belongs to the onset of the biggest peak (and not to some other peak or outlier), we take the full width at half maximum (FWHM) of the peak into account and only search within \(10\times \) FWHM behind the retardation peak and \(2.5\times \) FWHM before the transmittance peak, taking the different forms of the histograms into account.

To compute \(I_{\text {rmax}}\), the region with the maximum retardation value is determined. To ensure that the region belongs to a white matter region and is not an outlier caused by noise, we use the Connected Components algorithm from the OpenCV library [4] (block-based binary algorithm using binary decision trees [6]) with eightfold connectivity. We mark all pixels with maximum retardation value and count the number of pixels in the largest connected region. If the number is at least 512, we select this region as reference. If the number is lower, we reduce the maximum retardation value iteratively by 0.01, until we find such a region. In this reference region, we compute the average value in the normalized transmittance image (\(I_{\text {rmax}}\), see red vertical line in Fig. 1(b)). This value can be used as first estimate to separate white from gray matter. To define the border more precisely, we use the point of maximum curvature between \(I_{\text {rmax}}\) and \(I_{\text {upper}}\) as new threshold value \(I_{\text {lower}}\).

Taking all this into account, we can compute the masks for white and gray matter as follows:

$$\begin{aligned} \text {White Matter:} \ \&(0< I_{\text {T}} < I_{\text {lower}}) \vee (r > r_{\text {thres}}), \end{aligned}$$
(1)
$$\begin{aligned} \text {Gray Matter:} \ \&(I_{\text {lower}} \le I_{\text {T}} \le I_{\text {upper}}) \wedge (r \le r_{\text {thres}}). \end{aligned}$$
(2)

All image pixels that fulfill Eq. 1 (Eq. 2) are considered as white (gray) matter, see Fig. 1(b) in white. All other image pixels are considered as background.

Fig. 1.
figure 1

Mask generation for white and gray matter, shown exemplary for a coronal vervet monkey brain section. (a,b) Top: Histograms of retardation and transmittance images obtained from a 3D-PLI measurement (128 bins in [0,1]); the determined threshold values are marked by vertical dashed lines. Bottom: Image pixels with values belonging to the orange, blue, and gray shaded regions in the histograms are marked in the respective colors. (c) Masks for white and gray matter, computed using the threshold values defined in the histograms on top. (Color figure online)

2.4 Independent Component Analysis (ICA)

The ICA belongs to the group of Blind Source Separation (BSS) techniques and can be used for data decomposition to find statistically independent components in a mixture of signals [17]. ICA has been applied to various artifact removal tasks, e. g. ocular artifact removal in electroencephalography [13], cardiac artifact removal in magnetoencephalography [5], and noise-signal-discrimination in functional magnetic resonance imaging [21].

In 3D-PLI, a data set consists of a series of N images (one for each rotation angle). To avoid that the background interferes with the decomposition, each image is divided into background and our region of interest (ROI) with the latter containing M pixels. The measurements are flattened and centered to obtain a zero-mean data array X with the dimension \(N \times M\). The decomposition into sources S with the shape \(N \times M\) requires that the data can be represented as a linear mixture of independent signals without additional additive noise, that there exist sufficient samples for every extracted feature (general advise is to keep \(k\cdot N^2 \ge M\) with \(k \in \left\{ 1, 2, 3,\dots \right\} \)), and that the distribution of the sources is non-Gaussian. With these prerequisites, the problem can be stated as

$$\begin{aligned} X = AS, \end{aligned}$$
(3)

where A is the so-called mixing matrix with the dimension \(N \times N\) and is yet unknown. Because S and A are both unknown, it is impossible to make a prediction about sign or amplitude of the basis vectors of A. Furthermore, we have no knowledge about the number of components in our data set, so we assume that the complexity of the data can be mapped by N features.

Prior to performing the ICA, the data array X is whitened by making use of a Principle Component Analysis (PCA) [24] to lower the degrees of freedom to \(N(N-1)/2\) [17]. The ICA then estimates \(W \approx A^{-1}\) by maximizing the entropy as in Infomax-based ICA [3] or by maximizing a measure of non-Gaussianity as in FastICA [15, 16]. We then obtain

$$\begin{aligned} WX = C \approx S, \end{aligned}$$
(4)

with the component vector C. We find the activation profiles of the Components in \(W^{-1}\) as basis vectors. It was shown that 3D-PLI signals contain sub- as well as super-Gaussian independent components, therefore FastICA or Extended Infomax [19], an extension of the Infomax algorithm, can be used. This work uses the Extended-Infomax Implementation of the MNE-Toolbox [14].

2.5 Automatic Noise Removal with ICA

The activation profiles given by the ICA can be distinguished into two categories: noise activation profile and signal activation profile. Because we know the PLI-signal shape from theory, we know the shape of the basis vectors we are looking for in our mixing matrix \(W^{-1}\approx A\). A simple classification problem is visualized in Fig. 2. The sinusoidal shaped activations are the ones to keep and we want to drop the activations that resemble random distributions.

The automatic identification is realized by fitting the expected (theoretical) function to each of the N activations. As identification measure, the mean squared error (MSE) is calculated for every fit and compared to the mean of all MSE values. When the MSE of the i-th fit is smaller than \(\nicefrac {1}{10}\) of the mean of all MSE values, we assume that the activation belongs to a signal component. Otherwise, we assume that the activation belongs to a noise component.

Fig. 2.
figure 2

Archetypal signal discrimination for ICA artifact removal: (a) sinusoidal PLI-signal activation, (b) random PLI-noise activation from e. g. thermal noise or light scattering.

After the detection of all noise activations, we construct a denoised mixing matrix \(W_{d}^{-1} \approx A_{d}\) by just zeroing out the respective column:

(5)

The denoised data array \(X_d\) can then be obtained by remixing the previously unmixed components:

$$\begin{aligned} W_{d}^{-1} C = W_{d}^{-1} W X = X_d. \end{aligned}$$
(6)

Estimation of Signal Enhancement. The noise reduction of the artifact removal is measured with a weighted chi-square statistic introduced by [10]. It is based on the reduced chi-squared statistic defined by

$$\begin{aligned} \chi ^{2} = \frac{1}{\nu } \sum _{i=1}^{N} \frac{(I(\rho _{i}) - f(\rho _i))^{2}}{\sigma (x,y,\rho )^{2}}, \end{aligned}$$
(7)

with \(\nu \) for the degrees of freedom, N the number samples (here measurement angles), \(I(\rho _i)\) the measured light intensity for an angle, \(f(\rho _i)\) the expected function, and \(\sigma (x,y,\rho )\) the standard error for every image point and angle. This statistic is applied to both noisy and denoised data leading to \(\chi ^{2}_{\text {raw}}\) and \(\chi ^{2}_{\text {ICA}}\). We denote the quotient of these two measures by relative Goodness of Fit (rGoF). In case one component is missing and the denoised signal is inherently different, an additional weighting factor \(\omega \) defined by

$$\begin{aligned} \omega = \frac{1}{\nu } \sum _{i=1}^{N} \frac{(f(\rho _{i}) - f^{*}(\rho _i))^{2}}{\sigma (x,y,\rho )^{2}} \end{aligned}$$
(8)

is included in the denominator, which penalizes large deviations between the expected function of the noisy signal \(f(\rho _i)\) and the expected function of the denoised signal \(f^{*}(\rho _i)\). The so obtained measure is denoted by weighted relative Goodness of Fit (wrGoF):

$$\begin{aligned} \text {wrGoF} = \frac{\text {rGoF}}{\omega } = \frac{\chi ^{2}_{\text {raw}}}{\chi ^{2}_{\text {ICA}} \cdot \omega }, \end{aligned}$$
(9)

where wrGoF \(> 1\) is associated with a signal improvement while wrGoF \(< 1\) is associated with a signal degradation.

Parallelization Concept. The parallelization is implemented by distributing the workload of the ICA problem equally in N parts to N workers via the Message Passing Interface (MPI) with mpi4py [7,8,9]. Every n-th element is sent to the n-th worker. After converging the result of all workers is collected and fused to the end result.

3 Results

The denoising procedure was applied to 22 brain sections (14 coronal rat brain sections and 9 coronal vervet brain sections). Every section was masked with a gray matter mask (as described in Subsect. 2.3) to remove background and white matter areas. In all cases, three components of interest were found, each with a sinusoidal activation function. The signal activations were kept and the noise activations were automatically removed. The resulting components and activations are shown exemplary for rat brain section no. 100 in Fig. 3.

The amount of wrGoF values are in all sections greater than one (\({>}99.9\%\)) and mostly greater than ten (\({>}99\%\)). A spatial distribution of the values is shown in Fig. 4 for the rat brain section (right) and a vervet brain section (left). In Fig. 5, three selected intensity profiles are shown for the rat brain section. Each individual profile shows an improvement and the denoised profile describes the measurement in a smooth way and is not influenced by outliers.

Fig. 3.
figure 3

Independent Components \(C_i\) for rat brain section no. 100 with their associated activations \(A_i\). The noise components are in the first and second row, the signal components are in the last row.

Fig. 4.
figure 4

Signal enhancement in a brain section visualized via the wrGoF measure. Left: wrGoF-map of vervet brain section no. 627. Right: wrGoF-map of rat brain section no. 100.

Fig. 5.
figure 5

Signal enhancement by the ICA denoising procedure shown for rat brain section no. 100: The image on the upper left shows the transmittance of the brain section for gray matter (background and white matter are displayed in black). The graphs show the intensity profiles of three selected areas (colored dots in transmittance image) before and after the denoising procedure.

Fig. 6.
figure 6

Scaling behavior of the ICA denoising procedure. Left: Intra-node scaling behavior for 1–24 Cores. Right: Global scaling behavior.

The alternating parallelization approach achieves a linear speedup of up to 12 cores (i. e. half of the cores on a node on the JURECA supercomputer [18]). The usage of all cores on the node only improves the speedup by two additional units as seen in Fig. 6. Overall, a weak scaling can be observed. While four nodes give a speedup of factor 2 with respect to one node, six nodes only offer a speedup of 2.5. For a complete vervet brain section (sample size \(\sim 10^8\) pixels), the run time of the denoising routine for a single worker is about five hours. Using a whole node lowers this to half an hour, and using four nodes lowers the run time to 15 min.

This scaling behavior is the same for the rat and the vervet brain sections. Furthermore, the number of workers and therefore the number of partial ICA problems do not interfere with the quality of the denoising process. The percentage of wrGoF values greater than one (\({>}99.9\%\)) or greater than ten (\({>}99\%\)) are not influenced by the amount of parallelism.

4 Discussion

In this work, an automatic denoising procedure for 3D-PLI data based on Independent Component Analysis (ICA) for high-resolution PM data (with 1.33 \(\upmu \)m pixel size) was presented. Previous works studied the denoising of low-resolution LAP data (with 64 \(\upmu \)m pixel size) [10,11,12], but the application on PM data was limited due to computational and memory constraints and was not fully automatic. Furthermore, the existing solutions were not suitable for a high-throughput workflow because masks for tissue had to be manually created or adjusted.

To overcome these limitations, three key steps had to been taken: The first step was to develop an automatic segmentation of brain tissue into white and gray matter, so that the ICA can work targeted on the noisy gray matter. The second step was an automatic detection of signal components in the ICA activations. The investigated brain sections showed good separability by a simple MSE measure. The zeroing of noisy components was straightforward to implement. Due to the fast convergence and easy separability, there was no need for constraints which would only complicate the procedure and add expensive hyperparameter training as in [10]. The third step was to parallelize the ICA in a pleasingly parallel manner to evenly distribute the workload and ensure that each worker receive similar statistics. This showed a weak, but significant scaling as shown in Fig. 6.

The obtained results for the wrGoF measure were consistently better than the ICA denoising for LAP data presented in [10, 11]. The values were not influenced by the amount of parallelism. The amount of wrGoF values are in all brain sections greater than one (\({>}99.9\%\)) and mostly greater than ten (\({>}99\%\)). Overall, the results are very promising for high-throughput denoising of high-resolution 3D-PLI sections.