1 Introduction

In 1965, Hakim and Adams [1] first proposed the concept of normal pressure hydrocephalus (NPH), that clinical symptoms are gait disorder, urinary incontinence, and dementia; the pressure of the cerebrospinal fluid during lumbar puncture is normal; the imaging manifestations are communicating hydrocephalus [2, 3]. In most parts of the world, the number of elderly and dementia patients is increasing [4]. Studies have shown that the prevalence of NPH is as high as 5.9% among the elderly over 80 [5]. As a kind of dementia disease that can be treated in the elderly [2], NPH is of increasing clinical importance [6]. On the one hand, early diagnosis and surgical treatment may increase the likelihood of a good prognosis for patients with NPH [7]. On the other hand, NPH has a spectrum of disease development, and radiological signs precede clinical symptoms [8]. Morphological evaluation of CT or MRI is essential for screening and diagnosing patients with NPH. Enlargement of the ventricle is an imaging feature of NPH [2]. More importantly, similar to NPH, enlarged ventricles are also related to cognitive and gait disorders [9]. Therefore, it is very necessary to evaluate the ventricles of patients with NPH.

In the past, researchers manually segmented the ventricle to calculate the volume. But this method needs to be based on professional knowledge [10, 11] and is very time- and energy-consuming [12,13,14]. More importantly, it is prone to human errors [15]. Therefore, it is precise because of these shortcomings that manual calculation of ventricular volume is often only used in clinical research with small samples and is difficult to apply in clinical practice with large samples [16, 17]. With the development of science and technology, artificial intelligence methods [18, 19] and the field of medical imaging are getting closer and closer [20, 21]. In recent years, some researchers have proposed using artificial intelligence methods to automatically extract ventricle features and calculate ventricular volume [22,23,24] through medical image analysis techniques [25, 26]. However, these methods for automatically measuring the volume of the ventricle are single mode. What's more, these methods cannot be applied to CT and MRI images at the same time, let alone calculate ventricular volume (VV) and intracranial volume (ICV) at the same time.

So, our purpose is based on CT and MRI images of NPH patients, using machine learning methods, to achieve efficient and accurate automatic measurement of the ventricular volume of NPH patients. This will lay a solid foundation for the development of a large sample of the ventricular volume of NPH patients, and it will also help clinicians quickly and accurately assess the ventricular condition of NPH patients [12, 14, 27].

2 Method

2.1 Patient and instrument information

By strictly following the guidelines [28], we retrospectively extracted CT and MRI images of the brains of 143 definite NPH patients in Shenzhen Second People's Hospital from January 1, 2014 to December 31, 2020. Among the 143 patients with definite NPH, 38 patients had only CT images, 46 patients had only MRI images, and 84 patients had both CT images and MRI images. Therefore, we obtained the brain CT images of 122 patients (38 + 84 = 122) with definite NPH and the brain MRI images of 130 patients (46 + 84 = 130) with definite NPH. Their characteristic information is listed in Table 1. In this study, a total of three MRI equipments and two CT equipments are included. Their specific parameters are described in Table 2. The flowchart and network structure of the whole process are shown in Fig. 1.

Table 1 Characteristic data of the definite NPH patients
Table 2 Scan parameters of CT and MRI
Fig. 1
figure 1

Flowchart and network structure of this research

2.2 Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This study passed the ethical approval of The First Affiliated Hospital of Shenzhen University's bioethics committee (approval no. KS20190114001), and the researchers all signed the informed consent form.

2.3 Manually label the ventricles and the intracranial volume

First, a radiologist with more than 5 years of clinical work experience manually marked the VV and ICV. Then, a neurosurgeon with more than 10 years of clinical work experience adjusted the manually marked results. Finally, an anatomy expert with more than 20 years of work experience reviewed the manual annotation results.

2.4 Computer processor information

All of the experiments were conducted on a machine with 48 cores Intel Xeon Platinum (Cooper Lake) 8369 processor and 192 GB memory. For training the model, the PyTorch framework was utilized with 4 Nvidia Titan RTX GPUs.

2.5 Image preprocessing

In the data preprocessing stage, we first normalized images using the z-score normalization through subtracting its mean and divided by its standard deviation. For handling the anomaly pixels in scanned CT/MRI volumes, we clipped them within the range of 2.5-quantile and 97.5-quantile. Then, we augmented the data with random horizontal flipping with 0.5 probability and random scaling of hue, saturation, and brightness within the range [0.8, 1.2]. Besides, we scaled the images and masks using the bicubic interpolation method and nearest interpolation, respectively.

2.6 Machine learning process

First, we randomly select 20% of the data from the dataset as the test set, and the rest are regarded as the training set. Then, the multimodal segmentation model is described in the following part.

In real-world scenarios, the thick-slice images can be easily obtained since they do not require heavy human annotations, whereas annotating thin-slice images is labor-intensive. Besides, the distribution of thick-slice images and thin-slice images is distinctive, which leads to the domain shift problem that hinders the ability of deep learning models. Moreover, CT and MRI images are easy to obtain, to leverage these two modalities of images. We propose a segmentation model that can automatically segment the CT images and MRI images regardless of the thickness. In our previous research, the feasibility of this model was verified [21, 29].

Our goal is to utilize the thick images of different modalities to minimize the performance gap between thick-slice CT and MRI images. We denote the thick slices as \({D}_{S}=\left\{\left({x}_{s},{ y}_{s}\right)\right| {x}_{s}\in {R}^{H\times W\times 3},{ y}_{s}\in {R}^{H\times W}\}\) and the thin-slice images as \({D}_{T}=\{{x}_{t}|{x}_{t}\in {R}^{H\times W\times 3}\}\). For the image extraction, we use the ResNet-34 that is pretrained on the ImageNet dataset as the encoder. For the decoder, we adopt the sub-pixel convolution to upsample and construct the images without heavy computations while bringing additional information to the prediction. In specific, the sub-pixel convolution can be derived as:

$$F^{L} = {\text{SP}}\left( {W_{L} *F^{L - 1} + b_{L} } \right),$$

where \(SP(\cdot )\) operator arranges a tensor shaped in \(H\times W\times C\times {r}^{2}\) into a tensor shaped in \(rH\times rW\times C\). \({F}^{L}\) and \({F}^{L-1}\) are the feature maps of the image.

For training the model, we use the thick-slice images and thin-slice images as the inputs of the model and optimize the model with the following objective function:

$$L\left( {x_{s} ,x_{t} } \right) = L_{S} \left( {p_{s} ,y_{s} } \right) + L_{T} \left( {p_{t} } \right),$$

where \({p}_{s}\) and \({p}_{t}\) are the model’s predictions, and \({y}_{s}\) are the label of thick-slice images. More concrete, the \({L}_{S}\) is the cross-entropy loss which is computed as:

$$L_{S} = - \frac{1}{{{\text{HW}}}}\mathop \sum \limits_{n = 1}^{{{\text{HW}}}} \mathop \sum \limits_{c = 1}^{C} y_{s}^{n,c} \log p_{t}^{n,c} ,$$

and \({L}_{T}\) is the entropy loss that guides the segmentation of the thin slices, which is obtained by:

$$L_{T} = - \frac{1}{C}\mathop \sum \limits_{c = 1}^{C} f\left( {Cp_{t}^{n,c} } \right); f\left( x \right) = x^{2} - 1.$$

During model training, we iteratively optimize the above loss functions. For testing, we feed the image slices as the input into the trained model and get the predicted segmentation.

Compared to the previous medical segmentation approaches, they tend to train the networks from scratch within end-to-end manner. However, it has been widely proved that using the pretrained model on the large-scale dataset, e.g., ImageNet can enable the network to learn the textual and shape prior effectively in the early stage of training. Therefore, we incorporated the ImageNet pretrained ResNet as the initialization of the encoder for obtaining powerful image representation during the first stage of training.

2.7 Calculation of VV and ICV

For this part, we present a method how to automatically calculate the VV and ICV using the segmentation method described in the previous sections. We first extract the ratio of the image from the scanned MRI or CT files; we denote this ratio \(r=({r}_{x}, {r}_{y}, {r}_{z})\)(mm3/pixel). Given an input MRI/CT volume, we first used the ventricle segmentation model to predict each type of ventricle, i.e., left lateral ventricle, right lateral ventricle, third ventricle, and fourth ventricle. For the i-th slice of the volume \({\varvec{V}}\), the prediction of each pixel is \({{\varvec{V}}}_{{\varvec{j}}{\varvec{k}}}^{{\varvec{i}}}\). Therefore, for the ventricle categorized as c, we calculate its \(V{V}_{c}\) as follows:

$$VV_{c} = r_{x} r_{y} r_{z} \mathop \sum \limits_{i} \mathop \sum \limits_{j} \mathop \sum \limits_{k} 1[{\varvec{V}}_{{{\varvec{jk}}}}^{{\varvec{i}}} = c],$$

where \(1\left[x=y\right]\) is the indicator function.

Therefore, given a scanned CT/MRI volume, we can use the above equation to estimate the VV of each ventricle. Similarly, we can estimate the volume of the whole brain by training a whole-brain segmentation model as we have stated in the previous section. Then, the volume of the brain \(ICV\) can be calculated. We use the trained whole-brain segmentation model to predict the pixel. For the i-th slice of the volume \({\varvec{V}}\), the prediction of each pixel is \({{\varvec{V}}}_{{\varvec{j}}{\varvec{k}}}^{{\varvec{i}}}\). With the whole-brain region categorized as 1, we can calculate the corresponding ICV as follows:

$${\text{ICV}} = r_{x} r_{y} r_{z} \mathop \sum \limits_{i} \mathop \sum \limits_{j} \mathop \sum \limits_{k} 1[{\varvec{V}}_{{{\varvec{jk}}}}^{{\varvec{i}}} = 1],$$

Then, according to the definition, the VV/ICV can be obtained as follows:

$$\frac{{{\text{VV}}}}{{{\text{ICV}}}} = \frac{{\mathop \sum \nolimits_{c} {\text{VV}}_{c} }}{{{\text{ICV}}}} \times 100\% .$$

2.8 Statistical analysis

Combining statistical methods used in previous literature [22, 30], we use Dice’s similarity coefficient (DSC), intraclass correlation coefficients (ICC), Pearson correlation, and Bland–Altman analysis to evaluate the spatial overlap, reliability, correlation, and consistency between the results of automatic and manual ventricle segmentation.

2.9 Implementation detail

For the implementation, we first trained the model on the thick-slice datasets with SGD optimizer for 200 epochs. The initial learning rate was set to 1e-3 with linear decay schedule. Then, we used the pretrained model as the initialization of the model and applied both thick slice and thin slices on the proposed objective function with the initial learning rate 1e-4 for 100 epochs. The weight decay factor was 1e-5 for training. For the training time, the pretrain stage took about 5 h on the machine with 4 NVIDIA TITAN RTX GPUs, and the main training took 10 h on the same machine.

3 Results

3.1 The processing results of CT image

The DSC, ICC, and Pearson correlations of the VV generated by our model for automatic segmentation and the VV generated by manual segmentation are 0.95, 0.99, and 0.99, respectively. The DSC, ICC, and Pearson correlations of ICV generated by automatic segmentation and manual segmentation are 0.96, 0.99, and 0.99, respectively (Table 3 and Fig. 2). Bland–Altman analysis shows that manual and automatic segmentation bias mean ± standards deviations of VV and ICV are 4.2 ± 2.6 and 6.0 ± 3.8 (Fig. 3). It takes 3.4 ± 0.3 s for our model to automatically segment the VV and ICV of a patient (Table 4).

Table 3 The DSC, ICC, and Pearson correlation of validation set manual and automatic measurement results
Table 4 Validation set measurement results
Fig. 2
figure 2

Pearson correlation analysis diagram of manual and automatic segmentation results. Whether it is the ventricle volume (VV) and intracranial volume (ICV) of the CT image or MRI image; the Pearson correlation between automatic and manual segmentation results is 0.99, and there is the statistical significance (P < 0.01)

Fig. 3
figure 3

Bland–Altman analysis diagram of manual and automatic segmentation results. In the CT image, the Bland–Altman analysis shows that manual and automatic segmentation bias mean ± standards deviations of VV and ICV are 4.2 ± 2.6 and 6.0 ± 3.8. In the MRI image, the Bland–Altman analysis shows that manual and automatic segmentation bias mean ± standards deviations of VV and ICV are 2.0 ± 0.6 and 7.9 ± 3.8

3.2 The processing results of MRI image

The DSC, ICC, and Pearson correlations of the VV generated by automatic segmentation and manual segmentation are 0.94, 0.99, and 0.99, respectively. The DSC, ICC, and Pearson correlations of ICV generated by automatic segmentation and manual segmentation are 0.93, 0.99, and 0.99, respectively (Table 3 and Fig. 2). Bland–Altman analysis shows that manual and automatic segmentation bias mean ± standards deviations of VV and ICV are 2.0 ± 0.6 and 7.9 ± 3.8 (Fig. 3). It takes 1.9 ± 0.1 s for our model to automatically segment the VV and ICV of a patient (Table 4).

3.3 Processing result display

For qualitatively evaluating the effectiveness of our proposed method, we visualize the segmentation results as well as their corresponding 3D reconstruction results for both MRI and CT samples in Fig. 4. As depicted in the figure, the right lateral ventricle is colored in red; the left lateral ventricle is colored in green; the yellow-colored region represents the third ventricle; and the blue region represents the fourth ventricle. Besides, the 3D segmentation results are visualized from the axial plane, the coronal plane, and the sagittal plane. We can notice that our method could not only predict the two ventricles mentioned above but also segment the third ventricle and the fourth ventricle well. By comparing the results of our method with other methods in the automatic segmentation of the ventricle of the validation set, we can see that our method is superior to the other two methods in automatically segmenting the ventricle (Table 5).

Fig. 4
figure 4

The visualization of the 3D brain ventricles and whole-brain segmentation results with two different modalities (CT/MRI) and the corresponding three-dimensional visualization of the predictions. The right lateral ventricle is colored in red; the left lateral ventricle is the green one; the third ventricle is colored with yellow; and the blue region indicates the fourth ventricle

Table 5 Comparison results (DSC) of our method with other methods for automatically segmenting the ventricle of the validation set

4 Discussions

The results produced by the multimodal automatic ventricle segmentation method established through the brain images of NPH patients and the results produced by manual segmentation have excellent spatial overlap, reliability, correlation, and consistency. Therefore, the artificial intelligence method can realize the efficient and accurate assessment of the ventricular condition of NPH patients.

Volume measurement is obtained by segmentation, which refers to the process of describing the structure in imaging research [23]. The segmentation of the ventricle provides a quantitative measurement for the changes of the ventricle, forming important diagnostic information [31]. For manual segmentation, it took about 30 min to obtain the VV and ICV of a patient by manual segmentation [17]. There is no doubt that this will hinder the evaluation of the ventricles of large-scale samples [13]. Because of this, manual segmentation is often impractical in large-scale clinical practice, and more automated methods are urgently needed to complete it [32, 33]. So, the automated brain image segmentation method is a research hotspot in recent years [16]. More importantly, the development of accurate, fast, and easy-to-use ventricular volume segmentation methods is of great significance for further research and evaluation of the standardized use of ventricular volume in NPH patients [27]. The automatic ventricle segmentation method can overcome the limitations of the manual segmentation method [10]. It is an efficient and rapid ventricle segmentation method [12, 23]. The most important thing is that it can significantly shorten the operation time [24]; this lays a solid foundation for the direct measurement of the ventricle volume in large-scale clinical practice. However, due to the difference between CT and MRI images, automatic ventricle segmentation methods based on CT images are often difficult to process MR images [34]. On the other hand, the existing automatic ventricle segmentation methods for NPH patients only segment the ventricle structure [24]. Compared with the volume of the ventricle, the VV/ICV can reflect the situation of the ventricle in the whole skull [22] and take into account the differences in the volume of each subject caused by changes in anatomy [35].

Compared with current semi-supervised segmentation techniques, Li et al. [36] proposed a generative method that adopts StyleGAN2 with an augmented label synthesis branch, which utilizes partially labeled images to predict the out-of-domain images. During the inference time, it requires to fine-tune the network to find the best reconstruction with their proposed objective function for the target image. By contrast, our method fine-tunes the network with labeled and unlabeled images together to preserve the generalization ability for both thick-slice and thin-slice images under both CT and MRI modalities.

This study has the following limitations: first of all, this is a retrospective study; it is based on existing images and clinical information. Next, we will collect more comprehensive clinical information and imaging data of NPH patients through prospective research; we will also pay more attention to the follow-up process of NPH patients and the situation after surgery; this will help us to understand NPH disease more comprehensively. Secondly, our study is a single-center study; patients are all from a single area. Previous studies mentioned that the ventricles of different races are different [17]. Then, we will conduct multicenter research to further understand the ventricles of NPH patients and improve the applicability and application value of the automatic ventricle segmentation method. Thirdly, we only performed routine brain imaging analysis. Functional imaging is also very important for the diagnosis and treatment of NPH patients [37]. Therefore, in a follow-up research, we will use conventional imaging and functional imaging to further understand the changes in brain structure and function of NPH patients. The last but most important thing is that both the medical field and the artificial intelligence field are constantly evolving [18, 20]. Our approach needs to fine-tune the network on the thin slice and thick slice together for preserving the generalization ability on both types of images, which requires to access the thick-slice images when training for the thin-slice images although it works for multiple modalities. This would be impractical when it comes to the privacy of the data. Subsequently, we will continue to optimize the algorithm of the model to meet the actual needs of clinical practice. Besides, since numerous artificial intelligence algorithms have been successfully deployed to the real-world system [19], we also consider integrating our algorithm into the out-of-the-box healthcare system directly. To better integrate artificial intelligence and medicine, complement each other and make progress together.

5 Conclusion

In summary, we have established a multimodal and high-performance automatic ventricle segmentation method to achieve efficient and accurate automatic measurement of the ventricular volume of NPH patients. It can not only process CT and MRI images at the same time but also calculate the ventricular volume and relative ventricular volume at the same time. The whole process is relatively fast compared to the traditional method. Besides, the thickness agnostic ventricle and whole-brain segmentation can handle the samples generated by different scanners as well as the thickness of slices. This is an effective combination of the medical field and the field of artificial intelligence. It not only lays a solid foundation for the subsequent analysis of large samples of NPH patients, but also helps clinicians quickly and accurately understand the situation of normal pressure hydrocephalus patient's ventricles, which can help clinicians in the diagnosis of NPH patients, the follow-up process, and the evaluation of surgical effects.