1 Introduction

Multiple sclerosis (MS) is a chronic and degenerative disease of the brain and spinal cord with very heterogenous clinical presentation which can vary greatly between patients in severity and symptoms [1]. Also the clinical course of MS is unpredictable and most patients are initially diagnosed as having relapsing-remitting MS characterized by inflammatory attacks separated by variable periods of remission and recovery. After this first phase, the majority of patients transit into a progressive phase consisting in an unremitting and progressive accumulation of disability. Actually there is no cure for MS and existing therapies focus on symptomatic management and prevention of further damage, with variable effectiveness, though recent advancements are promising. MS origins are not well understood but characteristic signs of tissue damages are recognizable, such as white matter lesions and brain atrophy or shrinkage due to degeneration. These signs can be observed by MRI which has become a special tool to follow-up MS patients with reduced invasiveness due to the usage of specific contrast agents. In fact, focal lesions in the brain and spinal cord are primarily visible in the white matter on structural MRI observable as hyperintensities on T2-weighted images, proton-density images (PD), or fluid-attenuated inversion recovery images (FLAIR), and as hypointensities, or “black holes”, on T1-wheighted images [2]. These imaging procedures are all performed in a single MRI examination and the corresponding images (thousands), collected both in pre and post CA administration, are all used for MS monitoring and follow-up. Identification of the lesions affecting the white matter and their count and volume calculation by MRI have become well established protocols for assessing the progression of MS and treatment effect. For this reason, MRI is currently used routinely in clinical practice, though it is not well correlated with clinical disability progression due to the presence of different forms of disability (besides physical impairments, also cognitive impairments could occur), to neuroplasticity and to the effects of de-myelinization of nerves, a critical effect of MS, which is not observed by MRI (white matter could appear normal though it has reduced myelin). Moreover imaging markers are capable to capture volumetric changes but they are unable to indicate brain changes and spatial dispersion of the lesions. Besides that, MS patients routinely have MR imaging with CA every 6–12 months to assess response to medication but, recently [3], evidence has been provided of tissue deposition of contrast agents questioning the long-term safety of CA. Since in [3] it has been shown that there is no added benefit of CA over and above that of increased lesion burden, it could be argued that, since the proportion of individuals with worsening lesion load is a small proportion, there is no need for CA administration in those with stable disease.

In what follows, we present a framework to increase the precision and the objectiveness of MRI analysis in monitoring MS by improving lesion identification and comparison of the actual control with those collected previously in order to establish if new lesions have occurred and if old lesions have expanded or modified. Moreover, the framework is intended to optimize the use of CA, by eliminating it when the disease is stable. To the best of our knowledge, the proposal of a system which merges the advantages of an automatic MS lesion identification/segmentation strategy with those of registering data collected at different times to perform numerical comparisons of lesions is new.

The manuscript is structured as follows: Sect. 2 provides the related work, Sect. 3 details the proposed framework, Sect. 4 presents promising, though preliminary, results and Sect. 5 concludes the paper.

2 Related Work

MRI is considered the gold standard between imaging modality for identification and evaluation of MS lesions affecting white matter, thanks to its richness of imaging parameters, which allow to highlight the shape of these lesions with respect to the healthy tissue, to the usage of CA to establish the status of the lesions (active or chronic) and to new perspectives offered by MRI evolutions [4]. Thousands of MRI images composing a single examination are usually analyzed by expert radiologists: the operation is time consuming, subjective and difficult to be carried out without errors due to the huge number of evaluations required for each of the identified lesions. Moreover, additional evaluations and comparisons are required between the current examination and data collected previously, necessary to follow-up the disease. This has implied that both registration methods between data from different examinations and lesion identification process were automatized.

Regarding automatic segmentation of MS lesions by MRI images segmentation, several attempts have been done with success, though the huge variability of MS lesions in size, shape, intensity and location make automatic and accurate identification and segmentation really challenging [5,6,7]. Though classical segmentation techniques, based on shapes, could be effective [8], a particular attention to deep neural networks is necessary, due to their accuracy in solving computer-vision tasks with low manual intervention with respect to other approaches. The great advantage of deep learning is that the feature set would be no longer defined by the user but learned directly by the system from the training images. This is a useful property because it is often difficult for people to characterize features that best serve to separate healthy tissue from MS lesions. From the perspective of deep learning application, the high dimensionality of the MR images, the difficulty of obtaining reliable ground truth and the high accuracy required for clinical practice, all contribute to make white matter lesion segmentation a worthy test application. CNN have demonstrated breaking performance also in brain imaging segmentation [9,10,11]. In particular, Yoo et al. [9] were the first to propose an automated learning approach for MS lesion segmentation. Besides the architecture of the used system, the interesting innovations were that 3D patches of the MRI volume were used and that segmentation preferred combinations (co-registration) of T2-w and PD images because they were proven to carry more information than MRI images by other modalities and more information than T2-w or PD taken singularly. In 2015, Vaidya et al. [10] proposed a method that used 3D CNNs to learn features by different datasets of the same patient: T1-w, T2-w, PD and FLAIR MRIs. The method proposed in [11] has proven to use efficiently the information carried on by different MRI imaging modalities by reducing the number of parameters (and hence the training set) through the usage of two CNNs in cascade, trained separately. To date, the method presented in [11] represents for MS lesion segmentation one of the benchmark architectures.

Regarding registration techniques, the problem has been afforded since medical imaging moved its first steps [12] due to the necessity of matching images by different modalities. In the following years, the problem has been refined and effectively solved by using recently proposed learning-based deformable strategies and optimization, suitably studied for MRI of the brain [13, 14].

3 The Proposed Framework

The framework we propose is based on the utilization of both the data collected in the current examination and those collected in the previous examination, in turn composed by MRI data and by its corresponding lesion identification/segmentation. The framework sketch is reported in Fig. 1.

Fig. 1.
figure 1

Framework description. Two temporal controls (examinations) concur to evaluate the disease progression. The recent control is first classified for identifying lesions. Then its data are registered with those of the previous control (also previously classified) and logically compared with it in order to evaluate the status of the identified lesions.

After the acquisition of the current MRI data, lesions are identified and segmented by using the method described below. Following lesion segmentation, data from the current and the previous exams follow a 3D registration (also summarized below). After volume registration, the binary images containing ones where lesions are present and zeros elsewhere are used for binary operations to obtain resulting binary images indicating whether a lesion was present in both examinations (chronic lesion), if a lesion, though present in both examinations, has grown up with time (increased volume) or, finally, if a lesion is present in the actual examination and absent previously (new and, potentially, acute and active lesion). Moreover, a lot of objective numerical calculations are possible, such as: the number of lesions (calculated as the number of connected classified regions); single volume calculation; global volume occupied by lesions; calculation of the brain volume in the actual examination with respect to the previous exam or any other modification occurring in the brain that can be calculated numerically.

3.1 Lesion Segmentation

Being a benchmark method, we have used the supervised paradigm presented in [11] by extending its concept to contain, besides the parallel pipeline involving T1-w, T2-w, PD-w and FLAIR images also the linear combination T2-w + PD. The reason of using also T2-w + PD is because this modality has more information than the others regarding MS lesions [9]. Moreover, the linear combination contains more information than each of the singular modality (in particular, it increases the contrast of the lesions with respect to the background). In this way, we provided a simpler segmentation task to the system, thus increasing the segmentation accuracy while reducing the dimension of the training, labeled, dataset. This, in MS lesion segmentation, still remains a critical point because the number of available images with data is usually low [5]. A scheme of the used assembly is reported in Fig. 2.

Fig. 2.
figure 2

Two stage CNNs architecture used for identifying and segmenting MS lesions. Input of the system are the volume collected by different imaging modalities and by a linear combination of some of them. Training of CNN2 is made with a separated dataset.

The method is based on a cascade of two CNNs. Though computer vision architectures used for object recognition in natural images usually require up to hundreds of layers [15], the low variations of contrast in MRI images allows the use of smaller networks, thus reducing the training set dimension. The used method consisted of a 7-layers architecture for each of the two CNNs. Each network consisted of two stacks of convolution and max-pooling layers with 32 and 64 filters, respectively. Convolutional layers were followed by a fully-connected layer of size 256 and a soft-max fully connected layer of size 2 whose output was the probability of each voxel to belong to a lesion. For a complete specification of the used parameters, please refer to [11]. In the proposed approach, MS lesions were calculated using 3D neighboring patch features from the different input modalities. The used 3D patches were cubic, 11 × 11 × 11 voxels. The splitting in two different CNNs allowed to separate the training procedure in two and this allowed a reduction of the number of parameters without reducing accuracy. To reorder data balance for training, that is to equilibrate the number of “positive” patches (those containing lesions) with “negative” patches (those containing no lesions, much greater than the other), the dataset used for training consisted of the whole dataset of positive patches and of an equal number of randomly selected negative, healthy patches. In this way, the first network (CNN1) was trained by using the resulting balanced dataset and then tested on the whole dataset, thus obtaining a list of probabilities for each voxel of each patch to be “positive” (part of a lesion). After that, a balanced dataset was created by using the previous test results and by considering as positive all patches containing voxels whose probability was greater than 0.5. As for the previous balanced training dataset, negative patches (those in which all voxels had probability < 0.5), were randomly selected to be the same number of “positive” patches. The second network (CNN2) was trained from scratch with this resulting dataset. Once the whole pipeline is trained, new unseen MRI volumes can be processed using the same, two stage, architecture. The dataset is first decomposed in patches and, then, all volume patches are evaluated using CNN1. CNN1 discards all voxels with low probability (<0.5). The rest of the voxels, included into corresponding patches, are re-evaluated by CNN2 to obtain the final probabilistic lesion mask. Resulting binary masks (ones where lesion are present, zeros elsewhere) are computed by thresholding the probability lesion masks (prob > 0.5 are considered lesions).

Finally, an additional false positive reduction is performed by discarding binary connected regions with very low number of positive voxels (this number is calculated with respect to the minimal volume of the lesions used for testing). The proposed method, trained with the same dataset used in [11], had an average score of about 90% (about 3% greater than the original method) without using any artificial strategy for increasing the training dataset of patches. The improvement is probably due to the usage, between the others, also the volume composed by T2-w + PD which simplifies the identification/segmentation process.

4 Image Registration

Let assume that we want to compare and register two volumes composed by slices. We have the situation that some R(x, y, z) points (actual examination) are the reference points and some M(x, y, z) points (previous examination) are those to register with. Then the major goal of image registration is to find a geometric transformation T such that T(M(x, y, z)) is as close to R(x, y, z) as possible. Mathematically, the image registration problem can be formulated as a maximization problem:

$$ T_{opt} = \arg \,max_{{T \in\Omega _{T} }} S\left( {R,T\left( M \right)} \right) $$
(1)

where \( T_{opt} \) denotes the optimal transformation, S is a selected similarity metric and \( \Omega _{T} \) is the space of all possible transformations [16].

A conventional registration process is performed by applying the optimization (1) after having selected a similarity metric S. One way to solve the maximization (1) is not using the whole datasets to find the optimal T but to select a series of N fiducial corresponding points (landmarks, or control points) in both examinations and to search the optimal T to best match the two point sets in the two datasets. More specifically, let \( \left( {x_{i} ,y_{i} ,z_{i} } \right), i = 1,2, \ldots .,N \) and \( \left( {X_{i} ,Y_{i} ,Z_{i} } \right), i = 1,2, \ldots .,N \) be the two point sets in R(x, y, z) and in M(x, y, z), respectively. Then the task of mapping M to R becomes the problem of finding a transformation T such that T \( \left( {X_{i} ,Y_{i} ,Z_{i} } \right) \) are close to \( \left( {x_{i} ,y_{i} ,z_{i} } \right) \). This transformation can be regarded as a coordinate transformation which transform the coordinate of the N points in M to the N points in R. By applying T on all the points of M, we can also use it as an interpolation strategy, as used in [17, 18].

We can indicate the transformation T as its representation in homogeneous coordinates:

$$ T = \left[ {\begin{array}{*{20}l} {t_{1,1} } \hfill & {t_{1,2} } \hfill & {t_{1,3} } \hfill & {t_{1,4} } \hfill \\ {t_{2,1} } \hfill & {t_{2,2} } \hfill & {t_{2,3} } \hfill & {t_{2,4} } \hfill \\ {t_{3,1} } \hfill & {t_{3,2} } \hfill & {t_{3,3} } \hfill & {t_{3,4} } \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 1 \hfill \\ \end{array} } \right] $$
(2)

The transformation T is considered to be affine because different MRI examinations could be performed by different equipment, different imaging parameters and different resonators that could produce, besides translations and rotations, also scale variations and shear (scaling is produced by setting different field of view or different resolution and shear can be produced by magnetic field inhomogeneities [19]). The optimization problem is to find the coefficients of T that best fit M into R. The N points can be selected manually or automatically by a computer (we used a manual selection).

The similarity metric S that we have used in our optimization (1) is the least-squares metric:

$$ min_{{T \in\Omega _{T} }} \sum\nolimits_{i = 1}^{n} {\left[ {R\left( {x_{i} ,y_{i} ,z_{i} } \right) - T\left( {M\left( {X_{i} ,Y_{i} ,Z_{i} } \right)} \right)} \right]}^{2} $$
(3)

We could choose between different metrics [20] but we decided for the least squares metric (LSM) for two reasons: the examinations we aimed at register were both MRI of the same subject, though at different times, and, for this reason, had a high degree of correlation; LSM is the classical, simple, and most widely employed metric.

5 Preliminary Results

The proposed framework has been tested on data collected at 4 different times (4 consecutive examinations: 2010, 2011, 2017 and 2018, compared in couples: 2010–2011 and 2017–2018) a 55 years old male patient with a GE Healthcare Signa 1.5T system (https://www.gehealthcare.com/en/products/magnetic-resonance-imaging/1-5t). Some representative results are reported in Figs. 3 and 4 (2010–2011) and Figs. 5 and 6 (2017–2018).

Fig. 3.
figure 3

Representation of one of the slices of the volume collected by each of the MRI imaging modalities (columns) for the examinations collected at consecutive times (2010, first row, and 2011, second row) for the same patient. Exception is represented by the FLAIR section representation which served to better individuate the lesions at the sagittal plane. Last column contain the image obtained by summing T2w image and PD image (not directly collected by the MRI equipment).

Fig. 4.
figure 4

The slice corresponding to Fig. 3 after the calculation with the proposed framework (left) and after the administration of the CA (right). Different colors (left) are used for indicating different information. Green is used for chronic lesions, red for new lesions and blue for lesions segmented in the previous examination but absent in the following (outliers or reabsorbed edema). Both images are referred to the situation at the time of the second control (2011, in this case). (Color figure online)

Fig. 5.
figure 5

Representation of one of the slices of the volume collected by each of the MRI imaging modalities (columns) for the examinations collected at consecutive times (2017, first row, and 2018, second row) for the same patient. The Figure has the same significance of Fig. 3. No exception has been made for FLAIR image because no relevance was found in the present lesions.

Fig. 6.
figure 6

The Figure has the same significance of Fig. 4. No red and blue regions are present, to indicate the absence of new lesions (red) and of old lesions (blue) appeared in the old control but not detected in the recent control by the classification strategy. Both images are referred to the situation at the time of the second control (2018, in this case). (Color figure online)

In particular, Fig. 3 shows one of the slices, after registration, for the two controls (2010 in the first row, 2011 in the second row), collected with different imaging modalities (columns). Exception is made for the FLAIR image to better highlight the extension of the lesions. The last column contains T2-w + PD. All the reported volumes were the input of the segmentation method. Figure 4 shows the mask of the lesions calculated by using the proposed framework from the masks obtained by the segmentation procedure on both the controls (left) and the image obtained after CA in the second 2011 control (right). In particular, the final mask contained the logical AND between: the two masks (green); the second and the negative of the first (red); the first and the negative of the second (blue). Green indicates old, chronic, lesions (present in both examinations); red indicates new, maybe acute, lesion (present in the most recent examination but not in the previous); blue indicates lesions present in the old examination but not in the most recent (outliers or old, reabsorbed, edema). It is important to note that in the last image relevant information are represented by the colored regions (the brain image was just reported for reference but it was not part of the mask). By analyzing left image of Fig. 4, it could be deduced that some lesions occurred in the time between the two controls. If CA was not used (right image of Fig. 4 was unavailable), the framework could not decide regarding the status of red lesions (active or chronic). By using also the information from the right image in Fig. 4 (T1-w CA) it could be better defined the status of the new lesions in the red regions: that in the right hemisphere was active, the other was not. CA was, in this case, useful to ascertain better the disease progression. However, since new lesions occurred in between controls, the framework results helped in deciding for the CA administration.

By considering the consecutive controls performed recently (Figs. 5 and 6), it could be deduced that the diseases remained stable (just green lesions were present). In this case, framework results (Fig. 6, left) indicate to the radiologists to avoid CA administration since nothing CA would add. In fact, the information gathered by using CA (Fig. 6, right) allow to confirm the hypothesis suggested by our framework (CA was, in that case, unnecessary). The usage of the proposed framework before CA administration would have avoid CA administration in the control of 2018.

6 Conclusion

We have presented an automatic framework to analyze and evaluate the progression of the MS disease by evaluating the status of the lesions. The framework is based on the separate classification of data collected by using MRI, in different modalities, from two consecutive controls, on the registration of data and on the logical comparison of the binary masks containing lesions. The framework is capable to identify the status of the lesions (chronic od new). Preliminary results have demonstrated that it could be possible to ascertain the relevance of the disease progression and, in case of irrelevant disease progression, the framework is capable to avoid CA administration. Future work will be dedicated to an extensive system evaluation and characterization to perform an accurate quantitative analysis, to calculate other important numerical parameters and to optimize the overall running time that could allow its usage during data acquisition for helping radiologists to take the correct decision regarding the CA administration. Moreover, it will be studied how to use the results of the framework to improve the performances of the segmentation algorithm, in particular for reducing outliers.