1 Introduction

Change detection systems provide crucial information for damage assessment after natural disasters such as floodings, earthquakes, landslides, or to detect long-term trends in land usage, urban development, glacier dynamics, deforestation, and desertification [1,2,3,4,5,6,7]. In the last years, thanks to the development of heterogeneous or multimodal change detection methods, it was possible to relax the assumption of homogeneous and co-calibrated measurements. However, despite its undeniable potential, there is still a limited amount of research on heterogeneous change detection in the fields of computer vision, pattern recognition and machine learning. In [8], copula theory is exploited to build local models of dependence between unchanged areas in heterogeneous images and to link their statistical distributions. In [9], joint distributions of heterogeneous images are obtained by transforming their marginal densities in meta-Gaussian distributions, which provide simple and efficient models of multitemporal correlations. In [10, 11], a method based on evidence theory is proposed, which fuses clustering maps of the individual heterogeneous images and then detects “change” and “no-change” classes, from the transition probabilities between clusters. In [12], the physical properties of the considered sensors and, especially, the associated measurement noise models and local joint distributions are exploited to define a “no-change” manifold.

The capability of processing data from heterogeneous sources in the same application opens for usage of a much larger amount of information. With respect to time series, the temporal resolution can be increased and the overall time window can be extended. Nonetheless, new issues arise. Different sensors are sensitive to distinct physical conditions and comparing their measurements may produce false detections, due to inconsistencies in sensor behaviour rather than actual changes in the monitored entities. As the complexity of the fused data set increases, there could be a requirement for more flexible and complicated statistical models, which are harder to fit on data, they may be characterized by larger uncertainty in the parameter estimation and a higher computational cost. Finally, detecting and characterizing changes in heterogeneous images is not as trivial as in the homogeneous case, where a change corresponds simply to a difference in the signal values.

In this work, we propose a novel cluster-based approach for change detection in heterogeneous data. We design an unsupervised method to be as general as possible, i.e. application-independent. The proposed method processes pairs of images, acquired at different times from different sensors. In particular, one image comes from an optical sensor, whereas the second is a synthetic aperture radar (SAR) image. The images must be co-registered by a pre-processing step, to avoid that spatial misalignment of the images is misclassified as a change. Moreover, a third type of images is considered, whose elements are obtained by stacking optical and SAR images. A clustering method is executed independently on each of the three data sets. Then, the clusters identified in the first two data sets are matched against the ones from the third data set, in order to determine if the clusters from the first image split or merge in the second image. We associate changes to the occurrence of such modifications.

In this preliminary study, the problem has been defined, a possible solution has been suggested and experiments have been performed to assess the capability of the proposed methodology. Making the whole process automatic is the following step, which will be treated in a further extension of this work.

2 Background

This work leverages on the information delivered by distance-based clustering analysis on image data. To select the proper distance measures, we first need to identify the correct statistical models to represent the data. Since we process optical and SAR images, we consider only models commonly used when dealing with these two specific data.

A simple probability distribution that describes well the optical images is the Gaussian distribution [13, 14]. Specifically, a sensor with n channels yields feature vectors \({\varvec{x}}_{opt} \in \mathbb {R}^n\), which are modelled by a multivariate Gaussian probability density function (pdf)

which compactly reads as \({\varvec{x}}_{opt}|\omega _{i}\sim N(\varvec{\mu }_{i},\varvec{\varSigma }_{i})\). Here \(\varvec{\mu }_{i}\) and \(\varvec{\varSigma }_{i}\) are the mean vector and the covariance matrix associated to cluster \(\omega _{i}\), respectively.

Concerning SAR images in single polarisation, using the gamma distribution is a simplistic, yet effective option [15]:

$$\begin{aligned} f(x_{SAR}|\theta _{i},L)=\frac{1}{\theta _{i}\varGamma (L)}\left( \frac{x_{SAR}}{\theta _{i}}\right) ^{L-1}\exp \left( -\frac{x_{SAR}}{\theta _{i}}\right) . \end{aligned}$$

This is denoted by \(x_{SAR}|\omega _{i}\sim \varGamma (\theta _{i},L)\). \(\varGamma (L)\) is the gamma function, while L and \(\theta _{i}\) are the shape and the scale parameter, respectively. Since L (the number of looks) is the same for all the clusters, these can be fully characterised by their mean \(\mu _{i}=L\theta _{i}\).

The log-normal distribution is an alternative to the gamma pdf. It fits data reasonably well under most circumstances and, contrarily to the gamma pdf, it allows to model heavy-tailed SAR intensity data [15]. A positive-valued random variable \(X|\omega _{i}=e^{Y}\) follows a log-normal distribution if \(Y|\omega _{i}=\log (X)\sim N(\mu _{i},\sigma _{i})\). The pdf reads

$$\begin{aligned} f(X|\mu _{i},\sigma _{i})=\frac{1}{X\sqrt{2\pi \sigma _{i}^{2}}}\exp \left( -\frac{(\log (X)-\mu _{i})^{2}}{2\sigma _{i}^{2}}\right) , \end{aligned}$$

denoted by \(X|\omega _{i}\sim logN(\mu _{i},\sigma _{i})\). The first two moments of random variables X and Y are related according to

$$\begin{aligned} \mu _{X|\omega _{i}}&= \exp \left( \mu _{i}+\frac{\sigma _{i}^{2}}{2}\right) ,\;\; \sigma _{X|\omega _{i}}^{2} =\mu _{i}^{2}\left( e^{\sigma _{i}^{2}}-1\right) . \end{aligned}$$

To conclude, if the statistical behaviour of a SAR image can be described by log-normal distributions, then a logarithmically transformed image can be modelled by a Gaussian distribution. This property will be useful to process the stacked data \({\varvec{x}}_{st}\), which combines all features of the optical and the SAR image into one stacked feature vector, associated to each pixel.

As distance measures, we use Mahalanobis distance [16] for multivariate Gaussian distributed data and Hellinger distance [17] for gamma distributed data. A notorious drawback in cluster methods is the dependence of their results to initial conditions, such as initialization of cluster centers and ordering of the data. Additionally, the desired number of clusters or the scale parameter (used in methods such as hierarchical or density-based clustering) is often unknown. Ensemble clustering methods tackle these issues, by providing more stable results at the cost of higher computational complexity [18,19,20]. Ensemble methods can identify clusters of nontrivial shape and with different densities, handle noise and outliers, and they provide an estimate to the optimal number of clusters. In our case, such a number is unknown and, therefore, we perform cluster analysis with an ensemble approach based on Fuzzy C-Means (FCM) [21,22,23]. The ensemble procedure consists in repeating several times the FCM initialized with k different number of clusters, which each time is drawn from a uniform discrete distribution. FCM is implemented with the distance measures mentioned above. The FCM algorithm represents an iterative approach, where at each iteration a partition matrix U is returned as output. The membership values \(\mu _{ij}\) contained in U are exploited to evaluate the covariance matrix of each cluster as:

$$\begin{aligned} \varvec{\varSigma }_{i}=\frac{\sum \limits _{j=1}^{N}\mu _{ij}({\varvec{x}}_{j}-{\varvec{c}}_{i})({\varvec{x}}_{j}-{\varvec{c}}_{i})^{t}}{\sum \limits _{j=1}^{N}\mu _{ij}}, \quad i=1,\,\ldots ,\,k. \end{aligned}$$

When multivariate Gaussian distributed data are involved, the Mahalanobis distance computed in the following iteration employs these updated covariance matrices. In a possible future development, we plan to examine the partition matrix to identify the most reliable clustering results, in order to improve the post-clustering analysis.

Fig. 1.
figure 1

First step of the proposed methodology: obtainment of the stacked image and of the three partitionings.

3 Recognition of Cluster Splits and Merges

Given two heterogeneous images of the same geographical area captured respectively at times \(t_1\) and \(t_2\), we want to detect if a change occurred during the time lapse. Each image is clustered by using the distance measures that captures its statistical properties. The clustering ensemble procedure on each image provides the partitions

$$\begin{aligned} P_{opt}=c_{opt}^{(1)} \cup \quad \dots \quad \cup c_{opt}^{(N_{opt})} \end{aligned}$$
$$\begin{aligned} P_{SAR}=c_{SAR}^{(1)} \cup \quad \dots \quad \cup c_{SAR}^{(N_{SAR})} \end{aligned}$$

where \(N_{opt}=\left| P_{opt}\right| \) and \(N_{SAR}=\left| P_{SAR}\right| \) are the number of clusters in each partition. Then, if the SAR data \(x_{SAR}\) are assumed to follow a log-normal distribution, the logarithm of their intensities can be modelled by Gaussian pdfs. Since also the optical data \({\varvec{x}}_{opt}\) are modelled by Gaussian pdfs, the stacked vector \({\varvec{x}}_{st}=\left[ {\varvec{x}}_{opt}, \log \left( x_{SAR}\right) \right] \) could be thought of a realization of a multivariate Gaussian random variable. Accordingly, we compute a third partition \(P_{st}\) on the stacked data, as shown in Fig. 1. Once the three partitions are obtained, we check whether a cluster from the image at time \(t_{1}\) splits into two or more clusters in the stacked image, or whether two or more clusters from the stacked image may merge into one cluster of the image at time \(t_{2}\). Instead of comparing directly the clusters from time \(t_{1}\) and time \(t_{2}\), with our method we leverage the information contained in the covariance matrix of the stacked image, which captures the cross-correlation between the original images. Moreover it may provide a regularization that filters out the effect of the speckle noise on the clustering results. The proposed methodology is depicted in Fig. 2.

Fig. 2.
figure 2

Proposed methodology: it is possible to recognize changes as splits (a) and merges (b) by the comparison with the partitioning of the stacked image.

In Fig. 2(a), a region in the optical image at time \(t_1\) is fully contained in a cluster \(\mathbf {c}_{opt}^{(1)}\). The same region, is divided in two clusters, \(\mathbf {c}_{st}^{(1a)}\) and \(\mathbf {c}_{st}^{(1b)}\), in the stacked image. This means that in the SAR image at \(t_2\) the region is split in two clusters as well, \(\mathbf {c}_{SAR}^{(a)}\) and \(\mathbf {c}_{SAR}^{(b)}\). This denotes that a change occurred in the time lapse \(t_2-t_1\). In Fig. 2(b) instead, we can see the region that in the stacked image corresponds to two clusters \(\mathbf {c}_{st}^{(1a)}\) and \(\mathbf {c}_{st}^{(2a)}\), merges into a single cluster \(\mathbf {c}_{SAR}^{(a)}\) in the SAR image at time \(t_2\). This indicates another type of change from \(t_1\), where the region is characterized by two clusters \(\mathbf {c}_{opt}^{(1)}\) and \(\mathbf {c}_{opt}^{(2)}\) in the optical image.

4 Experiments and Results

In this section the proposed approach is applied, showing the potential and limitations of ensemble clustering and of an analysis of splits and merges. Consider the toy example in which two distinct ground cover classes are arranged as vertical stripes at time \(t_1\) (Fig. 3(a)). At time \(t_2\), after the occurrence of a change event, they are arranged as horizontal stripes (Fig. 3(b)). This case encompasses all the possible transitions between the two classes: (i) class 1 is unchanged (ii) Transition from class 2 to class 1 (iii) transition from class 1 to class 2 (iv) class 2 is unchanged.

The image in Fig. 3(a) emulates an optical acquisition with a single spectral channel. It is a field of uncorrelated Gaussian variables with mean \(\mu _{1} = 8\) for the left stripe, mean \(\mu _{2} = 12\) for the right stripe, and standard deviation \(\sigma _{1} = 1\) for both. The image in Fig. 3(b) is a plausible SAR acquisition generated as a field of uncorrelated gamma variables with mean \(\mu _{1} = 8\) for the upper stripe and mean \(\mu _{2} = 12\) for the lower stripe. The number of looks \(L_{1} = L_{2} = 9\) for both clusters.

Fig. 3.
figure 3

Simulated optical (a) and SAR (b) acquisitions, before and after the event. Four possible combinations between the two classes (c). Common clustering result on the stacked data (d).

The proposed method is applied on the stacked data, to check if it reveals the same configuration as the one shown in Fig. 3(c). Figure 3(d) shows a common output obtained on this toy data set. For each corner of Fig. 3(d), most of the pixels are clustered together, whereas the ones clustered differently count as errors. The accuracy is calculated as \(1 - \frac{{\#\;of errors}}{\#\;of pixels}\). Fifty experiment were done for each of the nine combinations of noise strength. The average of the accuracy is presented in Table 1.

Table 1. Accuracy of the clustering result of the stacked data for different levels of noise in the two images.

Despite of the correlation with the state of the art, it is not possible to compare directly these results with the one found in the literature: first of all, this problem is faced in a different manner and in a different framework. Most importantly, the presented method is completely unsupervised, for which there is no similar work for comparison. However, what can be said is that the proposed approach is able to detect the behaviour of splits and merges between the clusters, when the noise remains under a reasonable level (e.g. \(\sigma =1\), \(L=9\)).

Fig. 4.
figure 4

Gloucester before – optical image (a) – and after a flooding the event – SAR image (b). In (c), the ground truth of the change.

The method is also applied to real satellite acquisitions. The images in Fig. 4 represent the countryside at the periphery of Gloucester, Gloucestershire, United Kingdom, before and after a flood. Since the speckle noise affecting the latter was too strong, a 7-by-7 enhanced Lee filter [24] was applied to attenuate the noise, while preserving the details contained in heterogeneous areas. The analysis is carried out on the presented images by dividing them into smaller and non-overlapping windows of \(50 \times 50\) pixels, and then by looking for changes inside them separately. In this way, it can be reasonably thought that pixels can be grouped into a limited number of clusters, making the clustering process easier and more accurate regardless of the spatial nonstationarity of the image data. Processing smaller windows also reduces the computational cost, which scales quadratically with the windows size and the number of clusters. The FCM algorithm has been iterated 20 times, drawing a different number of clusters each time from the uniform probability mass function U [4, 7].

4.1 First Experiment

The region selected for the first experiment is shown in Fig. 5. It contains some agricultural fields and a river in the lower part of it. As seen in Fig. 5(d), clusters relative to different parts of the image are very well separated.

Fig. 5.
figure 5

First experiment: (a) window of the optical image, (b) window of the SAR image, (c) window of the ground truth mask, (d) clustering result on image a, (e) clustering result on image b, (f) clustering result on the stacked image.

Concerning the SAR acquisition, from Fig. 5(b) we observe that the flooded area covers the majority of the window. Such area is correctly identified by the large black cluster in Fig. 5(e). Comparing the three clustering results, two clear examples of clusters merging and clusters splitting are spotted. The big cluster representing some fields, that from the upper part of the optical image goes down to the right, has split into two different clusters in the SAR image (the light grey one and the black one), and this is highlighted by the presence of the light grey and the dark grey clusters in Fig. 5(f). Then, the dominant cluster of Fig. 5(e) is the result of the merging of some clusters of the optical image, i.e. the white cluster (the river), the dark grey cluster (some fields close to the river), a good percentage of the black cluster (the boundaries around the river and the fields) and one part of the above mentioned big cluster which has split. All these clusters are visible in the result obtained with the stacked data, and they are respectively: the dark grey cluster (the river), the grey cluster close to it (the fields close to the river and the boundaries) and the light grey cluster (the part of the splitting).

4.2 Second Experiment

The region selected for the second experiment is displayed in Fig. 6. In this case, the different areas are not well separated (Fig. 6(d)), especially in the center and in the lower right corner of the window, mainly because these parts of the image present miscellaneous ground covers. For example, some of the central pixels in Fig. 6(a) look darker, so the clustering algorithm erroneously cluster them together with the ones belonging to the river, as it happened in the first experiment. Instead, the bare soil field presents some brighter pixels close to the river and some darker pixels far from it, and these two groups are divided. Moving on to the image in Fig. 6(b), it can be seen how it looks still noisy and muddled, even after being filtered. Consequently, the clustering in Fig. 6(e) does not yields the same quality of the first experiment. Making a comparison with Fig. 6(c), more accurate delineation of changed areas would have been emphasized if the grey and black classes were grouped together and, most importantly, some of the agricultural fields in the lower right corner were grouped differently. But this is not a fault of the ensemble clustering, as these last areas are very similar to the flooded portion of the region, due to the characteristics of the specific kind of field and its SAR signature. Recognizing the flooded area in Fig. 6(b) by visual inspection and without prior knowledge, is also very difficult. It is worth noting that the available ground truth itself is only partially accurate, because for example the sharp edge on the right side is unlikely, but it still gives an idea about the location of the affected areas.

Fig. 6.
figure 6

Second experiment: (a) window of the optical image, (b) window of the SAR image, (c) window of the ground truth mask, (d) clustering result on image a, (e) clustering result on image b, (f) clustering result on the stacked image.

The quality of the partitioning in Fig. 6(f) is heavily influenced by the speckle noise, which is a fundamental issue in the field of SAR data analysis. Under these conditions, it is not trivial to recognise splits and, most of all, merges, due to the amount of noise in the SAR image at time \(t_2\). This case study highlights that an approach for change detection from an optical and a SAR image based on cluster splits and merges is limited by the clustering results. These latter are affected, in turn, by the characteristics of the input data (noise ratio, contrast, etc.), by the adopted clustering algorithm, and by the selection of its hyperparameters.

5 Conclusions and Future Works

In this paper, we studied the challenging problem of change detection in multitemporal and heterogeneous images, by means of a cluster-based techniques. Our study focused on the case of image pairs, relative to the same area, captured by heterogeneous sensors at different times. We evaluated to which extent a completely unsupervised approach could be successful in addressing change detection from heterogeneous image sources. The proposed idea is that changes on the ground can be related to clusters of the image at time \(t_1\) splitting and/or merging into the clusters of the image at time \(t_2\). The possibility to model SAR intensity as log-normally distributed and optical data as Gaussian allowed us to apply a multivariate Gaussian model in the joint domain of the optical channels and of the log-transformed SAR data. This allowed us to also apply the clustering algorithm on a stack of the images, to improve the chances of identify splits and merges. Experimental results were obtained on a toy example and on real satellite heterogeneous images. These experiments confirmed the potential of the clustering approach with respect to the problem of change detection from heterogeneous sources and suggested the effectiveness of the ensemble clustering approach. However, also the limitations were highlighted. In particular, the relationship between cluster splits/merges and changed/unchanged areas does not always hold. This limitation can be addressed if prior, ancillary, or application-specific information is used to constrain the relationship between cluster splits/merges and changed/unchanged areas. For example, possible improvements might result from: (i) providing some a priori information to the system, such as the most probable changing parts according to their position (ii) introducing hypothesis that the changing parts are the majority or the minority of the image, according to the particular application (iii) to indicate the particular class that represents the sought changed areas, e.g. water for floods, bare soil for forest res, etc. The detection of cluster splits and merges carried out in this work is based on visual inspection and human interpretation, but it could be automated. A possible solution would be to overlap the mask of each cluster from the image at time \(t_1\) to the stacked image, in order to identify the areas where splits occur. Accordingly, this procedure could be applied to the clusters of the image at time \(t_2\) to recognise merges. Once splits and merges are identified, one could rely on prior information (if available) to improve the accuracy of change detection. For example, if the location of the river in the image is provided, one could focus the search for flooded areas with in the clusters close to its position. Alternatively, one could leverage on the statistical characteristics of water in SAR images to identify the areas of interest. On one hand, this approach would provide an automatic tool to improve clusters interpretation and to identify relevant splits and merge, associated with changes of interest. On the other hand, the necessity of prior information confirms the extreme difficulty of performing automatic and unsupervised change detection in heterogeneous data.

A significant improvement is also expected if polarimetric SAR images are used instead of SAR acquisitions with only one polarisation. This is because they bring a lot more intrinsic information which would enhance the capability of clustering results to identify natural classes in feature spaces associated with SAR observations. Obviously, this would force to consider different and more complicated models and distance measures. The parallelisation of the proposed approach, which is favored by its window-based formulation and would benefit of current cluster or GPU-based architectures, represents another possible and interesting future development. Last, but not least, the research can be extended to the multitemporal case in which more than two images are considered, exploiting the proposed method for the analysis of long term trends such as deforestation, glacier dynamics, desertification, land use change and urban development.