Urban Change Forecasting from Satellite Images

Metzger, Nando; Türkoglu, Mehmet Özgür; Daudt, Rodrigo Caye; Wegner, Jan Dirk; Schindler, Konrad

doi:10.1007/s41064-023-00258-8

1809 Accesses
3 Altmetric
Explore all metrics

Abstract

Forecasting where and when new buildings will emerge is a rather unexplored topic, but one that is very useful in many disciplines such as urban planning, agriculture, resource management, and even autonomous flying. In the present work, we present a method that accomplishes this task with a deep neural network and a custom pretraining procedure. In Stage 1, a U-Net backbone is pretrained within a Siamese network architecture that aims to solve a (building) change detection task. In Stage 2, the backbone is repurposed to forecast the emergence of new buildings based solely on one image acquired before its construction. Furthermore, we also present a model that forecasts the time range within which the change will occur. We validate our approach using the SpaceNet7 dataset, which covers an area of 960 km$^2$ at 24 points in time across 2 years. In our experiments, we found that our proposed pretraining method consistently outperforms the traditional pretraining using the ImageNet dataset. We also show that it is to some degree possible to predict in advance when building changes will occur.

Residential building type classification from street-view imagery with convolutional neural networks

Article 15 December 2023

Urban tree generator: spatio-temporal and generative deep learning for urban tree localization and modeling

Article 21 June 2022

Artificial Intelligence for Automatic Building Extraction from Urban Aerial Images

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Understanding the evolution of land use (e.g., constructions and changes of the land cover) is crucial in fields like urban planning, agriculture, natural resource management, anticipating housing market prices, and even autonomous driving and flying. For the latter, visual positioning systems that do not rely on GNSS are a concrete application example (Daedalean 2021). One of the critical components in a purely vision-based positioning system is an up-to-date map of the environment. Especially in emergencies, it is crucial to have a map with maximal confidence to find safe landing sites, and hence, regular map updating missions must be conducted, often involving resource-intensive survey flights. A system that is able to anticipate the changes could assist in indicating the locations of future survey flights and potentially save unnecessary flights in regions with no change.

Predicting urban transformations is a complex endeavor that requires advanced image understanding. Nonetheless, current research predominantly employs traditional non-data-driven methods or relies on pixel-wise Multi-layer perceptrons (MLPs). In response, we strive to bridge this research void by introducing a data-centric methodology, which enables the effective training of fully convolutional neural networks specifically designed to anticipate urban change.

In the present work, we aim to forecast where and when changes in building footprints are going to happen. Change forecasting is a binary segmentation problem with labels “change” and “no change”, where the forecasting range (i.e., the time span between the acquisition of the query image and the actual change) is defined implicitly, by selecting training samples with a fixed forecasting range. We use satellite images as the primary input data source because they provide global, uniform coverage. The SpaceNet7 dataset provides an adequate compromise that balances our requirements. The publicly available, annotated subset comprises 60 locations with a total extent of 960 km$^{2}$ and a ground sampling distance (GSD) of 4 m. The dataset consists of up to 24 image time-steps per location, with a temporal resolution of 1 month, which allows us to analyze the model’s performances with respect to different time range. We develop a 2-stage training strategy (as shown in Fig. 1) that is centered around a U-Net segmentation backbone (Ronneberger et al. 2015).

In Stage 1, we train a change detection network with a Siamese network layout based on a U-Net backbone, using as input pairs of satellite images of the same location, with different time stamps.
In Stage 2, we keep the backbone of Stage 1 and use it as a feature extractor for change forecasting. Moreover, we slightly adapt the architecture to also produce a time range forecasts, in which the model needs to anticipate within which time window the change will occur. We implement this task via a multi-task learning setup, where the model needs to predict an ordinal label that indicates when a change is expected to happen, in addition to the binary change label.

2 Related Work

The exploration of change detection is a well-explored task in remote sensing. Early approaches relied on hand-crafted workflows to detect changes in satellite images, while later, classical machine learning approaches were used to automatically classify the hand-crafted features (Singh 1989; Hussain et al. 2013; Le Saux and Randrianarivo 2013; Metzlaff 2015; Wessels et al. 2016). With the rise of deep learning techniques, researchers started to employ neural networks that are capable of learning features themselves (El Amin et al. 2017; Zhu 2017; Liu et al. 2020). Siamese Networks that share the same feature extractor across multiple images turned out to be a suitable inductive bias for several computer vision tasks, including stereo matching (Zagoruyko and Komodakis 2015) and object tracking (Bertinetto et al. 2016). But also in remote sensing, researchers employed Siamese feature extractors (Zhan et al. 2017), often as part of an end-to-end trainable network (Daudt et al. 2018b, a; Arabi et al. 2018). Recent studies have extended this concept to multi-task scenarios (Liu et al. 2019), network architectures based on attention mechanisms (Chen et al. 2020) and multi-scale features (Yang et al. 2021).

There have been a few attempts to predict future land cover with traditional machine learning. Iacono et al. (2012) make the rigid assumption that the land use/land cover (LULC) class is a discrete state depending solely on its previous state. In this way, they are able to apply Markov chains to model state changes over time. However, this relatively strong assumption may not always be valid and limits the use of auxiliary data. Land transformation models (Pijanowski et al. 2002; Tang et al. 2005; Newman et al. 2016; Pijanowski et al. 2020) are established methods for LULC class forecasting that allow the inclusion of additional social, political, and environmental drivers and process them via an MLP. Chu et al. (2010) used Markov Chains (MC) to forecast land use changes, while later, Nguyen et al. (2020) proposed an approach that employs satellite imagery as a driving variable to forecast LULC changes with MLP Markov Neural Networks.

To the best of our knowledge, there exists no published research about deep learning models for change forecasting, apart from one notable exception: (Rußwurm et al. 2020) employs a recurrent neural network (RNN) to model time series data and forecast low-resolution satellite observations—e.g., the MODIS NDVI—in an autoregressive manner. That method does not directly predict urban changes, rather it predicts future satellite observations that may, or may not, enable a subsequent change detection.

3 Methodology

The question of which pixels in a satellite image will change is ill-posed: change is only defined w.r.t. a finite time interval, but a single image does not delimit a time interval (contrary to conventional change detection, where the interval is the time between the two acquisition dates). Therefore, we must define a time horizon and pose the simpler and more meaningful question of whether or not a change will occur within a given fixed time frame. In a machine learning context this can be accomplished by training only on image pairs with the relevant time window, thus implicitly establishing the forecasting range. The limitation to a fixed forecasting range comes with a disadvantage, namely that one can only use a subset of the overall data for training that has the appropriate temporal spacing. In other words, one ignores much of the possibly available data. To still exploit all samples, we pretrain the backbone on a change detection task (Stage 1), where different time spans can be mixed. Subsequently, we finetune the pretrained backbone for the change forecasting tasks (Stage 2), as shown in Fig. 1 and described in the following subsections.

3.1 Stage 1: Change Detection

In this stage, we train a conventional Siamese network (Daudt et al. 2018a; Arabi et al. 2018; Yang et al. 2021) to detect changes in temporally ordered pairs of satellite images. The network follows the U-Net architecture with ResNet50 encoder (He et al. 2016), with shared weights in the two branches and output feature dimension 16 per branch. The feature maps of the two U-Net branches are concatenated and fed into a classification head with two hidden convolutional layers with $(3\times 3)$ kernels and depth 16, to obtain the final pixel-wise predictions. On the one hand, we can use all the available training pairs at this stage. On the other hand, this type of pretraining already allows the adaptation to the satellite image domain, in contrast to the traditional pretraining that is usually performed on the ImageNet domain. As a loss function, we use the binary cross-entropy loss.

3.2 Stage 2: Forecasting

For the Forecasting tasks, the pretrained U-Net backbone of Stage 1 is employed as the base. A new classification head with the same structure as for the change detection task is trained from scratch to perform binary change forecasting and time range forecasting, as described in the subsequent Sects. 3.2.1 and 3.2.2.

3.2.1 Change Forecasting

The objective of this task is to predict whether a change will occur within a specific, fixed forecasting range. To accomplish this, our model has a single output per pixel, namely a score $\hat{p}_f$ that indicates how likely it is that a building footprint change is expected to occur. To cover different forecasting ranges, we train nine separate classifiers for ranges of 1, 3, 6, 9, 12, 15, 18, 21, and 24 months, using in each case only the corresponding subset of the training data. We minimize the standard binary cross-entropy loss defined as

$$\begin{aligned} \mathcal {L}_{binary} = BCE(\hat{p}_f, y_c), \end{aligned}$$

(1)

where $y_c$ is the binary “change” / “no change” label that result from comparing the built-up masks of two-time stamps.

3.2.2 Time Range Forecasting

The objective of this model is to classify pixels as belonging to one of two categories: “early change” (i.e., changes that occur within 1–12 months), or “late change" (i.e., changes that occur within 12–24 months). We employ a time range forecasting model that has three logit outputs: $q_e$ for “early”, $q_l$ for “late”, and an auxiliary output $q_0$ for the “no change” class. We use the auxiliary output to obtain additional supervision signals as described below.

When directly applying multi-class cross-entropy to the problem of classifying pixels into the “no change”, “early change”, and “late change” categories, the “no change” class will typically dominate the learning process due to its much higher relative frequency. To address this issue, we split the problem into two sub-problems.

The first sub-problem is a binary decision between an “early change” (within 1–12 months) and a “late change" (within 12–24 months). The predicted score $\hat{p}_e$ is a measure of the likelihood that a given pixel will undergo an “early change”, and is trained with a cross-entropy loss w.r.t. the ground truth label $y_e$ (1 for early change, 0 for late change). Note that this loss is only calculated on pixels that exhibit a change as per the GT label.

$$\begin{aligned} \hat{p}_e = \frac{\exp (q_e)}{\exp (q_e) + \exp (q_l)} \end{aligned}$$

(2)

$$\begin{aligned} \mathcal {L}_{time} = BCE(\hat{p}_e, y_e). \end{aligned}$$

(3)

The second sub-problem is the 24 months version of the change forecasting task described above in Sect. 3.2.1, since it aims to classify whether a change within 24 months will occur. The predicted score $\hat{p}_c$ measures the likelihood that a given pixel undergoes a change at all within the maximum time interval. The loss is again a standard cross-entropy between the prediction and the binary “change” / “no change” label $y_c$.

$$\begin{aligned} \hat{p}_c = \frac{\exp (q_e+q_l)}{\exp (q_e+q_l) + \exp (q_0)} \end{aligned}$$

(4)

$$\begin{aligned} \mathcal {L}_{binary} = BCE(\hat{p}_c, y_c). \end{aligned}$$

(5)

Finally, we merge the two losses with the mixing weight $\lambda$ to obtain the overall loss for the model. Empirically, $\lambda \approx 10^3$ is a suitable value.

$$\begin{aligned} \mathcal {L} = \mathcal {L}_{time} + \lambda \mathcal {L}_{binary} \end{aligned}$$

(6)

Splitting into two sub-problems with separate losses makes the optimization problem more well-defined. Dividing the task into two sub-problems each with its own losses, results in a better formulated optimization problem. The time range forecasting term can only be calculated on a small portion of the available pixels (the ones that indeed exhibit a change as per the GT), but it provides an almost perfect class balance between “early” and “late” changes. On the other hand, the loss provided by the binary change label is calculated from the complete set of pixels but suffers from a class imbalance issue. Overall, the combination of the loss terms improves the performance of the resulting model.

3.3 Label Imbalance

Change detection, and by extension also change forecasting, typically suffers from severe label imbalance. We thus employ two balancing strategies, in both stages of the training workflow.

First, we oversample change examples. Image patches i are sampled with probability

$$\begin{aligned} p_{i} = \frac{a + N_{i}}{\sum ^M_k a + N_{k}}, \end{aligned}$$

(7)

where $N_i$ is the number of changed pixels per patch i; M is the number of samples; and a is a distribution smoothing constant, set to $a \approx 50$, a value that was found empirically.

Second, we noticed that thresholding the scores at 0.5 (respectively at 0.33 for the 3-class time range forecasting) results in a poor precision-recall trade-off with a heavy bias towards the majority class “no change” (we further elaborate on this effect in Sect. 5). To counteract the bias, we determine the threshold in a data-driven manner: separately for every training batch, we find the threshold that maximizes the F1 score. To reduce the effect of stochasticity in the training mini-batches, we compute the final threshold value as a moving average over the last 500 training batches. Empirically, the approximation is very good: the discrepancy between the threshold estimated in this manner and the oracle threshold determined from the test set is vanishingly small.

3.4 Implementation Details

Our method is implemented in PyTorch (Paszke et al. 2019) and is publicly available.^{Footnote 1} To train the model we use the Adam optimizer (Kingma and Ba 2014) with default parameters and a base learning rate of $10^{-4}$. We augment the samples by random cropping, small affine transformations, mirroring, and color jittering. We use a batch size of 16, except for the largest forecasting ranges of 21 and 24, where we found batch size 4 to perform best, likely as a consequence of the small size of the corresponding data subset.

For all models in Stage 2, we first freeze the backbone and train the head for 5000 batch iterations, then we reduce the learning rate by a factor of 10 to $10^{-5}$ and train the entire model end-to-end.

4 Data and Experimental Setup

To validate our method, we make use of the SpaceNet7 dataset. It has been published by Van Etten et al. (2021) and was created for the building tracking competition featured at NeurIPS 2020. The dataset consists of 60 globally distributed labeled locations (see Fig. 2), each containing a series of 24 Planet Labs RGB satellite image mosaics of 4×4 km², with consecutive mosaics, acquired one month apart. The ground sampling distance of the images is 4 m, and the total covered area is 960 km². The dataset also contains a set of manually labeled building footprints, where each image of the time series is labeled individually.

For this work, we derive a dataset that considers image pairs from the same locations but at different time steps and obtain change masks by subtracting the corresponding built-up area masks from each other. We note that, for simplicity, we have omitted cases where buildings have been removed. The reason is that our goal was a dataset that showcases urban development, as opposed to the opposite scenario where buildings are removed for good. The latter is far less common and involves completely different visual cues. During visual inspection, we found that many of the apparent destruction labels were actually caused by misalignment errors between the manually digitized footprints at different times, rather than actual building destruction. Using this approach, we obtain about 16,000 unique image pairs, which we further split into 264,000 non-overlapping pairs of training patches of size 224$\times$224 pixels. However, for the task of change forecasting, one implicitly defines the forecasting range by the choice of training sub-datasets that have a consistent forecasting range. For example, when we want to forecast for the smallest possible range (i.e., one month), the subset of pairs that are one month apart contains 5000 samples, whereas, for the largest forecasting range of 24 months, we only obtain 200 samples. Moreover, the dataset also exhibits a severe label imbalance, which is quite common for change detection datasets (Daudt et al. 2019). The average fraction of changed pixels across all samples amounts to 0.3%, and only one in seven patches have >0.5% positive labels. Figure 3 summarizes the size and imbalance of the sub-datasets.

To ensure a representative and balanced evaluation, we split the dataset stratified with respect to the individual continents, resulting in a training set with 42 locations (70%), a validation set for parameter tuning with 6 locations (10%), and a test set to calculate the final metrics with 12 locations (20%). The exact division of the geographical areas is illustrated in Fig. 2

5 Results and Discussion

5.1 Change Forecasting

In the comparison of Fig. 4, we plot the F1 scores w.r.t. the forecasting range of our method [denoted as “Forecasting (ours)”] and compare it to a baseline pretrained in single-image mode on ImageNet [“Forecasting (ImageNet)”], rather than on the satellite image change detection task. Moreover, we also provide the change detection performance trained from scratch [“Detection (vanilla)”] and pretrained on ImageNet [“Detection (ImageNet)”], as an upper bound for what the forecasting model can be expected to achieve: it would be unreasonable to expect the model to perform better in forecasting changes using a single image than in detecting changes when a second image is also available. Note that there is only one change detection model for all ranges, while there are separately fine-tuned change forecasting models for different prediction ranges.

Our model exhibits a consistent gain of 2–3 percentage points over the baseline approach. The advantage is most pronounced for the 1 month range, where our model outperforms the baseline with 8.0 vs. 1.1%. This is a particularly challenging forecasting range for learned models because it suffers from the most severe imbalance in the labels: there are very few positive pixels, as within 1 month only a few construction projects proceed to a point where a building is clearly recognizable. It appears that a better initialization of the network weights yields higher robustness against such extreme label distributions, such that forecasting performance actually matches the change detection model. With the increasing prediction range, the changes get more substantial, meaning that the detection task gets easier, whereas forecasting in the absence of a second image remains equally hard. Consequently, the change detector benefits from less imbalanced labels and significantly improves as the temporal range grows, while the performance of change forecasting remains at a respectable 10% across most of the range, increasing to 15% for the 24 months range.

In Fig. 5, we show the precision and recall scores that correspond to the F1 scores previously discussed. The trade-off between precision and recall for each interval is influenced by the nature of individual sub-datasets and the classification threshold. We hypothesize that the observed trend of higher precision for the first 12 months and higher recall for the second 12 months may be attributed to the reduced class imbalance in longer forecasting ranges. Under the assumption that the models focuses to a significant degree on detecting construction sites, it seems reasonable that the recall would gradually increase. This is because construction sites foreshadow urban changes. With increasing forecasting range these sites are more likely to reach a point where new buildings have been erected, leading to more positive labels and thus higher recall.

Moreover, we provide the breakdown of the precision-recall trade-off curve in Fig. 6 for the classifier fine-tuned to the 24 months prediction range. The curve exhibits a bias toward favoring recall over precision—meaning, a significant amount of recall must be relinquished to see an increase in precision larger than 20%. This may indicate that in some cases the image evidence is sufficient to anticipate an imminent change, but not to localize it. Intuitively, this seems to make sense, as grading and earthworks in early stages do reveal the intention to construct, but not the location of the individual buildings within the plot.

Figure 7 illustrates successful predictions by our model. The model tends to identify the rough location of future changes correctly, mostly on the basis of detecting construction sites. The shape of the model predictions generally does not align exactly with the ground truth. This is not surprising given the inherent ambiguity of the task—from the early earthworks and preparations it is not possible to determine the precise outlines of the future buildings. Moreover, CNNs by construction tend to produce blurry outputs in the presence of uncertainty. Fortunately, knowing the location and the approximate extent of the land cover change is sufficient for many downstream tasks.

In Fig. 8 we display typical failure cases, which further help to understand which visual cues the model relies on for its predictions. It is apparent that the model anticipates new buildings at early-stage construction sites, but it seems to also have acquired a rudimentary understanding of urban development and sprawl, as it tends to predict the construction of new buildings in cluttered or empty wastelands that lie in the vicinity of existing buildings.

5.2 Time Range Forecasting

To isolate the time range prediction from the binary change forecasting, we restrict the following evaluation to pixels that do truth labels. In Table 1, we present the direct comparison of our pretraining method and the standard ImageNet pretraining in terms of accuracy (Acc) and average F1 score (aF1). The results show that our custom pretraining approach indeed improves performance by 3% in both accuracy and F1 score, further supporting the effectiveness of our proposed methodology.

We present the confusion matrix for time range forecasting in Fig. 9. It shows that with our setup it is in principle also possible to classify future change events into a group that will happen sooner and another one that will only happen later. We note that the “early” class exhibits a precision of 60.0%, while the precision of the “late” class amounts to 68.2%, suggesting that the later changes might be easier to detect than the earlier ones. Table 2 presents a comparison of our approach to models specifically trained for time range forecasting and change forecasting, respectively. For the time range forecasting, we use the same evaluation procedure as before, e.g. restricting the evaluation to pixels that do exhibit a change according to the ground truth labels. Moreover, we provide the accuracy and average F1 score over the “early” and “late” changes. Additionally, we display the F1 score, precision (Pre), and recall (Rec) metrics for the “change” class from the change forecasting task. The results indicate that our multi-task approach is beneficial for the time range task, but not for the binary change forecasting, as it trades off recall for precision.

Table 1 Comparison of our pretraining approach against the ImageNet pretraining approach on the time range forecasting task

Full size table

Additionally, we present an empirical examination of the mixing weight parameter $\lambda$ in Table 3. The analysis shows that the model performs best at 1000, but breaks for values <10 and >1000.

Table 2 Combining both loss functions boosts the time range forecasting task, while it deteriorates the performance in the change forecasting task

Full size table

Table 3 Analysis of the mixing weight parameter $\lambda$

Full size table

6 Conclusion

The main contribution of this work is a model to forecast where new buildings are likely to appear in the near future. Our goal has been to present a contribution to this little-explored topic in the light of modern deep learning technology. Besides setting a first baseline for change forecasting with deep convolutional networks, we have designed a 2-stage transfer learning procedure that employs change detection from paired images as a proxy task for learning features tailored to the analysis of high-resolution satellite images of urban and periurban regions. We have shown that such a pretraining improves change forecasting across a range of time horizons and that it is particularly helpful for a short horizon of 1 month, where the imbalance between unchanged and changed areas is particularly extreme. Moreover, we have also shown that it is possible, to some degree, to forecast how far into the future a change is going to happen.

Clearly, the presented approach does not perfectly solve the problem, mainly due to the fact that forecasting a future construction event from a single image is an ill-posed and very challenging problem. On the one hand, while there obviously are visual cues for future construction activity, none of these cues are guaranteed and unambiguous. For instance, earthworks may point at future construction, but they can take place for other reasons, e.g., landscaping. Furthermore, if a new estate will be constructed, there may already be signs like access roads or earthworks 2 years before, but there could also still be grassland or even agricultural fields, etc. Besides these conceptual limits, there are also technical challenges like the difficulty to obtain a large enough foreground set of comparatively infrequent events, such as new construction and the associated imbalance of the available labels.

We consider our work as a first attempt and hope that it may encourage further research and development toward more powerful and sophisticated forecasting methods. While we have tried to pioneer the use of deep learning for this type of forecasting, our standard convolutional design only scratches the surface of what is possible. Especially if multiple pre-event images are available, it would seem natural to explicitly model the temporal evolution of urban development—for instance with recurrent or attention-based architectures—to better exploit the temporal characteristics of the data. One way to overcome the scarcity of data may also be to synthetically introduce changes in images if one manages to bridge the domain gap between real and synthetic examples. Going for longer-term forecasts significantly beyond the next 2 years is a formidable challenge in terms of both data and methods, but would open up opportunities for a whole new set of applications such as long-term infrastructure planning tasks.

Data Availability

The SpaceNet7 dataset is publicly available (Van Etten et al. 2021).

Code Availability

The code is available at https://github.com/nandometzger/ChangePrediction.

Notes

https://github.com/nandometzger/ChangePrediction.

References

Arabi MEA, Karoui MS, Djerriri K (2018) Optical remote sensing change detection through deep siamese network. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, IEEE, pp 5041–5044
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, Springer, pp 850–865
Chen J, Yuan Z, Peng J, Chen L, Huang H, Zhu J, Liu Y, Li H (2020) Dasnet: dual attentive fully convolutional Siamese networks for change detection in high-resolution satellite images. IEEE J Select Topics Appl Earth Observ Remote Sens 14:1194–1206
Article Google Scholar
Chu HJ, Lin YP, Wu CF (2010) Forecasting space-time land use change in the paochiao watershed of taiwan using demand estimation and empirical simulation approaches. In: International Conference on Computational Science and Its Applications, Springer, pp 116–130
Daedalean A (2021) Visual positioning system (vps): What’s under the hood. https://daedalean.ai/tpost/zkhc162ju1-visual-positioning-system-vps-whats-unde
Daudt RC, Le Saux B, Boulch A (2018a) Fully convolutional siamese networks for change detection. In: 2018 25th IEEE International Conference on Image Processing (ICIP), IEEE, pp 4063–4067
Daudt RC, Le Saux B, Boulch A, Gousseau Y (2018b) Urban change detection for multispectral earth observation using convolutional neural networks. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, IEEE, pp 2115–2118
Daudt RC, Le Saux B, Boulch A, Gousseau Y (2019) Multitask learning for large-scale semantic change detection. Comput Vis Image Underst 187:102783
Article Google Scholar
El Amin AM, Liu Q, Wang Y (2017) Zoom out cnns features for optical remote sensing change detection. 2017 2nd International Conference on Image. Vision and Computing (ICIVC), IEEE, pp 812–817
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hussain M, Chen D, Cheng A, Wei H, Stanley D (2013) Change detection from remotely sensed images: from pixel-based to object-based approaches. ISPRS J Photogramm Remote Sens 80:91–106
Article Google Scholar
Iacono M et al (2012) Markov chain model of land use change in the twin cities, 1958–2005, article in tema-journal of land use, mobility and environment$\cdot$ january 2012. Tech. rep. 1970–9870. https://doi.org/10.6092
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Le Saux B, Randrianarivo H (2013) Urban change detection in sar images by interactive learning. In: 2013 IEEE International Geoscience and Remote Sensing Symposium-IGARSS, IEEE, pp 3990–3993
Liu X, Zhou Y, Zhao J, Yao R, Liu B, Zheng Y (2019) Siamese convolutional neural networks for remote sensing scene classification. IEEE Geosci Remote Sens Lett 16(8):1200–1204
Article Google Scholar
Liu Y, Pang C, Zhan Z, Zhang X, Yang X (2020) Building change detection for remote sensing images using a dual-task constrained deep Siamese convolutional network model. IEEE Geosci Remote Sens Lett. https://doi.org/10.1109/LGRS.2020.2988032
Article Google Scholar
Metzlaff L (2015) Region based building footprint extraction and change detection for urban areas. PhD thesis, Universität Augsburg
Newman G, Lee J, Berke P (2016) Using the land transformation model to forecast vacant land. J Land Use Sci 11(4):450–475
Article Google Scholar
Nguyen H, Pham T, Doan M, Tran P (2020) Land use/land cover change prediction using multi-temporal satellite imagery and multi-layer perceptron markov model. ISPRS International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences Volume XLIV-3/W1-2020:99–105
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, Curran Associates, Inc., pp 8024–8035, http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pijanowski BC, Brown DG, Shellito BA, Manik GA (2002) Using neural networks and GIS to forecast land use changes: a land transformation model. Comput Environ Urban Syst 26(6):553–575
Article Google Scholar
Pijanowski BC, Gage SH, Long DT, Cooper WE (2020) A land transformation model for the Saginaw bay watershed. Landscape ecology. CRC Press, pp 183–198
Chapter Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer, pp 234–241
Rußwurm M, Ali M, Zhu XX, Gal Y, Körner M (2020) Model and data uncertainty for satellite time series forecasting with deep recurrent models. In: IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, IEEE, pp 7025–7028
Singh A (1989) Review article digital change detection techniques using remotely-sensed data. Int J Remote Sens 10(6):989–1003
Article Google Scholar
Tang Z, Engel B, Pijanowski B, Lim K (2005) Forecasting land use change and its environmental impact at a watershed scale. J Environ Manag 76(1):35–45
Article Google Scholar
Van Etten A, Hogan D, Manso JM, Shermeyer J, Weir N, Lewis R (2021) The multi-temporal urban development spacenet dataset. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6398–6407
Wessels KJ, Van den Bergh F, Roy DP, Salmon BP, Steenkamp KC, MacAlister B, Swanepoel D, Jewitt D (2016) Rapid land cover map updates using change detection and robust random forest classifiers. Remote Sens 8(11):888
Article Google Scholar
Yang L, Chen Y, Song S, Li F, Huang G (2021) Deep Siamese networks based change detection with remote sensing images. Remote Sens 13(17):3394
Article Google Scholar
Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361
Zhan Y, Fu K, Yan M, Sun X, Wang H, Qiu X (2017) Change detection based on deep Siamese convolutional network for optical aerial images. IEEE Geosci Remote Sens Lett 14(10):1845–1849
Article Google Scholar
Zhu Z (2017) Change detection using landsat time series: a review of frequencies, preprocessing, algorithms, and applications. ISPRS J Photogramm Remote Sens 130:370–384
Article Google Scholar

Download references

Acknowledgements

Special thanks go to Stefano D’Aronco for his many valuable technical suggestions. Moreover, we thank Corentin Perret-Gentil and Philipp Krüsi for their practical input and Daedalean AI for the collaboration.

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich.

Author information

Authors and Affiliations

Photogrammetry and Remote Sensing, ETH Zürich, Zurich, Switzerland
Nando Metzger, Mehmet Özgür Türkoglu, Rodrigo Caye Daudt & Konrad Schindler
Institute for Computational Science, University of Zurich, Zurich, Switzerland
Jan Dirk Wegner

Authors

Nando Metzger
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet Özgür Türkoglu
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Caye Daudt
View author publications
You can also search for this author in PubMed Google Scholar
Jan Dirk Wegner
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Schindler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KS and JDW conceived the study. MÖT, RCD provided technical inputs. KS, RCD proofread the manuscript. NM implemented the methods, conducted the experiments and created the raw version of the manuscript.

Corresponding author

Correspondence to Nando Metzger.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Metzger, N., Türkoglu, M.Ö., Daudt, R.C. et al. Urban Change Forecasting from Satellite Images. PFG 91, 443–452 (2023). https://doi.org/10.1007/s41064-023-00258-8

Download citation

Received: 03 March 2023
Accepted: 14 September 2023
Published: 05 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s41064-023-00258-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Urban Change Forecasting from Satellite Images

Abstract

Similar content being viewed by others

Residential building type classification from street-view imagery with convolutional neural networks

Urban tree generator: spatio-temporal and generative deep learning for urban tree localization and modeling

Artificial Intelligence for Automatic Building Extraction from Urban Aerial Images

1 Introduction

2 Related Work