Modelling urban networks using Variational Autoencoders
A long-standing question for urban and regional planners pertains to the ability to describe urban patterns quantitatively. Cities’ transport infrastructure, particularly street networks, provides an invaluable source of information about the urban patterns generated by peoples’ movements and their interactions. With the increasing availability of street network datasets and the advancements in deep learning methods, we are presented with an unprecedented opportunity to push the frontiers of urban modelling towards more data-driven and accurate models of urban forms.In this study, we present our initial work on applying deep generative models to urban street network data to create spatially explicit urban models. We based our work on Variational Autoencoders (VAEs) which are deep generative models that have recently gained their popularity due to the ability to generate realistic images. Initial results show that VAEs are capable of capturing key high-level urban network metrics using low-dimensional vectors and generating new urban forms of complexity matching the cities captured in the street network data.
KeywordsVariational autoencoders Urban modelling Street networks
Temporal and spatial patterns of human interactions shape our cities making them unique, but, at the same time, create universal processes that make urban structures comparable to each other. A long-standing effort of urban studies focuses on the creation of quantitative models of the spatial forms of cities that would capture their essential characteristics and enable data-driven comparisons. There have been several attempts at studying urban forms using quantitative methods, typically based on complexity theory or network science (Arcaute et al. 2016; Barthélemy and Flammini 2008; Murcio et al. 2015; Buhl et al. 2006; Cardillo et al. 2006; Masucci et al. 2009; Strano et al. 2013). The approaches create an abstract representation of an urban form to derive its key quantitative characteristics. Although theoretically robust, the abstractions might often be too simplistic to capture the full breadth and complexity of existing urban structures.
With the increasing availability of urban street network data and the advancements in deep learning methods, we are presented with an unprecedented opportunity to push the frontiers of urban modelling towards more data-driven and accurate urban models. Street networks are a ubiquitous element at every urban area and a robust proxy for population density, jobs and housing accessibility and environmental features (Zhao et al. 2016; Levinson 2012; Boeing 2018; Peponis et al. 2007). Also, street networks are often part of a superimposed pattern developed by local and regional governments. In that sense, this paper could provide urban planners with the capabilities of creating not one, but thousands of street configurations, where different actors can test a variety of urban scenarios.
In this study, we present our initial work on applying deep generative models to urban street network data to create spatially explicit models of urban networks. We based our work on Variational Autoencoders (VAEs) trained on images of street networks. VAEs are deep generative models that have recently gained their popularity due to the ability to generate realistic images. VAEs have two fundamental qualities that make them particularly suitable for urban modelling. Firstly, they can condense high dimensional images of urban street networks to a low-dimensional representation which enables quantitative comparisons between urban forms without any prior assumptions. Secondly, VAEs can generate new realistic urban forms that capture the diversity of existing cities. In this work, we use image representation of street networks since images encode both topological and spatial network information. Street network images could be parsed to graphs, if desired, using road parsing algorithms (Li et al. 2018; Chu et al. 2019; Máttyus et al. 2017).
In the following sections, we show our experiments based on urban street networks from Open Street Map (OSM). The results indicate that VAE trained on the OSM data is capable of capturing critical high-level urban metrics using low-dimensional vectors. The model can also generate new urban forms of structure matching the cities captured in the OSM dataset. All code and experiments for this study are available at https://github.com/kirakowalska/vae-urban-network.
Methodology and dataset
A variational autoencoder consists of an encoder, a decoder, and a loss function. The encoder is a neural network. Its input is a datapoint x, its output is a hidden representation z, and it has weights and biases θ. The goal of the encoder is to ’encode’ the data into a latent (hidden) representation space z, which has much fewer dimensions that the data. This is typically referred to as a ’bottleneck’ because the encoder must learn an efficient compression of the data into this lower-dimensional space. The encoder is denoted by qϕ(z|x).
The decoder is another neural network. Its input is the representation z, it outputs a data point x, and has weights and biases ϕ. The decoder is denoted by pϕ(x|z). The decoder ’decodes’ the low-dimensional latent representation z into the datapoint x. Information is lost in the process because the decoder translates from a smaller to a larger dimensionality. How much information is lost? The information loss is measured using the reconstruction log-likelihood logpϕ(x|z). The measure indicates how effectively the decoder has learned to reconstruct an input image x given its latent representation z.
The first term is the reconstruction loss or expected negative log-likelihood of the i-th data point. This term encourages the decoder to learn to reconstruct the data. Poor reconstruction of the data x from its latent representation z will incur a large cost in this loss term. The second term is a regularizer that we introduce to ensure that the distribution of the latent values z approaches the prior distribution p(z) specified as a Normal distribution with mean zero and variance one. The regularizer is the Kullback-Leibler divergence between the encoder’s distribution qθ(z|x) and p(z). It measures how close q is to p. The regularizer ensures that the representations z of each data point are sufficiently diverse and distributed approximately according to a normal distribution, from which we can easily sample.
The variational autoencoder is trained using gradient descent to optimize the loss with respect to the parameters of the encoder and decoder θ and ϕ.
Street network data
Urban networks comparison
The trained autoencoder learnt mapping from the space of street network images (64 × 64 or 4,096 dimensions) to a lower dimensional latent space (32 dimensions). The latent representation stores all the information required to reconstruct the original image of the street network, so it is effectively a condensed representation of the street network that preserves all its connectivity and spatial information. In the lack of well-defined similarity metrics of urban networks, this paper uses the condensed representations as vectors of street network features. Hereafter, we call the vectors urban network vectors. Urban network vectors can be used to measure the similarity between different street network forms and to perform further similarity analysis, such as clustering.
Average network metrics of urban street networks in the three clusters in Fig. 6a
Number of nodes
Number of edges
Average node degree
Total edge length
Average edge length
Urban networks generation
In “Urban networks comparison” section, we used the autoencoder to compress real street images to low-dimensional vectors which we then used to make quantitative comparisons. This employed one strength of variational autoencoders: the ability to encode high-dimensional observations as meaningful low-dimensional representations. The second strength pertains to the ability to generate realistic urban street forms that match the complexity of urban forms across the globe. The ability could potentially advance the current state-of-the-art in simulations of urban forms and socio-economic processes taking place on urban networks.
Discussion and conclusions
This study is an early exploration of how modern generative machine learning models such as variational autoencoders could augment our ability to model urban forms. With the ability to extract key urban features from high-dimensional urban imagery, variational autoencoders open new avenues to integrating high-dimensional data streams in urban modelling. The study considered images of street networks, but the proposed methodology could be equally applied to other image data, such as urban satellite imagery.
Variational autoencoders were selected among deep generative models (Moosavi 2017; Albert et al. 2018) due to their two capabilities: firstly to condense images to low-dimensional representations, secondly to generate new previously unseen images that match the complexity of observed images. The first capability enabled us to extract key urban metrics from street network images, the second gave us the power to generate realistic images of previously unseen urban networks.
Our results, based on 12,479 city images across the globe, showed that VAEs successfully condensed urban images into low-dimensional urban network vectors. This enabled quantitative similarity analysis between urban forms, such as clustering. What is more, VAEs managed to generate new urban forms with complexity matching that of the observed data. Unfortunately, the resolution of the generated images was low which was accredited to the small size of the dataset. Future work will repeat model training on a much larger corpus of images to improve the generative quality. Moreover, further work will fine tune the generative quality by investigating the impact of the size of the latent space (currently fixed to 32 dimensions) and the training objective used (e.g. Wasserstein distance instead of KL divergence).
Despite the promising results, the study opens essential questions for future work. The first question pertains to the black-box nature of deep learning models that lack comprehensive human interpretability. This limitation is already receiving much attention in the deep learning literature (Ribeiro et al. 2016; Shrikumar et al. 2017; Lundberg and Lee 2017). In this study, the limitation manifests itself in our lack of understanding of how latent space representations of urban networks relate to established network metrics (Newman 2010). A related question refers to the ability to evaluate the quality of model outputs, i.e. latent representations and synthetic images. Again, quality assessment of deep generative models is a hot topic in the broader deep learning research community (see for example Wu et al. (2017)).Future work could address the problem from the perspective of urban network science. Finally, before this type of generative models could be part of any urban planning cycle, we need to reflect how we might develop these tools further through designing a structured set of experiments that include, for example, population densities or environmental features.
The authors would like to thank Szymon Zareba and Adam Gonczarek (Alphamoon Ltd) for advice on deep generative models during the course of the project.
KK designed and implemented the methodology, executed the computer runs, and wrote the initial version of the article. RM prepared street network data and extensively revised the article. Both authors read and approved the final manuscript.
KK is a lecturer in geospatial machine learning at the Bartlett’s Centre for Advanced Spatial Analysis, University College London, UK and a machine learning researcher at Alphamoon, PL. She develops machine learning algorithms for urban modelling and sensor data mining. Her research interests include geospatial data mining, sensor data fusion and machine learning for sensor networks.
RM is a senior research fellow at the Bartlett’s Centre for Advanced Spatial Analysis, University College London, UK. His academic interests include urban complex networks, information transfer in social systems, spatial interaction models and pedestrian flows. One of his main research topics is the application of multifractal measures to different urban aspects, such as street networks and social inequality.
There is no specific funding received for the study.
The authors declare that they have no competing interests.
- Albert, A, Strano E, Kaur J, González M (2018) Modeling urbanization patterns with generative adversarial networks In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, 2095–2098.. IEEE.Google Scholar
- Boeing, G (2018) A multi-scale analysis of 27,000 urban street networks: Every US city, town, urbanized area, and Zillow neighborhood. Environment and Planning B: Urban Analytics and City Science:2399808318784595.Google Scholar
- Chu, H, Li D, Acuna D, Kar A, Shugrina M, Wei X, Liu M-Y, Torralba A, Fidler S (2019) Neural turtle graphics for modeling city road layouts In: Proceedings of the IEEE International Conference on Computer Vision, 4522–4530.Google Scholar
- Dangeti, P (2017) Statistics for Machine Learning. Packt Publishing Ltd, Birmingham.Google Scholar
- Kingma, DP, Welling M2014. Auto-encoding variational bayes.Google Scholar
- Krizhevsky, A, Sutskever I, Hinton GE (2014) Imagenet classification with deep convolutional neural networks In: Neural Information Processing Systems, 1097–1105.Google Scholar
- LeCun, Y, Bengio Y, et al. (1995) Convolutional networks for images, speech, and time series. Handb Brain Theory Neural Netw 3361(10):1995.Google Scholar
- LeCun, Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990) Handwritten digit recognition with a back-propagation network In: Adv Neural Inf Process Syst, 396–404.. NIPS.Google Scholar
- Li, Z, Wegner JD, Lucchi A (2018) Polymapper: Extracting city maps using polygons. arXiv preprint arXiv:1812.01497.Google Scholar
- Lundberg, SM, Lee S-I (2017) A unified approach to interpreting model predictions In: Advances in Neural Information Processing Systems, 4765–4774, NIPS.Google Scholar
- Máttyus, G, Luo W, Urtasun R (2017) Deeproadmapper: Extracting road topology from aerial images In: Proceedings of the IEEE International Conference on Computer Vision, 3438–3446.. IEEE.Google Scholar
- Moosavi, V (2017) Urban morphology meets deep learning: Exploring urban forms in one million cities, town and villages across the planet. arXiv preprint arXiv:1709.02939.Google Scholar
- Peponis, J, Allen D, French S, Scoppa M, Brown J (2007) Street connectivity and urban density In: 6th International Space Syntax Symposium, 1–12.. Citeseer, Istanbul.Google Scholar
- Ribeiro, MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the predictions of any classifier In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016.Google Scholar
- Shrikumar, A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences In: Proceedings of the 34th International Conference on Machine Learning.Google Scholar
- Witten, IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington.Google Scholar
- Wu, Y, Burda Y, Salakhutdinov R, Grosse R2017. On the quantitative analysis of decoder-based generative models.Google Scholar
- Zhao, F, Sun H, Wu J, Gao Z, Liu R (2016) Analysis of road network pattern considering population distribution and central business district. PloS ONE 11(3):0151676.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.