1 Introduction

Worldwide, 55% of the global population lives in urban areas and the present urban population is projected to increase from today’s 4 billion people to 6 billion by 2050 [48]. Mainly as a result of migration from rural areas, cities are growing in terms of inhabitants and urban area and form new residential areas outside or further away from the city core. However, the speed of urbanization presents challenges such as meeting the growing demand for transport infrastructure and affordable housing. Urban zones take different forms and characterizations and urban growth patterns differ amongst regions as a result of socio- economic, cultural, historical and environmental differences. As an example, in the US, people tend to live in low-density, single-family houses and commute by car to work. In Japan by contrast, high-rise residential buildings dominate and workers commute by public transportation (mostly rail-based) [2]. In order to identify the most promising city development policy, it is of primary interest to assess the relations between network infrastructure, socio-economic indicators and the transport system performance based on experiences from existing cities; the understanding how cities are shaped by setting the appropriate transport priorities can help to achieve terms of sustainable mobility objectives [36].

The relationship between transport infrastructure expansion and population growth, spatial expansion and land-use change has been highlighted in many works [1, 5, 46]. A tight relationship between transport and urban development has been shown in earlier works [34, 37]. The imbalance between travel demand and transport infrastructure supply as reason for the increase in congestion has been studied by Aljoufie et al. [1]. High congestion levels cause significant costs to society; it has been estimated that exposure to traffic congestion reduces welfare in the US by $557 million per year [17] and the estimation of congestion cost to UK economy is approximately £13 billion per year, in a forecast through 2030 increasing to £21 billion per year [44]. Congestions impede the proper functioning of more sustainable transport modes such as bus services or cycling; as a consequence, existing bus services could neither meet the growing transport demand, nor meet the demand of the cities’ economic development [45]. Due to these negative impacts, congestion levels are a good candidate as transport performance indicator. More specific relations between infrastructure expansion and various transport indicators have been found in the studies cited below.

The expansion of road network generally leads to lower population density in cities: Baum-Snow et al. [4] have shown that the integrated effects of ring roads and highways in Chinese cities gave rise to move 25% of central inhabitants to surrounding zones. The empirical estimates from Baum-Snow [3] show further that each highway expansion within an urban center of US metropolises causes an average 18% drop of inhabitants in the city center. An analysis in Wisconsin within 1980–1990 demonstrated that highway expansions caused population increase in suburban areas and booming the urban sprawl [13]. Similar results have been shown by analyses in California between 1980 and 1994 [12]. At the contrary, rail network expansion has been shown to increase population density at nearby urban rail stations or tracks in several studies, thereby strengthening compactness of urban areas [6, 28, 31].

The strong correlation between road infrastructure expansion and growth of vehicle ownership has been determined for 50 countries and 35 cities [26]. A positive relationship between highway expansion and car usage has been shown between 1982 and 2009 in the US [32]. A negative correlation between transit ridership and highways length has been found for the Montreal Region [15]. A sharp rise in car ownership in cities with low railway intensity and on the other hand a relatively slow rise of car ownership in cities with high railway intensity have been shown for six Asian megacities located in China, Japan and Thailand [27]. US cities with rail lines experienced larger declines in car usage than cities without rail infrastructure between 2000 and 2009 [25]. Similar modal shifts have been shown to exist in Europe: averaged over 14 LRT systems, approximately 11% of car drivers have changed to rail [24]. With growing concerns over traffic congestion and pollution from motorized vehicles, Dill and Carr [18] have indicated a positive correlation between bicycle usage and bicycle infrastructure expansion in 43 US cities based on data from Bureau of the Census. This finding has been confirmed and quantified based on a survey from 13 European cities [40].

In summary, an extension of the road network tends to decrease urban population density, decrease the effectiveness of road based public transport -- conditions for favoring an increase in car ownership. A consequence of these effects is a further increase in private road transport demand which is often cited as “induced demand” [30]. Rail and bike networks have been shown to achieve de-congesting effects.

The choice of suitable and relevant indicators for the analysis of transport policies is not obvious. Different definitions of “accessibility” have been used as indicators. Geurs and Van Eck [22] has described various components of “accessibility”: land-use, temporal, individual and transport. In an extensive review, Geurs and Wee [23] identified four types of possible accessibility measures: infrastructure-based, location-based, activity-based and utility-based accessibility.

Based on these findings and conditioned by the availability of accessible data, this study will use the length of transport infrastructure per person to quantify the amount of available transport infrastructure. This term is known as infrastructure accessibility [21]. The transport performance is quantified by congestion levels.

The aim of the present study is to shed more light on relations between transport-socio-economic indicators and transport performance indicators. The used data is thought to be comparable across all selected cities, allowing an absolute global evaluation of the transport performance indicator. With respect to previous studies, the number of comparable cities is larger and more recent. Concrete transport policies are addressed by answering this question: under which conditions do more railways and bicycle infrastructure reduce congestion levels?

The next section motivates the data collection for this work and explains the principle data processing steps. The analysis and results are presented and discussed in Analysis and results section, while the conclusions in Sec. 4 summarizes the main findings.

2 Data collection and processing

The general approach of this work is to collect, process, correlate and model publicly available and comparable data from a large number of cities around the world. In this section, all indicators are defined and the different data sources are described.

2.1 Socio-economic data-collection of cities

Socio-economic data has been sampled from a variety of regions around the world -- data from 151 cities which are distributed over 51 countries. Data of at least two consecutive population census as well as administrative spatial area information of urban areas were extracted from City population [16]. Population estimations are used in case local census data have not been available. Recent data of GDP per capita for each urban area have been sourced from the Organization for Economic Co-operation and Development (OECD) database [38]. All GDP values are expressed in American dollars, with an average value of the years 2010–2014. The missing OECD data has been completed from difference sources [8, 14, 20, 33, 41, 43]. The GDP per capita data is available for 139 cities. Population density is calculated as population per spatial area in sq. km. Errors may occur by mixing GDP data from the OECD database with data from other sources. This error type concerns predominantly smaller cities. A general error source is that urban boundary definitions of urban areas are not unified and that GDP data stems from different years. Both issues can lead to compatibility problems with the other data (performance and infrastructure indicators).

2.2 Performance indicator data

The central performance indicator used for this study is the congestion level in terms of average daily extra travel time (ADETT), which is the extra travel time in a day with respect to the free-floating traffic scenario, averaged over all monitored traffic participants of a distinct urban area. Comparable data on congestion level are retrievable through the Tomtom database. Tomtom is used by more than 6 million connected GPS devices and traffic is monitored by many million GSM probes and millions of government-owned road sensor [42]. As Tomtom’s methodology is sufficiently accurate and unified all over the world, it is a suitable data source for the present study. However, errors may occur for several reasons: the TomTom data is not produced by a representative selection of the population; the special distribution may not be homogeneous; finally the coverage may differ from city to city and may also differ from the urban boundaries found in Socio-economic data-collection of cities section.

2.3 Infrastructure related indicators from cities

The infrastructure accessibility (IA) is expressed as infrastructure length per inhabitant (in meter infrastructure per 10 inhabitants). The network infrastructure length is determined for each infrastructure-type of a city from the OSM database, using the OSMNx software package [10, 11]. OSM is a crowed sourced, unified and publicly available map of the world. OSM infrastructure data looks trustworthy for many cities, although it still needs some improvements on micro-level details. The OSM data quality seems sufficient for macro-level analyses. OSM consists of three basic components: nodes, ways and relations [39]. Each component has various characterizing attributes, called tags. For instance, the way tags can be used to identify the type of infrastructure.

The Python software package OSMnx extracts and converts OSM network data of the desired location into a directed transport graph (which is a graph object of the Python networkX package) and performs some topological corrections and node clustering simplification. The links of the graph retain the tag information of the ways. Clearly, it is possible to generate sub-graphs for each transport infrastructure (ordinary roads, bikeway and rail). OSMnx does provide options to generate and analyze each of the sub-graphs.

The area of the retrieved transport graph can be specified by providing the polygon surrounding the area or through the name of the city. In the latter case, the administrative boundaries of the desired city is retrieved from OpenStreetMaps’ Nominatim database. In most cases, official boundaries have been available on Nominatim and only in rare cases, manual boundaries have been defined. The statistics module of the OSMnx has been used to determine the length of each subgraph, e.g. road length, rail length and bikeway length. Finally the infrastructure accessibility IA is determined for all infrastructure types using the population data (see Sec. 2.1). BRT infrastructure length is sourced from www.brtdata.org [9] and BRT IA is determined in mm per 10 inhabitants. Errors of the infrastructure data are due to the incomplete OSM network or wrongly specified road attributes by volunteer contributors.

3 Analysis and results

In this section different analysis are performed and their results are discussed.

3.1 Correlations within city groups

In order to render the city comparison more comparable, cities are divided into three sub-groups, according to criteria explained in [19]: cities with a population under 800,000 are defined “small cities” (51 cities), cities with a population between 800,000 and 3 million are defined “mature cities” (56 cities) and cities with a population over 3 million as are defined “metropolis” (44 cities). The distribution of considered cities with respective group-type is shown in the world map on Fig. 1.

Fig. 1
figure 1

Distribution of analyzed cities (white = small cities, green = mature cities, red = metropolises)

The Pearson Correlation Coefficient between different indicators together with the number of samples are shown for different city sizes in Table 1. The software IBM SPSS 25 is used for the Pearson correlation analyses of variables, while the 95% confident level is taken into account. Not shown are low correlation whose coefficients have absolute values below 0.2. Note that the indicator correlations of small cities are often low, probably due to their heterogeneous sizes, land-use and transport networks.

Table 1 Pearson correlation coefficient and number of samples (N) between different indicators

The clearly positive correlation between spatial city area and population growth rate for metropolises, mature cities and all cities is trivial as the number of newborns is proportional to the population size. Also the fact that congestion levels (ADETT) increase with higher population density is not surprising and confirms that cities are struggling keeping transport infrastructure in pace with increasing traffic intensity (trips per sq. km). Interesting is the negative relationship between population density and GDP per capita, suggesting that economically weaker cities experience more congestions – this is particularly true for metropolises. The correlation between GDP per capita and road infrastructure accessibility (IA) is strong for metropolises and a little weaker for mature cities. The relationship between GDP per capita and rail IA and between GDP per capita and cycle IA is less pronounced.

The strong relationship between road IA and ADETT is clearly seen for all city sizes. For metropolises, the increase of rail infrastructure shows a similar de- congestionating effect than an increase in road infrastructure, while for small cities rail infrastructure is less correlated with congestions. One hypothesis could be that smaller cities are less congested and there is less pressure to change from car to rail. These results confirm the previously mentioned finding that rail infrastructure has a relaxation effect on road traffic for metropolises [7, 29, 47], presumably by shifting car trips to rail trips. Combining the relations between road/rail IA, congestions and GDP per capita, it could be hypothesized that economically strong metropolises can afford to expand road, rail and bicycle infrastructure and are more successful in reducing congestions.

3.2 Statistical models

As IA and ADETT are generally well correlated, some statistical models have been calibrated with the entire set of cities as well as on specific subsets. The best fit between road infrastructure accessibility RIA and ADETT of all cities is achieved with an exponential function of the shape:

$$ ADETT=a\ \mathit{\exp}\left(b\ RIA\right) $$
(1)

However, the fitting errors with a linear model are only slightly superior. The results of this calibration is shown in Table 2. Despite the high noise levels in the data, the coefficient b is negative, which means decreasing congestions with increasing road IA. This model has been applied for the three city sub-groups and plotted together with the data points in Figs. 2, 3, 4.

Table 2 Calibration results of exponential function model Eq.(1) for all cities. R2 = 0.515, sample size N = 147
Fig. 2
figure 2

Multi variant diagram of metropolises. Congestion level (ADETT) over Road IA; Bubble size is proportional to the population density; filled color indicates Train IA, bubble border color indicates Cycle IA, color of starred city- labels indicate BRT IA. For color scaling, see Table.5. The dotted line represents the fitted exponential curve from Eq.(1)

Fig. 3
figure 3

Multi variant diagram of mature cities. Congestion level (ADETT) over Road IA; Bubble size is proportional to population density; filled color indicates the Train IA, bubble border color indicates Cycle IA, color of starred city- labels indicate BRT IA. For color scaling, see Table.5. The dotted line represents the fitted exponential curve from Eq.(1)

Fig. 4
figure 4

Multi variant diagram of small cities. Congestion level (ADETT) over Road IA; Bubble size is proportional to the population density; filled color indicates Train IA, bubble border color indicates Cycle IA, color of starred city- labels indicate BRT IA. For color scaling, see Table.5. The dotted line represents the fitted exponential curve from Eq.(1)

A further model is build which includes both, road infrastructure accessibility RIA and train infrastructure accessibility TIA:

$$ ADETT=c+d\ RIA+e\ TIA $$
(2)

As RIA and TIA have the same unit, the coefficients d and e quantify the reduction in traffic-congestions due to an increase/decrease in road infrastructure or train infrastructure, respectively. The interesting question is how the coefficients d and e behave in cities with high and low population densities. Table 3 shows the calibration results of coefficients d and e for cities with a high population density (above 1500 per sq. km) while Table 4 shows the same calibration for cities with low population density (below 1500 per sq. km). The population density division at 1500 per sq. km has been chosen arbitrarily. The main idea has been to isolate extreme space oriented cities in the US and Australia. However, the division at 1500 per sq. km can be varied in reasonable bounds without changing the core message of the results, as detailed below.

Table 3 Calibration results of linear function model Eq.(2) for cities with population densities above 1500 per sq. km. R2 = 0.269, sample size N = 88
Table 4 Calibration results of linear function model Eq.(2) for cities with population densities below 1500 per/km^2 .R2 = 0.638, sample size N = 39

The results for high density in Table 3 show that e is significantly more negative than d (four times more negative) and that both coefficients are significant. This result means that an increase in train infrastructure per person reduces more congestion than the increase in road infrastructure per person. One reason why rail lines combat congestion more effectively is probably due to the fact that rail infrastructure has been implemented primarily along the most congested corridors of the city. Therefore, the result of the model does not mean that extending rail network beyond the main traffic corridors will continue to reduce traffic congestion.

The situation for low density cities, shown in Table 4, is less clear: e is only slightly more negative than d and e is statistically not significant (high P value). This means railway building for low density cities appears less effective in reducing congestions with respect to cities with high density cities.

3.3 Multi-variant comparison

In an attempt to pursue a holistic approach, the relations between five different indicators are shown in a bubble-type graph where each bubble represents city: the x-axis represents the Road IA and the y-axis represent the ADETT, the fill color indicates Train IA, bubble border color indicates Cycle IA, color of starred city- labels indicate BRT IA. The color scaling is summarized in Table 5. The bubble graph has been generated for each of the city groups: metropolitan cities in Fig. 2, mature cities in Fig. 3 and small cities on Fig. 4. For each city group, the model from eq. (1) has been calibrated, as the exponential curve showed the best fit. The regression curve and the R2are also indicated in each bubble graph.

Table 5 Scaling of multi-variant graphs Figs 2-4

The regression analyses for all city groups (Figs. 2-4) show R2values between 0.4 and 0.6, which indicate a good fit, considering the many error sources mentioned in Data collection and processing section. and the diversity of street-layouts, public transit service characteristics and mobility cultures. In the figures of all three city groups, the cities can be divided in two groups at a Road IA of approximately 35 m/(10Inh): most cities below this threshold have a higher population density, compared with the cities above this threshold. It is evident that many cities with low population densities have built large road networks and have succeeded in reducing congestion. On the other hand, cities with higher population densities appear to have space-constraints and cannot extend their roads network.

Looking closer at cities with higher population densities, it is apparent that those cities with a more extensive train network per person (red and orange color) have generally lower congestion levels. This result is consistent with the models in 3.2. However, there are also many exceptions: Dublin and Bucharest have high Train IA but also high congestion levels, while Madrid and Sao Paolo have low Train IA and low congestion levels. Furthermore, the small cities give a less clear picture regarding Train IA and congestions. Some of the small cities with higher population density stand out for their low congestion level most likely due to the presence of a high level of cycling infrastructure; examples are Malmo, Zwolle and Fresno. However, there are not enough example cities with high level of cycling to show a general trend.

4 Conclusions

In the past, the limited availability of comparable data on socio-economics, transport infrastructure and transport performance of cities prevented a holistic analysis with many indicators, due to the lack of variety. These limitations have been overcome by analyzing OSM data, Tomtom data and data from centralized internet databases. To date, no systematic worldwide infrastructure analyses based on OSM data has been performed. Using the Python package called OSMnx, it has been possible to extract different network-types from the OSM data, downloaded from different urban areas of the world. The 151 analyzed cities are distributed over 51 countries. The cities have been analyzed as a whole and within subgroups of cities with distinct population sizes (small cities, mature cities and metropolises). Relationships between socio-economic indicators, infrastructure accessibility and congestion level have been investigated.

Good correlation values between infrastructure accessibility, socio-economic indicators, and congestion levels have been demonstrated with a reasonable goodness of fit. The analyses have shown that cities with higher GDP have built more infrastructure which in turn results in lower congestion levels. The relation between infrastructure accessibility and congestion levels has been quantified using regression models. For cities with low population density (above approximately 1500 Inh. per sq. km), more roads per inhabitant lead to lower congestion levels. Metropolises and mature cities with high population density have in general lower congestion levels where rail infrastructure per person is higher. There is significant evidence that, in case of high density cities, an increase in train infrastructure accessibility is more de-congestionating than an increase in road infrastructure accessibility.

The available data could be further exploited to determine the transport-related energy consumption in cities, updating the worldwide comparison of Newman and Kenworthy [35]. However, this would require more information on modal split and trip distances, data which is more difficult to retrieve in a consistent manner.