The impact of urban form on commuting in large Chinese cities

Studies on cities in Europe and the United States have demonstrated that travel behaviour is influenced by urban form. Based on these findings policies steering the shape of cities have been proposed to reduce urban transport emissions and limit congestion. Such policies can also be relevant for the rapidly growing and motorising Chinese cities. Yet, empirical evidence on the relationships between urban form and car usage is scarce for the specific Chinese context that is characterised by high densities, fast development and strong government steering. Using novel crowd-sourced datasets we study the impact of several urban form variables (city size, urban density, land-use mix, polycentricity and spatial clustering) on the cost of commuting expressed in time and distance. The results show that city size and spatial clustering are important determinants of commuting: large cities without clear clusters of businesses and other facilities have longer average commuting times and distances. Increased prosperity also adds to longer and lengthier commutes. Spatial planning measures that maintain or reinforce high-density clusters can help limiting commuting distance and time. Current sprawled urban development may have long-term, negative consequences for the accessibility and liveability of Chinese cities and could hamper their economic potential.


Introduction
China's rapid economic growth, averaging 10% over 30 years (National Bureau of Statistics China 2015), has led to an unprecedented move from the countryside towards cities. This migration, the largest in human history, has not yet ended: the projection for 2025 is that 70% of Chinese population will live in cities with a population of over 1 million (McKinsey Global Institute 2009). This means that by that year, 350 million people will be added to the Chinese urban population: more people than live in the United States today (McKinsey Global Institute 2009).
Given the long lifetime of urban structures, the way the resulting urban expansion develops will have long-lasting socio-economic and environmental consequences for, for example, accessibility, liveability, energy consumption and related emissions. One of the major contributors to a city's energy consumption and greenhouse gas emissions is the transport sector. Currently, this is the fastest growing sector worldwide in energy consumption and cities account for the fast majority of this growth (Yan and Crookes 2009). In China this growth in transport emissions is especially daunting due to the rapid increase in motorised private transport: expectations under a business-as-usual scenario are that 35% of future automobile sales growth is from this country. Urban development patterns have notable impacts on resident's transport mode choice, travel time, travel length and travel frequency (Cervero 1989;Barrett 1996). The striking differences in transport-related carbon emissions between industrialised cities can be illustrated by a comparison between Barcelona and Atlanta. In Barcelona more people (2.8 million) live on a 26 times smaller area than in Atlanta, Georgia (2.5 million) and its inhabitants consume 11 times less transport energy per person (Lefèvre 2009).
To limit car travel, cities can adopt policies that significantly influence residents' travel behaviour by steering land-use patterns, densities and urban design. Examples of such policies are zoning, building codes and city growth boundaries (Sevtsuk and Amindarbari 2012). Mitigating greenhouse gas emissions through sustainable urban design is currently considered a top priority for city policy makers because of the daunting effects of climate change (World Bank 2015). Therefore, knowledge on how urban form influences transportrelated emissions is important to inform policy makers on which city structures can result in lower emissions. Considering the rapid growth of the urban built environment in China, this question is especially urgent here as the urban structures that are currently being built may have long-lasting impacts on the future sustainability of urban areas (Lefèvre 2009).
Many studies have analysed the effects of urban form on travel behaviour (e.g. Bento et al. 2005;Boarnet and Sarmiento 1998;Ewing and Cervero 2010;Gordon et al. 1989;Lin et al. 2015;Khattak and Rodriguez 2005;Krizek 2003;Schwanen et al. 2004;Stead 1999) and on mitigating transport CO 2 emissions (e.g. Grazi et al. 2008;Hickman and Banister 2007;Ma et al. 2015;Qin and Han 2013). The studies show that mixed land use and high urban density correlate with shorter distances, less motorised travel and less transportrelated CO 2 emissions. Higher densities and mixed land-use reduce trip length and the number of motorised trips by concentrating residences, employment and services (Cervero 1989). This in turn influences mode choice and reduces the share of car usage in relation to non-motorised modes of transport and public transportation (Barrett 1996). Commuting patterns are not only influenced by concentration, but also by the form of cities. This urban form can be either polycentric, with clustered development in suburbs, or monocentric, with development concentrated in one main centre (Levinson and Kumar 1994;Gordon et al. 1989).
Previous studies on urban form and mobility concentrated on Western cities with a fairly long history of industrial and infrastructure development. Consequently, empirical evidence for less developed countries, including China, is scarce (He et al. 2013). Recently some studies were done on specific aspects of travel behaviour in individual Chinese cities (e.g. mode choice in Nanjing by Feng et al. 2014) and only for Jinan (Jiang et al. 2014) and Beijing (Yang 2006;Ma et al. 2015;Zhao et al. 2010Zhao et al. , 2011 empirical studies are done that link travel behaviour to changes in the urban fabric. These studies are all based on case studies for a single city, mostly Beijing, and focus on the importance of neighbourhood characteristics at the individual city level, rather than aggregate city-wide characteristics. A more complete assessment on the importance of urban form characteristics for commuting in Chinese cities is lacking and it is not clear whether the relationships found in developed countries also hold for China, as the urban development processes are rather different in at least three respects. First of all, the urban density levels in China are much higher (compare the average density of 1200 persons per square kilometre in the US and 3200 in Europe to 6100 in China). Second, the pace of China's urban expansion is not comparable to developed countries. The current urban population growth in developing countries is five to eight times greater than in developed countries, which is faster than ever before in world history (United Nations 2007). This has led, amongst others, to different dynamics in terms of the balance between residences and jobs. For the cities of Beijing and Guangzhou jobs remained concentrated in the city centre while residential booms occurred in the outskirts resulting in longer commutes for residents Zhou et al. 2013). Third, there are institutional differences as urban growth in China was to a large extent centrally planned (Wu et al. 2007). Chinese urban planning is traditionally stronger than in Western countries, because of the ownership of land by the state, which may result in different urban patterns because of the more controlled supply of land and residential location choice (Zhao et al. 2009). Still resembling the former Danwei system, or socialist work units, Chinese government used to stimulate that jobs and housing are located close to each other to reduce travel distances. With the current trend towards a more market-oriented economy, these land-use features are however disappearing. Land use and housing reforms in which real estate agencies are replacing the socialist work units lead to a spatial separation of jobs and residences (Yang 2006).
To study how the specific characteristics of Chinese cities have influenced their relationships between urban form and mobility we set up an analysis for 30 cities with more than three million inhabitants using novel crowd-sourced datasets. We study the impact of several urban form variables (city size, urban density, land-use mix, polycentricity and spatial clustering) on commuting time and distance. The focus is on commuting because this is an important part of people's travel patterns and because it is a common theme in transport policy due to its regular spatial as well as temporal patterns and the relation with people's choices on where to live and work (Lin et al. 2015).

Determinants of commuting behaviour
Different perspectives are applied to analyse commuting behaviour depending on the scale and objective of the study at hand as well the disciplinary background of the researchers. Lin et al. (2015) offer an interesting framework that tries to reconcile these different approaches. They show that commuting patterns are influenced by a combination of socioeconomic factors that are active at the city and the individual level and the urban form that defines the spatial relationships between jobs and households ( Fig. 1).
At the city level, differences in historical and cultural factors, institutional factors or economic factors are relevant to define the general context for travel options and behaviour, while at the individual level income, education, age, gender and individual preferences and attitudes are important determinants (Lin et al. 2015). Following the seminal work of McFadden (1973McFadden ( , 1974 individual residents are assumed to base their travel decisions on the utility they derive from an option (e.g. choice of route or transport mode) in relation to the utility derived from alternative options. This utility will depend on individual preferences for specific options, the costs of these options and budget constraints (Liu and Shen 2011) and is the topic of a wealth of disaggregate studies on travel behaviour. In such studies the dependent variable (e.g. commuting) is measured at the individual level (e.g. Boarnet and Sarmiento 1998;Crane and Crepeau 1998;Giuliano and Narayan 2003;Handy et al. 2005;Bento et al. 2005).
Urban form describes the spatial distribution and concentration of jobs and residences and thus affects commuting distances. To some extent urban form is the resultant of the acting of socio-economic forces over a prolonged period. The aggregate behaviour of many individuals within in a specific city context changes the urban fabric. As we want to study aggregate commuting patterns for a cross-section of cities and are less interested in the choice behaviour of individuals we focus on the importance of urban form. In our approach we build on previous aggregate studies (e.g. Kenworthy 1989a, b, 1999;Cervero and Gorham 1995;Kenworthy and Laube 1999;Schwanen 2002;Veneri 2010;Yang et al. 2012) that analysed inter-city differences in travel behaviour in Europe and the US. The different dimensions of urban form and their relations with commuting behaviour described in these and other studies are briefly discussed in the following sections.

Urban form and the spatial relationship between jobs and houses
Urban form can be considered as the spatial configuration of human activities (Anderson et al. 1996) and a factor that shapes travel patterns because it provides the spatial context in which houses and firms are located (Hickman and Banister 2007). The jobs-housing balance is the level of heterogeneity among workers' residences and employment locations in a given area where a balanced mix of jobs and housing is perceived to be positive in that shorter trips are facilitated (Cervero 1989). An indicator for the jobs-housing balance is the theoretical minimum commute: the minimum amount of travel needed to move workers from their homes to their workplaces in an urban region (White 1988). Studies on US cities showed that there is a relation between this theoretical minimum commute of a city and its observed commute length: similar sized cities can have significantly different minimum commutes (Giuliano and Small 1993;Horner 2002Horner , 2007. The authors infer that this is caused by differences in the urban structure. This is the problem of excess commuting: non-optimal commuting distances and times resulting from the spatial arrangement of residential and workplace locations in a city (White 1988). Excess commuting can be  Lin et al. (2015) significant: studies on Polish and US cities showed that it varies from 48 to 67% (Horner 2002;Niedzielski 2006).

Urban form dimensions and their impact on commuting
Previous studies on the relation between urban form and commuting patterns often conceptualised urban form as residential density, because this is considered a good proxy for how accessible employment, goods and services are (Giuliano and Narayan 2003). In general, characteristics of urban form that have been taken into consideration in empirical research can be summarised as measures concerning city size, density, the spatial structure pattern and diversity.
Many early studies build on the monocentric city model developed by Alonso (1964), Mills (1972) and Muth (1969) that uses the assumption of a central business district (CBD) and a surrounding residential area. According to this model, commuting distances increase linearly with distance from the CBD, as employment is assumed to be concentrated there. Later research emphasised that many cities do not follow this monocentric model; as cities grew bigger they started to take on a more polycentric form in which employment is located in sub-centres outside of the CBD. Those sub-centres emerged as places where substantial parts of the urban population live and became increasingly independent from the original centre (Lefèvre 2009). Therefore, from the 1980s onwards, scholars have attempted to include measures and models that allow for polycentricity as well.
The well-known comparative aggregate study by Newman and Kenworthy (1989a, b) found a strong negative association between urban density and distances in developed cities. However, one should be careful in interpreting aggregate-level measures, as the relations found may not hold at the disaggregate level; a phenomenon which is called the modifiable areal unit problem (Zhang and Kukadia 2005). Therefore, Newman and Kenworthy's study was followed by quite some critique, because it did not take into account other factors, besides density, that may explain travel distances. Following on this research, several disaggregate studies investigated to what extent urban form (mostly measured as urban density) affects individual travel behaviour. Here the issue of self-selection is apparent: households that have a car or like driving may choose to live in lower density areas. Therefore, researchers have used instrumental variable (IV) techniques to control for the effect of self-selection.
Most of this research found that the endogeneity problems with urban density are small and that the effect of urban density on distances is significantly negative (e.g. Grazi et al. 2008). Ewing and Cervero (2010) found in their synthesis of a large number of empirical studies (which are mostly on the US) that Newman and Kenworthy's conclusion generally holds; urban form has in most studies a significant effect on commuting distances and higher urban density leads to shorter distances. Most empirical research has focused on city size, density, polycentricity, and land-use mix and these findings are summarised below.

City size
First of all, the size of the city is an important aspect to take into account when analyzing differences in urban spatial structure in relation to the jobs-housing balance and commuting. According to standard economic theory (e.g. Mills 1972), commuting distances become larger with distance from the city centre. This implies that when a city grows bigger, average distances increase. This is a simplified approach as it does not take into account the spatial structure of the city (e.g. degree of polycentricity or density changes). Transportation (2018Transportation ( ) 45:1269Transportation ( -1295Transportation ( 1273 However, Gordon et al. (1989) found that mean travel times are also longer in bigger, polycentric cities. According to Levinson and Kumar (1997) the number of opportunities in, for example, jobs and housing choices increases when cities grow bigger and therefore commuting distances increase. This implies that the choice of travel destinations is not limited by what is available nearby, but by what is available in the region (Yang et al. 2012). Hence, larger cities are expected to have longer commutes, but this increase was found to be fairly minimal in a recent study on 40 metropolitan areas in the US where commute times were 7% longer in areas twice the size of smaller ones (Angel and Blei 2016).

Density
Urban density is the most widely applied metric of urban form in empirical studies. Many studies on US and European cities concluded that urban density is the main determinant of travel times (e.g. Newman and Kenworthy 1989a, b;Schwanen 2002). In general, there is a consensus in literature that dispersion leads to longer commuting distances and more use of cars, as distances are generally longer (Dunphy et al. 1997).
However, urban density can also affect travel times in the opposite direction (Yang et al. 2012). A study by Levinson and Kumar (1997) on US cities showed that a residential density of 7500-10,000 persons per square mile results in the lowest commuting times. At higher densities, the congestion effect offsets the decrease in distance. Note that most Chinese cities have densities far over 10,000 persons per square mile. Kenworthy and Laube (1999) suggested in their study that it is therefore questionable whether further densification of Asian cities will have similar effects as in Europe or the US.
Polycentricity Gordon et al. (1989) noted that the relation between residential density and mean travel times is not straightforward, because it also depends on whether the city is monocentric or polycentric. They argue that in a polycentric city model, people have more opportunities to live closer to their work place, a phenomenon known as 'co-location' and that reduces commuting distances (Gordon and Wong 1985;Handy 1996). The co-location hypothesis builds on the assumption that employees and jobs follow each other and that labour and housing markets are perfect (Zhao et al. 2011). Under these conditions residents base their location choice on a trade-off between housing prices and commuting costs. The assumption is that, the closer to the job, which according to the monocentric city model is in the CBD, the higher the house price.
Another factor to take into account is the effect of polycentricity on congestion. A polycentric city structure is expected to lead to less congestion than a monocentric structure, which will reduce commuting times (Levinson and Kumar 1994). According to Gordon et al. (1989), in monocentric cities, an increase in residential density is generally expected to decrease travel times. However, when density becomes too high, the congestion effect may take over which will increase travel times.
Based on a review of existing empirical studies on the relation between polycentricity and commuting, Tsai (2001) concludes that there is no uniform relation. Some studies found that polycentric structures lead to lower average distances than monocentric structures (e.g. Gordon et al. 1989;Song 1992;Guiliano and Small 1993;Spence and Frost 1995;Alpkokin et al. 2008) whereas other studies found opposite results (Baccaini 1997;Cervero and Wu 1997;Ewing 1997;Schwanen et al. 2003;Aguilera 2005). In their study on 50 metropolitan areas in the U.S., Yang et al. (2012) found that the relation depends on the threshold of population density that is used to define polycentricity: high density polycentricity increases commuting times whereas moderate high-density polycentricity lengthens commuting times.

Land-use mix and spatial clustering
Mixed land use is expected to result in activities that are located closer together and thus reduce commuting distances (Cervero 1996). Some studies found that mixed neighbourhoods have better accessibility and therefore more people walk or cycle than in monofunctional neighbourhoods (e.g. Cervero 1989;Stead 1999). Giuliano and Small (1993), however, argue that those effects are very small, and therefore improving the land-use mix will only have a limited effect on commuting distances.
Spatial clustering is to some extent the opposite of land-use mix, but it may be beneficial for limiting commuting as it allows for the development of efficient transport infrastructure. Especially clusters with high densities of jobs and residences may be able to limit the need for lengthy commutes. Since spatial clustering within a city can occur in one or more subcentres there is no direct relation with its degree of polycentricity. Moreover, we plan to apply an approach relying on statistics of spatial association in combination with local-level data that are likely to highlight different clusters than more formal methods that characterise polycentricity.

Socio-economic factors at city level
In addition to urban form and jobs-housing balance, socio-economic factors are important to understand commuting behaviour at city level. These city level differences are especially pronounced when comparing commuting patterns worldwide, as specific historical, cultural, economical and institutional factors have produced very different urban systems. Several general differences between Chinese cities and European and US cities were already discussed in the introduction. Due to a lack of data we are not able include many different socioeconomic factors, but we do include a proxy for local income as many studies show that more prosperous cities have longer commuting distances (Lin et al. 2015). This relation, however, does not hold for commuting times, where the opposite is often the case. This probably relates to the fact that inhabitants may switch to using cars when their income increases allowing them to locate further away from the city centre. Consequently, this results in longer commuting distances but not in longer commuting times (Schwanen 2002). In Chinese cities, the current rapid economic growth leads to changing tradeoffs between location centrality and commuting costs. Furthermore, the location of dominant economic sectors may be of importance. For example, service industries are usually related within shorter commutes than manufacturing (Crane and Chatman 2003).

Data and empirical methodology
Commuting distance and time Data on commuting behaviour are derived from a large-scale survey by Baidu (the Chinese equivalent of the Google search engine) among a population of over 3 million of their users Transportation (2018Transportation ( ) 45:1269Transportation ( -1295Transportation ( 1275 in 300 Chinese cities. The data is from 2014 and concerns average one-way commuting distances in kilometres and travel time in minutes. Respondents were asked to mark the location of their home and their workplace in the Baidu map service app on their smart phone, and the commuting distance and time was automatically calculated (Baidu 2014). For this study all cities with more than 3 million inhabitants were selected. 1 Those cities are geographically dispersed throughout the most densely urbanised parts of China, as can be seen from Fig. 2.
Beijing is the city with the longest commute in terms of both time and distance. The average commute takes 52 min and is 19.2 km long, which is more than two times that of the average commute of the city on rank 30 (Fig. 3). China's largest city (Shanghai) ranks second in the list. These two cities have the largest number of commuters who live outside of its territory; many commuters come from surrounding cities, covering distances of over 50 km (Liu 2015).
The average commuting distance in the sample is 12.5 km and the average commuting time is 35 min. This average commuting time is longer than in European and US cities. The average distance for commuting is also longer than in European cities, but comparable to US cities. In European cities, the typical commute is 10 km and takes 28 min (Schwanen 2002), whereas in US cities, the average commuting distance is 12.5 km (Brookings Institution 2011) and takes 24.7 min (US Census Bureau 2009).
The variation in average commuting distances is slightly larger than in commuting times as the coefficients of variation show 2 ; these are 0.22 and 0.21 respectively. This is a common finding; possible reasons may be that the time people are willing to spend on commuting is rather constant, in contrast to the distance travelled. Marchetti (1994) found that across history, the average journey-to-work time of 30 min is similar across cities around the world and has remained stable for around six centuries (and indeed also similar to the average of 35 min in this dataset). He explains how cities can sprawl as long as they stay within the one-hour daily travel boundary. This is not the case for historical records of travel distance; there is a large variation in average travel distances across cities around the world and over time, due to differences in preferred mode choice (e.g. car oriented versus public-transit oriented cities).
It can furthermore be observed from Fig. 3 that commuting time and distance of each city do not completely follow each other; the correlation coefficient is 0.97. This may be explained by differences in congestion. For example, Chengdu has a higher commuting time than can expected from its commuting distance, which is most likely due to the fact that this is one China's most severely congested cities (TomTom Traffic Index 2014).

Metrics of urban form
To calculate these urban form metrics, two datasets are used: a crowd-sourced database on development density and land-use mix and a dataset on population based on Chinese population census data. First, these two databases are described and secondly, the way the urban form metrics are calculated using a Geographic Information System (GIS) is explained.

Development density and land-use mix per parcel
Development density and land-use mix are characterised at parcel level in a dataset of 297 Chinese cities that is based on data from crowd source platforms (Long and Liu 2013). This dataset is publicly available from the Beijing City Lab website and applies street networks derived from OpenStreetMap to delineate the parcel geometries and Point-of-Interest (POI) data from China's leading online business catalogue (Sina Weibo) to add parcel-level information such as land-use mix and development density.
In total 82,645 urban parcels were identified in 297 Chinese cities (of which 30 are used in this analysis). A parcel is a spatial unit that is comparable to a 'city block' in the US. It is defined as a 'continuously built-up area bounded by roads'. 3 Using this definition, the parcels were automatically identified by a vector-based cellular automated model. An extensive description of the method is provided by Long and Liu (2013). The quality of the identified parcels has been tested against other available datasets such as remote-sensing  (Pesaresi et al. 2016); water from the CCI Land Cover dataset (www. esa-landcover-cci.org); and country boundary from the GADM database of Global Administrative Areas (www.gadm.org) and survey datasets and the parcel identification turns out to be reasonably accurate. 4 The main difference is that the automatically generated parcels are generally somewhat larger than in the manually generated datasets, which is mainly due to the lack of information on small roads in OpenStreetMap (Long and Liu 2013).
For each parcel, the dataset contains information on amongst others development density and land-use mix, derived from POI data from a business catalogue of Sina Weibo. The business catalogue lists business establishments, leisure locations, transport facilities and housing options throughout China (Website Sina Weibo). The development density is measured as the ratio of POIs per parcel to the parcel area, and is generally high: for example, for Beijing there are on average 10.6 POIs per 200 square metres (Lian and Xie 2011). The POIs are divided into eight categories: commercial sites, office buildings, transport facilities, government, education, residence communities, green space and others. Land-use mix is defined as the degree of mixture of different POIs from different categories, which is further explained in section ''Calculated metrics'' (Long and Liu 2013). The quality of the POI data on development density and land-use mix was tested against another data source on floor area ratio in Beijing. The correlation coefficient between development densities calculated with these two different methods is 0.86, which suggests that POI data can be used as proxy for urban density (Long and Liu 2013). However, a drawback of this dataset is that it uses the quantity rather than the quality of the POI data: this means that for example a large department store is treated the same as a small shop.

Inhabitants per Jiedao
To describe the spatial distribution of population a dataset is obtained from Beijing City Lab. This dataset is available as GIS point feature data at Jiedao (city sub-district) scale for the whole country Mao et al. 2015). The population data is derived from the 2010 population census of China and includes migrant workers residing in the Jiedao (National Bureau of Statistics of China 2010).

Calculated metrics
City size City size is one of the most fundamental dimensions of urban form and provides the basis for many other urban form metrics. The most common metrics of city size are population size and total built-up area. This study uses population size (obtained from the 2010 Census) as a proxy for the total number of commuters, in line with studies of Angel and Blei (2016), Tsai (2001) and Yang et al. (2012), whereas total built-up area is included in our assessment of density. The population size of the cities in our sample ranges from 3 to 23 million inhabitants. See Table 1 for summary statistics of these and other metrics and ''Appendix'' for a complete overview of the calculated metrics per city.
Density Urban density is the urban form metric mostly used in studies on the effect of urban form on commuting times and distances. There are multiple ways of defining density. We apply the most widely used measure of urban density: population density defined as the average number of residents per square kilometre. The data are derived from the dataset on inhabitants per Jiedao, however, as these are point data and the boundaries of the Jiedao's were not available, their areas were approximated by Thiessen polygons. These are polygons from which the boundaries define the area that is closest to each point relative to all other points. The average population density was then calculated as the total number of inhabitants divided by the total area of all Jiedao's per city. The average population density in this sample is 6712 inhabitants per square kilometre, a bit higher than the average of 6100 for Chinese urban areas (Demographia 2015). ''Appendix'' provides three example maps of population density.
Polycentricity Most definitions for polycentricity usually refer to either morphological or functional polycentricity. Morphological metrics describe the territorial distribution and sub-centres within individual cities, while functional polycentricity is usually assessed at the inter-urban level focusing on flows of e.g. traffic within an urban region (Meijers 2008;Sevtsuk and Amindarbari 2012). In this study, a morphological definition of polycentricity is used: the degree to which employment or inhabitants are concentrated in the city's subcentres (Sevtsuk and Amindarbari 2012). The degree of polycentricity will in this case be based on inhabitants, as no employment data was available. The drawback of defining polycentricity in terms of population is that nothing can be inferred about the jobs-housing balance. Consequently, there is the implicit assumption that when there are clearly defined sub-centres in terms of population, this is also the case for employment.
The polycentricity index that is used in this study is developed by Sevtsuk and Amindarbari (2012): this method consists of two steps in which first, the centres are defined and secondly, the polycentricity index is calculated. The definition of centres is based on three criteria: 1. the centre's density is higher than the average of the city-wide mean density; 2. adjacent polygons above the specified threshold are grouped together; 3. centres must contain 10 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi population p % or more of the total city population. 5 When the centres are determined, the polycentricity index (PC) is calculated as follows: where N is the number of centres, R c is the ratio of the population living in the identified centres to the total population size and HI is the homogeneity index. HI is defined as follows, based on the entropy index by Limtanakool et al. (2009): Here Z i is ratio of the population living in centre i to the population living in all identified centres. HI returns a value between 0 and 1; when population is equally divided over all centres in a city it equals 1 and when all inhabitants concentrate in one centre it equals 0. The polycentricity index is thus based on the relative size of the centres, the number of centres and the balance in size between the centres. Using this method, 14 cies were identified as polycentric (PI [ 0), of which Beijing, with a PI value of 0.94, is the most polycentric city.  [ 0), 0 indicates that only one centre is present b Dummy variable: 1 = significant spatial clustering of high density areas, 0 = random or dispersed spatial pattern 5 Note that this third step is an important addition compared to studies that just look at density levels relative to the average city density. Such simpler criteria result in polycentricity indices that are sensitive to the threshold density values used to define centers, as was discussed by Yang et al. (2012).
Land-use mix Average land-use mix is derived from the dataset on development density and land-use mix by taking the city average of land-use mix. Land-use mix for each parcel is calculated by the mixed index (M): where n is the number of POI types, p i is the proportion of POI type i among all POIs in the parcel (Long and Liu 2013). This metric provides information on the proportion of different categories of POIs on an average square kilometre. It can be argued that the land-use mix indicator is more meaningful if a probability of a certain land-use to occur (as e.g. residential land-uses are more common than educational uses) and also a probability of certain land-uses to mix together is included (Sevtsuk and Amindarbari 2012). However, the available data was not detailed enough to calculate such more advanced metrics. ''Appendix'' provides example maps of land-use mix.
Spatial clustering Spatial clustering within a city indicates the degree to which phenomena such as buildings, population, jobs and businesses are concentrated. This concept bears some relation with average density but also considers the spatial distribution of density differences within a city. We follow the approach of Tsai (2005) and test for the presence of clusters of high-density areas. In order to determine an aggregate metric for each city, a hot-spot analysis was conducted with development density data of Long and Liu (2013) as input. ''Appendix'' shows development density hotspot maps for three cities. The hot-spot analysis resulted in a Getis-Ord statistic for each city that measures the degree of clustering for either high or low values of, in this case, development density, controlling for the size of the parcels (Getis and Ord 1992). Among the included cities, a random pattern is more common (20 cities) than significant high clustering 6 (10 cities) and significant low clustering is not observed. As significant clustering of high density areas in our case indicates the presence of concentrations of, for example, businesses, facilities and housing options, it can also considered as a proxy for urban compactness. Tsai (2005) showed in a simulation analysis that higher spatial autocorrelation coefficients correspond to more compact metropolitan areas. Since spatial clustering within a city can occur in one or more subcentres there is no direct relation with its degree of polycentricity. The effect of the different urban form variables on commuting time and distance is tested in two separate regression models using ordinary least squares (OLS). City per capita GRP is included to control for city-level differences in prosperity. This data was obtained from Starmass (2015) and relates to 2013. The number of public transport vehicles per 10,000 inhabitants is included to control for city-level differences in public transport provision. This was obtained from the China City Statistical Yearbook (National Bureau of Statistics China 2014). Table 1 shows the summary statistics of the included variables; the complete dataset is listed in ''Appendix''. The correlation among the included explanatory variables is below 0.45 for all variables (see ''Appendix''). Table 2 presents the results of the two regression models that describe the effect of several urban form metrics on commuting distance and time. Both models perform rather well in explaining commuting differences between cities showing an explained variance of around 0.8. In general, the effects on commuting distance and time are fairly similar when taking into account the ratio between average commuting time and distance. 7 City size (expressed as population size) has a considerable and significant effect on commuting distance and time. When population increases with 1%, commuting distance increases by 31 m and commuting time by 6 s. This impact of city size is stronger than in US cities where an increase of population with 1% leads to an increase in travel distances of 5 metres (Gordon et al. 1989) and increase in travel times of 1.8 s (Yang et al. 2012). In European cities, a 1% increase in city size leads to a 0.36 s increase in commuting time (Schwanen 2002). While the observed impact may seem small, it should be considered substantial due to the large variation in city size in our sample: when population doubles, commuting distance increases by 2.2 kilometres and commuting time increases by 6.6 min (compare an increase of only 2.5 min as suggested by the study of Angel and Blei 2016 8 ). The large impact may be a result of the fast growth of many cities that has resulted in residential booms in the outskirts of cities whereas employment remained concentrated in their centres. Due to the resulting deterioration of the jobs-housing balance, average distances may increase substantially for larger cities Zhou et al. 2013).

Results
GRP also has a significant and substantial impact if we consider the large variation within our sample: if GRP per capita doubles, the commuting distance increases by 2.7 km and commuting time by 5.7 min. That cities with a higher GRP per capita have longer commuting distances confirms the hypothesis that wealthier people travel further between home and work. That commuting time also increases contradicts findings for European cities where commuting distance also increases with prosperity, but commuting times decrease, probably due to an increase in the use of faster modes of traffic, such as cars (Schwanen 2002).
Higher average densities do not necessarily result in shorter commutes, contrary to the findings on European and US data (e.g. Newman and Kenworthy 1989a, b). A possible explanation is that average densities in China are much higher than in western cities and therefore marginal increases in average density do not lead to the same decrease in commuting. This suggests that the congestion effect may offset the decrease in distance at densities above a certain threshold as was found by Levintson and Kunmar (1997) and suggested by others (e.g. Yang et al. 2012).
Interestingly enough the presence of high-density clusters of, for example, businesses, facilities and housing options, does significantly affect commuting. So the spatial distribution of densities and the presence of high-density areas may be more important for explaining commuting behaviour than average population density values. We find that commuting distances decreases with 1.6 km and commuting time decreases with 4 min when high-density clusters are present. The latter result is less significant indicating that this relation is more complex and that it may also depend on local factors such as road network and travel modes (Feng 2014).
The average level of polycentricity does not have a significant influence on commuting distance and time. This underlines the ambiguous relationship between polycentricity and commuting. Studies on European and US cities have shown both positive and negative correlations between the two. The fact that we are not able to establish a significant relation also suggests that both positive and negative relations exist in our sample. The impact of polycentricity may thus depend on additional city-level characteristics (such as jobhousing distribution) that we could not incorporate in our analysis.
The number of public transport vehicles per 10,000 inhabitants does not significantly increase or decrease commuting distance or time in our analysis. This may be due to the ambiguous relation public transport has with commuting behaviour. Public transport commutes generally require more time to cover the same travel distance than automobile commutes (Vandersmissen et al. 2003;Lee et al. 2009;Yang et al. 2012). But public transport provision and urban structure are also interdependent: larger cities with compact forms can more efficiently sustain a dense public transport network (Masanobu and Hanaoka 2003). The contrasting impacts of city size (leading to longer commutes) and spatial clustering (shorter commutes) we find in our analysis may obscure the impacts of the higher public transport densities associated with these urban characteristics. This indicates that the impacts of size and compactness outweigh those of public transport provision. 9 Finally, the average mix of land-use does not have a significant influence on commuting behaviour, which contrasts previous findings that mixed land use may bring activities closer together and therefore reduce commuting distances. However, it may be the case that averaging land-use mix over the often large urban areas of Chinese cities does not capture the benefits local concentrations of land-use mix may have for limiting commuting times and distances. Another explanation may be that the land-use mix is already very high in Chinese cities compared to European or US cities due to the Danwei system (Feng 2014). Therefore, marginal increases in land-use mix may not have a considerable impact on commuting distances.

Conclusion and discussion
Due to data limitations, knowledge on the effects of urban form on commuting distances and times in Chinese cities is scarce and especially inter-city comparisons are lacking. Knowledge on this issue is of crucial importance for China, as Chinese cities are currently growing rapidly (McKinsey Global Institute 2009) and the way this urban expansion will develop may have long-lasting effects on their accessibility, liveability and greenhouse gas emissions (He et al. 2013).
The current research takes a first step towards a better understanding of the relationship between urban form and commuting for a sample of 30 large Chinese cities by making use of three novel, open access and crowd sourced datasets. Notwithstanding the limitations of crowd-sourced and open data sources (see e.g. Sui et al. 2013), the included datasets provide a unique opportunity to be able to include several metrics of urban form for a large range of Chinese cities. The analysis indicates that commuting behaviour in these cities responds differently to some urban form characteristics than was expected from earlier studies on cities in Europe and the US.
The results show that spatial clustering (presence of high-density clusters of, for example, businesses, facilities and housing options), city size (measured by population size) and per capita GRP are important determinants of commuting distance and time. Cities with such clusters have lower average commuting distance and lower average commuting time, while larger and wealthier cities have longer and lengthier commutes. Spatial planning measures that maintain or reinforce high-density clusters can thus help limiting commuting distance and time. Striving for higher average population densities, as suggested by Newman and Kenworthy (1989a, b) based on data for developed countries, does not seem relevant for Chinese cities as our analysis did not find a clear relationship between average densities and commuting times or distances in the sampled Chinese cities. Neither degree of polycentric nor land-use diversity was found to significantly affect a city's average commuting behaviour.
The main difference between our findings and studies on Europe and the US is that we did not find a significant impact of average population density and that we find a much stronger effect of city size on commuting. These differences may be explained by the incomparability of the average densities and urban growth rates. Average densities in Chinese cities are two to five times higher, which may explain why further densification will not lead to much shorter travel times as this may be offset by an increase in congestion. Furthermore, the very rapid urban growth (five to eight times higher than in developed countries), has led to residential booms in the outskirts of cities, whereas most of the jobs stayed concentrated in the centres. This may explain the large effect of an increase in city size on commuting times and distances.
The results should however be interpreted with caution, as this study has a couple of limitations. First of all, only GRP per capita and public transport vehicles per 10,000 inhabitants were included as control variables for city level characteristics. Other sociodemographic or economic characteristics of cities, such as age structure, culture or dominant industries, could help explain part of the observed differences in commuting distances and times. Secondly, including data on the spatial distribution of employment could improve the analysis by providing insights into the effects of the jobs-housing balance on commuting. Such data could also be used to define polycentricity based on employment centres as was applied in other studies (e.g. Schwanen 2002; Giuliano and Small 1991;Muñiz and García-López 2009). Thirdly, transport modes were not included in this analysis as such data was not available. Information on selected modes of transport could especially help better understand commute times and allow for establishing the link between travelled kilometres and environmental indicators such as greenhouse gas emissions.
Notwithstanding its limitations, this study shed some new light on the effect of urban form on commuting in a cross-section of large Chinese cities, which is important to support evidence-based spatial planning policy development. In contrast to the US and Europe, China has a large potential for implementing spatial planning policies because the state still owns most of the land and steers investment decisions (Yang 2006). Our results suggest that spatial planning should be directed towards maintaining or reinforcing high-density clusters. Current developments, fuelled by increasing urbanization and prosperity, seem to lead to more sprawled cities (Deng and Huang 2004;Zhao 2010;Wu et al. 2015) and thus pose the risk of increased congestion, traffic-induced emissions and a further deterioration of the urban environment.
This study thus suggests that we cannot base our understanding of the relation between urban form and commuting solely on studies on Europe and the US. These findings seem not directly applicable to other countries, which have different urban development patterns in terms of, amongst others, urban growth rates and average population densities. Recommendations for future work are to expand the analysis to other developing countries and to include additional data, as suggested above, on city level characteristics, employment and transport modes.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix
See Tables 3 and 4