Spatial association techniques for analysing trip distribution in an urban area
- First Online:
- Cite this article as:
- Mazzulla, G. & Forciniti, C. Eur. Transp. Res. Rev. (2012) 4: 217. doi:10.1007/s12544-012-0082-9
- 1.4k Downloads
Urban processes and transportation issues are intrinsically spatial and space dependent. For analysing the spatial pattern of urban and transportation features, the spatial statistics techniques can be applied. This paper presents a spatial association statistics for mobility data, and particularly the daily trips made by people from home to work and study places (commuter trips).
In the last few years, urban analysis has been supported by the adoption of Geographic Information Systems (GIS). Using GIS, statistics of global autocorrelation (Getis-Ord General G and Global Moran’s Index I) and statistics of local autocorrelation (Gi* and Local Moran’s I) was elaborated.
The application of spatial association statistics led to find clusters and to identify eventual hot spots of the mobility data set. The results showed that the spatial distribution of trips among the census parcels displays spatial dependence in the data set.
This work provided interesting results about the spatial distribution of commuter trips because it showed spatial auto-correlation of the daily trips variable.
KeywordsSpatial associationDaily commuter tripsGIS
Urban processes and transportation issues are intrinsically spatial and space dependent. An urban spatial structure is a spatial arrangement of a city in which it is a result of the interaction between land markets, topography, infrastructure, taxation, regulations and urban policy over time . Railways, road networks, civil and industrial building, and other constructions built on territory fit for people’s needs. In particular, transport demand is influenced by the location of dwellings and economic activities; therefore, it is strongly dependent on the spatial distribution of these .
To find the processes of spatial distributions, it is necessary to manipulate a large amount spatial data about urban areas using spatial analysis techniques. The notion of spatial analysis can include any operation performed on geographical data. Spatial analysis techniques allow to study the shape of spatial aggregation of the variables and their spatial relationships. It is possible to make some objective considerations about spatial patterns; understanding if spatial pattern is random or represent a definite aggregation; establishing the causes of a spatial distribution; discovering if the observed values are enough for analysing a spatial phenomenon; exploring the heterogeneity of the areas in the region of study .
Over the last few years, the adoption of Geographic Information Systems (GIS) has supported urban analysis. A GIS allows the spatial relationships among the variables to be studied, because it integrates common tasks performed on the database, such as statistical analysis, with the advantages of graphical representation of data and geographic analysis offered by maps. Using GIS, researchers can manipulate a large amount of data and visualize urban affairs .
This paper presents the application of the spatial association techniques using mobility data of the Cosenza-Rende urban area. The aim is to understand the spatial distribution of mobility data and identifying eventual spatial patterns.
The paper is organized as follows: in the next section some spatial association techniques are described; Section 3 presents a brief literature review about spatial association; in Section 4 the case study is described; Section 5 presents the outcomes of the application of global and local techniques of spatial association and concluding remarks are contained in Section 6.
2 Spatial association techniques
Spatial statistics comprises a set of techniques for describing and modelling spatial data. Unlike traditional (non-spatial) statistical techniques, spatial statistical techniques actually use space–area, length, proximity, orientation, or spatial relationships–directly in their mathematics .
There are some technical issues in spatial statistics. Among these, spatial association or spatial autocorrelation is the tendency of variables to display some degree of systematic spatial variation. In urban studies, this fact often means that data from locations near to each other are usually more similar than data from locations far away from each other. Spatial association may be caused by a variety of spatial processes, including interaction, exchange and transfer, and diffusion and dispersion. It can also result from missing variables and unobservable measurement errors in multivariate analysis . The advantages of the study of spatial autocorrelation are manifold : to provide tests on model misspecification; to determine the strength of the spatial effects on the variables in the model; to allow for tests on assumptions of spatial stationarity and heterogeneity; to find the possible dependent relationship that a realization of a variable may have on other realizations; to identify the role that distance decay or spatial interaction might have on any spatial autoregressive model; to help to recognize the influence that the geometry of spatial units under study might have on the realizations of a variable; to allow for identifying the strength of associations among realizations of a variable between spatial units; to give the means to test hypotheses about spatial relationships; to give the opportunity to weigh the importance of temporal effects; to provide a focus on a spatial unit to better understand the effect that it might have on other units and vice versa (“local spatial autocorrelation”); to help in the study of outliers.
The elements of the model are: a vector Y (n×1) of objective variable observations; a matrix X (n×K) of independent observations including the usual constant; a vector β (1×K) of parameters corresponding to K independent variables. Scalars ρ and λ are parameters of spatial association corresponding to the objective variable and the error term ε, respectively, while μ are independent and possibly homogeneous error terms . W is the spatial lag operator and is a matrix (n × n) containing weights wij describing the degree of spatial relationship (contiguity, proximity and connectivity) between units of analysis i and j. Considering physical contiguity, in the matrix W a weight of 1 is assigned to pairs of zones sharing a border and 0 otherwise. Connectivity can be given in terms of travel between pairs of origins and destinations. Alternatively, proximity can be defined in terms of distance or various accessibility measures, such as travel time or generalized costs.
In general, the modelling process is preceded by the explanatory data spatial analysis (ESDA), which is a phase associated to the visual presentation of the data in the form of graphs and maps and leads to the identification of spatial dependency patterns in the phenomenon under study. ESDA is a collection of techniques to visualize spatial distributions, identify atypical locations or spatial outliers, discover patterns of spatial association, clusters or hot spots, and suggest spatial regimes or other forms of spatial heterogeneity.
In the study of local pattern association, several statistics of spatial association allow to detect places with unusual concentrations of high or low values to be analysed (‘hot’ or ‘cold’ spots). In the last few years, two statistics have been used in many applications: Gi(d) statistics [10–12] and Local Indicators of Spatial Association (LISA) as Local Moran’s I .
High-high association: the value of xi is above the mean and the values of xj at ‘neighboring’ zones are generally above the mean, the statistic is positive;
Low-low association: both values are below the mean, the statistic is positive;
High-low association: the value at i is above the mean and the values at neighboring zones are, in general, below the mean, this gives a negative statistics;
Low-high association: the value at i is below the mean and the weighted average is above the mean, Ii is negative.
These can be reached from a Moran’s scatterplot tool. The combination of LISA and a Moran’s scatterplot tool provides information on different types of spatial association at the local level.
3 Literature review
In the literature, many studies deal with the application of spatial analysis but in different fields. For example, Anselin  applied measures of spatial association to investigate the spatial patterns of conflict in Africa, whereas a study by Anselin et al.  established the utility of exploratory spatial data analysis in uncovering interesting patterns of child risk, considering rates for infant mortality, low birth weight and prenatal care as social indicators. In both cases, the exploration of spatial patterns clearly demonstrated the presence of significant spatial clusters of high and low values, as well as some interesting spatial outliers.
Spatial association has been studied also to analyse land-use data, which have the tendency to be spatially autocorrelated, as land-use changes in one area tend to propagate to neighboring regions. Aguiar et al.  built spatial regression models to assess the determining factors of deforestation, pasture, temporary and permanent agriculture in Amazon. The goal of this paper is to explore intra-regional differences in land-use determining factors.
Over the last decades, there has been considerable interest in the analysis of urban spatial structures using spatial analysis techniques to describe and explain the distribution of population, land values, employment and other structural variables in a city. Some studies are about the exploratory spatial data analysis. Among these, Páez et al.  applied ESDA techniques to analyse the land price data in Sendai City, a middle- sized Japanese city with population rounding up to 1 million. The application of global statistics as Moran Index I showed that all variables present a high degree of positive, meaning that observations with similar values tend to form clusters. To complement the global analysis, the authors resorted to the use of local spatial association statistics. Localised exploratory data analysis shows that the distribution of land prices in Sendai City follows an essentially monocentric pattern, with only two spatial regimes: the CBD area and the periphery. In Baumont et al.  ESDA was studied to analyse the intraurban spatial distributions of population and employment in the agglomeration of Dijon (regional capital of Burgundy, France). The aim was to study whether this agglomeration has followed the general tendency of job decentralization observed in most urban areas or whether it is still characterized by a monocentric pattern.
In others studies the spatial association techniques were applied to analyse housing prices. Tse  suggested a stochastic approach which is able to correct autocorrelation bias in the hedonic house price function due to spatial dependence. The model, using data from Hong Kong, incorporates adjustments reflecting net floor area ratio, age, floor level, views, transport accessibility and amenities such as availability of recreational facilities.
Spatial autoregressive models (SAR) were used to estimate the impact of locational elements (as propinquity to a shopping facility or a recreational amenity) on the price of residential properties sold during 1995 in the Greater Toronto Area . The first step was to estimate Moran’s I to determine the effects of spatial autocorrelation that existed in housing values. This research discovered that SAR models offered a better fit than non-spatial models, because in the presence of other explanatory variables, locational and transportation factors were not strong determinants of housing values.
The analysis of spatial association is beginning to be applied to model transportation processes and land use and transportation interaction. Bolduc et al.  analysed travel flows and modal split using a regression model of spatial association. In this model an error components specification with spatial error autocorrelation was introduced. Application of the model to a case study shows that the spatial model gives a better fit to the data compared to non-spatial models.
Berglung and Karlstroem  used Gi statistics (local spatial association) for applications with flow-data, and demonstrated its usefulness in two applications. They explored non-stationarities and identified underlying geographical patterns. The authors concluded that localised statistics allow to address how relationships between variables vary over space.
A study proposed by Shaw and Xin  implements a temporal GIS, coupled with an exploratory analysis approach, to allow a systematic and interactive way of analysing land use and transportation interaction among various data sets and at user-selected spatial and temporal scales. Although the identified interaction patterns do not necessarily lead to rules that can be applied to different geographic areas, the results of explanatory analysis provide useful information for transportation modelers to re-evaluate the current model structure and to validate the existing model parameters.
Another application of spatial association is in traffic safety . This paper aims at identifying accident hot spots by means of a local indicator of spatial association (LISA), more in particular Moran’s I. For applications in traffic safety, Moran’s I was adapted because road accidents occur on a network. The authors indicated that an incorrect use of the underlying distribution would lead to false results.
Analysis of the literature showed that the spatial analysis techniques were initially applied to the study of socio-economic and demographic variables. Only more recently, these techniques have been applied in the analysis of urban areas and they are still few applications in the field of transport and mobility. Researchers in the field of transportation, however, have shown a growing interest in applying these techniques to the analysis of mobility. This is because there is a strong spatial component in the processes of generation and distribution of trips.
This work arises, therefore, to investigate the presence of spatial autocorrelation in the data on the trips distribution in an urban area.
4 The case study
The case study focuses on the urban area of Cosenza, placed in Calabria Region (South Italy). Cosenza, which is the provincial capital in North Calabria Region, forms a single urban area together with Rende in the northerly direction.
This urban area is the most important centre of attraction for all the towns of the province because it performs some administrative functions and offers different services and job opportunities. Furthermore, Rende is home to the University of Calabria (UniCal). The campus affected mobility characteristics of all the urban centre of the province. Nowadays the University represents one of the major centres of attraction of the urban area; over 33,000 students and about 2,800 members of staff attend the campus. Thanks to the university, Rende has changed considerably in recent decades, such as the construction of new residential areas and new infrastructures.
For providing a preliminary characterization of the cities analysed in this work, it is necessary to report some information about population and economic activities .
Population and housing data
Total population (inh.)
Male population (inh.)
Female population (inh.)
Population younger than 15 years (inh.)
Population between 15 and 65 years (inh.)
Population older than 65 years (inh.)
Families with 1 member (nr.)
Families with 2 members (nr.)
Families with 3 members (nr.)
Families with 4 members (nr.)
Families with 5 members (nr.)
Families with 6 or more members (nr.)
Surface area (kmq)
Total housing (nr.)
Empty housing (nr.)
Population density (inh./kmq)
Housing density (nr. hous./kmq)
The population of the urban area is equally spread between males (48 %) and females (52 %). About 68 % of the urban area population belongs to the intermediate class of age (between 15 and 65 years old), which represents the class of persons of working age; about 18 % of people are older than 65 years and about 14 % younger than 15 years. The city of Rende is characterized by a younger population than Cosenza; in fact, only 12 % of people living in Rende is older than 65 years, against a percentage of 20 % for the city of Cosenza; in addition, 15 % of people living in Rende is younger than 15 years, against a percentage of 13 % for the city of Cosenza. This results can be confirmed by calculating the old-age dependency ratio, which is the ratio of the number of elderly persons of an age when they are generally economically inactive (age over 65 in this case) to the number of persons of working age (conventionally 15–65 years old). Specifically, the ratio has a value of 0.26 for the urban area and a value of 0.31 for the city of Cosenza; on the other hand, the value of the old-age dependency ratio for the city of Rende is half of the ratio for Cosenza (0.16).
In the urban area there are about 40,000 families; 70 % of these families lives in Cosenza. A large part of families living in the urban area (about 26 %) have one member; about 23 % of families have two members; more than 40 % are families with three or four components; finally, only 10 % of families have five or more members.
Resident employment data
Resident labour force
Resident employed persons
Resident persons employed in agriculture
Resident persons employed in industry
Resident persons employed in services
Obviously, these percentages are correlated to the population size. In fact, in order to compare the employment data of the two analysed cities and to give more specific information about the levels of employment, some rates can be calculated.
As an example, the regional employment rate gives an idea about the levels of employment by considering employed persons as a percentage of the population. In this study case, the employment rate is equal to 31 % for the urban area, 29 % for the city of Cosenza, and 34 % for Rende; therefore, Rende has a major number of people employed compared to the total population than Cosenza. Analogously, the regional unemployment rate can be calculated, by considering unemployed persons as a percentage of the economically active population (labour force). The urban area presents an unemployment rate of about 21 %, Cosenza of about 23 %, while Rende has the lowest value, equal to 18 %. By analysing the data about the employment by sector of the studied area, persons employed in the services represent 84 % of the total employed persons, about 14 % of resident persons work in the industry, and only 2 % in the agriculture. Finally, 76 % of employed persons are employees.
Number of persons employed in the private and public enterprises
Persons employed in agriculture
Persons employed in industry
Persons employed in services
Persons employed in business activities
Persons employed in other private services
Persons employed in public services
4.1 Daily trips characteristics
Census data of the population  also provides the data referred to the daily trips made by people from home to work and study places (commuter trips). The trips are distinguished into trips with destination in the place of residence (internal trips), and trips with destination outside the place of residence (external trips).
However, it is necessary to observe that among the trips from Cosenza some trips have destination in Rende and vice versa. Therefore, these trips are internal trips for the urban area. In order to quantify these, some information collected by previous surveys are taken into account, and specifically a survey realized on the occasion of the urban traffic plan drafting of Cosenza . The survey, effected in May 2000, was addressed to 649 households (2,014 members) out of 28,499 resident households . From the survey data it follows that there are 32,852 trips per day made (for all purposes) by persons resident in the city with destination in other places, but a relevant part of these (17,924 trips) had their destination in Rende (54.6 %). This percentage can be used for estimating the number of commuter trips with origin in Cosenza and destination in the urban area.
Analogously, from the survey realized in the occasion of the urban traffic plan drafting of Rende , a number of 7,293 trips per day made (for all purposes) by persons resident in Rende with destination in other places was estimated. Also in this case, a relevant part of the trips (5,272) had their destination in Cosenza (72.3 %). This percentage can be used for estimating the number of commuter trips with origin in Rende and destination in the urban area.
Daily trips for work and study purposes
Trips with destination in Cosenza
Trips with destination in Rende
However, it is necessary to point out that census data refer to the trips made for work and study purposes only, but a relevant part of the daily trips is made for other purposes. As an example, by the same survey realized in the occasion of the urban traffic plan drafting of Cosenza it emerges that out of 5,075 home-based trips realized by a sample of residents in Cosenza, 1,924 (38 %) are trips made for work and study purposes, but 3,151 (62 %) area trips realized for other purposes. Therefore, we can retain that 47,471 commuter trips registered by the census represent only 38 % of the total trips made in a day. By taking into account the complementary percentage (62 %), a realistic value of the daily home-based trips amount to 124,924. This value could be further increased in order to take into account the non home-based amount of trips.
5 Spatial techniques application
Clustering techniques have emerged as a potential approach for analysing complex spatial data in order to determine whether or not inherent geographically based relationships exist. The measures of global and local spatial autocorrelation, defined in the Section 2, were applied and implemented in a GIS environment for analysing the spatial association of the internal and external daily trips made in the urban area of interest. The computer program ArcGIS contains methods that are most appropriate for understanding broad spatial patterns and trends.
5.1 Global statistics of spatial association
The purpose of the application of global techniques is to understand the spatial distribution of trips among the census parcels in the entire urban area. The tools used for calculating global statistics in ArcGIS are High/low Clustering and Spatial Autocorrelation.
High/Low Clustering measures the degree of clustering for either high values or low values. It calculates the Getis-Ord General G statistics and associated Z score which is a measure of statistical significance. The null hypothesis to reject is “there is no spatial clustering”. When the absolute value of the Z score is large, the null hypothesis can be rejected. The higher (or lower) values of the Z score involve the strong intensity of the clustering. A Z score near zero indicates no apparent clustering within the study area, whereas a positive and a negative Z score indicates clustering of high and low values, respectively. This statistics is very useful to understand the pattern of daily trips in the urban area of Cosenza and Rende.
General G Summary for daily internal trips
General G Summary
Observed General G
Expected General G
General G Summary for daily external trips
General G Summary
Observed General G
Expected General G
Spatial Autocorrelation measures the Global Moran’s I which evaluates whether the analysed pattern is clustered, dispersed, or random. A Moran’s I value near +1.0 indicates clustering whereas a value near −1.0 indicates dispersion. The Global Moran’s I function also calculates a Z score value that indicates whether or not to reject the null hypothesis: “there is no spatial clustering”. To determine if the Z score is statistically significant, it is compared to the range of values for a particular confidence level. When the p value is small and the absolute value of the Z score is large enough to fall outside of the desired confidence level, the null hypothesis can be rejected.
Global Moran’s I Summary for daily internal trips
Global Moran’s I Summary
Global Moran’s I summary for daily external trips
Global Moran’s I Summary
The application of Getis-Ord General G and of Moran’s Index I gives similar results from the analysis of internal trips but dissimilar ones for external trips. In fact, for internal trips, the first statistics establishes that there is clustering of low values, and the second one confirms the presence of spatial patterns. Instead, for external trips, the General G statistics says that the distribution of data is random, whereas Moran’s I shows that there is a clustered pattern.
5.2 Local statistics of spatial association
The global measures of spatial association refer to the entire area and do not give indications about the clusters are localized. The local statistics of spatial association are useful in detecting places with unusual concentrations of hot spots. The tools of ArcGIS, which are used in this work for applying the local statistics, are Hot Spot Analysis and Cluster and Outlier Analysis.
Hot Spot Analysis calculates the Getis-Ord Gi* statistics for hot spot analysis. The output of the Gi function is a Z score which represents the statistical significance of clustering for a specified distance and must be compared to the range of values for a particular confidence level. A high Z score for a feature indicates its neighbours have high attribute values, and vice versa. A Z score near zero indicates no apparent concentration.
Cluster and Outlier Analysis measures the Anselin Local Moran’s I and identifies clusters of points with values similar in magnitude and clusters of points with very heterogeneous values.
A positive value for I indicates that the feature is surrounded by features with similar values. A negative value for I indicates that the feature is surrounded by features with dissimilar values. The tool also provides a Z score value for each observation. A group of adjacent features having high Z scores indicates a cluster of similarly high or low values. A low negative Z score for a feature indicates the feature is surrounded by dissimilar values. Finally, the tool provides a distinction between a statistically significant (0.05 level) cluster of high values (HH), cluster of low values (LL), outlier in which a high value is surround primarily by low values (HL), and outlier in which a low value is surrounded primarily by high values (LH). The Anselin Local Moran’s I output can be displayed by the visualization of these four patterns of spatial association.
Comparing the output of Hot Spot Analysis and Cluster and Outlier Analysis, a certain similarity emerges. In fact, both the statistics give an indication about the localization of the hot and cold spot, which is approximately the same.
The application of the spatial association statistic to commuting trip data introduced new aspects which merit further consideration, as said in . Moreover, the used measures can improve understanding of the strengths and weaknesses of the estimated models in terms of a spatial analysis. This understanding can be incorporated into improved and more comprehensive models.
The purpose of this paper is to investigate spatial association patterns in the distribution of daily trips made by people from home to work and study places (commuter trips). The trips have been distinguished into trips with destination in the place of residence (internal trips), and trips with destination outside the place of residence (external trips). Exploratory spatial data analysis was conducted applying both global and local techniques of spatial association. The main contribution of the ESDA is to highlight potentially interesting features in the data, and to address the modelling process.
The statistics were elaborated by using GIS, which allows the outcomes to be estimated with automatic proceedings and this aspect facilitates the application of techniques to large data sets. In fact, the application of spatial analysis has obviously become easier with the recent advancements in computing and GIS, which have revolutionized the development of planning support systems to study and simulate the future of travel demand in urban areas.
The results showed that the spatial distribution of trips among the census parcels displays clusters of similar values and there is spatial dependence in the data set. This means that to model the phenomenon is necessary to use spatial regression models because the application of non-spatial regression models can lead to wrong results.
The work presented in this paper is a step towards a wider work regarding the case study of Cosenza-Rende. Future developments will regard the analysis of interaction between land-use and transportation systems, the development of spatial regression models, and it will also comprise the supply transportation system, the localization of dwellings and economic activities, and the territorial features. Moreover, further developments will concern the check if the results can be generalized to urban contexts with similar characteristics to that studied.
This article is published under license to BioMed Central Ltd. Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.