Introduction

The extent of an individual’s knowledge of their local environment plays an important role in the spatial decisions taken by that individual at that location. A relatively low comprehension of the configuration of local areas reduces and potentially degrades decision-making ability, alternatives available to more informed travellers are overlooked. The importance of spatial knowledge in shaping behaviour has been well recognised in behavioural geography and cognitive science for a number of years [13], however there has been little attention given this aspect of behaviour within many conventional urban and transportation models. As recognition of the importance of individual behaviour - and population-wide heterogeneity - in shaping urban phenomena develops, the necessity of being able to understand and model the extent and structure of spatial knowledge becomes all the more pertinent.

Conventional interpretations of spatial knowledge recognise the influence of salient objects and hierarchies in distorting an individual’s perception of their surroundings. Notable early research by Kevin Lynch into the relationship between cognition and urban spaces differentiated five elements - paths, edges, districts, nodes, and landmarks - as anchors upon which spatial knowledge is constructed [16]. Subsequent studies identified how individuals build a subjective cognitive map of surrounding urban space through understanding spatial relationships between objects [5, 1]. Siegel and White’s seminal work identified that spatial knowledge is centred around a comprehension of the locations of landmarks (or salient objects), with a topological knowledge of route connectivity being a product of movement between landmarks [24]. Similarly, although within a different domain, it was established how the brain’s of mammals demonstrate sharply increased activity around specific locations, indicative of the utilisation of these points in navigation [20] Building on this and additional findings [12, 21, 22], anchor-point theory of spatial knowledge [7] asserts that knowledge is ordered hierarchically, with certain objects appearing more salient or easily recalled from the individual’s spatial memory. The individual nature of the hierarchical organisation of spatial knowledge is a product of both personal activity and the salience of an object [4].

Despite a wealth of well-established research describing the nature and structure of spatial knowledge, there has been limited extension of these models within the context of urban transportation modelling. While models of spatial knowledge and learning have been developed [14, 9, 2], their application within the real-world context has been minimal, neither linking real-world features to a cognitive representation of space, nor capturing the heterogeneity in experience and knowledge known to exist among an urban population. The approach outlined within this paper focuses on tackling both of these shortfalls - developing a methodology for the modelling of spatial knowledge within a large-scale real-world context, that too considers the nature and extent of individual-level heterogeneity.

In seeking to understand and model heterogeneity in spatial knowledge, one must understand the core drivers of variation in the experience of urban space. Central to this factor is individual engagement in activities across the city, something that has been a central element in research into activity spaces. Activity spaces describe the spatial extent of the regions within which individuals conduct their day-to-day activities. According to Golledge and Stimson, an individual’s activity space is constructed around three elements - movements near to and around the home location, movements to and from regular activity locations (such as for work, shopping or socialising, etc.), and movements near to and around the locations of these activities [8]. It is these regular movement behaviours that influence an individual’s experience of their environment [23], and contribute towards the formation of spatial knowledge.

An array of concepts and methods have been developed that aim to represent the spatial extents of activities. One stream of research builds on the time geography concept introduced by Hägerstrand [10], focusing on the probable extents within which an individual’s travel is limited by their available time. The potential path area (PPA) model incorporates activity locations and travel costs in deriving a planar space that estimates the likely limits of travel [17, 15]. Alternative approaches have been introduced elsewhere, experimenting with different structures in the generalisation of activity spaces. Schönfelder and Axhausen [23] implement three approaches that incorporate observed activity patterns: confidence ellipses - complete circular regions that aim to encapsulate a given percentage (95 % is referenced) of all trips; kernel densities - constructed over multiple activity location points; and shortest path networks - encompassing the minimum distance routes between all locations. The authors find greatest favour in the latter approach, given its reduced tendency to overgeneralise the total area of activity space.

Despite the demonstrable progression made in the domain of modelling activity spaces, none of the aforementioned approaches appear to align satisfactorily with conventional comprehensions of how spatial knowledge is built, structured and maintained. Spatial knowledge is built through physical interaction with the urban environment, with these interactions and spatial relations encoded mentally. A model of spatial knowledge must therefore replicate the relationship between activity and the extents of modelled space. It cannot be assumed that knowledge runs continuously if it has not been experienced, nor can path-like structures realistically represent how knowledge is encoded and used by the individual.

In this paper a new framework is introduced for the estimation of spatial knowledge across a heterogeneous population within a real-world context. The model aims to advance the current state-of-the-art in activity space modelling, replicating the process by which individuals undertake activities and expand their knowledge of space in doing so. The model will focus on the spatial knowledge of vehicular drivers, with a view to providing a methodology for the future integration of bounded spatial knowledge within conventional transport and urban models. In developing an approach for integration with other models, focussing on a real-world context, the model will be constructed, where possible, using commonly available datasets.

Models of spatial knowledge are, within this framework, disaggregated by both space and individual experience. Across space, variation in the distribution of activities, based on proximity and attractiveness, is estimated. With respect to experience, the model aims to capture how spatial knowledge develops over time, estimating how experience of space is expanded and deepened through increased exposure. The final result will be a set of representations of spatial knowledge, varying by location and by the number of trips undertaken.

The approach is made up of three core elements, introduced sequentially during this article, describing models of space, activity and learning. The next section outlines a model of urban space that aims to better encapsulate the role of nodes and landmarks in the formation of spatial knowledge. The structure is constructed using real-world GIS data sets, and will form the basis for the construction of heterogeneous models of spatial knowledge. Following this, in the third section, a set of spatial interaction models are presented that model heterogeneity in activity generation across the urban area. These models are calibrated and validated using survey data. The fourth section describes the combination of the models of space and activity within a framework for the generation of spatial knowledge. Estimated activities are undertaken, and in doing so the spatial extent of knowledge is deepened. Following this section, the real-world application of the approach are outlined, drawing on a range of case studies. The article concludes in outlining the benefits and limitations of this approach, as well as areas for potential future research and extension of the model.

Topological Space

The first stage in developing a model of spatial knowledge involves establishing the model of space that best reflects the relationship between the urban environment and individual driver cognition. Within this context, urban space is not considered planar, and the model should reflect how spatial knowledge is non-continuous across this space, restricted by experience, and skewed by specific features and locations [8].

The representation developed during this section draws upon the extensive research into cognition and spatial behaviour in cities outlined during the Introduction. The literature identifies how particular locations (usually considered to be buildings, landmarks, or other point-like features) are core to the neurological construction of spatial knowledge [19], in addition to being integral to decision-making and communication about urban space [24, 21, 7, 26, 29]. The nature of these features in urban space is less well defined, with the general consensus indicating only that these locations reflect visually or subjectively salient locations [8]. However, where considering the utilisation of such features within a quantitative model of the urban space, visual and subjective salience are difficult concepts to extract reliably from conventional datasets. While attempts can be made to generate quantitative classifications of the locations of landmarks and salient features, no predefined dataset nor methodology exists, limiting the wider application of any model derived in this fashion.

Where considering the spatial features of most importance to drivers, it is relevant to consider the role of the road network in shaping spatial knowledge. Across any road network, the features with the greatest potential for recognition and salience are road junctions. These locations are positioned prominently within the wider urban area, they are locations through which many individuals pass (more than any single road feature), and require increased cognitive engagement from the individuals, prompting a decision regarding route choice.

Within the literature there is strong appreciation for the importance of the road junction. As Lynch [16] confirms, junctions hold ‘compelling importance for the city observer. Because decisions must be made at junctions, people heighten their attention at such places and perceive nearby elements with more clarity’. Lynch also highlights the strong relationship between road junctions and widely-known salient features, such as city landmarks, squares and public transport stations. Passini [21] supports these notions, stating that junction locations are often centres of urban activity - be it transportation, economic or social activity - and represent points at which route decisions must be made.

The nature of junctions furthermore lend themselves to a natural hierarchicalisation as their proximity to certain routes determines their likely prominence upon the road network. This is an important factor in view of previous research indicating that individuals rank spaces hierarchically [12] and maintain knowledge with a granularity dependent on experience [24, 18].

The remainder of this section will outline the process of node definition, using junction point data, in London, United Kingdom. The approach utilises a conventional GIS data source, describing the point locations of road junctions, is aggregated to a point whereby a hierarchy of core junctions is extracted. Following this stage, the method by which nodes are connected within a topological representation of space is outlined. This second step incorporates observed route patterns to ensure the specification of realistic inter-nodal movements.

Node Definition

The topological network representation was prepared using the British Ordnance Survey Integrated Transportation Network (ITN) dataset, a GIS dataset that contains representations of all UK roads to a fine level of detail. In this dataset, individual road segments are constructed as sections of road traversing two road junctions. Junctions are modelled as separate node entities and two or more road segments may be connected via a shared node. It is through these nodes that the network topology will be constructed.

The ITN road network provides a hierarchy of road classification, specified by transportation authorities. This hierarchy presents an opportunity to introduce an inter-nodal variation with respect to the prominence of a specific junction. Variation in this respect is important, as it has already been established that certain nodes attract more traffic, and by virtue a greater number of routing decisions, than others. This ranking also provides a reasonable pathway for a generalisation of heterogeneity in spatial knowledge - whereby major junctions may be more likely to be known by individuals than less prominent junctions.

As nodes are not ranked themselves according to any specific classification, a node classification is formed based on the ranking of connecting roads. This ranking takes a four-level hierarchy as shown below, defined according to the incoming and outgoing road segment classifications. In order to capture only the most important features on the road network, only the highest ranking classifications are considered within this representation. All remaining nodes - for example, those linking only to Local Streets or Alleys - are not considered explicitly within this topology. This choice is formed under the assumption that they are not used explicitly during the route choice decision process. As will be demonstrated later, however, this does not preclude movement along these streets.

  1. 1.

    Junctions between only Motorways and A-Roads

  2. 2.

    Junctions linking B-Roads to Motorways or A-Roads

  3. 3.

    Junctions between B-Roads

  4. 4.

    Junctions between Minor Roads and Motorways, A-Roads or B-Roads

The node entity itself does not, however, provide a suitable object on which to build a conceptual topology of route choice. As one can observe in Fig. 1, where those nodes ranked according to the above definitions are shown, in present form there remain too many distinct objects. Individuals may be deemed unlikely to base selections upon this volume and density of objects, especially considering their somewhat arbitrary nature as simply elements within a GIS dataset. Drivers are more likely to base decisions on the fuzzy location of the junction, something that these node entities can help to capture.

Fig. 1
figure 1

Example of all junction nodes classified within the four-level hierarchy

To move towards a simpler representation of fuzzy decision point locations, junctions nodes are clustered according to their similarity in classification and proximity. This process consists of a bespoke two-stage spatial clustering method - iterating through each feature, forming clusters on the satisfaction of certain rules, then generalising the location of that cluster.

  1. 1.

    Initial Cluster Formation: Nodes are checked for their proximity of other nodes of the same ranking classification. Where a matching node is identified within a 250-metre radiusFootnote 1, the node is designated to be clustered with the search node. Clustering is dependent on the current cluster status of the matching point, resulting in either;

    1. a)

      If the matching point is not already attached to a cluster, a new cluster is created between the two points, with a tentative cluster location defined as the coordinate equidistant between the two clustered points; or,

    2. b)

      If the matching node is already assigned to a cluster, the search node is added to that cluster, and the cluster location being recalculated as the centroid equally located between all points within the cluster.

  2. 2.

    Cluster Optimisation: Iterate through the node set, test whether nodes are within a 250-metre radius of all cluster centroids generated during the first stage. Where a more favourable clustering is found, the node is moved to that cluster and the cluster centroids recalculated. The process continues iteratively, checking all nodes until no further amendments to the location of any cluster is made.

Figure 2 presents a snapshot of the centroid locations of finalised clusters along with the original junction points, focussing on the same area shown in Fig. 1.

Fig. 2
figure 2

Example of spatially clustered nodes (shown in green) formed from junction nodes (indicated in grey)

This node creation process was conducted on the London road network. The formation of the node clusters was complete after six iterations, establishing a total of 3087 clusters for the entire London road network. Of these, 816 nodes were classified at level 1 ranking, 475 at level 2, 110 at level 3 and 1686 at level 4. The distribution of nodes and their associated classifications is shown in Fig. 3 for the central London region.

Fig. 3
figure 3

Topological representation of space in central London with nodes indicated by classification

Edge Definition

The definition of edges between nodes completes the construction of a topological representation of space. While the nodes form the basis for spatial knowledge, upon which spatial decision-making in undertaken, the edges describe relationships between nodes. Thus as an individual arrives at a node location, the connectivity indicated by the edges at that node dictate which movements are available to that individual.

The edges are defined in this case using observed patterns of movement across the London road network. Observed movements are derived from a dataset of the routes of nearly 700,000 minicab journeys across London. Iterating through each route, the nodes traversed during each journey are recorded, and directed edges constructed between each successive pair of nodes. During this process no concern is given towards the exact route taken by the minicab between nodes, only that the node-to-node movement has taken place.

To account for the hierarchical structures between nodes, each route is assessed four times. In the first instance, only movements between level 1 nodes are considered, in the second iteration, both level 1 and level 2 nodes are included, in the third, level 3 nodes are incorporated, and in the final iteration, movements between all nodes are considered. The extraction of four topological models in this way allows the incorporation or exclusion of nodes from lower levels of the spatial hierarchy, reflecting limitations in spatial knowledge. In the next stage, the method by which spatial knowledge is limited, influenced by activity, is outlined.

Activity Estimation

Individuals gain a great deal of their spatial knowledge through necessity. In their day-to-day activities, individuals must travel to certain locations to carry out these activities, predominantly including activities such as shopping, socialising and working. Location plays an important role in this process, individuals will travel to different locations to carry out various activities, but they will generally not seek to travel too far from their home location to do so. In identifying the activities of individuals at a given location, one must therefore understand the attraction exhibited by nearby locations, for a range of tasks. During this phase of the modelling process, this relationship is estimated - in lieu of a comprehensive data set describing such activity - through spatial interaction modelling.

Town Centre Attraction

Spatial interaction models are an established technique for the identification of inter-zonal spatial relationships. By modelling some concept of zonal attractiveness, relative to the travel cost between the zones, one is able to build a broad understanding of how much interaction between two zones there is likely to be. Within the implementation of spatial interaction models here, four different types of zonal attractiveness are incorporated, each relating to the utility of an area for a specific type of activity. As such, in assessing the likelihood of interaction between zones, the model implemented here incorporates an additional element indicating the type of activity being undertaken. This amended model of spatial interaction therefore takes the following form [11]:

$$ {A}_{ij k}={F}_{jk}^{\alpha_k} exp\left(-{\beta}_k{d}_{ij}\right) $$

Where A ijk is the attraction from location i to zone j for a given activity k, F jk is the area of floorspace in zone j relating to activity k, and d ij is the travel cost between location i and zone j, and α and β are parameters relating to the relative influence of F and d in decision making around activity k. Calibration of these latter parameters, and how they vary with relation to each activity, is discussed in detail in the next section. Given that only vehicular travel is considered within this representation of spatial interaction, the travel cost value utilised for this work is calculated simple road network distance.

Given an attraction value of A ijk for each location by activity, one can then estimate the number of trips, from a complete set of trips for activity k, that will be attracted to this location. This is a simple proportional calculation, taking the following form:

$$ {T}_{ijk}={T}_{i.k}\frac{A_{ijk}}{{\displaystyle {\sum}_j}{A}_{jk}} $$

Where T ijk represents all trips between location i and zone j for activity k, and T i. k represents all trips from i for activity k.

The set of origin locations i utilised within this model consist of the 822 wards of London, with the attractor zones j designated as the 166 ‘Town Centres’ of London. These locations, indicated in Fig. 4, specified by the Greater London Authority (GLA) represent areas of clustered commercial activityFootnote 2, and are a central feature in strategic planning for various organisational bodies in London. Along side these spatial definitions is survey data relating to the commercial capacity and activity of each Town Centre. Within these data are floorspace estimates for a range of activities at each Town Centre, and it is these data that are incorporated within the model of spatial interaction, and constitute F jk .

Fig. 4
figure 4

Locations of Town Centres in London, with centres classified by the GLA as ‘International’ or ‘Metropolitan’ indicated

For each Town Centre, floorspace estimates relating to four types of commercial activity are extracted from the town centre data sets. These classifications of commercial activity are commonly used in retail and marketing studies. The data provided for each Town Centre data are equated to their relative attraction for the purposes of completing each of the following leisure activities:

  • Convenience Goods Shopping - This activity consists of shopping for day-to-day items such as food produce, drinks, tobacco, newspapers and non-durable household goods. It is hypothesised that individuals take these trips regularly, with a higher preference for nearby retail locations rather than necessarily high floorspace (indicative of greater range).

  • Comparison Goods Shopping - Shopping for items bought less regularly, including items such as clothing, homeware, jewellery, books, music, electricals and furniture. Individuals would be expected to be willing to travel further for these types of item, to areas with higher floorspace of outlets selling comparison goods, so naturally exhibiting a greater range of choice.

  • Commercial Leisure - Activities relating to trips to the cinema, theatre or sports venues. Again, individuals are likely to engage in these activities less regularly than convenience goods shopping, however, it is unlikely that larger retail zones represent a positive utility in selecting a region. It is expected that individuals are more likely to favour nearby locations for these types of activity.

  • Eating and Drinking Leisure - These activities refer to trips to locations with higher concentrations of food and drink outlets. Individuals are again, it is considered, less likely to visit regularly, but are more likely to prefer nearby locations with a reasonable range of options.

These classifications broadly encompass most types of individual retail-based activity, and are classes that have been widely employed in other studies of commercial behaviour. The calibration of the model to estimate the role of floorspace and distance in selecting locations for each of these activities is described below.

It should be noted at this point that, clearly, the activities described here do not represent all types of individual travel behaviour. One particularly prominent absence from this model being work-driven activity. Inter-zonal relationships in this respect are less straight-forward than commercial activities, with varying distributions of workplace and worker types across zones leading to a more complicated linkages. Likewise, the model is unable to capture the influence of non-commercial leisure and socialising activities without a higher level of model complexity that is unnecessary at this stage. These limitations are discussed in more detail later. It is expected, however, that in including four different types of readily predictable activities, that a considerable extent of regional interactions can be detected nevertheless.

Parameter Specification

As described above, each activity undertaken by an individual will encompass varying preferences with respect to a willingness to travel or an attraction to larger, more diverse Town Centres. The diversity of these activities will see individuals move to different parts of their environment, extending their knowledge of space. These preferences are captured within the spatial interaction model by the α and β parameters, and during this section the process the calibration and validation of these parameters, for each activity, is detailed.

The parametrisation process utilises household survey data relating to retail and leisure activity within two London borough regions, carried out on behalf of two borough councils by Roger Tym and Partners, a town planning and development economics consultancy. These studies were carried out in Hackney and Camden borough regions in 2005 and 2008 respectively [27, 28]. The Roger Tym surveys - that, from the reports’ descriptions, follow the same methodology - investigate the attractiveness of various locations for different activities, differentiated into the same four behaviours - namely, convenience retail, comparison retail, commercial leisure, and eating and drinking - described above. Within each borough a subset of the population were surveyed with respect to their preference for different areas in carrying out these four activities. In Hackney, the study surveyed 1,200 individuals, in Camden, 1,001 individuals were questioned, with an equal spread between spatial zones (representing clusters of two or three wards) within each borough.

Questions focussed on asking individuals to pick their most preferred option, for each activity, from a range of locations. For the most part, these locations either corresponded directly with Town Centre definitions, or could be manually matched based on close proximity (e.g. ‘Hackney Tesco, Morning Lane, Hackney Central’ assigned to the Mare Street region). Where matches were not possible (in the case of large, standalone supermarkets) these locations were recorded in an additional category of ‘Others’.

Calibration

The initial calibration of α and β parameters was established through a comprehensive sweep of parameter combinations. A wide range of potential pairs of parameters were combined, implemented within the spatial interaction model described above, and their ability to predict the attractiveness of different locations derived. This process was repeated separately for each activity. At this initial phase, only matches with the Hackney data set were explored, leaving the Camden data set for validation purposes. The application of this model aimed to replicate the process utilised during the Hackney study process. As such, the borough was divided into the same 12 spatial areas utilised during the study, with 100 trips generated - representing the 100 surveyed individuals in that zone - from each location. These trips were distributed within these zones between constituent wards, totalling 26 areas, yielding an even spatial spread of trips within each zone. For each activity, this process generated 1,200 trips across the entire borough region, representing an equivalent data set for comparison against the survey results.

Following an initial investigation into parameter ranges, two subsets were selected that aimed to cover the full possible range of parameter combinations. For the α parameter, these were bounded from 0.1 to 2.0, with a 0.1 interval, yielding 20 values. For the β parameter, again 20 values were tested, ranging from 0.0001 to 0.002 with an interval of 0.0001. The comprehensive combination of these ranges of parameters yielded 400 combinations. These 400 combinations of parameters were applied across the 26 study wards, assessing the attractiveness of each of the 166 Town Centres for each of the four activities. The process in total generated the calculation of a total of 6.905 million spatial interaction estimates.

The assessment of the parameter combination was carried out simply through a direct comparison of the model estimate trips and the findings from the study. A cumulative difference was established between the model output and surveyed data, and the three best performing models by this measure established. This process was conducted separately for each of the four activities. These twelve models were then passed forward for further assessment during the validation phase. The selected models are shown in Table 1 along side the level of model performance.

Table 1 Top three best performing models for activities following initial calibration, showing parameters and total difference between estimated and survey data for Hackney

Validation

During the second stage of model development, the best performing models identified through the calibration of the Hackney data set were applied to the Camden data set. Again, in line with the calibration stage, trips were disaggregated across the wards within the borough, with the 1,001 individuals involved in this study distributed evenly across the 25 wards within the Camden region. The spatial interaction models were executed using each of the best performing parameter pairs derived during the calibration phase, and trips distributed accordingly. Trip distributions are derived from this process, as executed during the calibration phase, and once again compared through cumulative difference with the survey results, in this case relating to Camden.

The generation of estimates for Camden using each of the best performing models are shown in Table 2. In terms of the selection of the single set of models to be implemented, the choice was quite straightforward. Across each activity, the single best performing model identified during the calibration process were once again identified as most favourable within the Camden assessment. These best performing models are highlighted in Table 2.

Table 2 Selected parameter combinations following validation phase, showing parameter pairs and total difference between estimated and survey data for Camden. Selected models are italicized

In summary the weights derived for each activity appear to fall in line with the hypotheses outlined earlier. Convenience Retail behaviour received a considerably lower α score and much higher β value than observed for other activities, suggestive of individual’s preference for nearby retail zones, without major consideration for floorspace. In terms of the remaining activities, preference values appear to be broadly similar. Floorspace is of a greater concern during this selection process, indicative of a greater available selection, while individuals are shown to be more willing to travel longer distances for the right places, relative to those trends identified in Convenience Goods shopping. It is apparent that in many cases, particular Town Centres will serve a triple purpose in terms of their provisions for Comparison Goods shopping along side Commercial Leisure and Eating and Drinking establishments.

While higher cumulative differences are observed during the validation stage, in most cases the best performing model aligns closely with that generated for the Hackney region. Nevertheless, the differences are indicative of the limitations of the parameterisation process, limiting the study to the London region alone. Extensions to other cities would require reparameterisation using equivalent retail activity studies.

Trip Generation

The generation of trips to each Town Centre, in line with the models developed above, must incorporate an understanding of the proportion by which each type of activity is undertaken. This element represents the T ik element of the spatial interaction model. These estimates are derived from existing data sets describing trip frequencies and purpose.

In this regard, the most reliable and London-specific data set identified was the London Travel Demand Survey (LTDS) for 2011 provided by Transport for London [6], which details trip purposes for all journeys taken in London. These data indicate that the majority of trips are made for the purpose of ‘Shopping and personal business’ or ‘Leisure’, contributing 29.1 % and 27.6 % respectively; other categories include ‘Commuting’ (at 16.7 %), ‘Education’ (8.3 %), ‘Other Work’ (6.3 %) and ‘Other (including Escort)’ (11.9 %). These results are indicative of the prominence of shopping and leisure trips, in constituting 56.7 % of all journeys, in shaping individual conceptions of their environment.

There was, however, little direct information identified with respect to the splits in trip purpose within these categories. This is due to there being little crossover between retail and transport activity classifications. In this regard, only broad estimates can be implemented. Of the shopping trips taken by an average individual, 80 % of these trips are assumed to constitute Convenience Goods Shopping, with the remaining 20 % representing trips for Comparison Goods. For leisure trips, a 60–40 split is introduced, with Eating and Drinking trips expected to be taken on a slightly more frequent basis than Commercial Leisure trips. Splitting these proportions within the broader category splits, yields the final trip proportions for all journeys described in Table 3. Clearly, the rough specification of these trip proportions, in view of the lack of a reliable data source, is disadvantageous. However, in carrying out this work, one simply seeks a reasonable expectation of the local spatial spread of journeys around an origin zone. Further advances upon this approach, incorporating geodemographic variation, for example, may offer more accurate representations of the spatial spread of trips, but for little expected added value in this case.

Table 3 Purpose proportions utilised in trip generation

Using these proportions a given set of trips may be split by function, and their distribution modelled separately according to the parameters described in 4 for each trip type.

Spatial Learning and Experience

The third and final stage of the modelling process involves the generation of spatial knowledge, integrating the models of space and activity outlined in the previous sections. The process by which spatial knowledge is generated is based upon the way by which an individual may go about learning their local environment. From a given home location, an individual carries out daily trips to local Town Centres, for the completion of a range of tasks. As they complete more trips to that particular location, the inclination to explore and improve upon their route increases. As they travel more regularly towards this location, the individual identifies more preferable routes, those that they might not have noticed during their first few journeys. And so, following a given number of journeys to the same place, the individual will have worked out the ‘good’ and ‘bad’ ways of travelling there, with their locations broadly recorded within their personal cognitive map.

This process of learning has been integrated within the model of spatial knowledge generation. Space is learnt progressively as increasing number of trips are conducted, as new, more efficient routes to a range of destinations are constructed. For each defined area of the city, the spatial knowledge of an individual inhabiting that zone can be built upwards from highly basic knowledge, structured around their increasing activity in the area. This gradual growth in spatial knowledge is analogous of an individual’s increasing experience of a region.

Spatial knowledge is constructed according to the hierarchical node-based representation of the city, as defined earlier. As individuals increase their experience of an area, they increase the depth of the hierarchy to which their knowledge extends. The generation of spatial knowledge is differentiated spatially by ward areas. The British ward area definitions yield a spatially granular variation in knowledge, ensuring not too much nor too little generalisation across population knowledge, and, given their historical nature, more accurately encompass structurally or culturally homogeneous regions than alternative regional representations. Thus, in adopting this model, one assumes a homogeneity in spatial knowledge across each ward, an assumption that will be discussed in more detail in the conclusions.

In generating trips, a route is modelled across nodes from the origin ward to the destination Town Centre. The construction of the route follows a principle of angle minimisation towards the destination, minimising any deviation from the straightest line to the destination. This principle is selected given increasing recognition within the literature that angle minimisation towards a destination plays an important role in navigation [3, 25]. In constructing the route between origin ward and destination town centre, only those nodes known to the individual are utilised.

The procedure by which spatial knowledge is built proceeds as follows, and is repeated once only for each ward within the London region. The parametrisation of the model is discussed in more detail in the results section.

  1. 1.

    Using the Spatial Interaction models defined earlier, generate 20,000 trips, the maximum number of trips that will be taken from each ward during the generation process, to Town Centre destinations.

  2. 2.

    Assign all Level 1 hierarchy nodes within the road network as known. The incorporation of these nodes is indicative of a broad structural knowledge of the individual’s environment, a knowledge that enables orientation in space.

  3. 3.

    Assign all nodes, which ever level of hierarchy they are assigned, within the ward region as known. This measure is indicative of high levels of knowledge at the very local level.

  4. 4.

    Repeat for all trips generated at Step 1:

    1. a)

      Select a single trip at random from trip set.

    2. b)

      Construct route to trip destination using a principle of minimal angular deviation from the destination, using the current Experience Level attributed to the destination region on which to base the route. For example, if a region is currently assigned an Experience Level of 2, only those nodes ranked in either Level 1 or 2 may be incorporated into the route plan.

    3. c)

      If node in route plan previously unknown, add to known nodes.

    4. d)

      Add all nodes within Town Centre region to known nodes, if not already included.

    5. e)

      Increment trip count to destination region by 1.

    6. f)

      Reassess experience level for trip destination as follows, based on number of trips to destination. Should the number of trips to a destination exceed current experience level then increment level (allowing the construction of a spatially more specific route to the destination to be constructed during Step 4b). The specification of the experience thresholds is outlined below.

      1. i.

        If Trip Count < 5 Then Experience Level = 1

      2. ii.

        If Trip Count > =5 and Trip Count < 15 Then Experience Level = 2

      3. iii.

        If Trip Count > =15 and Trip Count < 30 Then Experience Level = 3

      4. iv.

        If Trip Count > =30 Then Experience Level = 4

  5. 5.

    Complete spatial knowledge construction once all trips have been generated.

As the number of trips increase during the course of this process, as trip destinations are discovered and visited, the individual’s knowledge of space extends. This process therefore naturally enables the extraction of knowledge at different levels of experience. Thus, at predefined points during the construction of spatial knowledge, known node representations are extracted, representative of spatial knowledge at a progressive levels of experience.

For the purposes of the London region, a new spatial representation is extracted after every 250 trips. This value is derived from Transport for London activity data that states that, on average, individuals take 0.7 shopping journeys per day, and 0.66 leisure trips per day, equating to 1.36 trips per day, or 248.2 trips per half year. The figure is rounded to 250 for simplicity. Thus, very broadly, one would hypothesise that 1000 simulated trips would be indicative of an individual having lived in a location for 2 years, and likewise 7,500 trips indicative of knowledge generated over 15 years. Over the full 20,000 journeys generated through this process, one therefore extracts representations of spatial knowledge indicative of 40 years of experience. Clearly, any such prediction over a 40 year period is highly speculative, incorporating a vast swathe of assumptions, the nature and implications of these will be detailed more thoroughly in the discussion later.

The setting of the experience thresholds that guide the growth of spatial experience are specified to align with these temporal definitions. Little quantitative real-world evidence was identified linking exposure frequency with knowledge growth, so the levels were set to ensure that knowledge growth increased gradually. The final classifications ensure that local, highly attractive destinations will become well known within a year (after 250 trips), while others, visited irregularly, will not achieve the same levels of familiarity for a number of years. While these thresholds are established for London, this stage of parameterisation will require particular consideration were the approach to applied elsewhere.

Model Application and Results

The generation of representations of spatial knowledge was carried out for 822 wards in London. For each ward, 20000 trips were generated and modelled, leading to the modelling of a total of 16.44 million journeys, and node knowledge data extracted every 250 trips. This process leads to the extraction of 65760 effective profiles by which spatial knowledge may be modelled.

Five case study wards are examined in this section, chosen to provide a wide examination of trends in spatial knowledge generation across the city. The regions under examination are South Bermondsey (in south London), Clissold (in north-east London), Headstone North (in outer north-west London), Wimbledon Park (in south-west London), Fulham Reach (in west-central London).

Temporal Growth

The first aspect to explore is the rate by which spatial knowledge is extended as experience increases, and any variations one observes across each scenario. The rate at which nodes are experienced is shown in Fig. 5 for each of the five case study regions (Level 1 nodes that have not yet been visited during the simulation are not included here).

Fig. 5
figure 5

Growth in numbers of known nodes over time for five case study regions

As one can observe in Fig. 5, the number of known nodes increases rapidly during the early stages of the modelling process, with the learning rate then decreasing substantially beyond 10000 trips. Considering that any elements of memory loss are not considered within this model, this is broadly in line with what might be expected. An individual encountering a region for the first time is considerably more likely to find new, previously undiscovered areas of space than any individual who has travelled the area for a number of years. Examining, additionally, one can observe how increasing experience leads to a considerably more detailed knowledge of space. For Clissold alone, an individual having explored the surrounding region for 6 months may be expected to have knowledge of 88 nearby nodes, after ten years this may have reached 216 nodes. Thus while growth in spatial knowledge is initially very rapid, prolonged exposure continues to lead to its significant extension.

Spatial Growth

While temporal growth provides some insight into the spatial learning mechanism, it is vital that, in examining the generation of spatial knowledge, one examines in detail its manifestation across space. This analysis can be explored both at the local level and in how variations are observed across the entire space.

Granular Patterns

Examining local patterns allows one to observe how spatial knowledge spreads over space, and how ultimately representations of knowledge of individuals from different areas of the city may be considerably different. Two case studies are detailed here, to provide an indication of these elements - Clissold, located in north-east London, and Wimbledon Park, in south-west London.

Clissold

The spatial knowledge representations generated for Clissold after 250, 1,000, 2,500 and 10,000 trips are shown in Fig. 6. These intervals are selected to demonstrate how knowledge of space grows rapidly during initial stages before reaching its stable state beyond 10000 trips. Again, these representations only show those nodes that have been encountered during the modelling process.

Fig. 6
figure 6

Growth in spatial knowledge over time in Clissold

As one can observe from the results for Clissold, the generation of spatial knowledge over time follows the attraction towards nearby destinations. Knowledge extends as destinations are visited and the topology constructed, not arbitrarily taking on an even or smoothed structure. As one can already see after 250 trips, knowledge construction is skewed towards central London (marked A in Fig. 6a), with an increased clustering of nodes apparent within these zones and on the various routes towards them. Once one reaches 2,500 trips, it is clear that many of the destinations that will be visited by the simulated individual have already been done so, forming the broad boundaries of knowledge for an individual inhabiting that region. At 10,000 trips, while the spatial extents of the knowledge have again widened slightly, a greater growth is appears to be upon the increased specification of knowledge around and towards already visited areas, as more refined routes towards regularly visited destinations are generated. It is this process of increasing specification of knowledge that replicates how increased experience can enable a more considered treatment of space, and the construction of more efficient routes.

Wimbledon Park

Likewise to the trends shown in Clissold, one can examine how spatial knowledge grows around the Wimbledon Park region, in south-west London. Known node patterns generated for this region are shown in Fig. 7, again after 250, 1,000, 2,500 and 10,000 trips.

Fig. 7
figure 7

Growth in spatial knowledge over time in Wimbledon Park

The Wimbledon Park spatial knowledge representation appears to demonstrate many of the same patterns of development as observed around Clissold, albeit on a slightly wider spatial scale. Once again one can observe how a number of main attractive destinations form a framework early on, which is then ‘filled in’ as increasing numbers of trips are carried out. The reduced density of the urban area around Wimbledon (influenced by the location of Richmond Park, marked A in Fig. 7a) furthermore helps demonstrate the increasing specification of knowledge over time. As one can observe, after 250 trips the individual has already visited the Kingston Town Centre (denoted B in Figure 7a) to the direct west of Wimbledon Park. However, it is not until after 2,500 trips that a detailed knowledge of the route between the two locations is established.

The local trends demonstrated in both case studies describe how the formation of spatial knowledge is shaped by destinations, the routes towards these locations, and the decisions that must be made en-route towards the target. During the early stages of the simulation, route choices are made naively, based on only a broad understanding of the road network, and to only a few highly prominent destinations. As increasing numbers of trips occur, the extent of this knowledge is widened and deepened, allowing the individual to select more refined routes towards their destinations. It was demonstrated that after 10,000 trips, or around 20 years of trips originating from one location, a detailed regional knowledge is established. The distinctions in spatial knowledge influenced by experience, as demonstrated within these two examples, may be implemented in defining more sophisticated representations of driver behaviour across different areas of the road network.

Regional Trends

Although local structures of spatial knowledge are central, at a broader level, it is furthermore interesting to observe the wider trends in relative differences between representations of spatial knowledge. As was shown in Fig. 5, the extent and rate of growth of knowledge by an individual in South Bermondsey is expected to vastly exceed that of the equivalent individual inhabiting Headstone North. The reasons for these deviations might be accounted to two factors - the spatial structure of the surrounding areas, and the proximity and attraction of nearby Town Centres. On the first point, it is clear that central London consists of a considerably denser in urban structure than outer London regions, and thus it may be reasonably expected that such an area requires a more spatially granular interpretation in order to enable efficient navigation. On the whole, more central regions - such as Clissold and South Bermondsey - demonstrate a higher specificity of spatial knowledge. However, as is clear from the example of Fulham Reach, this effect is not continuous.

It would appear that, as one moves closer towards central London, with the increasing attraction to the large nearby leisure and retail centres in and around this region, the necessity or desire to explore other regions reduces. As such, the spatial knowledge of an individual in central London may be considerably lower than an equivalent individual living slightly outside of this area. This effect is demonstrated in Fig. 8, which visualises the number of known nodes by ward after 10,000 trips from each region. One can observe a reduction in the degree of spatial knowledge in both central London and outer London. It is noticeable from this representation too, that one observes similar effects nearby to prominent Town Centres in outer London, such as Kingston, Hammersmith and Wood Green, denoted A, B and C respectively in Figure 8.

Fig. 8
figure 8

Number of known nodes by ward after 10000 simulated trips

Discussion and Conclusions

The approaches described during this paper detail methods developed in order to enable the broad estimation of variation in spatial knowledge across a population of vehicle drivers. The modelling methodology - incorporating approaches towards the modelling of cognitive space, activity and spatial learning - was intended to replicate the process by which spatial knowledge and experience is extended during road travel. The model of space developed during this paper aims to capture the prominent locations in urban space that have been extensively identified to be central to spatial memory and decision-making. The model of activity, defined and parameterised using survey datasets, aims to reflect associations between residential and commercial locations, for a range of different leisure activities. In combining these models of space and activity with a framework for spatial learning, a model of the process by which spatial knowledge develops, driven by activity, is outlined. The combination of approaches - better aligned with conventional research into spatial cognition and neuroscience - aims to provide a more accurate reflection on the nature of spatial knowledge than is achieved through conventional models of activity space.

In considering the practical extents and utility of this modelling framework, it must be initially recognised that the modelling of spatial knowledge will always be restricted to a certain degree. Where considering any particular individual, a vast swathe of factors are too difficult to capture to enable their incorporation within this model. These might include, among others, aspects relating to an individual’s ability to remember locations, any personal history with particular locations (such as where they used to live, or where their partner used to live, etc.), or any other particular activities the individual previously or currently undertakes. What the framework does, however, is provide a broad disaggregated indication of how spatial knowledge may broadly vary across the city, reducing in accuracy as one moves away from a home location. Importantly, too, this dissipation in knowledge is not linear, and is instead influenced by an experience of particular features and attractive locations across urban space. By linking directly with standard GIS datasets, the framework may be relatively easily integrated within conventional models of urban phenomena and extended to other cities. This potentially enables the incorporation of the role of bounded spatial knowledge in influencing individual behaviours (potentially within a wider framework of bounded rationality), and thus its subsequent influence in defining wider patterns of behaviour.

In addition to the natural limitations associated with the prediction of spatial knowledge, there are a number of areas where this methodology may potentially be extended. Firstly, despite the introduction of a spatial disaggregation in knowledge representation, a significant degree of unrealistic homogeneity exists within the model. Aside from the variations in experience, differentiation between individuals is drawn from the ward upwards, with all trips and trip costs generated from the centroid of each ward polygon, an approximation that may be reasonably deemed unrealistic in some of the larger wards. Two improvements in this respect may be achieved through simply increasing the granularity of the spatial representation, or better by focussing on inter-regional variation, particularly in terms of geodemographics. It is clear that a significant degree of variation in an individual’s movement can be described more effectively by considering demographic as well as spatial variation. What may be deemed attractive to one demographic existing within a particular area, may not seem compelling to another group within the same area.

Where one considers variation among local populations, one furthermore must question the applicability of the trip generation model to effectively describe attraction to particular regions. One improvement on this approach would be the incorporation of trip chaining, whereby trips are planned and conducted together. This would more realistically capture the volume of interactions between an individual and a location, not treating each activity as a disassociated journey.

At present the model incorporates a limited view on mobility in cities. Only motorists are considered, and spatial knowledge is constructed around road network junctions based on retail and leisure activities alone. The next larger extension of this approach should include pedestrians and other non-drivers, and different types of activity. There are two clear advances that can be made on this front. First, in the current model, topological spatial knowledge is assigned against junction locations. Not only does this design assume that navigation is conducted at junction level only, but it may be problematic to extend this approach to the modelling the spatial knowledge of non-vehicular drivers. The significant locations for other users may include salient landmarks (such as public transport stations or public squares) that fall some way from the road network. Research should be carried out to better clarify the nature of salient features for navigation under all contexts, especially the relationship between vehicle drivers and junctions (a relationship at present only indicated). With respect to the modelling of these locations, a potential extension may be the incorporation of different network centrality measures - an possible indicator of frequency of exposure - to capture these locations. The use of centrality measures in place of the existing networkhierarchy would also enable the simple reapplication of the approach to other city contexts.

A second area of extension should address activity simulation. Activities expand some way beyond retail and leisure too, with work trips constituting 0.40 trips per day and education-related trips another 0.20 trips on average per day, according to the Transport for London Travel Demand Survey. While of lesser importance than shopping and leisure trips (representing 0.77 and 0.6 trips per day respectively), their implementation would clearly improve the breadth of the model. In a similar sense, the influence of public transport routes, in particular their role in shaping the attractiveness of different destinations, may be investigated more thoroughly in subsequent iterations of this model. The approach detailed here assumes road-based travel, incorporating a road network-based distance weighting approach. In London, the role of the Underground, train and bus routes play an important role in shaping where individuals deem attractive or otherwise.

Finally, at this point, the models remain unvalidated. This remains a future objective, but a number of reasons exist for why this has not been explored at this point - the inherent and complex issues associated with externalising and recording individual’s spatial knowledge [8], the large number of participants that would be required to conduct a wide scale assessment, and the simple fact that too many of the facets that might accurately describe an individual’s knowledge of space remain missing. On this final point, while the extension of the model in the directions previously discussed may move these models closer to a more realistic representation, fundamentally any model will remain only a broad estimate of any individual’s knowledge of space. While capturing the nature of knowledge is near impossible for the vast majority of the population, but the model outlined here introduces a modelling approach for its broad estimation.