The aim of this work, as outlined in the introduction, has been to explore the efficacy of agent-based modelling as a means of simulating the daily spatio-temporal behaviour of different population groups to better understand how town- and city-centres are used during the daytime. This section will review the preliminary findings and discuss the extent to which this aim has been met. Later sections then outline the main caveats and present ideas for immediate future work.
Preliminary findings
The activities of some groups of people are easier to estimate than for others. For example, from the 2011 UK census there is an abundance of data about where people live and, for employed people, where they go to work. By coupling these data with estimates of the timings of their activities – in this case from a large time use survey – it is possible to estimate the spatio-temporal activities of a reasonably large group of people. Here we call them commuters. The surf model, in its initial iteration, is a model of the typical routines and displacements of those 9–5 commuters on typical workdays. Having used all available evidence to estimate the activities of commuters, it becomes possible to create simulated estimates of the overall amounts of footfall that this group contributes to. There were, of course, clear discrepancies between the simulation of a single demographic group, and the real data that included a much broader range of individuals. Therefore, following discussions with local stakeholders another substantial group who are present in the case study area was added to the model; that of retired people. The inclusion of this new agent type substantially reduced the simulation error and paved the way to continuing this iterative processes, gradually increasing the number and diversity of groups who are modelled.
These results provide a potential avenue to better understand how urban spaces are being used in the absence of information about the behaviour of individuals in the study area. Although footfall data are available, these do not reveal information about the individual people who contribute to the aggregate footfall. Indeed, we would argue that it would be unethical to attempt to identify, and potentially begin to track, those individuals in the first place. In effect, therefore, the model provides a means to disaggregate the available footfall data by the demographics of the individuals who contribute to the aggregate counts. Agent-based modelling is a methodology that is ideally suited to this task as it offers a means of combining high resolution spatial data (in this case the census) with high resolution behavioural data (the time use survey). The model is used to marry these otherwise disparate datasets to create a more robust picture of daily urban dynamics.
There are, of course, some differences between the simulated and real data that remain. Fig. 7 illustrated these. The most notable difference is the reduction in footfall at approximately 15:00. This begs the question: who are the people who make up this extra footfall? This time corresponds closely with the time that children, both young – who will be collected by carers – and older – who are mostly unsupervised – will leave their schools. It is therefore extremely likely that this group (children and their carers) are the cause of this particular discrepancy. Interestingly, the inclusion of these groups in future iterations will not only require the addition of a new group (schoolchildren), but also the diversification of the existing groups as many commuters and retired people will be responsible for delivering children to and from school. There is evidence for this both in the time use survey and anecdotally following the discussions with stakeholders.
Regarding the location of the activities, it is also worth noting that some sensors, notably 1 and 14, suggest morning and afternoon peaks that are more indicative of commuting behaviour than others. This points to the possibility of identifying the most likely locations, as well as times, at which the different groups might be present. Fig. 8 focussed specifically on the locations of each of the sensors. This, again, has the potential to provide useful information about these non-commuting groups. If, for example, policy makers are concerned about the impacts of pollution on the elderly or young children, a method such as this could be used to provide evidence about the times and locations that the group will be most active. This information is otherwise extremely hard to gather using traditional sources such as surveys and censuses, and even more so using big data sources that are often biased towards certain age groups, potentially excluding the very young or very old. We see this as the main contribution of the paper; a means of simulating the daily spatio-temporal behaviour of different population groups, particularly for those whose activities are otherwise very difficult to interrogate.
Caveats
There are a number of caveats that are important to note. Firstly, there are questions regarding the real-world footfall data that have yet to be resolved. For example, it is reasonable to assume that most smartphones will be counted by the sensors, so to estimate age or gender bias we can look for data on smartphone saturation within the population at large, at least if there is no specific bias in Wi-Fi usage between different groups. It is harder to estimate factors such as whether small numbers of people who, as an artefact of their activities, will trigger a sensor multiple times in an hour. Although there are some mechanisms that can be used to guard against this, e.g. by temporarily recording the unique identifier (MAC address) of the phone and ignoring it if it is counted more than once per hour, on modern smartphones the identifier changes regularly so this is not possible. Nevertheless, the footfall data are likely to represent a sufficiently accurate proxy for day time activity levels in the town. A few recent studies have shown that a bias in phone usage between demographic groups does not have a major effect on general travel patterns [38, 39].
Another potential problem is that although the footfall data are relatively recent, covering the period October 2015 to June 2017, the census data are from 2011. It is likely that some people will have moved or changed jobs in this time, but estimating the current commuting patterns in the area is beyond the scope of this work.
The intensity-based decision framework seems to work better when agents have at least one not-at-home regular activity with a fixed location that anchors their behaviour. If not, which is the case for retired people, too many agents do all their activities immediately in the morning. This does not reflect the patterns observed in the time use survey. The model seems to be more suited to dealing with commuters who have less flexibility and time (because they spend a large part of their day at work) to start shopping and doing leisure activities.
This model is somewhat unusual in comparison to other agent-based models in that it has not been empirically validated by comparing its outputs to data that it has not been calibrated on. Section 4 discussed this problem at length. Empirical (i.e. “fit to data” [34]) validation has not been undertaken for two reasons. Firstly, there are no data to validate the model against. The usual process of dividing up the real-world data into training and tests sets would not work here because the average daily weekday patterns would be identical in both the test and training sets (as evidenced by the extremely narrow confidence intervals produced in Figs. 6 and 8). Larger cities might have other big data sources that can be used to estimate the daytime population, such as public transport smart cards, but this is not the case in Otley. Secondly, empirical validation is of limited value here anyway. The aim of this paper is to explore whether the model might be able to provide new information about groups for whom activity data not currently exist, not (yet) to produce a robust empirical analysis. That said, attempts at face validation were made by presenting early results to the people who are the most familiar with the case study area. Although their observations are clearly anecdotal and the group itself was self-selected, their insights are still useful.
Future work
Immediate future developments will improve the behaviour and diversity of the agents, the activity framework, and the routing algorithm. To begin with, the agents should be members of households. This is particularly important for modelling school children and their carers (this group was noted to be the most likely contributor to the difference between the model and real data). Data for households could be estimated from the census through spatial microsimulation [40, 41]. Doing activities with other household members would have an impact on the activity intensities of the agents. An extended model would also include a greater variety of leisure activities. The agents could also remember preferred locations for their flexible activities, which would then lead to a higher probability of that location hosting the activity in the future (i.e. the agents could build up preferences for certain locations). They could also learn to improve the logical order of their activities, so that their travel distance reduces. Furthermore, by accounting for different transportation modes, congestion could begin to emerge.
Although the model was calibrated, the process was performed manually by comparing the observed agent behaviour to the time use survey and footfall data and adjusting parameters accordingly (as discussed in Section 3.1). In the future, the calibration of the model parameters should be more automated. With the current mix of big data and traditional data, optimisation algorithms such as genetic algorithms, neural networks, or Bayesian approaches (e.g. [42]) could be considered. However, the most interesting approach to future calibration will include dynamic data assimilation techniques to make real-time data streams to forecast the ambient population [43] and update a running model accordingly.