Probabilistic Modelling for Incorporating Uncertainty in Least Cost Path Results: a Postdictive Roman Road Case Study

The movement of past peoples in the landscape has been studied extensively through the use of least cost path (LCP) analysis. Although methodological issues of applying LCP analysis in archaeology have frequently been discussed, the effect of DEM error on LCP results has not been fully assessed. Due to this, the reliability of the LCP result is undermined, jeopardising how well the method can confidently be used to model past movement. To strengthen the reliability of LCP results, this research proposes the use of Monte Carlo simulation as a method for incorporating and propagating the effects of error on LCP results. Focusing on vertical error, random error fields are calculated and incorporated into the documented and reproducible LCP modelling process using the R package leastcostpath. By graphically communicating the impact of vertical error using probabilistic LCPs, uncertainty in the results can be taken into account when interpreting LCPs. The method is applied to a Roman road case study, finding that the incorporation of vertical error results in the identification of multiple ‘least cost’ routes within the landscape. Furthermore, the deviation between the roman road and the probabilistic LCP suggests that the location of the roman road was influenced by additional factors other than minimising energy expenditure. This research finds that the probabilistic LCP derived using Monte Carlo simulation is a viable method for the graphical communication of the uncertainty caused by error within the input data used within the LCP modelling process. Therefore, it is recommended that probabilistic LCPs become the default approach when modelling movement using input data that contains errors.


Introduction
The movement of past peoples in the landscape has been studied extensively through the use of least cost path (LCP) analysis (Verhagen et al., 2019;White & Surface-Evans, 2012). LCP analysis is based on the premise that humans will economise their behaviour (Zipf, 1949). Thus, when travelling in a landscape, humans will attempt to optimise the cost of travel by choosing the path of least cost (Surface-Evans & White, 2012). Therefore, the likelihood of interaction between humans and the landscape is assumed to be related to the ease of access, with easy-to-access areas expected to show traces of human activity (Surface-Evans & White, 2012). This ease of access is represented as a cost surface, which quantifies the ease or difficulty of crossing between individual cells in a raster grid (Collischonn & Pilar, 2000). The cost surface is then used to create an accumulated cost surface, which represents the accumulated cost of moving away from a specific starting cell (Yu et al., 2003). Finally, the least costly path from another cell to the starting cell is calculated (Herzog, 2014a;Howey, 2007).
The input data, and the resultant cost surface, is therefore foundational in the calculation of the LCP. In particular, the digital elevation model (DEM), which forms the basis of the majority of archaeological LCP studies (Herzog, 2010;Verhagen, 2018), is often used as an error-free model of the topography (Wechsler & Kroll, 2006). Whilst the impact of DEM resolution on the cost surface and the LCP has been explored (e.g. Branting, 2012;Harris, 2000;Herzog, 2014a), the effects of horizontal and vertical error are rarely discussed (for exceptions see Herzog & Posluschny, 2011;Herzog & Yépez, 2015a, 2015bVerhagen et al., 2019). Instead, cost surfaces and LCPs are often viewed as 'infallible' (Brouwer Burg, 2017), ignoring that they were derived from a DEM containing random errors: that is, errors accrued from mistakes during the creation of the DEM (Fisher & Tate, 2006). This random error in the DEM can have significant consequences on the computed LCP, given the extreme sensitivity of LCPs to small changes in variables (Kantner, 2012, p. 234). Due to this, the reliability of the LCP as a heuristic device for understanding past human movement is undermined, jeopardising how well the method can confidently be used to identify long-distance routes (e.g. Batten, 2007;Palmisano, 2017), to understand trade networks (e.g. Bell et al., 2002;Fábrega Álvarez & Parcero Oubiña, 2007;Murrieta-Flores, 2012) and to assess which factors influenced the location of routes (Bell & Lock, 2000;Fonte et al., 2017;Güimil-Fariña & Parcero-Oubiña, 2015;Herzog, 2017;Kantner & Hobgood, 2003;Verhagen et al., 2019;Verhagen & Jeneson, 2012).
Whilst errors in the DEM can be reduced through the use of high-resolution, highaccuracy DEMs derived from sources such as lidar (light detection and ranging), error cannot be fully eliminated (Aguilar et al., 2010;Hodgson & Bresnahan, 2004). Thus, the error in the DEM, although reduced, still propagates into uncertainty in the LCP result. Therefore, this research proposes that instead of ignoring the error in the input data, and the subsequent LCP, the error in the DEM should be incorporated through the use of uncertainty propagation. By doing this, the resultant LCP is viewed probabilistically. That is, each LCP based on input data incorporating different realisations of the error is aggregated to form a 'density of LCPs' that graphically communicates how the error impacts the LCP, as well as identifying the most probable location of the LCP after error has been accounted for. From this, the reliability of the LCP is strengthened by yielding outcomes that better approximate reality, or in other words, the LCP and their interpretation is less susceptible to change if the LCP modelling process is run with the same input data incorporating a different, but equally likely, realisation of error.
Although the use of uncertainty propagation can be applied to all input data containing errors within the LCP modelling process, this research focuses on the impact of vertical error on the calculated LCP. This is demonstrated through a case study that aims to postdictively model a Roman road in Cumbria, England.

Methodological Proposal
In order to propagate the effects of vertical error on LCP results, this research proposes the use of Monte Carlo simulation (Brouwer Burg et al., 2016;Hunter & Goodchild, 1997;Zhang & Goodchild, 2003). Monte Carlo simulation has been used extensively for the modelling of DEM uncertainty in viewshed analysis (e.g. Fisher, 1991Fisher, , 1992Nackaerts et al., 1999), flooding (e.g. Cooper et al., 2015;Gesch, 2018;Gesch et al., 2020;Hawker et al., 2018;Leon et al., 2014) and slope failure prediction (e.g. Davis & Keller, 1997;Holmes et al., 2000;Zhou et al., 2003). By incorporating different realisations of the vertical error, which is measured as the average positive or negative deviation between ground-based observations and DEM values at a set of control points and reported as the root mean square error (RMSE), the impact of vertical error on the model output can be evaluated (Heuvelink, 1998;Zhang & Goodchild, 2003, p. 94). That is, for n simulation runs, n random error field realisations drawn from a normal distribution of mean = 0 and standard deviation = DEM's vertical error are generated ( Fig. 1(A)) and individually added to the DEM (Hunter & Goodchild, 1997) ( Fig.  1(B)). The n DEMs are then used to compute n cost surfaces ( Fig. 1(C)), with each cost surface incorporating different realisations of the vertical error. Each cost surface is then used to calculate a LCP ( Fig. 1(D)), resulting in n LCPs. Therefore, each LCP can be viewed as a deterministic realisation from an equally likely realisation of the true elevation surface. From this, the n LCPs can be viewed probabilistically-that is, the likelihood of the LCP crossing a raster cell can be communicated by calculating the number of times an LCP crosses a cell divided by the total number of simulations ( Fig.  1(E)).

Package Overview
In order to make the incorporation of vertical error reproducible and accessible to other researchers (Marwick, 2017), the method has been implemented into the R package leastcostpath (Lewis, 2020). The package is available at https://github.com/ josephlewis/leastcostpath and https://cran.r-project.org/web/packages/leastcostpath and includes the function add_dem_error() for the calculation and addition of a random error field to the DEM (i.e. it automates steps A and B in Fig. 1).
The package leastcostpath is built on the classes and functions in the R package gdistance ( van Etten, 2017). According to van Etten (2017), the results from gdistance are comparable to other software such as ArcGIS and GRASS. The package gdistance converts the raster map to a sparse matrix representing conductance between adjacent cells (van Etten, 2017). The use of conductance, which is the ease in which movement can occur between adjacent cells, is in contrast to other GIS software, where calculations are done using cost, friction or resistance values ( van Etten, 2017). Sparse matrices are efficient data structures that are especially suitable for LCP analysis given that the connectivity of cells in a raster map is often limited to adjacent cells only, with the vast majority of cells not connected, and thus having a conductance value of zero (van Etten, 2017). The R package leastcostpath utilises these sparse, or conductivity, matrices to represent conductivity through a landscape. By calculating and storing conductivity from each raster cell to all other adjacent raster cells based on a usersupplied function, both isotropic (conductivity neither benefited nor hindered by direction) and anisotropic costs (conductivity is direction-dependent) can be incorporated into the LCP analysis (Conolly & Lake, 2006, pp. 217-221;Wheatley & Gillings, 2002, pp. 137-140). For example, slope is anisotropic, with movement up-slope normally involving higher costs than movement down-slope (Herzog, 2014a, b). Therefore, by calculating the slope in all directions and preserving the anisotropic property of slope, the LCP can differ depending on whether it is calculated from A to B or B to A (Herzog, 2014a).

High Street Roman Road
In order to assess the similarity in location between the computed LCPs and the Roman road, the known route of the High Street Roman road is required. Despite the location of the Roman road being available from other sources (e.g. Bishop, 2014;Talbert, 2000), the extant route of the Roman road was recorded by Whitehead and Elsworth (2008) during an archaeological evaluation of the area. This represents the most accurate polygonal representation of the Roman road available, whilst also only including that which can be identified in the field (see Collingwood, 1930, p. 118;Nicholson, 1861, p. 7 for speculation on northern and southern sections). The location of the Roman road was retrieved from the Historic England Scheduled Monuments via https://historicengland.org.uk/listing/the-list/data-downloads/ (List Entry -1003275).

Digital Elevation Model
The OS Terrain 50 DEM (50m resolution and 4m RMSE) (Ordnance Survey, 2020) was used throughout this case study. The OS Terrain 50 DEM is based on the modern topography, and does not, therefore, represent the topography during the Roman  Breeze and Dobson (1985, p. 6) period. Although this means that the DEM incorporates modern features such as the Haweswater and Kentmere Reservoir, this study is focused on the use of LCP analysis in the modelling of the Roman road along the 'High Street' ridgeway. Due to the high elevation of the ridgeway, the produced LCP is therefore assumed to be unaffected by these modern constructions. Furthermore, whilst the resolution of the DEM has been noted to have an impact on LCP results (Conolly & Lake, 2006, pp. 101-102;Herzog & Yépez, 2015b;Verhagen et al., 2019), with Kantner (2012) recommending the use of DEMs with a resolution of under 30m, computational limitations necessitated the use of the 50m DEM over more accurate data sources such as the OS Terrain 5 DEM (5m resolution and 2.5m RMSE) (Ordnance Survey, 2017). However, it should be noted that although a lower resolution DEM was used within this case study, the consequences of vertical error will be present in a DEM regardless of its resolution, and therefore, the results of this study are applicable to all DEM-based least cost analyses. For interested readers, the limitations were imposed due to the size of the raster when calculating the cost surface. That is, the OS Terrain 50 DEM raster has 51,744 cells. Contrast this with the OS Terrain 5 DEM, which has 5,177,480 cells, and required more than 24GB of RAM to calculate the cost surface. Nonetheless, as this case study is explicitly focused on the impact of vertical error on the LCP results, and not DEM resolution, the use of the lower resolution DEM is therefore deemed justifiable.

Computing the Cost Surface
In order to represent cost of movement between adjacent cells within the DEM, Llobera and Sluckin's (2007) symmetric quadratic cost function was applied to the calculated slope values. The use of Llobera and Sluckin's (2007) energy expenditure cost function over the more familiar time-based Tobler's Hiking Function (Tobler, 1993) reflects previous success in its ability to model Roman roads (e.g. Fonte et al., 2017;Güimil-Fariña & Parcero-Oubiña, 2015;Parcero-Oubiña et al., 2019). Furthermore, in line with the recommendation by Harris (2000), a 48-neighbourhood kernel was used-that is, each cell in the cost surface is deemed to be adjacent, and therefore connected, to 48 neighbouring cells. Due to this, the elongation error, which is the difference in length between the optimal straight line and the LCP, is reduced to below 1.4% (Huber & Church, 1985), resulting in more accurate cost accumulation (Harris, 2000;Herzog, 2014a;Wheatley & Gillings, 2002, pp. 142-144).

Incorporating Vertical Error and Computing the Probabilistic Least Cost Path
In order to propagate the effects of vertical error on LCP results, 1000 realisations of random error fields were calculated. Although Heuvelink (1998, p. 45) noted that 100 Monte Carlo realisations should be enough, 1000 realisations ensure that the results are stable and unlikely to change. It should be noted, however, that the number of realisations could be decreased by assessing when the results converge. That is, assessing when increasing the number of LCPs does not change the overall spatial pattern of the probabilistic LCP. The error fields representing vertical error within the DEM were generated from random values drawn from a normal distribution of mean = 0 and standard deviation = 4 (i.e. the RMSE of the DEM) (Fisher, 1998;Fisher & Tate, 2006;Wechsler & Kroll, 2006). To ensure that the random error fields were spatially autocorrelated with themselves, a mean 77×77 filter was applied (Wechsler & Kroll, 2006). That is, the values in each random error field were replaced with a mean of the values within a moving 77×77 window. This results in the values within each random error field being less different when they are nearer to each other and more different when they are further away. The window size of 77×77 was derived through the use of a semivariogram based on the DEM (Wechsler & Kroll, 2006). The range, which indicates the maximum distance at which spatial autocorrelation is present (Oliver & Webster, 2015, pp. 30-32), was identified as being 3820.6m, thus the 77×77 window (3820.6/50 m resolution of DEM). The spatially correlated random error fields were then added to the DEM, resulting in 1000 DEMs each incorporating different realisations of the random error field (steps A and B in Fig. 1). The DEMs were then corrected for unrealistic geomorphology that might have been introduced when adding the random error field (Temme et al., 2009, p. 131). This was achieved through the use of a sink fill algorithm, which removes surface depressions in DEMs (Choi, 2018;Wang & Liu, 2006). From this, 1000 cost surfaces were calculated (step D in Fig. 1). Lastly, 1000 least cost paths were computed, with the probabilistic LCP generated by calculating the number of times an LCP crosses a raster cell divided by the total number of simulations (step E in Fig. 1).

Least Cost Paths Without and With Vertical Error
When visually comparing the LCPs (Fig. 4), it is apparent that the single LCP produced without incorporating vertical error (Fig. 4(A)) does not fully capture nor communicate how the vertical error in the DEM, and subsequently the cost surface, propagates to the LCP result. In contrast, the probabilistic LCP (Fig. 4(B)), through incorporating vertical error, and its uncertainty, has identified multiple 'least cost' routes within the southern section of the Roman road. Similarly, the probabilistic LCP has identified two routes of either side of the Roman road in the centre of the study area, with one deviating southwards towards the lower elevation. Finally, the probabilistic LCP has identified that the large deviation from the Roman road in the northern section remains after vertical error is accounted for.

Discussion and Conclusions
The case study demonstrates how the effect of vertical error in the DEM, and the subsequent cost surface, propagates through the LCP modelling process. Whilst the LCPs calculated without and with vertical error shared similarities in their trajectories, the use of probabilistic LCPs has identified that the vertical error within the DEM most influences the location of the calculated LCP in the southern section of the study area. More specifically, the probabilistic LCP has identified three different routes that the LCP could have taken, dependent on a different, but equally likely, realisation of vertical error within the DEM. However, by quantifying the likelihood of the LCP crossing each raster cell, it is apparent that the route following the Roman road is more probable and is therefore a more credible location for the LCP after vertical error is accounted for. Similarly, the two identified routes in the centre of the study area show that the vertical error within the input data, and the subsequent cost surface, impacts which route is more probable, and therefore which better reflects reality. In contrast, the probabilistic LCP has identified one main route in the northern section of the study area. However, unlike the southern section, the Roman road in the northern section does not follow the LCP-that is, the least costly path used when trying to minimise energy expenditure when moving through the landscape. The lack of similarity, therefore, suggests that the location of the Roman road in the northern section may have been influenced by other factors, for example visibility (e.g. Verhagen & Jeneson, 2012).
Nonetheless, by incorporating and propagating the effect of vertical error in the LCP modelling process, the vertical error within the DEM is accounted for, therefore increasing the reliability that the LCP yields outcomes that better approximates reality. Furthermore, the uncertainty caused by vertical error on the LCP result in the southern and centre section of the study area suggests that higher accuracy DEMs should be used when conducting least cost path analyses. Through this, the uncertainty in the LCP result caused by the vertical error within the DEM will be reduced. It should be noted, however, that there are still errors in the DEM not accounted for. For example, the impact of horizontal error on the LCP results has not been assessed, whilst the 50-m resolution of the DEM averages the 'true' elevation surface, and therefore introduces error. Therefore, further research is needed on the assessment of how different errors impact DEMs, their subsequent cost surfaces and when these errors are deemed to have minimal effect on the LCP results.
The most significant contribution of this research lies in the explicit propagation of error on LCP results. By formally incorporating error into the LCP modelling process, uncertainty is quantified, resulting in LCP results of increased reliability. Furthermore, the use of probabilistic LCPs has shown to successfully communicate the uncertainty caused by error, as well as allowing for the identification of multiple potential routes that are the result of different, but equally likely, realisations of error. The quantification of uncertainty on LCP results is therefore of fundamental importance. It is through the explicit acknowledgement that the digital elevation model is not an error-free model of topography that we can aim to confidently use our LCP models as heuristic devices for understanding past human movement, irrespective of the geographical or cultural context. Until then, interpretations of human actions based on LCP models are biased. This research therefore suggests that the use of uncertainty propagation via Monte Carlo simulation is appropriate not just for vertical error but any error that needs to be quantified and visualised when conducting least cost path analysis, or working with cost surfaces more generally. three reviewers for their comments, though of course, all mistakes are my own. The analysis have been conducted in R (R Core Team, 2018) using the libraries DMMF (Choi, 2018), gdistance (van Etten, 2017), gstat (Pebesma, 2004), leastcostpath (Lewis, 2020), raster (Hijmans, 2020), rgdal (Bivand et al., 2019), rgeos (Bivand & Rundel, 2020) , sp (Pebesma & Bivand, 2005) and tmap (Tennekes, 2018).

Declarations
Competing Interests The author declares no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.