1 Introduction

Location modelling is crucial to service system planning for both public and private sectors. In the public sector, the location choices of service facilities involve not only fiscal responsibility but also social good and accessibility (White 1979). Local governments, as an example, provide a range of services, such as libraries and schools, where the intent is to maximise accessibility to people in the community. In the private sector, retail outlets are concerned with locating stores and distribution centres to best serve customers and optimise revenue.

Location modelling, a component of spatial analytics, has been supported in geographic information systems (GIS). Commercial GIS packages such as ArcGIS and TransCAD explicitly formulate and solve location models using heuristic methods (Murray et al. 2019). For example, locations models within ArcGIS are solved using the Teitz and Bart heuristic (ESRI 2020) or GRASP heuristic. A major advantage of using a heuristic for solving a location problem is that it requires less time and fewer computing resources to identify a solution (Church and Murray 2009; Tong and Murray 2012). Nevertheless, for most heuristics, there is no guarantee that they will derive optimal or even good solutions. Moreover, even if a heuristic has good performance for one problem instance, the performance does not generalise when this heuristic is applied to another instance of the same type.

The inclusion of location analytics within GIS software has promoted the widespread use of location models in various contexts, including solid waste facilities (Khan et al. 2018), fire stations (Adesina et al. 2017) and health care centres (Jankowski and Brown 2014). A summary of these applications can be found in Murray et al. (2019). Yet, this software has technical limitations, with location model results proving to have uncertain solution quality as well as being computationally inefficient (Murray et al. 2019).

More concerning is the reproducibility of location models through commercial GIS. By definition, the reproducibility of a study “refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator” (Bollen et al. 2015). There are three major components to a reproducible study (Leeka and Peng 2015): first, the data are available. Second, the code used and the documentation is available. Third, a correct analysis has been performed. A detailed discussion of reproducibility in geographical analysis can be found in Kedron et al. (2019).

Specifically, location models in commercial GIS are rarely reproducible. The code of these models is not open. Further, the heuristics used within ArcGIS and TransCAD involve many parameters that affect speed and solution quality. However, the technical details in the heuristics used are hidden from users, and there are no mechanisms for users to know or interact with the parameters.

Because of the technical limitations of location models in commercial GIS, researchers have written custom software, often relying on an optimisation solver that can derive optimal solutions (Tong et al. 2009; Tong and Wei 2017). The benefits are that optimal solutions can be identified if such solutions exist, and that modification of models is possible. The custom software, however, cannot necessarily guarantee reproducibility, because the code is often not open for other users. This makes it challenging to reproduce reported approaches and results.

Open-source software is a more promising option for using location models. Specifically, open-source software has obvious advantages. First, under open source, the transparency and reproducibility of a method are guaranteed, eliminating the “black box” that hides implementation details (Rey 2009, 2018; Singleton et al. 2016). Second, open-source software assures wide access to and full control of the source code for audit and modification, enabling redistribution of code. Third, it facilitates advanced user extension of software to fit their requirements as necessary. Moreover, although open source does not necessarily mean free of charge, most open source software can be used at no or a low price.

Spatial analysts have been embracing open-source access to algorithms and software. Open-source examples that support spatial analysis include GRASS GIS (GRASS Development Team 2017) and QGIS (QGIS Development Team 2009), but also PySAL (Rey and Anselin 2007) and GeoDa (Anselin et al. 2006). Buliung and Remmel (2008) and Rey (2009, 2018), among others, argue that open-source has the potential to revolutionise and enhance geospatial research and education and will play a critical role in facilitating discussions on methodological workflows for spatial analytics. Steiniger and Hunter (2013) propose a set of evaluation criteria in the selection of open-source GIS software, highlighting important considerations such as functionality, usability and customization options. The above discussion reinforces that it is crucial to understand function and performance of the open-source tools before deploying them.

Open-source alone is insufficient to guarantee reproducibility in location modelling. Available packages differ considerably with respect to model assumptions and capabilities. Proper model specification and implementation is critical, as is solution approach. While usage and application of these methods across a range of disciplines has increased dramatically in recent years (Murray et al. 2019; Xu et al. 2020) and open-source tools for location modelling are continuing to mature, what is significant at this stage is a lack of critical review of the capabilities of these packages. This study investigates open-source tools for applying coverage models. Of particular interest are assumptions, representation and computational efforts. Assessment and comparison against the methods available in commercial GIS software too are of interest. As most reported applications of location models are done using commercial GIS, as noted by Murray et al. (2019) and Xu et al. (2020), and we anticipate that open-source options would likely become popular in applications, we should compare both options in terms of their function and performance.

This paper is built upon previous work (Chen and Murray 2021) and distinguishes itself by making several noteworthy contributions to open source location modelling. First, we systematically compare relevant open source options for location modelling using two case studies. Second, we substantially extend the function of two important open source location models. Third, we discuss the conditions that guarantee optimal solutions for a location cover model and point out future directions in this field.

This paper is organised as follows. The background of the research is described in the next section. This is followed by an overview of coverage models to be investigated. A detailed comparison of location cover tools is then provided. After this, planning applications are presented to highlight the differences in modelling capacities. Finally, this paper ends with recommendations for future research in open-source developments to support location cover modelling.

2 Background

A range of location models have been proposed to support different management, planning and decision-making contexts. The focus of this paper is location cover models, which are aimed at siting facilities in order to best service demand. Location cover models have seen broad use in a range of applications, including bike-share stations (García-Palomares et al. 2012), disease treatment centre (Oviasu 2014), veterinarian clinics (Polo et al. 2015) and solid waste facilities (Khan et al. 2018). In particular, there has been an increasing interest in two location cover models: one is the location set covering problem (LSCP) and the other is the maximum covering location problem (MCLP). The LSCP (Toregas et al. 1971) involves planning applications in which the fewest facilities are to be sited so as to serve all demand within the designated service response standard of time or distance. The MCLP (Church and ReVelle 1974) was introduced to determine the locations of facilities of a fixed number that serve the most potential demand within the service standard. Since their appearance, these models have been applied and extended in various fields (see Church and Murray 2018).

Because of their prominence, both LSCP and MCLP have been implemented in a range of software tools. While there are more than ten categories of software (GNU Project 2019), three categories have been identified for tools that provide location cover models, as shown in Table 1.

Table 1 Three types of software tools for location cover models

Location models are available in proprietary GIS software, including ArcGIS and TransCAD. While the use of location models through proprietary GIS is common, limited attention has been given to solution quality and reproducibility issues. These commercial tools rely on heuristics for solving location models, generally not providing technical details or solution quality assurance. This point has been noted by Woodhouse et al. (2000), who reported that ArcGIS did not derive the optimal solution for the LSCP instances. Murray et al. (2019) also found that ArcGIS and TransCAD failed to optimally solve most of the 1059 MCLP and LSCP problems examined. Moreover, the heuristics used by ArcGIS and TransCAD involve parameters that influence speed and solution quality, yet users are prevented from knowing or interacting with these parameters (Church 2002; Murray 2010). Users are also forbidden from knowing or controlling the parameters of pre-processing steps in TransCAD, such as assigning points to a network (Gaboardi et al. 2020). In practice, the lack of transparency and solution quality assurance raises issues of significance in reported findings.

Some researchers have written custom software that directly structures and solves location models (Tong et al. 2009; Tong and Wei 2017), relying on general optimisation solvers. While custom software allows developers to specify their models, it comes with obvious shortcomings. It is challenging to develop new software, which requires specific technical skills—software design, tackling data input, model implementation and integration with solvers. Moreover, in academia, researchers generally neither disclose the technical details nor release the code of developed software. Consequently, it is hard to reproduce, validate and reuse location models found in custom software.

Open-source software is well-suited for the use of location cover models. Generally, open-source software is based on code that anyone can inspect, apply, modify and enhance. In particular, several open-source tools providing location cover models have been released. Among them, PySpatialOpt (Pulver 2019), a Python package, provides access to a number of cover models, including MCLP and LSCP. It relies on an optimisation solver to solve the problem. Another is Maxcovr (Tierney 2019), an R package, that formulates and solves MCLP and LSCP. As with PySpatialOpt, it uses an optimisation solver to derive optimal solutions. Another example is FLP Spreadsheet Solver (Erdoğan 2019; Erdoğan et al. 2019) that is based on Microsoft Excel and Visual Basic, which is not free of charge and requires an Microsoft Office license to use. Unlike PySpatialOpt and Maxcovr, this tool relies on a tabu search heuristic to solve the MCLP and LSCP. Some tools related to location models are excluded from this study for various reasons. One is SITATION (Daskin 2002), which solves five classes of location problems, including MCLP and LSCP. This software is free for use but not open-source and is therefore beyond the scope of this study. There are some open-source packages that solve the p-median problem, including orloca (Munoz-Marquez 2019) and tbart (Brunsdon 2015). Although both MCLP and LSCP can be transformed into an equivalent p-median problem (Church and ReVelle 1976) and then solved by these tools, they are not directly applicable to location cover problems.

3 Methods

As mentioned above, of interest in this paper are the LSCP and MCLP as representative location models that have been broadly applied. This section gives their mathematical formulations along with specifics for application context and extension. The following notation is defined as:

  • \(i\): index of demand units (total n).

  • \(j\): index of potential facility sites (total m).

  • \(d_{ij}\): distance or travel time from facility site j to demand \(i\).

  • S: maximum service standard of distance or time.

  • \(N_{i}\): set of potential facility sites that can cover demand \(i\) within the service standard (e.g. \(\left\{ {j{|}d_{ij} \le {\text{S}}} \right\}\)).

$$x_{j} = \left\{ {\begin{array}{*{20}l} 1 & {{\text{if a facility is sited at location}} j} \\ 0 & {{\text{otherwise}}} \\ \end{array} } \right.$$

Using this notation, the LSCP is as follows based on Toregas et al. (1971):

$${\text{Minimise}}\quad \mathop \sum \limits_{j} x_{j}$$
(1)
$${\text{Subject to}}\quad \mathop \sum \limits_{{j \in N_{i} }} x_{j} \ge 1{ }\quad\forall i$$
(2)
$$x_{j} \in \left\{ {0,1} \right\} \quad \forall j$$
(3)

The objective (1) seeks a minimal number of facilities. Constraints (2) require that each demand be covered by one or more sited facility. Constraints (3) require that the decision variables be binary.

The LSCP assumes that resources are sufficient to cover all demand, with the number of facilities the only factor in the objective (Church and ReVelle 1974). However, when resources are lacking to serve all demand, decision makers may attempt to cover the greatest amount of demand possible within their budget. This is the consideration of MCLP, which relaxes the hard constraint in LSCP that all demand be served. Additional notation is as follows:

  • \(a_{i}\): estimated demand at location \(i\)

  • \(p\): \({\text{ the number of facilities to be sited}}\)

  • $$y_{i} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if~demand}}~i{\text{~is~covered}}} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$$

The formulation of the MCLP based on Church and ReVelle (1974) is:

$${\text{Minimise}}\quad \mathop \sum \limits_{i} a_{i} y_{i}$$
(4)
$${\text{Subject to}}\quad \mathop \sum \limits_{{j \in N_{i} }} x_{j} \ge y_{i}\quad { }\forall i$$
(5)
$$\mathop \sum \limits_{j} x_{j} = p$$
(6)
$$y_{i} \in \left\{ {0,1} \right\}\quad \forall i$$
(7)
$$x_{j} \in \left\{ {0,1} \right\}\quad \forall j$$
(8)

The MCLP objective (4) seeks the maximal total demand that can be covered by the facilities located. Constraints (5) account for coverage of a demand location. Constraints (6) require that exactly p facilities be sited. Constraints (7) and (8) impose binary restrictions on decision variables.

There are numerous variants and extensions of the LSCP and MCLP (Church and Murray 2018). The spatial nature of the LSCP and MCLP lies not only in the variables \(x_{j}\) that correspond to the choice of facility locations, but also in the set \(N_{i}\) as it defines the subset of facilities that serve demand \(i\) within the service standard. One common form of this standard is the maximal service distance, which relies on the travel space (Cartesian or network) as well as movement potential (geodesic, Euclidean, rectilinear, network, etc.) of service providers. In addition, model variants have been introduced due to the representation of demand. Demands, as spatial features, can be in the form of points, lines or polygons (Church and ReVelle 1976; Murray and Wei 2013), and the coverage to the demand can be either binary or partial. Binary coverage requires that a demand is either covered or not covered by any facility. On the other hand, under partial coverage, a demand can be partially covered by a facility, possibly as a function of area served (Murray 2005; Tong and Murray 2009). Moreover, demands can have a uniform or variable weight. Another important extension has been the introduction of facility capacities. The capacity constraint, which requires the demand covered by a facility should not exceed its capacity, is essential in applications where facilities have limited potential to provide service. The formulations of the capacitated LSCP and capacitated MCLP are summarised in Church and Murray (2018). It is worth noting that the inclusion of capacity constraints on facilities makes the problem much more complicated and challenging to solve (Xu et al. 2020).

4 Access and solutions in location cover tools

There are multiple steps in an analysis that uses a location cover model, as presented in the workflow in Fig. 1. Using this basic workflow, a detailed comparison of different location cover modelling tools is possible. This comparison now follows and is summarised in Table 2.

Fig. 1
figure 1

The workflow of using a location cover model

Table 2 A comparison of different tools that support location cover modelling

4.1 Model type

All of the four tools discussed in Table 2 (PySpatialOpt, Maxcovr, FLP Solver, ArcGIS) provide access to MCLP and LSCP, and some tools also provide entry to other location models. For example, PySpatialOpt offers access to not only the LSCP and MCLP but also backup coverage location problem, threshold model, complementary coverage threshold model and the trauma resource allocation model (Pulver 2019). Further, FLP Solver also addresses the p-centre and p-median problems (Erdoğan 2019). In addition, these tools differ on dealing with location problems without feasible solutions. For example, when tackling an LSCP in which not all demand can be covered by the given facilities, ArcGIS would return a solution that ignores demand not covered by any potential facility site. Such a problem is technically infeasible as LSCP requires all demand be covered, but ArcGIS deals with this in an ad hoc fashion. This could mislead users. In contrast, other tools would inform the user that there is no feasible solution.

4.2 Capacity

In location modelling, facilities may or may not have capacities. One common assumption in PySpatialOpt and Maxcovr is that each facility has an unlimited capacity, meaning there is no mechanism for addressing facility capacity. In contrast, ArcGIS and FLP Solver allow for specifying a capacity for each facility. As noted previously, the addition of capacity constraints likely increases problem difficulty in solution.

4.3 Demand unit shape and weight

Demand units are spatial features in the form of points, lines or polygons, which relates to how space is abstracted. According to Tong and Murray (2009), the MCLP is sensitive to the abstraction of space (as points or polygons), and uncertainty would be introduced in the approximation of geographic space. While point-based demand units are accepted by all tools discussed, polygon-based demand is only allowed in PySpatialOpt. Two coverage types for polygon-based demand can be found in PySpatialOpt: binary and partial. With binary coverage, the coverage is either zero or one, and a facility covers a demand unit when the entire demand polygon is contained by the service area of the facility. With partial coverage, the coverage is proportional to the intersection between the service area of a facility and the demand polygon. For the MCLP, demands can have different weights \(a_{i}\), which is true for the models of ArcGIS, FLP Solver and PySpatialOpt.

4.4 Distance metric

The distance metric or travel time \(d_{ij}\) from demand unit \(i\) to potential facility site \(j\) is of central importance for location cover models. In particular, ArcGIS needs a transport network in order to structure and solve location models. For problems that are not network-based, a (complete) network has to be created in advance to reflect the distance metric used. Although the network created can reflect all distance metrics, in theory, this complete network would require more computational efforts by increasing needed computer memory and computational time. Maxcovr accepts a user-defined distance matrix as input, and if no distance matrix is provided, it supports only the Haversine formula distance detailed in Sinnott (1984). The Haversine distance is the distance between two points on the greater circle of the earth sphere (assuming that the earth is a sphere):

$$d = 2r\sin^{ - 1} \sqrt {{\text{sin}}^{2} \left( {\frac{{\Phi_{2} - \Phi_{1} }}{2}} \right) + \cos \left( {{\Phi }_{1} } \right)\cos \left( {{\Phi }_{2} } \right){\text{sin}}^{2} \left( {\frac{{\lambda_{2} - \lambda_{1} }}{2}} \right)}$$
(9)

where r denotes the earth radius; (\(\Phi_{1}\), \(\lambda_{1}\)) and (\(\Phi_{2}\), \({ }\lambda_{2}\)) are the latitude–longitude coordinates of the two points. PySpatialOpt does not contain a distance computing module, so the distance needs to be computed externally. This gives users considerable flexibility in defining the distance metric between demand and facilities. Finally, FLP Solver provides the greatest flexibility in distance specification. It not only allows for user-defined distance, but also offers a range of distance options: Round Euclidean, Euclidean, Manhattan, big circle and the driving distance on Bing Maps.

4.5 Solution approach

In ArcGIS, location cover models are solved by a three-step heuristic method (ESRI 2017). A major issue with the heuristic in ArcGIS is that it is unknown whether it would yield optimal solutions for a problem. A recent study indicated that the heuristic in ArcGIS fails to identify optimal solutions for 50% of the location planning problems examined (Murray et al. 2019), though the deviation from optimality was generally low. As with ArcGIS, location models accessible in FLP Solver are solved using tabu search, a heuristic method. Tabu search has been applied in an early study that determines the optimal locations for ambulance bases (Adenso-Díaz and Rodríguez 1997). There is also no guarantee that tabu search will derive the optimal solutions for location cover models. Noteworthy is that the parameters of tabu search in FLP Solver are revealed and well explained, potentially aiding reproducibility. In contrast, both PySpatialOpt and Maxcovr rely on optimisation solvers that would derive optimal solutions for location models, if such solutions exist. Specifically, Maxcovr supports three solvers, i.e. lp_solve, Gurobi and GLPK, by directly using a solver’s API. On the other hand, PySpatialOpt uses PuLP, which is a Python-based linear programming modelling environment and can connect to several solvers including lp_solve, CPLEX, XPRESS, Gurobi, GLPK, etc. Using modelling environments is preferable to directly using a solver, as these environments allow problems to be written in a concise and easy-to-read manner (Gearhart et al. 2013). In addition, these environments can connect to multiple solvers, thereby providing the flexibility between competing solvers.

5 Case studies

Two case studies are presented to compare the function and performance of the four tools being discussed. The PySpatialOpt and Maxcovr are backed by GLPK (GNU Project 2017), an open source optimisation solver. Although GLPK is not necessarily the most efficient solver (Meindl and Templ 2012; Gearhart et al. 2013), it is appropriate for comparing the performance of PySpatialOpt and Maxcovr against ArcGIS and FLP Solver.

All processing and computation are conducted on a remote desktop computer (Intel Xeon E5 CPU, 2.7 GHz with 256 GBytes memory). The Python (version 2.7) and R (version 3.4.0) programming languages are used, as both PySpatialOpt and ArcPy (version 10.6) supports only Python 2.7 instead of Python 3.6. The parameter settings in the tabu search heuristic used by FLP Solver are listed in Table 3. These are the default values in the documentation (Erdoğan et al. 2019) except for the time limit of 60 seconds. This new time limit was chosen as it led to optimal solutions for the MCLP and LSCP instances in the San Francisco case. For comparative purposes, results are also provided for ArcGIS methods called via ArcPy (version 10.6). The code for the case studies is fully open source and hosted on Github (Chen 2021).

Table 3 Parameter settings of FLP Solver in this study

5.1 Store location planning in San Francisco

A retail chain would like to choose store locations in San Francisco (see Fig.  2) that would maximise business. The primary objective is to locate stores close to population centres, based on the assumption that people tend to shop more at nearby stores than at ones that are far-away. The centroids of the 205 census tracts in this city are used as demand for service, with the population in the year of 2000 as demand weight, which totals 955,113. A set of 16 potential store sites are considered. The maximum service distance to access a store is assumed to be five kilometres on the road network. A facility-demand distance matrix was derived from ArcGIS Network Analyst extension, in which both census tract centroids and potential store sites were presumably snapped to the road network. This distance matrix serves as input to PySpatialOpt and Maxcovr. Here, ArcGIS was chosen to generate this matrix rather than open source software like QGIS, because this guarantees that the same pre-processing step of snapping points to network is used in the location models in ArcGIS and in generating a distance matrix.

Fig. 2
figure 2

MCLP derived store configuration (p = 4) and coverage for serving the San Francisco area. The top right inset shows the location of the study area in California

Table 4 summarises the findings for applying the LSCP to derive the minimal number of stores needed to cover all census tracts in both studies, with the optimal facility numbers in bold. In this case, eight stores are suggested by all four packages as the minimum for coverage of all demand. Computationally, PySpatialOpt and Maxcovr are considerably more efficient than the ArcGIS method and FLP Solver.

Table 4 LSCP results by case study

The MCLP was applied to assess the impacts and coverage of fewer facilities. The possibility of siting between one and twelve stores was considered, and the findings are summarised in Table 5. Optimal coverage values and the shortest computation time are indicated in bold. Among all 12 problems, solutions from all four tools led to the same and optimal coverage.

Table 5 MCLP results for San Francisco store case study

Figure 2 shows the optimal spatial configuration of stores for p = 4 from PySpatialOpt. Solution time for the MCLP varied, with ArcGIS requiring 12.66 s on average, and PySpatialOpt and Maxcovr taking only 0.24 and 0.01 s. The solution time of FLP Solver is close to the time limit of 60 s.

5.2 Police surveillance towers in York

The second case study involves location planning to support police surveillance tower placement in order to guarantee rapid detection and response to crimes in the city of York, UK. Examined here were 591 crime incidents in a subarea of York (see top right inset in Fig. 3) during September 2016 (Odell 2019). The analysis considers each building with GPS coordinates as a potential tower site, totalling 1821 sites. The response distance standard was assumed to be 200 m, and the goal is to site surveillance towers in the most efficient way. Given the nature of service response and the needs for efficiency, the LSCP and MCLP represent meaningful yet different approaches for the tower siting problem. As the Haversine distance is used, a complete network reflecting Euclidean distance was structured for use in ArcGIS. FLP Solver was unable to solve this case, as there were 2412 locations involved (include demand and potential facility) and therefore over five million rows of pairwise distance, whilst the Excel software only allows up to 1,048,576 rows (Microsoft Corporation 2018). This demonstrates one limitation of the FLP Solver.

Fig. 3
figure 3

MCLP derived surveillance tower configuration (p = 20) and coverage for monitoring crimes in a subarea in the City of York, England. The top right inset shows the location of the study area in York

Table 4 summarises the findings for using the LSCP to derive the minimal number of surveillance towers needed to cover all crime points in York. In this case, PySpatialOpt and Maxcovr suggested 22 towers were the minimum for coverage of all demand, whereas ArcGIS proposed 23 towers. Computationally, the solution times required by ArcGIS were significantly longer than that of PySpatialOpt and Maxcovr. The MCLP using three solvers was applied in order to access the impacts of locating fewer towers, which is summarised in Table 6. Optimal coverage values and the shortest computation time are indicated in bold. Of all 22 problem instances, both the PySpatialOpt and Maxcovr identified optimal configurations, and the ArcGIS solver identified only eight optimal solutions. The spatial configuration for 20 facilities by PySpatialOpt is shown in Fig. 3, which covers 585 incidents, whereas the identified configuration using ArcGIS covers only 581 incidents. The solution time required using ArcGIS (on average 21,483 s) was obviously longer than that of PySpatialOpt (2.78 s) and Maxcovr (0.01 s). This highlights the computational efficiency of PySpatialOpt and Maxcovr in comparison with ArcGIS.

Table 6 MCLP results for the York Tower case

6 Discussion

The case studies highlight several issues regarding location cover open-source tools. In particular, given a location coverage problem and a solver, there are two conditions that guarantee an optimal solution for a given problem: first, the solver accommodates the distance metric specified in the given problem; second, the solver offers optimal methods. In the above cases, both conditions are satisfied in PySpatialOpt and Maxcovr, so optimal solutions are identified for all instances. Therefore, location models should allow users to specify a distance metric that is well-suited to a given planning problem. PySpatialOpt, Maxcovr and FLP Solver provide such flexibility via accepting a service area configuration or a facility-to-demand distance matrix as input. In contrast, ArcGIS requires a transportation network, which is particularly limiting in some cases. In the York case study, a network had to be created to reflect Euclidean space before the ArcGIS solver could be applied.

It is important to note that we have extended the function of PySpatialOpt and Maxcovr, which led to the satisfaction of the above criteria and improvement of computing efficiency. Specifically, we introduced the distance matrix into PySpatialOpt, which proved more efficient than using service areas. In addition, we incorporated the variable demand weights into Maxcovr, which originally required an equal demand weight. This reflects the importance of the freedom to inspect and extend software functionality, which is only made possible by open source access.

The case studies highlight a comparison of exact and heuristic methods. Both PySpatialOpt and Maxcovr derive optimal solutions using optimisation solvers. In comparison, the FLP Solver relies on a heuristic, which cannot guarantee optimal results. Interestingly, in the case study for locating towers, the ArcGIS heuristic failed to find optimal solutions to the LSCP and 14 MCLP problem instances (out of 22). The suboptimal solutions may be an issue as most users tend to treat results obtained from GIS software as ‘optimal’. When siting facilities that are costly to staff and maintain, any facilities above the minimum lead to unnecessary expenditure. Moreover, PySpatialOpt and Maxcovr perform significantly more efficiently than FLP Solver in solving LSCP and MCLP, mainly due to approach differences (similarly for ArcGIS).

The results also show that PySpatialOpt and Maxcovr are more scalable than FLP Solver and ArcGIS. While PySpatialOpt and Maxcovr scaled well to the York case study, FLP Solver failed to solve this case due to the one-million limit of the distance record. On the other hand, ArcGIS did not scale well to the York case as it required significantly longer computing time than PySpatialOpt or Maxcovr, which could be explained by the ArcGIS workflow that requires a special network reflecting Euclidean distance.

Another discussion point is associated with the documentation. Documentation for software is critical to ensure the code can be used by others, and the cases can be replicated or reproduced (Rey 2018). Documentation is also one of the key criteria in selecting open source GIS software (Steiniger and Hunter 2013). While documentation in PySpatialOpt and FLP Solver is extensive and clear, documentation is limited for the MCLP in Maxcovr, yet necessary to explain that it requires a non-empty set of existing facilities. Of course, this is true as well for ArcGIS, with no documentation to explain that when an LSCP is not feasible, the model returns a facility configuration that covers only those demand that can be covered.

7 Conclusions

This paper has investigated location cover models (LSCP and MCLP in particular) available through open-source and proprietary tools. The tools reviewed are PySpatialOpt, Maxcovr and FLP Solver, and results are compared with those obtained using ArcGIS. Such comparison is critical since the future will no doubt see an increased use and application of these methods through open-source approaches, similar to how current application relies heavily on commercial GIS platforms like ArcGIS and TransCAD (see Murray et al. 2019; Xu et al. 2020). Overall, PySpatialOpt and Maxcovr outperform FLP Solver and ArcGIS in terms of solution optimality, scalability and distance types. The differences between these tools are noticeable in many aspects. Regarding problem assumptions, the open source options are more flexible than ArcGIS as they allow for a wide range of space and distance metrics. Moreover, PySpatialOpt and Maxcovr use exact methods and rely on mathematical solvers (e.g. GLPK) to solve location cover problems, in contrast to FLP Solver that relies on a heuristic (as does ArcGIS). These differences were demonstrated in two application studies that involve two LSCP and 34 MCLP instances. PySpatialOpt and Maxcovr were found to be relatively fast to solve (significantly faster than ArcGIS). Moreover, all solutions derived by PySpatialOpt and Maxcovr were optimal, which are important in facility planning practice. On the other hand, FLP Solver identified optimal solutions for the San Francisco instances, yet could not solve the York case due to the large problem size (591 demand and 1821 potential sites).

The findings of this study suggest some guide on reproducible location modelling from a software perspective. First, a package should have open code and be scrutable. Second, if possible, a package should rely on exact methods to solve location models. Where heuristics are used, a detailed technical description should be provided. Third, a package should accommodate a wide range of space and distance type. Ideally, a package should accept a distance matrix between demand and potential sites as input.

While open-source software is promising for location cover modelling, further work needs to be done with regards to quality control, maintainability and collaboration with other projects, as suggested in (Steiniger and Hunter 2013). First, software quality control measures such as documentation and testing should be implemented in the open-source location models, which would guarantee code usability. Second, the software maintainability (e.g. ease of modification, clear structure) of these tools should be enhanced, which is important to attract contributions from more developers. Third, as there is a trend of collaboration among open source GIS projects, PySpatialOpt and Maxcovr can be coupled with other open source GIS libraries such as geopandas (Jordahl et al. 2020) and PySAL (Rey and Anselin 2007, 2010), which would add to the analytical capacity and attract wider users.