1 Introduction

Several papers discussed the importance of managing key locations and understanding tourist spatial behaviour and its main determinants (Cooper 1981; Russo 2002; Liu et al. 2017). At the micro-level, a better knowledge of tourist intra-destination behaviour is relevant for destination marketing and management (Zoltan and McKercher 2015). Increased understanding of most visited places and how tourists experience the destination may orient policy actions aimed at mitigating overcrowding in certain places, promoting less visited places, and enhancing tourist safety and satisfaction (Li et al. 2019). Despite the importance of understanding tourist movements within a destination, collecting data on visitors’ mobility are a challenging task (Stopher 2012). Amongst such complexity sources, we could identify the vast diversity of routes and attractions available for each visitor (Lew and McKercher 2002), which are also influenced by the visitor and visit features and the spatial locations of infrastructures, restaurants, accommodations (Smallwood et al. 2012). Traditional methods are generally based on post-visit questionnaires or trip diaries (East et al. 2017). However, collecting information on mobility through traditional travel diaries can be a complex and labour-intensive process that demands considerable time and attention from the respondent (Stopher 2012). Moreover, relying on diaries or self-reported routes for the analysis of tourist mobility may result in several pitfalls (McKercher and Zoltan 2014). These may include instances in which tourists find the effort required to complete the diary to be arduous and therefore leave it incomplete or omit certain parts. Furthermore, tourists may encounter difficulties in locating the site to be marked on the diary or map (Puczkó et al. 2010). To overcome these problems, in the last decades, there has been an increase in the number of studies that used tracking technologies for collating data on intra-destination mobility behaviour, especially in tourism research (Shoval et al. 2015). Nowadays, Global Positioning System (GPS) technology allows for collecting information on human mobility at great temporal and spatial details, without any effort required to recall the visited places. Given this vast data, many challenges arise in modelling highly detailed spatio-temporal data.

Although various techniques may be implemented to analyse tracking data (Atluri et al. 2018), more research is needed to study stop location points concerning destination-related characteristics and individual-related variables (Grinberger and Shoval 2019). Approaches considering the surrounding elements, like road networks or locations of attractions, are even more scarce. For instance, Smallwood et al. (2012) analysed the movement patterns of visitors travelling on the road network in northwestern Australia.

Due to geometrical complexities, analysing individual data observed along a network of lines are highly challenging. Moreover, the intrinsic lack of homogeneity in a network militates against the traditional methods of spatial statistics (Baddeley et al. 2021).

Point processes theory on linear networks has been recently proposed for analysing events occurring on a network of lines. They were firstly introduced in the spatial context and then extended to the spatio-temporal case, focusing on the analysis of first- and second-order summary statistics (Ang et al. 2012; McSwiggan et al. 2017; Rakshit et al. 2017, 2019; D’Angelo et al. 2021b, 2022b). Whilst most of the literature about point processes on spatio-temporal first-order intensity on networks is concerned with nonparametric estimation (Moradi et al. 2019; Moradi and Mateu 2020; Mateu et al. 2020), only a recent paper by D’Angelo et al. (2022) has dealt with parametric intensity specification of inhomogeneous first-order intensities on networks. The authors fitted a Gibbs point process model with mixed effects for the purely spatial component and a spatio-temporal log-Gaussian Cox process, adapting both models to the underlying road network. Furthermore, D’Angelo et al. (2022c) have considered the self-exciting behaviour of crime point process data, proposing a spatio-temporal Hawkes model adapted to linear networks, including a parametric estimation of the background based on covariates.

From a methodological perspective, various approaches have been used to model intra-destination tourist movements. Identifying stop locations are essential for summarising the information in tracking data since they may indicate touristic areas of interest for the individuals. Gong et al. (2015) review the main research results on stop location identification. In tourism, stop identification may reveal popular places, such as tourist attractions, restaurants, or shopping centres. After identifying attractions, several studies used graph-based methods to analyse tourist movement patterns (Kurashima et al. 2010; Yang et al. 2017; Hu et al. 2019), intending to reconstruct the network of the relationships amongst the various attractions. Other authors instead focused their attention on trajectory similarity. Shoval et al. (2015), for example, used the sequence alignment method to segment tourists according to their trajectory similarity. Zheng et al. (2019) propose a combination of the dynamic time warping with earth mover distance to compare the trajectories of 56 tourists in the Xiamen campus and Petry et al. (2019) provide a comparative approach amongst various similarity measures for comparing trajectories. Due to the complexity of modelling stop location intensity, most studies undertake descriptive approaches based on visual examination of stop location maps. At the same time, a modelling-based approach may help to reveal essential tourist behaviours.

From an empirical perspective, similar to recent studies (De Cantis et al. 2016; Ferrante et al. 2018; Domènech et al. 2020a, 2020b; Navarro-Ruiz et al. 2020; Shoval et al. 2020; Casado-Díaz et al. 2021), we focus on cruise tourism segments for several reasons. First, the single entry-exit point that characterises cruise visits at the destination makes implementing a GPS-based survey feasible. Second, the limited duration of the visit allows for a complete picture of cruise passengers’ experience at the destination. Third, the cruise tourism segment has mainly been debated due to their impact on destinations, determining to overcrowding of places in the face of a lower expenditure than traditional tourists (Brida and Zapata 2010; Larsen et al. 2013). However, the previous studies have not explicitly focused on cruise passengers’ stop activities to the best of our knowledge. Moreover, this paper provides a comprehensive picture of individuals’ characteristics, integrating a questionnaire-based survey and GPS tracking data in Palermo (Italy) and Dubrovnik (Croatia), based on the assumption that stop locations reveal critical points of interest of the destination. A Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al. 1996) is used to identify the stop locations from the GPS tracking data. The data are analysed using a spatial point process applied to a linear network. Specifically, in this paper, we extend the work presented in D’Angelo et al. (2022) in two directions: (i) by using the DSCAN algorithm, to identify the stops, and (ii) by including external spatial covariates in an application of the Gibbs point process model. Note that the Gibbs point process model is fitted on a road network represented by the configuration of the streets.

All the analyses are carried out through the statistical software R Core Team (2023), and the source codes to reproduce the analysis in the city of Palermo are contained in the GitHub repository https://github.com/nicolettadangelo/gps_data.

The paper is organised as follows. The data are described in Sect. 2. Section 3 presents the DBSCAN method employed to identify stop activities at the destination. In Sects. 4 and 5, we introduce both the theory of spatial point processes on linear networks and the Gibbs point process model for the intensity specification. Sections 6 and 7 report and discuss the results obtained for the cruise tourism segment in Palermo and Dubrovnik, respectively. Section 8 concludes this paper and indicates future research directions.

2 Data

Data were collected from two surveys: the first was conducted in the Spring of 2014 in Palermo (Italy), and the second in the Summer of 2015 in Dubrovnik (Croatia). Both surveys were implemented by integrating a questionnaire-based survey and GPS tracking devices on cruise passengers disembarking at their destination. Specifically, 303 cruisers were interviewed in Palermo and 51 in Dubrovnik; in both cases, 12 variables were selected for each interviewed cruise passenger, according to the literature on visitors’ intra-destination mobility. Data collected in the port of Palermo are initially presented in De Cantis et al. (2016). A description of the Dubrovnik survey can be found in Ferrante et al. (2018), together with the complete description of the collected variables. An opening and a closing questionnaire were administered to collect socio-demographic characteristics and pre-, and post-visit information for each cruise passenger interviewed.

In addition, a GPS tracking device was released to each cruise passenger sampled, recording their position during the visit to the destination. Therefore, the resulting GPS tracking data consist of coordinate points recorded every ten seconds. As an example, Fig. 1 shows the GPS track of a sampled cruise passenger (black points) and the streets (red lines), with the detail of the stops preference and of the time (in minutes) in each stop (identified by the red stars), using Palermo data. Moreover, the top numbers in the circles indicate which stop has been first visited, whilst the bottom numbers indicate the time spent at each stop. The three stops represent the disembarking area, Piazza Pretoria and the Cathedral, the last two being in the list of the main attraction of the municipality of Palermo.

Fig. 1
figure 1

Example of a GPS track (black points) and stops derived from the DBSCAN algorithm (red stars) for a cruiser in the municipality of Palermo. The top numbers in the circular indicate which stop has been first visited. The circular’s bottom numbers indicate the times spent in minutes at each stop (colour figure online)

Even though the data quality is very high, we can identify some noise points. Indeed, when analysing the complete trajectory, a set of pre-processing operations on the raw GPS data are necessary to remove outliers and regularise the trajectories (see Abbruzzo et al. (2021) for further details). However, in this paper, our main aim is to model the stops according to some covariates and the spatial characteristics, then consider the outliers as noise points.

The analysis of cruise passengers’ stop activities, based on the integration of the questionnaire-based survey and GPS tracking data, prevents some typical data collection problems on tourist mobility. Indeed, traditional methods are generally based on post-visit questionnaires or trip diaries, which rely on the accurate recall of the places visited and activities made. However, nowadays, GPS technology allows for collecting information on human mobility at a very great temporal and spatial detail, without any effort required to recall the visited places. In this way, we can evaluate the relationships amongst the number of stops obtained from the DBSCAN algorithm (in Sect. 3), and both individual-related variables, collected through the questionnaire-based survey, and some itinerary-related characteristics, identified from the GPS tracking data on the whole itinerary.

2.1 Covariates

According to the literature on visitors’ intra-destination mobility, a set of covariates related to the individual characteristics, information on the itinerary undertaken, and the selected tourist attractions are included in the analysis. First, a set of variables is available from a questionnaire issued to the cruise passengers: Day of the interview, Age group, Country of origin, Education level (dichotomized in low (High school diploma or Bachelor degree) and high (Master or PhD)), First visit (indicating whether the cruise passenger was visiting the city for the first time or not), Yearly income (dichotomized in < 40,000 and \(\ge\) 40,000 euro). The questionnaire is issued to only one person per group if the person travels with the family or a group of friends.

Regarding the information on the undertaken itinerary, a set of synthetic information for the two contexts under analysis was derived from the GPS tracking data and considered as potential determinants of stop activity, namely: Total length of tour (dichotomized in < 11 and \(\ge\) 11 km), Total time of tour (dichotomized in < 3.5 and \(\ge\) 3.5 hours), Maximum distance from the port (dichotomized in < 3.5 and \(\ge\) 3.5 km), Average duration of the stops (in minutes). Specific threshold values for some covariates were considered to distinguish between different types of tourists. For instance, the maximum distance from the port discriminates between those who stayed within the historical centre of the city of Palermo and those who ventured outside. As for income, low sample size for some income categories suggested that reducing the number of categories to two would be more appropriate. Additionally, dichotomising certain covariates can facilitate easier interpretation of the groups considered (for example between those undertaking a short tour and those making a longer one) and can mitigate some potential problems stemming from measurement errors.

Considering the restricted area of the historical centre of Dubrovnik, information on the maximum distance from the port was not included in the model selection procedure, being relatively constant for all the units under analysis.

Finally, a set of dummy variables are also derived by the questionnaire issued to cruise passengers, indicating whether the cruise passenger has visited a specific touristic attraction. The tourist attractions of the two destinations have been selected after comparing the top attractions on popular websites, such as Tripadvisor, Google rank, Lonely planet and Skyscanner. These tourist attractions include: the Norman Palace, the Cathedral, Massimo Theatre, La Martorana, the Chiaramonte Palace, Quattro Canti, and Ballarò Market in the case of Palermo city, and Minceta Tower, Bokar Tower, Stradun of Dubrovnik, the Cathedral, Loggia Square, Fountain of Onofrio, and The Rector’s Palace and Cultural Historical Museum, in the case of Dubrovnik.

3 The DBSCAN algorithm

The DBSCAN is a density-based algorithm (Ester et al. 1996) designed to identify arbitrary-shaped clusters, where the clusters are sets of spatial points that fall within a certain distance. Concurrently, the algorithm can identify the noise points, which are spatial points not belonging to any cluster. Given \(D^{(i)}\) a generic trajectory for a unit i, a cluster is defined as a minimum set of spatial points which are sufficiently close to each other. The following definitions are given to clarify the concept of points sufficiently close to each other.

Let \(p = (x, y)\) be a point in a trajectory D for a generic unit i. The \(\epsilon\)-neighbourhood of a point p is defined by \(ne_\epsilon (p) = \{q \in D: d(p,q) \le \epsilon \in \mathbb {R}^+\}\), where \(d(\cdot , \cdot )\) is a distance function. If the cardinality of an \(\epsilon\)-neighbourhood of a point p, i.e. \(\text{ card }(ne_\epsilon (p))\), is at least greater than a minimum number (\(minpts \in \mathbb {N}\)) then p is a core point. Moreover, a point p is directly density-reachable from q, with respect to \(\epsilon\) and minpts, if \(p\in ne_\epsilon (q)\) and \(\text{ card }(ne_\epsilon (q)) \ge minpts\). Finally, let’s define density-reachable and density-connected points. A point p is density-reachable from the point q with respect to \(\epsilon\) and minpts if there is a chain \(p_1, \ldots , p_l\), \(p_1 = q\), \(p_l = p\) such that \(p_{i+1}\) is directly density-reachable from \(p_i\). A point p is density-connected to point q, with respect to \(\epsilon\) and minpts, if there is a point o such that both p and q are density-reachable from o with respect to \(\epsilon\) and minpts.

A cluster C is a non-empty subset of D satisfying the following requirements:

  • \(\forall \ \ p, q \in D\): if \(p \in C\) and q is density-reachable from p with respect to \(\epsilon\) and minpts, then, \(q \in C\);

  • \(\forall \ \ p, q \in C\): p is density-connected to q with respect to \(\epsilon\) and minpts.

Let \(C_1, \ldots , C_k\) be the clusters of D with respect to \(\epsilon\) and minpts, then, \(p\in D\) is a noise point if it does not belong to any cluster \(C_i\). The algorithm starts with the first point p in the database D, retrieving all the neighbours of a point p concerning \(\epsilon\) and minpts. This procedure will yield a cluster if p is a core point. Otherwise, no points will be density-reachable from p, and the DBSCAN algorithm will proceed to consider the next point of the database. Centroids summarise the spatial coordinates of each point belonging to a cluster, and the rest are considered as noise. The DBSCAN requires choosing two parameters, namely \(\epsilon\) and minpts, which indicate the search radius and minimum number of points in the search radius for identifying the location as a cluster. Finally, this algorithm does not consider temporal information. For an algorithm extension, which would also incorporate the temporal information, see Birant and Kut (2007).

This study uses a value of 20 (\(\sim\) 3 min) for the minpts tuning parameter and 40 (meters) for the distance value \(\epsilon\). The distance between consecutive points is calculated as Euclidean since it is approximately the same as the one on the linear network when one considers spatial points measured every few seconds. Note that we collect data every ten seconds in the case of Palermo and Dubrovnik.

Figure 1 shows an example of an application for the Palermo cruise passengers’ GPS tracks. The DBSCAN reveals three stops in which the cruise passenger has spent 8.38 (Stop 1), 18.23 (Stop 2) and 16.23 (Stop 3) minutes. The shortest-path distances, which consider the road network, between these stops are around 2300 m, (1 \(\rightarrow\) 2) and (2 \(\rightarrow\) 3), and 764 m. On the contrary, the euclidean distances between the stops are around 1500 (1 \(\rightarrow\) 2) and 600 (2 \(\rightarrow\) 3) meters, respectively. This simple example motivates spatial point processes on a linear network.

4 Spatial point processes on a linear network

A point process model assumes that a point pattern \({\textbf {x}}\) is a realisation of a finite point process X in the bounded region \(D \subset \mathbb {R}^2\) without multiple points. A spatial point pattern is an unordered set \({\textbf {x}}=\{{\textbf {x}}_1,\dots ,{\textbf {x}}_n\}\) of points \({\textbf {x}}_i\) where \(n({\textbf {x}})=n\) denotes the number of points, not fixed in advance (Cressie 2015). In this context, N(A) denotes the number of points of a set \(A \cap X\), where \(A \subseteq D\).

We denote a point location in the plane as \({\textbf {u}}\), specified by Cartesian coordinates \({\textbf {u}} = (u_1, u_2)\) without explicitly mentioning the coordinates. The intensity function describes the first-order property of the point process and is defined as follows:

$$\begin{aligned} \lambda ({\textbf {u}}) = \lim _{\vert d{\textbf {u}}\vert \rightarrow 0} \frac{\mathbb {E}[N(d{\textbf {u}})]}{\vert d{\textbf {u}}\vert }, \end{aligned}$$

where \(d{\textbf {u}}\) represents an infinitesimal region containing the point \({\textbf {u}}\in D\), \(\vert d{\textbf {u}}\vert\) is the area of \(d{\textbf {u}}\), and \(\mathbb {E}[N(d{\textbf {u}})]\) is the expected number of events in \(d{\textbf {u}}\). If the intensity is constant, the process is called homogeneous. In the case of inhomogeneity, the intensity varies within the study area and may depend on the coordinates of points. A point process model, assuming independence, is fully described by its conditional intensity function (Daley and Vere-Jones 2007). For a comprehensive treatment of spatial point processes in Euclidean space, please refer to Baddeley et al. (2015).

In a formal sense, a linear network \(L=\cup _{i=1}^{n} {\textbf {l}}_{i} \subset \mathbb {R}^{2}\) is typically represented by a finite union of line segments in two-dimensional space (Ang et al. 2012). These line segments, denoted as \({\textbf {l}}_i\), have positive lengths and together form the linear network L (\(L=\cup _{i=1}^{n} {\textbf {l}}_{i} \subset \mathbb {R}^{2}\)). The endpoints of these line segments are referred to as nodes, and the degree of a node corresponds to the number of line segments that share that particular node (Okabe and Sugihara 2012). A line segment, \({\textbf {l}}_i\), is defined as \([{\textbf {u}}_i,{\textbf {v}}_i]=\{k{\textbf {u}}_i+(1-k){\textbf {v}}_i: 0 \le k \le 1\}\), where \({\textbf {u}}_i\) and \({\textbf {v}}_i\) represent the endpoints of \({\textbf {l}}_i\) in two-dimensional space. It is worth noting that the intersection of any two line segments, \({\textbf {l}}_i\) and \({\textbf {l}}_j\) (where \(i \ne j\)), can either be empty or consist of an endpoint shared by both segments. The total length of all line segments in the linear network L is denoted by \(\vert L\vert\). When determining the distance between two locations, \({\textbf {u}}\) and \({\textbf {v}}\), within the network L, the most commonly used method is the shortest-path distance, denoted as \(d_L({\textbf {u}},{\textbf {v}})\). This distance corresponds to the minimum length amongst all possible paths connecting \({\textbf {u}}\) and \({\textbf {v}}\) in the network. However, other alternative distance measures have also been discussed in Rakshit et al. (2017).

Similar to planar point processes, a point process on a linear network L, denoted as X, involves a random countable subset of \(\mathbb {R}^2\) without overlapping points. In practical scenarios, we observe \(n > 0\) events, \({\textbf {x}}=\{{\textbf {x}}_1,\dots ,{\textbf {x}}_n\}\), of the point process X occurring within a linear network \(L \subset \mathbb {R}^2\). We use N(L) to represent the number of events that take place on L. It is important to note that these events are not limited to occurring solely at the nodes of the network. Just like in the planar case, we define the intensity function \(\lambda (\cdot )\) as the point process equivalent of the mean function for a real-valued stochastic process. For further information about the statistical properties of the intensity function, refer to Moradi et al. (2018).

A point process N on a linear network L is considered homogeneous if it exhibits a constant intensity, that is, \(\lambda ({\textbf {u}}) = \lambda\) for all \({\textbf {u}} \in L\). In general, \(\lambda ({\textbf {u}})\) is interpreted as the expected number of events per unit length of the linear network L in the vicinity of \({\textbf {u}}\). However, a more realistic and general scenario are the inhomogeneous case, where the constant intensity \(\lambda\) of the Poisson process is replaced by a spatially varying intensity function over the linear network L. Recent research on statistical methods for analysing spatial patterns of points on a network of lines, such as road accident locations along a road network, has been reviewed by Baddeley et al. (2021).

5 Modelling the spatial stop intensity

In this section, we introduce a modelling approach useful for describing the spatial behaviour of the visitors. The presented methodology is developed through a stochastic point process modelling approach on a linear network originally proposed in D’Angelo et al. (2022). Specifically, we incorporate a random subject-specific effect and employ a parametric model to capture the visitors’ stops, taking into account both the underlying network structure and the individual tourists’ choices. To accomplish this, we adopt the Gibbs point process models with mixed effects, as outlined by (Illian and Hendrichsen 2010), adapting the procedure to the specific context of linear networks. This flexible procedure enables the incorporation of individual information through appropriate random and fixed factors, as well as external covariates.

Consider a scenario where there are M visitors on a linear network L, and each visitor generates a point pattern of stops represented by \({\textbf {x}}_1,\dots ,{\textbf {x}}_M\). In this context, we make the assumption that each \({\textbf {x}}_m\) (where \(m=1,\dots ,M\)) follows a pairwise interaction process (Van Lieshout 2000), characterised by a conditional intensity (Kallenberg 1984), which is given by:

$$\begin{aligned} \lambda _{\theta ,\phi _m}({\textbf {u}};{\textbf {x}}_m)=b_{\theta ,\phi _m}({\textbf {u}})\prod _{i=1,{\textbf {x}}_{im}\ne {\textbf {u}}}^{n({\textbf {x}}_m)}h_{\theta ,\phi _m}({\textbf {u}},{\textbf {x}}_{im}), \end{aligned}$$

where the parameter \(\theta\) is a vector of fixed effects and the random effects \(\phi _m\) are assumed to come from \(\Phi \sim N(0,\sigma _{\phi }^2 I)\). Then \(n({\textbf {x}}_m)\) is the number of points in \({\textbf {x}}_m\), that is, the number of stops per visitor, \(b_{\theta ,\phi _m}({\textbf {u}})\) and \(h_{\theta ,\phi _m}({\textbf {u}},{\textbf {v}})\) are two functions that model the intensity and the interaction, respectively.

5.1 Model estimation through pseudolikelihood

Inference is carried out through the maximisation of the pseudolikelihood over the subset \(A \subseteq L\)

$$\begin{aligned} \text {PL}_A\left( \theta ,\phi ;{\textbf {x}}_1,\dots ,{\textbf {x}}_M\right) =\prod _{m=1}^M\Bigg (\prod _{{\textbf {x}}_{im}\in A} \lambda _{\theta ,\phi _m}\left( {\textbf {x}}_{im}; {\textbf {x}}_m\right) \exp \bigg (-\int _A \lambda _{\theta ,\phi _m}({\textbf {u}};{\textbf {x}}_m)d{\textbf {u}}\bigg )\Bigg ) \end{aligned}$$

that is,

$$\begin{aligned} \begin{aligned} \text {PL}_A(\theta ,\phi ;{\textbf {x}}_1,\dots ,{\textbf {x}}_M)&=\prod _{m=1}^M\Bigg (\prod _{{i}=1}^{n({\textbf {x}}_m)}b_{\theta ,\phi _m}({\textbf {x}}_{im}) \prod _{{i \ne j}}^{n({\textbf {x}}_m)}h_{\theta ,\phi _m}\left( {\textbf {x}}_{im},{\textbf {x}}_{mj}\right) \Bigg ) \\ {}&\times \exp \Bigg \{ -\int _A b_{\theta ,\phi _m}({\textbf {u}}) \prod _{i=1}^{n({\textbf {x}}_m)}h_{\theta ,\phi _m}\left( {\textbf {u}},{\textbf {x}}_{im}\right) d{\textbf {u}} \Bigg \}, \end{aligned} \end{aligned}$$
(1)

where \(\phi =(\phi _1,\dots ,\phi _M)\).

In order to avoid the computation of integrals in the pseudolikelihood, we employ the Berman–Turner device (Baddeley and Turner 2000), which utilises a quadrature rule. This approach involves approximating the integral in (1) for each m by a finite sum over a set of points \({\textbf {u}}_{mj}\), where \(j=1,\dots ,l_m\). These points include all the data points and allow for a more manageable computation. This approach is advantageous as it can be shown that the log-pseudolikelihood is formally equivalent to the log-likelihood of independent weighted Poisson variables, i.e.

$$\begin{aligned} \log \text {PL}_A\left( \theta ,\phi ;{\textbf {x}}_1,\dots ,{\textbf {x}}_M\right) \approx \sum _{m=1}^M \sum _{j=1}^{l_m} \left( y_{mj}\log \lambda _{\theta ,\phi _m}\left( {\textbf {u}}_{mj}\right) -\lambda _{\theta ,\phi _m}\left( {\textbf {u}}_{mj}\right) \right) w_{mj} \end{aligned}$$

where \(w_{mj}\) is the quadrature weights, \(y_{mj}=\frac{z_{mj}}{w_{mj}}\) and

$$\begin{aligned} z_{mj}= {\left\{ \begin{array}{ll} 1 \quad \text {if}\quad {\textbf {u}}_{mj} \in \{{\textbf {x}}_{m1}, \dots , {\textbf {x}}_{m n({\textbf {x}}_m)}\}\\ 0 \quad \text {if} \quad {\textbf {u}}_{mj} \notin \{{\textbf {x}}_{m1}, \dots , {\textbf {x}}_{m n({\textbf {x}}_m)}\} \end{array}\right. }. \end{aligned}$$
(2)

The initial step in fitting the proposed spatial model involves creating a quadrature scheme on the linear network. Subsequently, we replicate dummy points for each potential mark, which in this case, correspond to the ID of the visitor. These replicated dummy points are generated at the same locations as the data points, but with distinct marks. If the original point pattern includes additional external information, such as individual covariates, their values remain associated with the generated dummy marked points. This is important as it allows for the inclusion of individual-specific covariates in the model specification, denoted as \({\textbf {Z}}({\textbf {u}}_{mj})\), which is replicated for all the quadrature points. Note that the suggested approach basically requires the generation of new dummy points for each level of the marks. Indeed, the procedure follows that of simulating a multitype point pattern. For this reason, our proposal does not directly extend to the case of individual covariates defined on a continuous scale.

Finally, the two sets of dummy points and the data points are superimposed, and the quadrature weights and the indicators (2) are computed.

In this paper, we apply the proposed models to these newly generated quadrature points, allowing for the incorporation of random effects and subject-specific covariates. We denote the location of these new sets of points as \({\textbf {u}}_{im}\). The model can be then fitted through the R Core Team (2023) package mgcv (Wood 2017).

For the intensity function \(b_{\theta ,\phi _m}({\textbf {u}}_{im})\), we assume an additive structure on the log-scale, that is:

$$\begin{aligned} b_{\theta ,\phi _m}({\textbf {u}}_{im})=\exp \left( \theta _1 B_1\left( {\textbf {u}}_{im}\right) +\phi _{1m}B_2\left( {\textbf {u}}_{im}\right) +\theta _3B_3\left( {\textbf {u}}_{im}\right) +B_4\left( {\textbf {u}}_{im}\right) \right) . \end{aligned}$$

In detail, we assume that \(B_1({\textbf {u}}_{im})=\varvec{1}\) and \(B_3({\textbf {u}}_{im})\) is the distance from the nearest attraction, computed as the shortest-path distance on the road network of the city. (see Fig. 5). This spatial covariate replaces the dummy variables of the model in D’Angelo et al. (2021), which indicated the visited specific touristic attraction by the cruise passenger, resulting from the questionnaire issued to the cruise passengers when disembarking. For this reason, the dummy variables considered in D’Angelo et al. (2021) could suffer from the participants’ recall bias. This bias is overcome by the spatial covariate \(B_3({\textbf {u}}_{im})\), which accounts for the spatial displacement of the considered touristic attractions by the GPS tracking data. In addition, \(B_2({\textbf {u}}_{im})\) denotes the ID of the visitor, included as a random effect. \(B_4({\textbf {u}}_{im})\) is a nonparametric function for the spatial location \({\textbf {u}}_{im} \in L\), which might be estimated through kernel, splines or other nonparametric techniques. Further external covariates, whose values are obtained for each point of the quadrature scheme when generating their new location as \({\textbf {Z}}({\textbf {u}}_{mj})\), can be included in the specification of the model.

To describe the interaction function \(h_{\theta ,\phi _m}({\textbf {u}}_{im},{\textbf {v}}_{im})\), we consider:

$$\begin{aligned} h_{\theta ,\phi _m}({\textbf {u}}_{im},{\textbf {v}}_{im})=\exp (\theta _2H({\textbf {u}}_{im}, {\textbf {v}}_{im})+\phi _{2m}H({\textbf {u}}_{im},{\textbf {v}}_{im})). \end{aligned}$$

We propose a smooth interaction function \(H(\cdot ,\cdot )\), which is assumed to be dependent only on the shortest-path distance between any pair of points, i.e. the length of the shortest path between the location of the two points on the network.

For two points located on the network, denoted as \({\textbf {u}}\) and \({\textbf {v}}\), we define the interaction function as follows:

$$\begin{aligned} H({\textbf {u}},{\textbf {v}})= {\left\{ \begin{array}{ll} \left( 1-\left( \frac{d({\textbf {u}},{\textbf {v}})}{R}\right) ^2\right) ^2 \quad \quad \text {if} \quad 0 < d({\textbf {u}},{\textbf {v}}) \le R\\ 0 \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \text {else} \end{array}\right. }, \end{aligned}$$
(3)

where \(d({\textbf {u}},{\textbf {v}})\) represents the shortest-path distance, and \(R \ge 0\) defines the radius of interaction.

In this particular application, we set the interaction radius to \(R=100\ m\), which is considered a reasonable distance threshold that captures potential interactions amongst visitors’ choices of stop locations.

6 Discussion of results in Palermo city

6.1 Stop activity identification

The DBSCAN algorithm summarises the information on the GPS tourist tracks by counting the number of stops and the time spent at each stop. The application of the DBSCAN to the 303 cruise passengers sampled during six days of surveys identified 1809 stops, with an average number of stops per tourist equal to 6, with a minimum of 1 and a maximum of 22. The duration of stops ranges from 5 to 56 min, with a mean of 17 and a median of 15 min. Figure 2 show the relative frequency distribution of the cruise passengers’ stops (on the left) and the average time spent at each stop (on the right), respectively.

Fig. 2
figure 2

On the left: Relative frequency distribution of the number of cruisers stops in Palermo. On the right: Distribution of the average time spent in each stop, derived from the DBSCAN algorithm

Both distributions are positively skewed; around 11% of the cruise passengers make eight stops, and 50% of the sample stops between 7 and 13 times.

6.2 Determinants of stop number

In Fig. 3, bivariate plots of the number of stops according to the set of considered categorical covariates are reported.

Fig. 3
figure 3

Boxplots of the number of visitors’ stops in Palermo for the considered categorical covariates

As for the exploratory analysis, by looking at the plots in Fig. 3, the number of stops generally appears higher as the mobility behaviour, measured by some synthetic characteristics, increases. Namely, the higher the distance from the port, the higher the number of stops made. Similar considerations hold for the total duration of the tour and the total length of the tour. That is, those who tend to explore more the destination also tend to stop more. As for socio-demographic characteristics, the degree of association is less clear. Nonetheless, the median value of the stops is slightly higher for those with higher education and income levels. On the other hand, this value is slightly lower for repeated visitors, compared to those who visit the destination for the first time, which then seem to prefer the destination’s attractions (Wang 2004).

These results are confirmed in D’Angelo et al. (2021), which describes the joint effect of the considered variables on the number of stops by fitting a Poisson regression model, to evaluate the influence of the individual-related variables and itinerary-related characteristics on cruise passengers’ stop behaviour, and identify the potential determinants of the stop number.

The results obtained from the previous study are rather reasonable. Regarding socio-demographic characteristics, both education and income are slightly significantly associated with the total stop number. More in detail, the association with the income may be explained with the potential expenditure associated with the stop activities (Thrane 2012), as well as education may be associated with the visit to museums or other types of attractions (Parola et al. 2014). In terms of the visited places, only the Cathedral visit is slightly associated with the total stop number. In contrast, the other considered places of interest are not significantly associated with the total stop number.

Note that in this paper, we aim at accounting for the spatial displacement of points and, therefore, the underlying road network, and we expect that people stop more often near well-known touristic attractions. For these reasons, we compute the spatial variable distance from the nearest touristic attraction, where the metric used was the shortest-path distance on the network from the spatial locations \({\textbf {u}}\) of the known attractions.

The resulting variable comes in the left panel of Fig. 4. Its marginal effect on the intensity of the process is assessed by plotting the marginal smoothed intensity function. Let \(\lambda ({\textbf {u}})\) be the intensity of the process under stud and \(Z({\textbf {u}})\) the considered covariate. Then, assuming \(\lambda ({\textbf {u}}) = f(Z({\textbf {u}}))\) where f is a nonparametric estimate of the intensity of the analysed point process, we wonder if the observed intensity depends on the spatial covariate. Smooth estimators of \(f(Z({\textbf {u}}))\) were proposed by Baddeley et al. (2012), and the smoothing procedure we consider is based on fixed-bandwidth kernel density estimation. The Poisson confidence bands are also computed.

Fig. 4
figure 4

Left panel: Values of the variables distance from the nearest touristic attraction in Palermo. Right panel: Nonparametric estimate of the intensity of the analysed point process, as a function of the distance from the nearest touristic attraction

Looking at the smoothed functions in the right panel of Fig. 4, it is evident that the effect of the spatial covariate varies as a function of the scale. In detail, the intensity exponentially decreases moving away from the nearest tourist attraction, whilst it becomes pretty constant after 200 ms.

6.3 Results of the spatial modelling of the stops on the road network

For the present study, for computational reasons, only two days of the survey are taken into account, referring to the cruise passengers visiting the city after disembarking from the cruise ship. Note that computational times for fitting point process models increase with the number of points. In our particular setting, the computational burden mostly depends on the number of IDs considered as random effects, as this, in turn, affects the quadrature scheme and, consequently, the dummy points generated. The spatial point pattern consists of 278 stops made by 72 visitors, stopping four times on average during their visit to downtown Palermo city on the 25th and 28th April 2014. To properly account for the constrained structure of the space support, the road network of the selected area is considered, providing a linear network L with 4473 vertices and 5399 lines. The quadrature scheme used for model fitting consists of the analysed 278 data points, representing the visitors’ stops, and 10,798 dummy points obtained generating the quadrature scheme on the analysed network. This leads to a dataset of 797,472 quadrature points, which is equal to the number of data points plus the number of dummy points, all replicated for the number of marks M.

In Fig. 5, the locations of the observed stops (in red) and the main considered attractions (in green) are displayed.

Fig. 5
figure 5

In red: the spatial point pattern of visitors’ stops in the downtown of Palermo city on the 25th and 28th April 2014. In green: the location of the tourist attractions (colour figure online)

The motivation for including the socio-economic characteristics and synthetic information on the itinerary undertaken as covariates is threefold: to explain the spatial inhomogeneity, to consider the characteristics of the visit, and to follow the literature on intra-destination behaviour.

In particular, starting from the fitted model in D’Angelo et al. (2021), based on the AIC values, none of the individual-related variables resulted significant.

Moreover, amongst other destination-related considered characteristics, both the geographical configuration of the destination determined by the road network and the shortest-path distance of each stop location from the nearest tourist attraction is taken into account.

Thus, we model the spatial intensity as follows:

$$\begin{aligned} \begin{aligned} \log \lambda _{\theta ,\phi _m}({\textbf {u}}_{im}; {\textbf {x}}_{m})&= \theta _1+\phi _{1m}B_2({\textbf {u}}_{im})+\theta _2v_{im}+\theta _3Z({\textbf {u}}_{im})+B_4({\textbf {u}}_{im}) \end{aligned} \end{aligned}$$
(4)

where: \(v_{im}=\sum _{j=1}^{n(\varvec{x}_m)}H({\textbf {u}}_{im},{\textbf {x}}_{jm})\); \(\theta _2\) is the fixed effect of the smooth function in Eq. (3); \(\theta _3\) is the fixed effect of the distance from the nearest attraction, computed as the shortest-path distance on the network from the location of these known attractions; \(\phi _{1m}\) is the random effect of the ID; the nonparametric function \(B_4({\textbf {u}}_{im})\) is estimated through thin plate regression splines with a chosen number of 29 knots for our analysis.

In Table 1, the estimates of the fixed effects and the summary of the random effects of the selected model (4) are reported.

Table 1 Parameters’ estimates and approximate significance of smooth terms of the Gibbs model (4) for the downtown of Palermo city

When \(\exp (\hat{\theta }_1)\) is multiplied by the length of the network, the estimated stops for each individual are 14.57, higher than the original average stops. This finding is likely due to the sparsity of the original points in some areas of the network.

The positive interaction parameter \(\exp (\hat{\theta }_2)=1.26\) indicates that overall the visitors’ stops attract each other. Therefore, visitors tend to stop in the same spots. Furthermore, \(\exp (\hat{\theta }_3)=0.996\), being less than 1, indicates that moving away from any tourist attraction slightly decreases the probability of visitors stopping. From the significant random effects, we notice that the intensity varies only amongst visitors (\(\hat{\phi }_{1m}\)), contrary to the interaction (\(\hat{\phi }_{2m}\)) which is not significant and therefore excluded from the model. The variance for the visitors’ random effect is estimated as 0.0003. Overall, this conclusion opens new research perspectives on modelling human behaviour and applying ecological theories (Meekan et al. 2017). Finally, the inclusion of the smooth term \(B_4({\textbf {u}}_{im})\) accounting for the spatial coordinates improves the fitting of the model significantly.

To make the estimator unbiased, that is, given the expected number of points \(\mathbb {E}[\int _L \lambda ({\textbf {u}})d({\textbf {u}})]=n\), the intensity obtained by the Equation (4) has been normalised \(\hat{\lambda }({\textbf {u}})= \frac{n\hat{\lambda }({\textbf {u}})}{\int _L \hat{\lambda }({\textbf {u}})d({\textbf {u}})}\). This is a peculiar problem of models fitted to linear networks, as the resulting intensity is interpreted as the number of points per unit length of the network (and not per unit area as in typical planar point pattern analysis). Therefore, it is often the case that the intensity resulting from a fitted model on a linear network does not integrate directly with the number of original points in the analysed point pattern. Therefore, Fig. 6 shows the estimated intensity, displaying the expected number of stops for each location of the quadrature on the network. To make easier the reading and to highlight the regions where visitors are most likely to stop, only the estimated intensities that are higher than the \(95^{th}\) percentile are reported.

Fig. 6
figure 6

Estimated pointwise intensities for Palermo, above the 95th percentile: The bigger the circle, the higher the intensity. The intensity has been normalised to obtain the expected number of stops for each location. Locations of the tourist attractions are displayed in red (colour figure online)

7 Discussion of results in Dubrovnik city

7.1 Stop activity identification

The application of the DBSCAN to the 51 cruise passengers sampled during six days of surveys identified 374 stops, with an average number of stops per tourist equal to 7.63, with a minimum of 2 and a maximum of 17. The stop duration ranges from 5.73 to 69.38 min, with a mean of 20.36 and a median of 16.46 min.

Figures show the relative frequency distribution of the cruiser stops (on the left) and the average time spent at each stop (on the right). Both distributions are positively skewed (Fig. 7).

Fig. 7
figure 7

On the left: Relative frequency distribution of the number of cruisers stops in Dubrovnik. On the right: Distribution of the average time spent in each stop, derived from the DBSCAN algorithm

7.2 Results of the spatial modelling of stop locations in Dubrovnik

In the case of Dubrovnik, the spatial point pattern consists of 374 stops made by 49 visitors, stopping seven times on average during their visit in the city downtown, on 3rd and 5th September 2015. The linear network L comes with 1178 vertices and 1245 lines. The quadrature scheme used for model fitting consists 140,434 quadrature points (374 data points and 2492 dummy points).

Fig. 8
figure 8

Left panel: the spatial point pattern of visitors’ stops in the downtown of Dubrovnik city (in red) on the 3rd and 5th September 2015, and the location of the tourist attractions (in green). Right panel: Values of the variables distance from the nearest touristic attraction in Dubrovnik (colour figure online)

In the left panel of Fig. 8, the locations of the observed stops (in red) and the main considered attractions (in green) are displayed. As for the Italian case, we model the spatial intensity as in Eq. (4). Indeed, starting from the full model considering all the individual-related variables, the best model for the Croatian data is the same. The spatial covariate obtained as the shortest-path distance from the nearest point of attraction comes in the right panel of Fig. 8.

In Table 2, the estimates of the fixed effects and the summary of the random effects of the final selected model are reported.

Table 2 Parameters’ estimates and approximate significance of smooth terms of the Gibbs model (4) in Dubrovnik

The model estimates the stops for each individual, which are 6.18. Some comments on the main differences with respect to the Palermo spatial intensity come as follows. The interaction parameter is estimated as positive also for the Dubrovnik data, indicating that also here, stops tend to attract each other. The variance for the visitors’ random effect is estimated as 0.275.

As in the case of Palermo, the exponential of the parameter representing the distance from the nearest point of attraction is negative, even though very close to 1. This suggests that the probability of stopping decreases moving away from the points of interest, and this is indeed confirmed by the clusters found in Fig. 8, quite close to the considered points of interest. Furthermore, only the random effect of the ID of the tourist is significant, whilst the interaction amongst their stops is not.

Fig. 9
figure 9

Estimated pointwise intensities for Dubrovnik. The bigger the circle, the higher the intensity. The intensity has been normalised to obtain the expected number of stops for each location. Locations of the tourist attractions are displayed in red (colour figure online)

Finally, Fig. 9 displays the obtained estimated intensity, showing the expected number of stops for each location of the quadrature on the network.

8 Summary and conclusions

This paper proposes an approach to derive meaningful information on a tourist visit at the destination, starting from the Global Positioning System (GPS) tracking data and the questionnaire-based information. The complex structure of the GPS data requires methods able to synthesise the vast amount of data collected to extract relevant information on the visitors’ behaviour. Amongst the various types of information which can be derived from the GPS tracking data, in this contribution, we focused our attention on stop activities as an essential element for destination management, not only because stop locations may highlight business opportunity (Adongo et al. 2017), but also because they may indicate crowded places, sustainability issues at attractions and traffic congestion (Chiou and Hsieh 2020).

To identify cruise passengers’ stop activities, a spatial clustering algorithm, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al. 1996), has been applied to the GPS tracking data collected at an individual level. Moreover, thanks to the integration of stop activities, socio-demographic characteristics and other itinerary-related information, it was possible to identify some of the potential determinants of stop activity at the destination. Determining the number of stops and analysing their main determinants is fundamental for service management since the stop locations may identify places where most of the expenditure is concentrated (Thrane and Farstad 2012).

Moreover, a novel model has been proposed to analyse the main determinants of spatial intensity of cruise passengers’ stop locations during their visit. The proposed model considers the linear network determined by the street configuration of the destination under analysis. The results do show that socio-demographic characteristics do not have a significant influence on stop location patterns, and the trip-related characteristics are the most influencing factors, including the distance from the main attractions and the potential interaction amongst cruise passengers in stop location decisions. Thus, attraction locations and interaction amongst cruise passengers seem to be the main determinants in stop location patterns. Nonetheless, these results could be influenced by the particular segment under study, the cruise passengers, characterised by a limited time for visiting the destination and a tendency towards group behaviour (Kriwoken and Hardy 2018).

From a methodological perspective, this paper contributes to the analysis and synthesis of the GPS tracking data, particularly for the modelling of spatial point processes on a linear network. Moreover, the Gibbs point process approach allows for analysing interactions amongst points to check whether attraction or repulsive relationships exist amongst tourists’ stop location choices. Whilst most of the recent literature on this topic is concerned with nonparametric intensity estimation, both in space and space-time, our approach contributes to the framework of point processes on networks by proposing a parametric model. A drawback of the present study is that it does not account for the temporal component. Indeed, an exciting line of future research is analysing the spatial point pattern of the individual trajectories (Moradi et al. 2018a) marked by time spent in each stop location.

From an applied perspective, improved knowledge of tourists’ spatial behaviour has relevant implications for destination management, given that a better knowledge of the determinants of spatial intensity of visitors’ stop locations in urban contexts may orient destination management policy. Nonetheless, another limitation of our study concerns the analysis of stops made by cruise passengers, which may be influenced by various factors that require further investigation. For instance, the decision to take organised tours or the behaviour of passengers in groups may be influenced by prior information provided before visiting the destination or by the accompanying tour guide. A more detailed investigation of the determinants of stops made by cruise passengers is necessary to gain a better understanding of these factors and their potential impact on our results. Also, the analysis is here focused on a restricted destination area. A wider study area would better account for other potential determinants of the stop activity related to individual trajectories and socio-demographic characteristics.