Keywords

1 Introduction

Tourism sector deals with various issues and requires reliable information to support decision making. One tool that is established for this task is the ABM [1], which is a computational approach for modeling complex systems consisting of interacting autonomous agents and simulating their activities [2]. Contrary to the top-down approaches that can estimate total number of tourists and provide a high level overview [3], the bottom-up approach of ABM defines and tracks behaviors of each individual tourist and can answer “what if” scenarios. Thus, the simulations generated by ABM can provide valuable insights to decision makers in dealing with current and future situations (e.g., what happens if some POIs are closed or the number of tourists is doubled?).

For ABM to be successful in creating realistic simulations, it first needs to be initialized with the right parameters for agents’ behavior. These parameters can be obtained by analyzing historical data, such as POI check-in data in the case of this study, for tourist flow patterns and behavior. Here we propose an approach for building a spatio-temporal ABM to simulate the behavior of tourists in the city of Salzburg based on the historical data of POI check-ins with tourist card. The main research question of this paper is can ABM adequately simulate tourists’ visiting patterns in terms of visits to POIs in one day? The practical contribution of this study is the extraction of ABM simulation parameters from check-in data format which can be recreated for other similar input datasets. Theoretical novelty, which we have not seen elsewhere, is in the use of frequent itemset mining to define agents’ decision making for the next destination.

2 Related Work

Baktash et al. [1] recently reviewed existing literature on ABM in tourism. According to their classification, our study falls between the tourist flow management and tourist decision making. Recent studies in these fields have looked at ABM for tourist flow management for 41 attractions in Sichuan [4], the spatial spillover effects across 314 Chinese cities [5], and user generated content analysis to deduce desired destinations [6]. What stands out about [1] is that there is no mention nor discussion of the role of machine learning (ML) and artificial intelligence (AI) for ABM in tourism. This shows us that even if such trend exists in other fields, there is a gap in the tourism ABM with potential to improve tourist behavior modeling by using ML and AI on historical data. Additionally, the tourist check-in data that is used here is not commonly found in literature and presents a new challenge.

3 Methodology

The proposed approach starts by analyzing the proprietary POI check-in data obtained from Salzburg CardFootnote 1 users (Fig. 1) who are usually short term visitors to Salzburg. The data hold anonymous check-ins into 29 different POIs from 2017 to 2019 where each row shows a unique user identifier, the name of the POI, and the date of the check-in.

Fig. 1.
figure 1

The workflow of the proposed method. POI check-in data are analyzed up until the day of the simulation to extract ABM parameters, and the results of the 1-simulation are compared against the ground truth data.

There are three parameters of the ABM environment that need to be determined from the input data via statistical analysis and data mining including the number of tourists, the number of POIs visited per day, and the POI selection preference by tourists. The number of tourists for the day of the simulation is set to be the same as the number of active tourists on the previous day in the data. For the number of POIs that tourists visit per day, we consider the distribution of these values for all the days in the data prior to the simulation day. We then use the mean, and standard deviation values to randomly select a number of POIs that each agent in the simulation has to visit using the gauss (mean, standard_deviation) GAMA function. The POI selection preference is defined by mining frequent itemsets of length 1 on individual POIs. The supports for POIs were calculated using the check-ins up until the day of the simulation and the support expresses the ratio of tourists that have visited the POI. The supports are then used as weights in the weighted random choice function used by tourists in the simulation to select their next destination.

The next step is to run multiple simulations of tourist flow in the city of Salzburg for the specified simulation day using the GAMA ABM platformFootnote 2 which supports the use of spatial datasets for defining the model environment [7]. We use OSM, the most prominent Volunteered Geographic Information (VGI) dataset [8], as a source of 2D vector information about road networks on which agents can move and building footprints as their origins and destinations.

The results of the simulation runs can be aggregated and compared to the ground truth POI check-in data for evaluation.

4 Experiment and Results

The experiment was performed on a standard PC with an I7 processor and 16 GB RAM running a Windows operating system. POIs were represented as point geometries sourced directly from Salzburg Card. The agents were initialized as sleeping at accommodations and would start visiting POIs around 8AM and would finish their visits when they have visited the requested number of POIs for the day. We ran 50 simulations for the simulation day March 1, 2019.

Figure 2 shows the histogram of visited POIs by a single user per day. The leftmost histogram shows the data from January 1, 2017 until March 1, 2019 which were used to train tourist-agents in the ABM simulation. The middle and right histograms show the simulation result for March 1, 2019 and the respective ground truth data. It can be noticed that the counts of simulated POI visits per agent (middle) are distributed similarly to the actual POI visits per tourist on the simulation day (right), which is also confirmed by the chi-squared test value of \(\chi ^2=0.1455\) between these two histograms.

Fig. 2.
figure 2

Histogram showing the number of POIs visited by a user per day for the data until the simulation day (left), simulation results (middle), and the real data on the simulation day (right).

Fig. 3.
figure 3

Mean visitor check-in numbers per POI across all 50 simulations (red line) with the 95% confidence interval (pink area) shown against the real POI visitor numbers for the simulation day (blue bars). The dashed line shows supports of POIs in the period until the simulation day that were used to parametrize agents’ POI selection. (Color figure online)

In Fig. 3 we list the 29 POIs within the City of Salzburg and their average visiting probability per tourist (dashed line). We then ran 50 simulation runs and analyzed the quantitative visitor numbers for each POI. Here, the blue bars represent the true visitor numbers for the simulation day (March 1, 2019), and the red line depicts the mean visitor numbers across all 50 simulations with 95% confidence interval indicated by the pink area.

5 Conclusion and Future Work

We proposed a tourist flow simulation approach based on ABM that is parameterized by historical data of POI check-ins. The three parameters for the ABM simulation, the number of people agents, number of POIs each agent visits per day, and the selection of POIs that agents visit were determined from historical data analysis. The simulation was performed for a single day on the model of the city of Salzburg and repeated 50 times. The simulation results were then compared to the ground truth POI check-in data for the same day.

The results show that the simulations are able to create overall realistic patterns of POI check-ins. However, Fig. 3 shows discrepancies in some POIs where the simulated numbers are much larger than true values. The simulated values for these POIs are similar to the long term trend shown with dashed line (e.g., POI Entritt Schloss & Wasserspeile Hellbrunn) which is observed over the training period. Our method is limited in reflecting seasonal or daily changes where POI may be closed on the simulation day. Thus, we need to develop a more sophisticated POI selection approach for the simulation that is based on a more detailed trend analysis. We should also increase the behavior complexity of agents, relying on theories from social sciences.