Modelling cross-dependencies between Spain’s regional tourism markets with an extension of the Gaussian process regression model

This study presents an extension of the Gaussian process regression model for multiple-input multiple-output forecasting. This approach allows modelling the cross-dependencies between a given set of input variables and generating a vectorial prediction. Making use of the existing correlations in international tourism demand to all seventeen regions of Spain, the performance of the proposed model is assessed in a multiple-step-ahead forecasting comparison. The results of the experiment in a multivariate setting show that the Gaussian process regression model significantly improves the forecasting accuracy of a multi-layer perceptron neural network used as a benchmark. The results reveal that incorporating the connections between different markets in the modelling process may prove very useful to refine predictions at a regional level.


Introduction
In recent years there has been a growing interest in machine learning (ML) techniques for economic forecasting (Weron 2014;Gharleghi et al. 2014;Kock and Teräsvirta 2014;Ben Taieb et al. 2012;Crone et al. 2011;Andrawis et al. 2011;Carbonneau et al. 2008).ML is based on the construction of algorithms that learn through experience.The main ML forecasting methods are support vector regression (SVR) and artificial neural network (ANN) models.Plakandaras et al. (2015) propose a hybrid forecasting methodology that combines an ensemble empirical mode decomposition algorithm with a SVR model to forecast the US real house price index.Lin et al. (2012) also combine an algorithm for time series decomposition with a SVR model for foreign exchange rate forecasting.Kao et al. (2013) and Kim (2003) use different SVR models for stock index forecasting.Tay andKao (2001, 2002) apply support vector machines in financial time series forecasting.Stasinakis et al. (2015) use a radial basis function ANN to forecast US unemployment.Feng and Zhang (2014) and Aminian et al. (2006) use ANN models in forecasting of economic growth.Sermpinis et al. (2012) and Lisi and Schiavo (1999) make exchange rates predictions by means of several ANNs.Sarlin and Marghescu (2011) generate visual predictions of currency crisis by means of a self-organizing map ANN model.Adya and Collopy (1998) evaluate the effectiveness of ANN models at forecasting and prediction.A complete summary on the use of ANNs with forecasting purposes can be found in Zhang et al. (1998).
Whilst SVR and ANN models have been widely used in economic modelling and forecasting, other ML techniques such as Gaussian process regression (GPR) have been barely applied for forecasting purposes (Andrawis et al. 2011;Ahmed et al. 2010;Banerjee et al. 2008;Chapados and Bengio 2007;Brahim-Belhouari and Bermak 2004;Girard et al. 2003).GPR was originally devised for interpolation.The works of Smola and Bartlett (2001), MacKay (2003), and Williams and Rasmussen (2006) have been key in the development of GPR models.By expressing the model in a Bayesian framework, the authors extend GPR applications beyond spatial interpolation to regression problems.GPR models are supervised learning methods based on a generalized linear regression that locally estimates forecasts by the combination of values in a kernel (Rasmussen 1996).Thus, GPR models can be regarded as a nonparametric tool for regression in high dimensional spaces.One of the limitations of the current methods for GPR is that the framework is inherently one dimensional, i.e. the framework is designed for multiple inputs and a single output.GPR models present one fundamental advantage over other ML techniques: they provide full probabilistic predictive distributions, including estimations of the uncertainty of the predictions.These features make GPR an ideal tool for forecasting purposes.This paper presents an extension of the GPR model for MIMO forecasting.This approach allows to preserve the stochastic properties of the training series in multiplestep ahead prediction (Ben Taieb et al. 2010).By extending conventional local modelling approaches we are able to model the cross-dependencies between a given set of time series, returning a vectorial forecast.The structure of the proposed model, consists of a batch of univariate forecasting modules based on Gaussian regression, followed by a linear regression that takes into account the cross-influences between the different forecast.
ML methods are particularly suitable to model phenomena that presents nonlinear interactions between the input and the output.The complex nature behind the data generating process of economic variables such as tourism demand, explains the increasing use of ML methods in this area.There is wide evidence in favour of ML methods when compared to time series models for tourism demand forecasting (Akin 2015;Claveria and Torra 2014;Wu et al. 2012, Hong et al. 2011;Chen and Wang 2007;Giordano et al. 2007;Cho 2003;Law 2000 andLaw andAu 1999).Tsaur and Kuo (2011) and Yu and Schwartz (2006) use fuzzy time series models to predict tourism demand.Celotto et al. (2012) and Goh et al. (2008) apply rough sets algorithms.Other authors combine different ML techniques in order to refine forecasts of tourism demand (Cang 2014a;Cang and Yu 2014b;Pai et al. 2014;Shahrabi et al. 2013).Peng et al. (2014) use a meta-analysis to examine the relationships between the accuracy of different forecasting models and the data characteristics in tourism forecasting studies.Athanasopoulos et al. (2011) carry a thorough evaluation of various methods for forecasting tourism data.
In spite of the desirable properties of GPR models, there is only one previous study that uses GPR for tourism demand forecasting (Wu et al. 2012).The authors use a sparse GPR model to predict tourism demand to Hong Kong and find that its forecasting capability outperforms those of the autoregressive moving average (ARMA) and SVR models.We attempt to cover this deficit, and to break new ground by proposing an extension of the GPR model for MIMO modelling, and assessing its forecasting performance.We make use of international tourist arrivals to all seventeen regions of Spain.By incorporating the connections in tourism demand to all regions, we generate forecasts to all markets simultaneously.We finally compare the forecasting performance of the GPR model to that of a multi-layer perceptron (MLP) ANN in a MIMO setting.This strategy is cost-effective in computational terms, and seems particularly indicated for regional forecasting.
Several regional studies have been published in recent years (Lehmann and Wohlrabe 2013), but only a few regarding tourism demand forecasting.Gil-Alana et al. (2008) use different time-series models to models international monthly arrivals in the Canary Islands.Bermúdez et al. (2009) generate prediction intervals for hotel occupancy in three provinces of Spain by means of a multivariate exponential smoothing model.The first attempt to use ML methods for tourism demand forecasting in Spain is that of Palmer et al. (2006), who design a MLP ANN to forecast tourism expenditure in the Balearic Islands.Medeiros et al. (2008) develop an ANN-GARCH model to estimate demand for international tourism also in the Balearic Islands.Claveria et al. (2015) compare the forecasting performance of three ANN architectures to forecast tourist arrivals to Catalonia.SERIEs (2016) 7:341-357 The main aim of this study is to provide researchers with a novel approach for MIMO forecasting, and a method for modelling cross-dependencies.The proposed extension of the GPR model to the MIMO framework allows incorporating the relationships between the different response variables in order to generate a vector of predictions.
The study is organized as follows.The next section presents the proposed extension of the GPR model to the MIMO case.In Sect. 3 we briefly describe the data.Section 4 reports the results of the multiple-step ahead forecasting comparison carried out to test the effectiveness of the model.The last section provides a summary of the theoretical and practical implications, and potential lines for future research.

Gaussian process regression (GPR)
GPR was conceived as a method for multivalued interpolation, and was first developed by Matheron (1973) based on the geostatistic works of Krige (1951).The works of MacKay (2003), Williams and Rasmussen (2006) and Smola and Bartlett (2001) have been crucial in the development of GPR.By expressing the model in a Bayesian framework, different statistical methods can be implemented in GPR models.Therefore GPR applications can be extended beyond spatial interpolation to regression problems, estimating the weights of observed values form temporal lags and spatial points using the known covariance structures.Detailed information about GPR can be found in Williams and Rasmussen (2006).
The GPR model assumes that the inputs x i have a joint multivariate Gaussian distribution characterized by an analytical model of the structure of the covariance matrix (Rasmussen 1996).The key point of the GPR is the possibility of specifying the functional form of the covariance functions, which allows to introduce prior knowledge about the problem into the model.Note that the functional dependency between variables in the covariance function does not need to be a cross product, but can be any function that takes into account the similarity between the input data points and also complies with the properties of a covariance.
An important point in which GPR differs from linear regression, is that the method assumes a probability distribution over the set of functions to be estimated, which allows for determining families of regression functions with specific functional forms.Formally, the training set D = {(x 1 , y 1 ), (x 2 , y 2 ), . . ., (x n , y n )} consists of a set of tuples, and it is assumed to be drawn from the following process: being x i an input vector in an Euclidean space of dimension d, i.e.R d ; and y i the target, which is a scalar output in R 1 .This framework allows to estimate a function from R d → R 1 .For notational convenience, we aggregate the inputs and the outputs into matrix X = [x 1 , x 2 , . . ., x n ] and vector y = [y 1 , y 2 , . . ., y n ] respectively.
The joint distribution of the variables is the conditional Gaussian distribution, which has the following form: where I is the identity matrix, and K (X, X ) the covariance matrix, also referred to as the kernel matrix, with elements K i j (x i , x j ).The kernel function k(x, x ) is a measure of the distance between input vectors.The kernel does not need to be strictly a matrix of cross-products between the input vectors.Kernels may incorporate a distance, or an exponential of a distance.
We try several kernel functions: the rational quadratic covariance function, the gamma exponential covariance function, and a radial basis kernel.We obtain the best performance with a radial basis function, as the other kernels allow for long term interactions between distant points in the input space.See MacKay ( 2003) and Williams and Rasmussen (2006) for a detailed analysis on kernel selection.Kernel functions should reflect the a priori knowledge about the problem at hand.To obtain local forecasts in the space of inputs, we select a covariance matrix that has a component with the shape of a Gaussian so as to model the interactions between nearby points.Specifically, we use an isotropic Gaussian, which is a Gaussian with a covariance proportional to the identity.We additionally introduce a term to account for nonstationarity in the data that corresponds to a dot product between sample points in the covariance matrix.
Therefore, in this study we make use of a radial basis kernel with a linear trend, which assumes a local continuity of the response variable: where υ 2 controls the prior variance, and λ is a parameter that controls the rate of decay of the covariance by determining how far away x i must be from x j for f i to be unrelated to f j .Note that the underlying operation is framed in the field of interpolation.The hyperparameters {υ, λ, γ , κ} are estimated by maximum likelihood in: (4) Given the subscripts of the variables that determine the covariance matrix, f and the predictive outputs f * , by making use of the Bayesian inference, the joint posterior distribution is: where K f, f is the covariance matrix of the training data, K f * , f a matrix that gives the mapping of the kernel on the combinations of test and train inputs, and K f * , f * the kernel matrix of the test inputs.
The output given by the GPR consists of a Gaussian predictive distribution p( f * /y) that is characterized by mean μ and variance .Therefore, the GPR model specification is given by equations: In this research, we use the mean value of the distribution as the predicted value of the GPR.For a given set of inputs X * = x * 1 , x * 2 , . . ., x * n , which optionally could consist of a single observation, we compute the output f * as μ.
In this study we propose an extension of the GRP model for MIMO modelling, basing this extension on an analogy to radial basis functions.In this analogy, each single GPR gives a prediction of the value of each individual predictor, and a multivariate linear regression combines these outputs into a new output vector.That is, we use a set of univariate predictors followed by a matrix product that takes into account the cross-dependencies of the outputs in order to improve the performance of each single GPR.In this case we have a R d → R M mapping, where M is the dimension of the output.This extension is applied by following a two-step training: (i) First, we train and generate supervised forecasts for each time series.That is for each multivariate input, we compute a vector of outputs f * of the trained GPR.(ii) In the second step, we estimate a regularized linear regression (Haykin 2008) from a training set that consists of tuples, D f = {( f 1 , y 1 ), ( f 2 , y 2 ), . . ., ( f n , y n )}.
The coefficients of the matrix corresponding to this regularized linear regression will be denote as W reg .Therefore, the predictions, which we denote as y * , are generated by means of the following expression: This procedure will be referred to as MIMO GPR.

Multi-layer perceptron (MLP) artificial neural network (ANN)
Many different NN models have been developed since the 1980s.The most widely used feed-forward topology in tourism demand forecasting is the MLP network (Liang 2014;Teixeira and Fernandes 2012;Lin et al. 2011;Zhang and Qi 2005).In feedforward networks the information runs only in one direction.MLP networks are supervised neural networks that use a simple perceptron model as a building block.
The topology is based on layers of parallel perceptrons, with a nonlinear function at each perceptron.The specification of a MLP network with an input layer, a hidden layer, and an output layer is defined by: where y t is the output vector of the network at time t; x t−i is the input value at time t − i, where i stands for the number of lags that are used to introduce the context of the actual observation; β j are the weights connecting the output of the neuron j at the hidden layer with the output neuron; w i j stand for the weights of neuron j connecting the input with the hidden layer, and g is the nonlinear function of the neurons in the hidden layer.We denote q as the number of neurons in the hidden layer, which determines the MLP network's capacity to approximate a given function.We use values from 5 to 30 with an increase proportional to the length of the forecasting horizon.
As with the GPR model, we also apply a MIMO approach by estimating a regularized linear regression (Haykin 2008), and generate the vectorial forecasts using the set of regularized coefficients.
The estimation of the parameters is done by cross-validation (Bishop 2006).We divide the database into three sets: training, validation and test.The validation set is used to determine the stopping time for the training and the number of neurons in the hidden layer.The test set is used to estimate the generalization performance of the network, that is the performance on unseen data (Bishop 1995;Ripley 1996).
Once the topology of the model is specified, the estimation of the weights of the networks can be done by means of different algorithms.In this study we use the Levenberg-Marquardt (LM) algorithm.To avoid the possibility that the search for the optimum value of the parameters finishes in a local minimum, we use a multi-starting technique that initializes the NN several times for different initial random values, trains the network and chooses the one with the best result on the validation set.
Based on these considerations, the first 96 monthly observations (from January 1999 to December 2006) are selected as the initial training set, the next sixty (from January 2007 to December 2011) as the validation set, and the last 15 % as the test set.For an iterated multi-step-ahead forecasting comparison the partition between train and test sets is done sequentially: as the prediction advances, past forecasts are successively incorporated to the training database in a recursive way.
3 Dataset Tang and Abosedra (2015), Pérez-Rodríguez et al. (2015) and Chou (2013) have shown the important role of tourism in economic growth.In this study we use data on international tourism demand to all regions of Spain provided by the Spanish Statistical Office (National Statistics Institute-INE-http://www.ine.es).Data include the monthly number of tourist arrivals at a regional level over the time period 1999:01 to 2014:03.Tourism is a key economic sector in Spain.It represents about 11 % of GDP and 12.7 % of total employment (WTTC 2016).Low oil prices and the dollareuro exchange rate have had a positive impact on the inbound tourism.The country registered a record of 64.9 million international tourist arrivals in 2014, which represented a growth of 7.1 % over the previous year.The main source markets are the United Kingdom, France, Germany and Italy.Tourism is highly concentrated in the summer season in most regions.Tourist arrivals are also concentrated in the main regional destinations: Catalonia, the Balearic Islands, the Canary Islands, and Andalusia.
A MIMO approach to regional economic modelling is particularly appropriate when the desired outputs are connected (Claveria et al. 2015).In Fig. 1 we present the frequency distribution of tourist arrivals by region during the sample period.We can see that most tourist arrivals are concentrated in the Mediterranean coast and the islands, being Catalonia, the Balearic Islands and Andalusia the regions that received the higher number of tourist arrivals, which almost amounted to 60 % of total tourist demand.
Table 1 shows a descriptive analysis of the data for the out-of-sample period (2012:01 to 2014:03).The mean of tourist arrivals shows that the main destinations are Catalonia, the Balearic Islands and Andalusia.The Balearic Islands and Catalonia present the highest peaks in demand.Arrivals to the Balearic Islands show the highest dispersion.

Results of the experiment
In a recent and comprehensive comparison on the M3 dataset for the major ML models for time series forecasting, Ahmed et al. (2010) find that MLP ANN and GPR models present the best results.Therefore, to assess the forecasting performance of the proposed extension of the GPR model, we compare it to that of a MLP ANN in a MIMO setting.First, we estimate the models and generate forecasts for different forecast horizons (h = 1, 2, 3 and 6 months).Multiple-step ahead forecasts are generated by means of a rolling scheme.Second, by means of several forecast accuracy measures, we summarize the results for the out-of-sample period.First, we compute the relative mean absolute percentage error (rMAPE) statistic (Table 2), that ponders the MAPE of the model under evaluation against the MAPE of the benchmark model.Next, we run the Diebold and Mariano (DM) test (Diebold and Mariano 1995) using a Newey-West type estimator, and a modified DM (M-DM) test (Harvey et al. 1997) to analyse whether the reductions in MAPE are statistically significant (Table 2).The null hypothesis of the test is that the difference between the two competing series is non-significant.A negative sign of the statistic implies that the MLP ANN model has bigger forecasting errors.
Table 2 shows the overall performance of the compared forecasting models on all regions.There are only two regions in which the ANN model presents lower MAPE SERIEs (2016) 7:341-357 values: the Canary Islands and Madrid, but the differences between both methods are not significant in three out of the four forecasting horizons.
In most regions, the best forecasting performance is obtained with the MIMO GPR model.Nevertheless, we find differences across regions.While in the Balearic Islands, Castilla-Leon, Catalonia, Extremadura, the Basque Country and La Rioja, GPRs outperform the ANNs, this improvement is not significant for all forecast horizons, especially for h = 1 and h = 3.As a result, we find that the improvement in forecast accuracy of the MIMO GPR with respect to the MIMO ANN forecasts becomes more prominent for intermediate forecasting horizons (2 and 3 months).
In general, we obtain the most accurate predictions for longer forecast horizons (h = 6).This can be attributed to the fact that ML methods use nonlinear functions that can account for saturation effects.The structure of these models can be empirically estimated, so the interactions between input variables can be learned from the data.
In Fig. 2 we compare the rMAPE results for one-and three-month ahead forecasts (h = 1 and h = 3).The graph indicates that there are only four regions in which the rMAPE is higher than one for h = 1, that is the ANN outperforms the GPR model for one-month ahead forecasts: the Balearic and the Canary Islands, Madrid and Murcia.Of these four regions, just two (the Canary Islands and Madrid) still obtain a rMAPE higher than one for h = 3.The fact that these two regions do not present seasonal patterns, suggests that GPR are more suitable for seasonal forecasting than ANN models.
Finally, in Table 3 we present the results of the percentage of periods with lower absolute error (PLAE) statistic proposed by Claveria et al. (2015).The PLAE can be regarded as a variation of the 'percent better' measure used to compare the forecast   (Cowles and Jones 1937).In this study we use the MLP ANN model as a benchmark.
The PLAE allows us to compare the performance between two competing models.This accuracy measure consists of a ratio that gives the proportion of periods in which the model under evaluation obtains a lower absolute forecasting error than the benchmark model.Let us denote y t as actual value and ŷt as forecast at period t = 1, . . ., n. Forecast errors can then be defined as e t = y t − ŷt .Given two competing models A and B, where A refers to the forecasting model under evaluation and B stands for benchmark model, we can then obtain the proposed statistic as follows: , where λ t = 1 if e t,A < e t,B , 0 otherwise.
Table 3 shows that the MIMO GPR outperforms the MIMO MLP ANN model in most cases, especially for 2-and 3-month-ahead forecasts.Special mention should be made of the Canary Islands and the Community of Madrid, where the ANN forecasts provide lower PLAE values.This result can be explained by the fact that they are the only regions that do not show seasonal patterns.
To summarize, we find that the overall forecasting performance improves for longer forecast horizons.This evidence confirms previous research by Teräsvirta et al. (2005), who obtain more accurate forecasts with ANN models at long forecast horizons.This result is indicative that ML techniques are particularly suitable for medium and long term forecasting.
Regarding the different methods, we obtain better predictions with the GPR model than with the ANN.This improvement is more generalized for intermediate forecast horizons.Despite being the first study to apply a MIMO approach for GPR forecasting, our results are in line with those obtained by Wu et al. (2012), who find evidence that a sparse GPR model yields better forecasting results than ARMA and SVR models.

Concluding remarks and future work
As more accurate predictions become crucial for effective management and policy planning, new forecasting methods provide room for improvement.Machine learning techniques are playing a pivotal role in the refinement of economic predictions.With this objective, we propose an extension of the Gaussian process regression model for multiple-input multiple-output forecasting.This approach allows modelling the cross-dependencies between a given set of input variables and generating a vector prediction.
The main theoretical contribution of this study to the economic literature is the development of a new approach to improve the forecasting accuracy of computational intelligence techniques based on machine learning.The increasing economic importance of the tourism industry worldwide has led to a growing interest in new approaches to tourism modelling and forecasting.Making use of the interdependencies in international tourism demand to all seventeen regions of Spain, we design a multiple-input multiple-output framework that incorporates the existing cross-correlations in tourist arrivals to all markets, and allows to estimate tourism demand to all destinations simultaneously.
We evaluate the performance of new method by comparing to a standard neural network in an iterative multiple-step-ahead forecasting comparison.We find that the proposed extension of the Gaussian process regression outperforms the benchmark model in most regions, especially for intermediate forecast horizons.
We obtain the best forecasting results for the longest forecast horizons, suggesting the suitability of machine learning techniques for mid and long term forecasting.As a result, our research reveals the suitability of a multiple-output Gaussian process regression model for regional economic forecasting, and highlights the importance of taking into account the connections between different markets when modelling regional variables with machine learning techniques.The assessment of alternative kernel functions on the forecasting accuracy is a question to be addressed in further research.

Table 2
Forecast accuracy: MIMO GPR vs. MIMO ANN modelThe rMAPE ponders the MAPE of the model under evaluation against the MAPE of the benchmark model.We use a MIMO MLP ANN model as a benchmark.The 5 % level critical value for the Diebold-Mariano (DM) loss-differential test statistic for predictive accuracy is 2.028.M-DM refers to the modified DM test statistic

Table 3
dakis et al. (1998)andWitt and Witt (1992)for an appraisal of different forecasting accuracy measures.The PLAE is a dimensionless measure based on the CJ statistic for testing market efficiency