1 Introduction

Tourism is playing an increasingly important role at many levels, and its sector is evolving extraordinarily fast. Thus, the study of tourism, crucial for numerous disciplines, needs to be quickly updated. During the last few years, tourism research is starting to be renovated to keep pace with the ongoing transformations. Nowadays, new data sources and innovative quantitative and qualitative methods offer new possibilities for better analysing and planning tourism (Xu et al., 2020), overcoming many limitations of more conventional approaches.

Although recent tourism research is exploring and taking advantage of new data sources and methods, there is still a long way to walk on innovation. This chapter aims to provide a general review and some guidelines on the potential use of new data and computational methods to enhance tourism’s knowledge base and promote their institutional adoption and, ultimately, more sustainable tourism.

The chapter is articulated around three topics proposed by Barranco et al. and included in a publication of the Joint Research Centre that aimed at collecting the upcoming research needs in terms of policy questions (Bertoni et al., 2022). The first one measures the environmental impacts of tourism. Tourism is, directly and indirectly, consuming an increasing amount of global resources, including fossil fuel consumption with the associated CO2 emissions, freshwater, land and food use (Gössling & Peeters, 2015). Therefore, assessing the impact of global tourism activity is one of the most relevant potential applications of new data sources and computational methods.

The second topic is assessing the socio-economic resilience of the tourism sector. Tourism economic weight and social impact have become even more evident in the context of the COVID-19 pandemic: the crisis has put between 100 and 120 million direct tourism jobs at risk, many of them in small- and medium-sized enterprises, according to the UNWTO (2021). Hence, it is relevant and urgent to explore how new data sources, analyses and models can contribute to planning a more resilient and balanced tourism sector in socio-economic terms.

Finally, the third topic is uncovering new tourists’ preferences, facilitating the digital transition and fostering innovation in the tourism sector. How can we better analyse new tourist patterns? COVID-19 may have accelerated some existing changes in tourism trends, so there is an urgent need for quick analyses and predictions for the very near future, as the emergence of nowcasting techniques evidences it.

2 Existing Literature

Over the past years, new data sources and innovative computational methods emerged to significantly improve our understanding of tourism. A summary is provided next.

2.1 New Data Sources of Potential Interest for Tourism

Tourism is being transformed at an accelerated pace, and conventional data sources often do not reflect ongoing changes with enough velocity or spatiotemporal resolution to support the urgent studies to be carried out. In this scenario, new data sources emerge as the raw material to open further explorations in tourism.

New datasets can be grouped into different categories according to data sources. Next, a listing of relevant new datasets is provided, classified according to the nature of the data source and its potential interest for tourism studies, offering some specific examples.

First, we must point out big data from specific sources of the tourism sector, such as smart tourism cards or information systems in destinations. These sources provide data directly recorded in tourism points of interest, valid to monitor existing activity and analyse current or past trends. In this category, we can also include other data sources such as booking data from transportation companies (especially flight booking data from airline operators), which can help predict tourism activity quickly, feeding nowcasting models. Additionally, we can highlight online accommodation companies and apps, such as TripadvisorFootnote 1 or Booking,Footnote 2 or new peer-to-peer accommodation online services such as AirbnbFootnote 3 (Calle Lamelas, 2017). These sources can be helpful not only for anticipating tourism demand but also because of the additional information collected from users, such as opinions, ratings, comments, etc. In addition, data with high spatiotemporal resolution allows us to analyse emergent spatial patterns, for instance, in the location of Airbnb accommodation in heritage cities (Gutiérrez et al., 2017).

Second, it is remarkable the potential use of GPS datasets. GPS data was actually ranked the top of big data in tourism research (accounting for 21%) and the first of device data (58%) according to the classification provided by Li et al. (2018). It is essential to use GPS tracks to study tourists’ routes, with an unprecedented level of detail, thanks to the high spatiotemporal resolution of GPS records. In this group, we include the GPS routes recorded by vehicle navigation apps, such as TomTom,Footnote 4 WazeFootnote 5 or Google Maps,Footnote 6 and tracking apps such as WikilocFootnote 7 or Strava,Footnote 8 very useful when analysing tourism in natural areas, for instance (Barros et al., 2019), or GPS data collected through the emerging tourist mapping apps (Brilhante et al., 2013; Gupta & Dogra, 2017).

Third, it is also outstanding the interest of user-generated content (UGC), especially datasets obtained from social networks such as Twitter; photo-sharing social networks such as Instagram,Footnote 9 FlickrFootnote 10 or PanoramioFootnote 11; or apps focused on the location of points of interest, such as Foursquare.Footnote 12 UGC allows us to explore different tourism dynamics. Semantic analysis of online textual data, such as tweets or travelling blog content, can uncover tourism preferences and trends (Ramanathan & Meyyappan, 2019). Spatial or temporal analyses can also be carried out because most users share data through mobile apps that register GPS coordinates. For instance, Flickr data can be the basis for different temporal analyses, such as estimating tourism demand over a day according to time slots or measuring tourism seasonality in national parks (Barros et al., 2019); also Twitter and Foursquare data can support spatial analyses, such as the identification of multifunction or specialised tourist spaces in cities (Salas-Olmedo et al., 2018) (Fig. 19.1).

Fig. 19.1
figure 1

Location of hotel and Airbnb offers (a) and density of photographs taken by tourists and residents (b) in Barcelona. Source: Gutiérrez et al. (2017)

Fourth, search engines’ data constitute a precious data source, such as Google Trends records. Considering that search engines are a leading tool in planning vacations (Dergiades et al., 2018), these datasets provide information on tourists’ interests and plans in advance and can feed models oriented to forecasting tourist arrivals (Havranek & Zeynalov, 2021).

Fifth, we must highlight the interest of datasets obtained from diverse information and communication technologies/devices. The rapid development of the Internet of Things (IoT) provides an increasing amount of Bluetooth data, RFID data and Wi-Fi data (Shoval & Ahas, 2016), which can be helpful to measure, for instance, tourist presence and consumer behaviour over time. Also, in this group, we must emphasise mobile phone data due to its potential use at different scales and for various purposes. The COVID-19 pandemic has accelerated the adoption of mobile phone data to monitor changes in tourism or general mobility trends with a high level of spatiotemporal resolution (Romanillos et al., 2021). This analysis may be extended beyond national borders. Nowadays, roaming services have become crucial for tourists, and roaming data allows us to track tourists globally. Lastly, credit card datasets should also be included here, given their potential for tourist consumption and behaviour analyses.

Finally, more conventional data sources can also provide “new” datasets and opportunities, due to improvements in the quality of data or the way data is shared, in real time, through mobile apps and online services. For example, it is the case of meteorological data. Given that weather is an essential factor in tourism demand, incorporating meteorological variables in tourism forecasting models can increase the predictability of tourist arrivals (Álvarez-Díaz & Rosselló-Nadal, 2010).

2.2 New Computational Methods with Application to Tourism Studies

In recent years, increased computational capacity, part of the big data revolution, has allowed for faster and cheaper analysis of massive databases by using new analytic tools, such as artificial intelligence (AI) or machine learning (ML). Nowadays, tourism analysts may also access an enormous collection of methods for their studies (some are comprehensible; others are like “black boxes”). This section gives a brief and non-exhaustive list of computational methods used in tourism studies, applications and examples.

Unsupervised techniques can identify groups and relationships by analysing explanatory variables themselves: no already known responses exist. Outcomes must be validated – are they logic? – tagged and hypothesised. In tourism, clustering techniques were used for detecting the spatial patterns of new touristic accommodations (Carpio-Pinedo & Gutiérrez, 2020) or exploring topics of online tourists’ reviews (Guo et al., 2017), factor analysis for uncovering latent motivational and satisfaction variables in tourist (Kau & Lim, 2005) and association rules mining/learning for discovering the most frequent and strong sets of visited places with Bluetooth data (Versichele et al., 2014).

Supervised techniques provide models to explain/predict responses. They need complete observations: explained (response) and explanatory variables. Outcomes must be compared to observed datasets. Some models investigate causalities and hypothetical “what-if” scenarios (key results are model’s parameters): linear regressions for inferring causes on tourism industry employment and retention (Chen et al., 2021) or structural equation models (SEM) for modelling the quality of life in a tourist island (Ridderstaat et al., 2016). Other models, especially AI-based techniques, anticipate responses or classify observations (key results are responses): autoregressive moving average (ARMAX) time series models for forecasting weekly hotel occupation with online search engine queries and weather data (Pan & Yang, 2017) or artificial neural networks (ANN) for predicting tourist expenditures (Palmer et al., 2006).

Some datasets need to be treated before applying the above methods, especially for reusing datasets from other studies or online sources. Observations must be regrouped into another spatial or temporal unit. While aggregating is a straightforward procedure, disaggregating data needs the use of other techniques; see estimating visitor data from regional to municipality scope (Batista e Silva et al., 2018).

Finally, data and models’ outcomes need to be presented and stand out to the target public. They can be shown using innovative designs (word clouds, cartograms, etc.), such as the United Nations World Tourism Organization (UNWTO) tourism data dashboard (UNWTO, n.d.). Part of them should be used on digital social networks or in other analysis processes.

3 Guidelines

This section proposes some guidelines and potential applications of the described new data sources and computational methods to the three main topics mentioned in the introduction.

3.1 Assessing the Environmental Impacts of Tourism

To facilitate the green transition in the tourism sector, we need a concrete EU roadmap with a solid framework and measurable objectives. Working with key performance indicators (KPIs) can help guide and commit the tourism industry and destinations. This section aims to propose a set of KPIs related to central topics regarding the environmental impact of tourism, focusing on new data sources and computational methods.

The first topic concerns tourism mobility. Sustainable tourism should be linked to a concept of sustainable mobility, so we propose a set of KPIs that can reveal to what extent we are advancing in the transition to a more sustainable model (Table 19.1).

Table 19.1 KP Is for tourism mobility
Table 19.2 KPIs for tourism land consumption

The second topic is tourism land consumption. As a consequence of the growth of tourism activity, land in tourist destinations is progressively occupied and degraded. Essential variables in this degradation process are land occupation, land fragmentation and changes in land-use patterns. We propose a set of KPIs that can improve the monitorisation of these variables, with the help of new data sources and methods (Table 19.2).

Finally, the third topic is tourism resources consumption and management. The increasing number of tourists leads to dramatic growth in the consumption of local resources, often leading to unsustainable scenarios. Next, a set of KPIs is proposed to help evaluate tourism resources consumption with the support of new data sources and methods (Table 19.3).

Table 19.3 KPIs for tourism resources consumption and management

3.2 Socio-Economic Resilience in the Tourism Sector

Tourism is an important sector in the EU economy. EU’s tourists spent about $400 billion on trips across Europe before COVID-19 (Eurostat, 2021b). In 2016, tourism was 10% of the EU’s GDP, and it employed 10% of workers in 3.2 million tourism-related enterprises (Eurostat, 2018). However, the tourism sector has high levels of temporal contracts and low retention rates (25%), women employment (~60%), younger workers 15–24 years old (~20%), lower educated workers (~20%) or foreign workers (~1/6) compared to other sectors.

The following KPIs can help key stakeholders assess their tourist offers and benchmark with competitors. These indicators could identify socio-economical relationships, vulnerabilities and weaknesses, undeveloped attractions and upcoming opportunities to make a more resilient sector. KPIs’ spatiotemporal dimensions are essential, especially for regions characterised by stationarity. These KPIs should be calculated for several periods, for the whole touristic population in a location (descriptive) or the whole/specific touristic population in competitors (comparison).

This first group of KPIs points out socio-economic impacts of tourism in a region that can be used for comparing them with other industries and competitors (Table 19.4). Some of these KPIs measure tourism impacts directly, but others estimate effects through related activities.

Table 19.4 KPIs for socio-economic impact of tourism in a region

The second set of KPIs concerns assessing tourist models’ diversity for detecting excess dependencies on a few attractions and tourist profiles and their stationarity (Table 19.5). Less diverse territories might be very vulnerable to changes in the tourism demand, wildly unexpected events or incompatible weather, among other cases.

Table 19.5 KPIs for assessing tourism diversity

3.3 Uncovering New Tourists’ Preferences, Digital Transition and Innovation in the Tourism Sector

New information technologies have revolutionised the tourism sector too. This section introduces how new technologies can be used to detect tourists’ preferences and better manage touristic businesses and locations.

3.4 Analysis of Preference Changes in the Tourism Sector

Businesses may use tourist demand data (accommodation booking, car renting) and users’ responses (comments or reviews on products or services) to comprehend the needs of (new) customers to develop and/or to update their products and services and to improve their customer care. While the former may reveal tourist preferences based on their choices, the latter may also highlight some declared unsatisfied ones. Analyses of preference changes need benchmarking approaches; competitor performances provide insight into the strengths and weaknesses of the study location/business. Nevertheless, how may new data and methods aid in the detection of preferences and their changes? Some guidelines are provided next.

3.4.1 Searching for Holidays and Activities

Many trips or touristic activities begin with an online search. Potential tourists use either general online search engines or specific touristic planner services. Consequently, data on preferences may be extracted by using autocompletion to suggest current trending complete search queries, or using some services like Google Trends for a similar end to observe variations over time. These tools can use queries from specific countries to help segment tourist preferences per origin while planning their holidays. Search query data has been used in many academic studies; Dinis et al. (2019) gathered and summarised some of them into the following topic categories: forecasting, nowcasting, identifying interests and preferences, understanding relationships with official data and others.

3.4.2 Text Is a Mine

People use words to communicate, and they can publicly share their opinions, recommendations, suggestions and complaints towards touristic attractions in interactive platforms. An analyst can use text mining techniques, such as natural language processing (NLP), to extract the sense of messages (including emojis) and undertake sentiment analyses (converting text into Likert scale values). However, this data may contain brief messages, with abbreviations, because of character restrictions. They must be translated into expanded statements. Also, fake/compulsive users should be dropped to avoid biases. Finally, text mining techniques have difficulties detecting ironic tones.

3.4.3 What a Beautiful Picture!

Some tourists also upload their pictures and videos on digital social networks. Unlike texts, images need to be described before automating processes to extract comprehensible data. Simple methods can summarise colours in pictures (they can explain weather conditions or infer day periods). More advanced ones, available in cloud computing services, can also identify locations, buildings and objects. Thus, pictures transformed into texts and previously mentioned text mining techniques can help determine preferences. In addition, images can include description text and comments that can be used to uncover revealed preferences. Finally, images’ metadata include when and, sometimes, where they were taken. This data can be used for determining spatial preferences of what to take a picture of and from where (viewpoints).

3.4.4 Life Is Change

Tourists’ preferences can evolve for many reasons (getting older, having children or new job positions or contextual reasons, among others). To detect these changes, it is required to have previous preferences to compare with the new ones and see significant changes. The above-mentioned methods can continuously process data, get further insights or update continuous datasets.

Notice that many tips and ideas use similar strategies for calculating the KPIs introduced in the previous section. Thus, they may also be reinterpreted to help analyse preference changes or warn regarding new successful tourist strategies in competitive locations by KPIs’ variations.

3.5 Digital Transition and Innovation

We have seen that using new data sources and computational methods can improve our understanding of tourism dynamics and help plan and develop better tourism policies. However, institutions and companies still have a long way to go to use all these new resources. To accelerate what’s been called the digital transition and foster innovation, we address several relevant questions in this section.

3.5.1 What Are the Main Challenges for Increasing Digitalisation and Innovation in the Tourism Sector? How Can Existing Difficulties Be Overcome?

Small and medium enterprises (SMEs) constitute the majority (around 90%) of Europe’s tourism enterprises (UNWTO, 2020). These kinds of enterprises often do not keep pace regarding technological advances, and are behind large companies regarding the digital transition. Furthermore, it has been estimated that up to 25% of jobs in tourism need upskilling.

To maintain the competitiveness of the European travel destinations and satisfy the emerging interests of the travellers towards sustainable travelling options, we need to support the digital transition. Therefore, it is urgent to digitalise services and close the existing skills gap.

The private sector essentially provides this support, with most SMEs relying on a few private tech companies. Public institutions should provide similar platforms or foster new public-private partnerships (PPPs) to increase the accessibility to new technologies and facilitate the upskilling process.

3.5.2 What Are the Main Difficulties in Collecting New Data? What Strategies Towards Effective Data Collection Should Be Put in Place?

New datasets essentially come from digital data sources. Fostering digitalisation is, therefore, the first step in the way of increasing the collection of data. However, as previously mentioned, digitalisation is mainly led by a few private big tech companies. Consequently, most of the new datasets come from these companies. Two actions could be necessary, then: first, to foster new or better deals and partnerships with them as data providers and, second, to avoid an excessive dependence on big tech companies by developing public digital/online platforms and services for SMEs, where the whole ecosystem (companies, institutions and users/tourists) shares data.

3.5.3 How to Measure Innovation, Digital Transition and Digital Skills Needs in the Tourism Ecosystem?

Some indicators can reflect the advance in the digital transition or tourism. For example, quantifying the (1) number of public-private partnerships and the (2) budget allocated to these PPPs could be necessary, given the importance of big tech companies in the digital sphere. In addition, when providing license to new digital services, some authorities are pushing agreements in terms of data sharing, so that companies (in the fields of mobility, waste management, energy, etc.) have to make datasets public, which could be helpful for the mentioned analyses and models. Quantifying the (3) number of agreements on data sharing would then be another essential indicator.

3.5.4 How to Motivate and Monitor High-Quality Data Collection by the EU Member States?

The Member States must be aware of the usefulness of new data sources and computational methods. All campaigns and initiatives launched to incentivise/facilitate data collection should be supported by services provided in exchange. We need to strengthen the link between sharing data and getting benefits in better analyses and services. It could be a good strategy for incentivising bottom-up data collection initiatives, from users to companies, institutions and, eventually, Member States.

Monitoring the advances in data collection by the EU Member States is crucial and should be coordinated. Initiatives such as the Tourism Satellite Account (TSA)Footnote 13 are essential. As previously mentioned, this reflects the almost absence of indicators calculated based on new data sources, in the reports provided by the Member States. However, annual reports should be replaced by constantly open and updated online platforms that could also inform not only about results but also about Member States’ progress, strategies, initiatives or agreements, regarding the digital transition.

Although recent tourism research is exploring and taking advantage of new data sources and methods, there is still a long way to walk on innovation in institutions at the level of the European Union and national, regional or municipal levels. This fact is evidenced in the Tourism Satellite Account (TSA) 2019 (Eurostat, 2019) Annex II. All countries indicate the most relevant data sources used to calculate the related indicators for each TSA table. Annex II shows the almost absence of nontraditional or new data sources, such as “mobile positioning data” or “other Big Data sources”.

4 The Way Forward

This chapter briefly discusses the potential of new data and computational methods to help stakeholders better understand and plan tourism.

The above KPIs might be measured almost everywhere in Europe and other regions of the world, in a wide range of periods and spatial scales, since they can be fed with similar data. If data sources are different, data must be reformatted to a common structure in comparative studies. Therefore, due to data’s total/partial interoperability, KPIs can be measured for several locations or industries, including competitors, and undertake comparative studies.

Data, methods and KPIs proposed in this chapter have some limitations. They do not cover all the analyses needed regarding the complex tourism sector. Therefore, other traditional measurement techniques and data sources (surveys) are still required and used complementarily. Moreover, new techniques can create new problems. Some potential issues are:

  • Dependency on the digital footprint. A significant number of tourists or tourist attractions may leave no or only a few digital footprints. The digital divide and digital infrastructure supply in the analysed locations must be considered while measuring and understanding KPIs’ results. Traditional studies must help determine which tourism segments might be well represented by digital data and help fix biases.

  • User’s privacy. It must be guaranteed while estimating tourism KPIs and developing new products and strategies (Hall & Ram, 2020). KPIs should reveal the necessary information on a given topic without interfering with people’s personal life and without explicitly fostering changes in their preferences. People can feel insecure if they feel constantly and unconsciously watched or worried that their data may be used against them. However, people may also agree to donate data to feed purely anonymous databases for good by an explicit agreement and fair and transparent methods.

  • Data availability/ownership. Private companies own valuable data for analysing tourism. They usually restrict access to data since it might be a big part of their business. However, some have already released datasets worldwide or in events like hackathons and datathons. These actions have created new businesses/products and brought new insights into the tourism processes. It is necessary to explore win-win strategies among businesses and the public sector to access relevant data while keeping privacy and industrial know-how permanently (Robin et al., 2016).

Finally, the above KPIs are just values. Although some of those values seem to be easily interpretable (higher values are better than lower ones in some KPIs), they usually need some comparative or normative framework. These ranges must also be defined.