Abstract
The tourism sector is one of the sectors that has undergone most changes in recent years due to digital transformation. One of the pillars of this transformation is the management of organizations based on data-driven decision making. The raw material for these data-driven strategies is, of course, the sources of information used, which have changed and grown significantly in recent years. This article attempts to provide a conceptual architecture for a modern data platform that effectively manages and analyses these information sources and facilitates data-driven decision-making in tourism organizations.
You have full access to this open access chapter, Download conference paper PDF
Keywords
- Tourism destinations
- Smart destinations
- Data-driven organizations
- Tourism digitalization
- Tourism data-platform
1 Introduction
Data-driven decision making is an area of crucial importance in the digital transformation in which many organizations are immersed. These decision-making methodologies allow organizations to be truly market-oriented, enabling them to focus on customers in order to build customer loyalty in a more cost-effective way (Moreno et al., 2019). Of course, the tourism sector is no exception to this situation and is undergoing a major transformation in which this type of management will be predominant (Camilleri, 2020).
The raw material for these data-driven strategies is data and its correct management, storage and use within an organization is crucial. The aim of this study is to propose a conceptual architecture for a data platform for an organization in the tourism sector that helps companies to better manage and use data in order to implement data-driven strategies. There are studies in the literature that propose a conceptual architecture for market-oriented organizations (Moreno et al., 2019) in our case this architecture focuses on the type of data and analysis requirements of organizations in the tourism sector. In the field of tourism itself we find studies that propose data architecture of different types and scope in terms of the variety of the type of data used (Navarro & Rubio, 2000; Abdulaziz et al, 2015; Bustamante et al, 2020). In our study we extend this scope to the entire spectrum of possible types of data in the tourism sector. Due to the digitization of products and services, the types of data handled in the tourism sector have grown significantly in variety and quantity. A modern architecture such as the one we propose, in line with current standards in data platforms, is required for their management.
2 Conceptual Architecture for a Tourism Organization Data Platform
As mentioned above, the objective of this study is to propose a conceptual architecture for the data platform of an organization in the tourism sector. A good starting point for this architecture is the one proposed by Moreno et al (2019) for a market-oriented organization. This architecture proposes several layers:
-
Data Sources
-
Data Management for analytics
-
Analytical techniques and Business Intelligence
-
Business insights
In our approach we will start with an initial analysis of the types of data sources currently existing in the tourism sector and, based on these types, the following layers of information management and analysis will be proposed, reaching a final layer in which applications are proposed for the most common analysis needs of the tourism sector. In the following sections, the characteristics of each layer are detailed and, finally, the complete conceptual architecture is proposed.
2.1 Data Sources
Data sources are the raw material on which all analysis that will enable data-driven decisions to be made is based. There is a wide variety of types of data sources. The different types of data sources should be well identified as their typology will determine how they are stored, managed and analysed. In the tourism sector, a very interesting classification is made by Li et al. (2018), in which three large blocks are defined as can be seen in Fig. 1:
-
Operations: data from transactions or operations such as hotel bookings, payments, flights, transport (flights, cruises, rail transport), website visits, etc.
-
Devices: data coming from devices: mainly mobile data, but also IoT (Internet of Things) data from sensors or other devices.
-
Users: data generated by tourists themselves: comments on social networks, online booking platforms, search engines, virtual communities, co-creation of tourist experiences, etc. This type of data is often referred to as User Generated Content (UGC).
There are interesting application cases in the literature for each of these types of data sources; if we focus on operations, we see applications in payment data (Ramos & Murta, 2022) or air flights (Gallego & Font, 2020). In terms of data from devices, we find applications in mobile data (Zaragozi et al., 2021), generally with georeferenced information, as well as data from IoT (Cha et al., 2017). In the third block concerning UGC-type data we also find applications of social network data (Gunter et al., 2019) or data from online booking platforms (Liu et al., 2021; Van der Zee & Bertocchi, 2018). In this last block we have a type of information that is worth highlighting, namely data from emerging co-creation models (Mohammadi et al, 2020). This data is data collected on travel booking platforms that allows consumers to co-design their own travel experiences. It should be noted that while in the first block (operations) the data are structured (standard format and well-defined structure), in the other two blocks (devices and users) we can find semi-structured or unstructured data, which should be taken into account in their management and storage. The last two blocks are those that have experienced the greatest growth in the last decade and those that require the greatest need for real-time management (Ranganathan et al., 2020).
In addition to the three blocks of data mentioned above, we can add data generated by public institutions. Here we find statistical data generated by specialized institutions or open data provided by local or state administrations or international organizations. For example, information on the level of occupancy of a destination, origin of tourists, expenditure and others. This type of open data has grown significantly in the last decade and is used in multiple applications (Bratucu & Cismaru, 2015). It is shared information that democratizes access to data for all public and private agents in the sector, as pointed out by Celdran-Bernabeu et al. (2018). In the tourism sector, different tourism intelligence systems have been developed in the last decade (Gajdosik, 2019), some at state administration level and others at local level, which collect and generate information of great interest.
2.2 Data Management for Analytics
These different types of data sources must be stored and managed in order to apply analytical techniques to obtain insights. For this data management and storage layer we propose to use a Data Lakehouse architecture. This architecture was introduced by Armbrust et al. (2021) and is an architecture that combines the transactions and data governance of enterprise data warehouses with the flexibility and cost-efficiency of data lakes to enable business intelligence and machine learning. This architecture is currently being adopted by many companies and we believe that the flexibility it provides is appropriate for the diversity of data sources we have identified in the previous section. A Data Lakehouse is the natural evolution of the Data Warehouse and Data Lake (Harby & Zulkernine, 2022). Data Warehouses have been widely used in the tourism sector (Navarro & Rubio, 2000; Abdulaziz et al., 2015) and more recently so have Data Lakes (Sankaranarayanan & Lalchandani, 2017; Raju et al., 2018). One of the characteristics of a Data Lakehouse architecture is scalability, which is very useful given the significant growth rate of UGC or IoT sources in the tourism sector. On the other hand, the flexibility of this type of architectures is suitable for storing structured, semi-structured or unstructured data sources, which are typologies of source structure identified in the previous section. The data sources managed in the Data Lakehouse will pass through different storage areas. These areas are differentiated by the degree of elaboration of the data and there will be areas with raw information and areas with highly elaborated information, which will facilitate different types of data analysis. The processes that ingest the information into the Data Lakehouse from the original sources and that carry out the treatment of the different data areas are the ETL (Extract, Load and Transform) processes. These processes must support batch data ingestion processes with the periodicity defined (daily, weekly, monthly) and others closer to real-time. This will depend on the type of source we are working with, for example, UGC data has a very high generation speed and will require ingestion close to real-time, while if we are working with open data information published by an organization, this information will have a specific publication frequency, for example, monthly, and will be ingested in a monthly batch process. In terms of infrastructure, this type of architecture can be implemented in the company's own servers or in a cloud infrastructure. We consider a cloud infrastructure to be appropriate in our case, as it allows companies to better adapt to market changes and therefore to the data to be managed, as well as to improve cost efficiency.
Finally, it should be noted that the storage and management of data in an organization must follow the rules, policies and processes defined at the Data Governance level. A Data Lakehouse type architecture will facilitate Data Governance tasks. Moreover, these governance processes will facilitate the cataloguing of data and its sharing with third parties where necessary, using a semantic model as standard as possible with that used in the tourism sector. This feature may be of relevance if a company wants to integrate into the Gaia-X digital ecosystem (Gaia-X; Braud et al, 2021). This European initiative proposes an open and secure data infrastructure, complying with the highest standards of digital sovereignty while promoting innovation that can be of enormous interest to a company in the tourism sector. Thus, we consider that a management and storage architecture such as the one proposed allows a company to be prepared to integrate into the Gaia-X ecosystem in the future. In terms of good practices, data standards and interoperability, consideration should also be given to the Tourism Data Space project (Tourism Data Spaces), which proposes a data marketplace for sharing and accessing data at European level. Similarly, we have the European Data Spaces for Tourism project (DATES) that focuses on the development of governance and business models, while providing a shared roadmap that will ensure the coordination of the tourism ecosystem stakeholders. Finally, another interesting reference to consider is the EU guide on data for tourist destinations (Smart Tourism Destination).
2.3 Analytical Techniques and Business Intelligence
The Data Lakehouse architecture outlined in the previous section allows the use of different types of analytical techniques from Business Intelligence (BI) to Machine Learning, each technique will use the most appropriate data areas of the Data Lakehouse depending on whether it requires raw information or more elaborated information. Within the wide range of possible data analysis techniques that can be applied in the tourism sector, the following are the most commonly used. There are multiple BI use cases in the tourism sector, such as the BI architecture proposed by Bustamante et al. (2020), which integrates information from four collaborative sources (Twitter, Openstreetmap, Tripadvisor and Airbnb) and is an example of an architecture focused only on BI and certain sources, but similar to the one proposed in this article. Complementary to classic BI we have the techniques of Data Discovery and self-service BI that give greater freedom when exploring the data. When tourist behaviour is analysed, another widely used analytical technique is clustering (Rodríguez et al., 2018). As mentioned in Sect. 2.1, an important block of data are those coming from mobile devices generally with geo-referenced information that allows the application of geospatial analytics techniques (Yang et al., 2012), as well as the block related to UGC type data are becoming increasingly important and are data in which techniques are applied to analyse texts such as Natural Language Processing techniques (Guerrero-Rodríguez et al., 2023). More advanced analytical techniques such as Machine Learning (Peng et al., 2020) or Deep Learning (Essien & Chukwukelu, 2022) are increasingly used in the tourism sector. In the field of Machine Learning techniques, one of the most widely used in the tourism sector is recommender systems (Esmaeili et al., 2020). Finally, it is worth mentioning the recent applications of Generative AI techniques to the tourism sector, in particular ChatGPT (Carvalho & Ivanov, 2023).
2.4 Business Insights
The final objective of the entire data cycle carried out in the previous sections is to obtain relevant insights in the different use cases of the tourism sector. The knowledge obtained has multiple uses in the tourism sector, we highlight the most frequent cases. Lv et al. (2021) differentiate two main levels of business insights: individual level (consumer behaviour and attitude) and organizational level (marketing management and performance analysis of tourism organizations). Using this division, we first find that there are many studies that put tourists at the center and analyse their behaviour (Miah et al., 2017), their perception (Nave et al., 2018) and their satisfaction (Li et al., 2020). Similarly, with regard to tourism supply, other areas of research are the personalization or recommendation (Esmaeili et al., 2020) of products and services and the co-creation of experiences (Mohammadi et al., 2020). All these use cases try to cover the different phases of the travel lifecycle (before, during and after) and mostly use UGC type data. On the other hand, at the organizational level, we find use cases at the level of tourism destination management such as demand forecasting (Li & Jiao, 2020), planning and development, value proposition, resource management, sustainability management (De Marchi et al. 2022) or reputation analysis (Cillo et al., 2019), as well as multiple use cases in the field of tourism companies such as marketing management or performance analysis (Bi et al., 2018) and pricing (Sánchez-Lozano et al., 2021) of products and services.
2.5 Conceptual Architecture of the Data Platform
Once the different layers of the data platform have been defined, Fig. 2 shows the complete conceptual architecture:
As a practical example, with a similar scope to the proposed data platform, we have the case of the Destination Data Platform within smart tourism ecosystem of the city of Gothenburg (Jansson et al, 2022).
3 Conclusions
In this study we have proposed a conceptual architecture for a data-driven data platform of a tourism organization. We believe that this architecture can help tourism organizations in their digital transformation and in making data-driven decisions. The proposed architecture is based on modern and flexible architectures and facilitates the management, storage and governance of data, taking into account the variety and growth of data types that currently exist in the tourism sector. In addition, this architecture also enables organizations to be prepared to integrate into data ecosystems such as those proposed in the Gaia-X initiative, this will enable both the integration of data from the ecosystem into the organization and the sharing of the organization's own data in the ecosystem in a simple and governed way. Finally, it should be made clear that the proposed architecture is an ambitious one and probably not within the reach of all players in the tourism value chain. Large companies in the hotel or transport sector or public institutions, for example, may be able to tackle this type of architecture, but it may be beyond the reach of other smaller companies in the restaurant and leisure sector, for example. The latter must approach their work with data in a different way. In terms of analytical techniques, these companies should consider Business Intelligence and those advanced analytical techniques that apply to them. In terms of data management, in order to avoid having to generate and maintain a costly architecture, these smaller companies must connect to data platforms generated by public institutions that offer a lot of information already managed, organized and with open data access. Many tourist destinations have tourism intelligence systems or smart destination platforms that generate a lot of useful information for all agents in the tourism value chain, regardless of the size of the company.
References
Abdulaziz, T. A., Moawad, I. F., & Abu-Alam, W. M. (2015). Building data warehouse system for the tourism sector. In 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS) (pp. 410–417). https://doi.org/10.1109/IntelCIS.2015.7397253
Armbrust, M., Ghodsi, A., Xin, R., & Zaharia, M. (2021). Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In Proceedings of CIDR (Vol. 8).
Bi, J. W., Liu, Y., Fan, Z. P., & Zhang, J. (2018). Wisdom of crowds: Conducting importance-performance analysis (IPA) through online reviews. Tourism Management, 70, 460–478. https://doi.org/10.1016/j.tourman.2018.09.010
Bratucu, G., & Cismaru, L. (2015). Developing a business intelligence planning tool for managing ecotourism destinations based on indicators existing at EU level. In International Multidisciplinary Scientific Geo Conference-SGEM, (pp. 181–188). https://doi.org/10.5593/SGEM2015/B53/S21.023
Braud, A., Fromentoux, G., Radier, B., & Le Grand, O. (2021). The road to European digital sovereignty with GAIA-X and IDSA. IEEE Network, 35(2), 4–5. https://doi.org/10.1109/MNET.2021.9387709
Bustamante, A., Sebastia, L., & Onaindia, E. (2020). BITOUR: A business intelligence platform for tourism analysis. ISPRS International Journal of Geo-information, 9(11). https://doi.org/10.3390/ijgi9110671
Camilleri, M.A. (2020). The use of data-driven technologies in tourism marketing. In Entrepreneurship, innovation and inequality: exploring territorial dynamics and development (pp. 182–194). https://doi.org/10.4324/9780429292583-11
Carvalho, I., & Ivanov, S. (2023). ChatGPT for tourism: Applications, benefits and risks. Tourism Review. https://doi.org/10.1108/TR-02-2023-0088
Celdran-Bernabeu, M. A., Mazon, J. N., & Sanchez, D. G. (2018). Open Data and tourism. Implications for tourism management in Smart Cities and Smart Tourism Destinations. Investigaciones Turísticas, 15, 49–78. https://doi.org/10.14198/INTURI2018.15.03
Cha, S., Ruiz, M. P., Wachowicz, M., Tran, L. H., Cao, H., & Maduako, I. (2017). The role of an IoT platform in the design of real-time recommender systems. In IEEE 3RD World Forum on Internet of Things (WF-IOT), (pp. 448–453). https://doi.org/10.1109/WF-IoT.2016.7845469
Cillo, V., Rialti, R., Del Giudice, M., & Usai, A (2019). Niche tourism destinations’ online reputation management and competitiveness in big data era: Evidence from three Italian cases. Current Issues in Tourism, 24(2), 177–191. https://doi.org/10.1080/13683500.2019.1608918
DATES (last consulted 2023, Sept). https://www.tourismdataspace-csa.eu/
De Marchi, D., Becarelli, R., & Di Sarli, L. (2022). Tourism sustainability index: Measuring tourism sustainability based on the ETIS toolkit, by exploring tourist satisfaction via sentiment analysis. Sustainability, 14(13). https://doi.org/10.3390/su14138049
Esmaeili, L., Mardani, S., Golpayegani, S. A. H., & Madar, Z. Z. (2020). A novel tourism recommender system in the context of social commerce. Experts Systems with Applications, 149. https://doi.org/10.1016/j.eswa.2020.113301
Essien, A., & Chukwukelu, G. (2022). Deep learning in hospitality and tourism: A research framework agenda for future research. International Journal of Contemporary Hospitality Management, 34(12), 4480–4515. https://doi.org/10.1108/IJCHM-09-2021-1176
Gaia-X: Gaia-X Hub (last consulted 2023, May). https://www.gaiax.es/
Gajdosik, T. (2019). Towards a conceptual model of intelligent information system for smart tourism destinations. Software Engineering and Algorithms in Intelligent Systems, 763, 66–74. https://doi.org/10.1007/978-3-319-91186-1_8
Gallego, I., & Font, X. (2020). Changes in air passenger demand as a result of the COVID-19 crisis: Using Big Data to inform tourism policy. Journal of Sustainable Tourism, 29(9), 1470–1489. https://doi.org/10.1080/09669582.2020.1773476
Guerrero-Rodríguez, R., Álvarez-Carmona, M. A., Aranda, R., & López-Monroy, A. P. (2023). Studying Online Travel Reviews related to tourist attractions using NLP methods: The case of Guanajuato, Mexico. Current Issues in Tourism, 26(2), 289–304. https://doi.org/10.1080/13683500.2021.2007227
Gunter, U., Onder, I., & Gindl, S. (2019). Exploring the predictive ability of LIKES of posts on the Facebook pages of four major city DMOs in Austria. Tourism Economics, 25(3), 375–401. https://doi.org/10.1177/1354816618793765
Harby, A., & Zulkernine, F. (2022). From data warehouse to Lakehouse: A comparative review. In Proceedings-2022 IEEE International Conference on Big Data, Big Data 2022 (pp. 389–395). https://doi.org/10.1109/BigData55660.2022.10020719
Jansson, J., Johansson, O., & Roshan, M. (2022). Initiating a smart tourism ecosystem: A public actor perspective. In Proceedings of the 55th Hawaii International Conference on System Sciences. https://doi.org/10.24251/HICSS.2022.335
Miah, S. J., Vu, H. Q., Gammack, J., & McGrath, M. (2017). A big data analytics method for tourist behaviour analysis. Information & Management, 54(6), 771–785. https://doi.org/10.1016/j.im.2016.11.011
Li, G., & Jiao, X. Y. (2020). Tourism forecasting research: A perspective article. Tourism Review, 75(1), 263–266. https://doi.org/10.1108/TR-09-2019-0382
Li, H. X., Liu, Y., Tan, C. W., & Hu, F. (2020). Comprehending customer satisfaction with hotels Data analysis of consumer-generated reviews. International Journal of Contemporary Hospitality Management, 32(5), 1713–1735. https://doi.org/10.1108/IJCHM-06-2019-0581
Li, J. J., Xu, L. Z., Tang, L., Wang, S. Y., & Li, L. (2018). Big data in tourism research: A literature review. Tourism Management, 68, 301–323. https://doi.org/10.1016/j.tourman.2018.03.009
Liu, T., Zhang, Y., Zhang, H., & Yang, X. P. (2021). A methodological workflow for deriving the association of tourist destinations based on online travel reviews: A case study of Yunnan Province, China. Sustainability, 13(9). https://doi.org/10.3390/su13094720
Lv, H., Shi, S., & Gursoy, D. (2021). A look back and a leap forward: A review and synthesis of big data and artificial intelligence literature in hospitality and tourism. Journal of Hospitality Marketing & Management, 31(2), 145–175. https://doi.org/10.1080/19368623.2021.1937434
Mohammadi, F., Yazdani, H. R., Pour, M. J., & Soltanee, M. (2020). Co-creation in tourism: a systematic mapping study. Tourism Review, 76(2), 305–343. https://doi.org/10.1108/TR-10-2019-0425
Moreno, C., Carrasco, R. A., & Herrera-Viedma, E. (2019). Data and artificial intelligence strategy: A conceptual enterprise big data cloud architecture to enable market-oriented organizations. International Journal of Interactive Multimedia and Artificial Intelligence, 5(6), 7–14. https://doi.org/10.9781/ijimai.2019.06.003
Navarro, J. R., & Rubio, J. Q. (2000). DATATUR: Tourism statistics information system-the experience of Spain. Information and Communication Technologies in Tourism, 2000, 126–146. https://doi.org/10.1007/978-3-7091-6291-0_12
Nave, M., Rita, P., & Guerreiro, J. (2018). A decision support system framework to track consumer sentiments in social media. Journal of Hospitality Marketing & Management, 27(6), 693–700. https://doi.org/10.1080/19368623.2018.1435327
Peng, R. Q., Lou, Y. X., Kadoch, M., & Cheriet, M. (2020). A Human-guided machine learning approach for 5G smart tourism IoT. Electronics, 9(6). https://doi.org/10.3390/electronics9060947
Raju, R., Mital, R., & Finkelsztein, D. (2018). Data lake architecture for air traffic management. In 2018 IEEE/AIAA 37TH Digital Avionics Systems Conference (DASC) (pp. 604–609). https://doi.org/10.1109/DASC.2018.8569361
Ramos, L. M., & Murta, F. S. (2022). Tourism seasonality management strategies-what can we learn from payment data. Journal of Hospitality End Tourism Insights. https://doi.org/10.1108/JHTI-12-2021-0337
Ranganathan, I., Thangamuthu, P., Palanimuthu, S., & Balusamy, B. (2020). The growing role of integrated and insightful big and real-time data analytics platforms. Advances in Computers, 117, 165–186. https://doi.org/10.1016/bs.adcom.2019.09.009
Rodríguez, J., Semanjski, I., Gautama, S., Van de Weghe, N., & Ochoa, D. (2018). Unsupervised hierarchical clustering approach for tourism market segmentation based on crowdsourced mobile phone data. Sensors, 18(9). https://doi.org/10.3390/s18092972
Sanchez-Lozano, G., Pereira, L. N., & Chavez-Miranda, E. (2021). Big data hedonic pricing: Econometric insights into room rates’ determinants by hotel category. Tourism Management, 85. https://doi.org/10.1016/j.tourman.2021.104308.
Sankaranarayanan, H. B., & Lalchandani, J. (2017). Passenger reviews reference architecture using big data lakes. In Proceedings of the 7th International Conference Confluence 2017 on Cloud Computing, Data Science and Engineering (pp. 204–209). https://doi.org/10.1109/CONFLUENCE.2017.7943150
Smart Tourism Destination (last consulted 2023, Sept). https://smarttourismdestinations.eu/
Tourism Data Space (last consulted 2023, Sept). https://dsft.modul.ac.at/tourism-data-inventory/
Van der Zee, E., & Bertocchi, D. (2018). Finding patterns in urban tourist behaviour: A social network analysis approach based on TripAdvisor reviews. Information Technology & Tourism, 20(1–4), 153–180. https://doi.org/10.1007/s40558-018-0128-5
Yang, B., Madden, M., Kim, J., & Jordan, T. R. (2012). Geospatial analysis of barrier island beach availability to tourists. Tourism Management, 33(4), 840–854. https://doi.org/10.1016/j.tourman.2011.08.013
Zaragozi, B., Trilles, S., & Gutierrez, A. (2021). Passive mobile data for studying seasonal tourism mobilities: An application in a mediterranean coastal destination. ISPRS International Journal of Geo-Information, 10(2). https://doi.org/10.3390/ijgi10020098
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this paper
Cite this paper
Vidal-Gil, J., Carrasco-González, R.A., Blasco-López, M.F. (2024). Data Platform for a Data-Driven Tourism Organization. A Conceptual Architecture. In: Guevara Plaza, A.J., Cerezo Medina, A., Navarro Jurado, E. (eds) Tourism and ICTs: Advances in Data Science, Artificial Intelligence and Sustainability. TURITEC 2023. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-031-52607-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-52607-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-52606-0
Online ISBN: 978-3-031-52607-7
eBook Packages: Business and ManagementBusiness and Management (R0)