A Method for Analysing Large-Scale UGC Data for Tourism: Application to the Case of Catalonia
In recent years, many articles have been published about the study of user-generated content (UGC) data in the domains of tourism and hospitality, in particular concerning quantitative and qualitative content analysis of travel blogs and online travel reviews (OTR). In general, researchers have worked on more or less population-representative samples of travel diaries, of tens or hundreds of files, which enables their manual processing. However, due to their dramatic growth, especially in the case of hospitality OTRs, this article proposes a method for semi-automatic downloading, arranging, cleaning, debugging, and analysing large-scale travel blog and OTR data. The main goal is to classify the collected webpages by dates and destinations and to be able to perform offline content analysis of the written text as provided by the author. This methodology is applied to analyse about 85,000 diaries of tourists who visited Catalonia between 2004 and 2013, and significant results are obtained in terms of content analysis.
KeywordsTravel blog Online travel review Web harvesting Web data mining Massive content analysis Catalonia
This work was supported by the Spanish Ministry of Economy and Competitiveness [Grant id.: GLOBALTUR CSO2011-23004 / GEOG].
- Abburu, S., & Babu, G. S. (2013). A frame work for web information extraction and analysis. International Journal of Computers & Technology, 7(2), 574–579.Google Scholar
- Eurostat. (2014). Tourism. In Eurostat regional yearbook 2014 (pp. 187–210). Luxembourg: Publications Office of the European Union.Google Scholar
- Marine-Roig, E. (2013). From the projected to the transmitted image: The 2.0 construction of tourist destination image and identity in Catalonia. Ph.D. dissertation. Retrieved September 1, 2014 from http://hdl.handle.net/10803/135006
- Marine-Roig, E. (2014b). The impact of the consecration of ‘La Sagrada Familia’ basilica in Barcelona by Pope Benedict XVI. International Journal of Tourism Anthropology (Special issue on “Sites of Religion, Sites of Heritage: Exploring the Interface between Religion and Heritage in Tourist Destinations”), 1–21. Retrieved September 1, 2014, from http://www.inderscience.com/info/ingeneral/forthcoming.php?jcode=IJTA
- Michael, C. (2014, May 6). From Milan to Mecca: The world’s most powerful city brands revealed. The Guardian, News, Cities, City brand. Retrieved September 1, 2014, from http://www.theguardian.com/cities/gallery/2014/may/06/from-milan-to-mecca-the-worlds-most-powerful-city-brands-revealed
- Moens, M. F., Li, J., & Chua, T. S. (Eds.). (2014). Mining user generated content. Boca Raton, FL: CRC Press.Google Scholar
- Schmunk, S., Hopken, W., Fuchs, M., & Lexhagen, M. (2014). Sentiment analysis: Extracting decision-relevant knowledge from UGC. In Z. Xiamg & L. Tussyadiah (Eds.), Information and communication technologies in tourism (pp. 253–265). ENTER 2014: Proceedings of the international conference in Dublin, Ireland, January 21–24, 2014. Switzerland: Springer.Google Scholar
- Serna, A., Gerrikagoitia, J. K., & Alzua, A. (2014). Towards a better understanding of the cognitive destination image of Euskadi-Basque Country based on the analysis of UGC. In Z. Xiamg & L. Tussyadiah (Eds.), Information and communication technologies in tourism (pp. 395–407). ENTER 2014: Proceedings of the international conference in Dublin, Ireland, January 21–24, 2014. Switzerland: Springer.Google Scholar
- Wahsheh, H. A., Alsmadi, I. M., & Al-Kabi, M. N. (2012). Analyzing the popular words to evaluate spam in Arabic web pages. The Research Bulletin of Jordan ACM, 2(2), 22–26.Google Scholar
- Wang, Y., Chan, S. C., Ngai, G., & Leong, H. V. (2013). Quantifying reviewer credibility in online tourism. In H. Decker et al. (Eds.), DEXCA 2013 (pp. 381–395). Proceedings of 24th international conference: Database and expert systems applications, Prague, Czech Republic.Google Scholar
- Yadav, Y., & Yadav, P. K. (2011). Site content analyzer in context of keyword density and key phrase. International Journal of Computer Technology and Applications, 2(4), 860–872.Google Scholar