Information Technology & Tourism

, Volume 18, Issue 1–4, pp 43–59 | Cite as

Assessing reliability of social media data: lessons from mining TripAdvisor hotel reviews

  • Zheng Xiang
  • Qianzhou Du
  • Yufeng Ma
  • Weiguo Fan
Original Research


As an emerging research paradigm, big data analytics has been gaining currency in various fields. However, in existing hospitality and tourism literature there is scarcity of discussions on the quality of data which may impact the validity and generalizability of research findings. This study examines the reliability of online hotel reviews in TripAdvisor by developing a text classifier to predict travel purpose (i.e., business vs. leisure) based upon review textual contents. The classifier is tested over a range of cities and data sizes to examine its sensitivity to data samples. The findings show that, while the classifier’s performance is consistent across different cities, there are variations in response to data sizes and sampling methods. More importantly, a considerable amount of noise is found in the data, which leads to misclassification. Furthermore, a novel approach is developed to address the misclassification problem resulting from data noise. This study reveals important data quality issues and contributes to the theoretical development of social media analytics in hospitality and tourism.


Big data Data quality Online hotel reviews Social media analytics Text classification Methodology 



This study was sponsored by the National Natural Science Foundation of China (71373023) and Beijing Municipal Commission of Education (SM201611417001).


  1. Abrahams AS, Fan W, Wang GA, Zhang ZJ, Jiao J (2015) An integrated text analytic framework for product defect discovery. Prod Oper Manag 24(6):975–990CrossRefGoogle Scholar
  2. Banerjee S, Chua AY (2016) In search of patterns among travellers’ hotel ratings in TripAdvisor. Tour Manag 53:125–131CrossRefGoogle Scholar
  3. Bird S, Klein E, Loper E (2009) Natural language processing with python. O’Reilly Media Inc, SebastopolGoogle Scholar
  4. Chua AY, Banerjee S (2013) Reliability of reviews on the internet: the case of Tripadvisor. In: Proceedings of the World Congress on Engineering and Computer Science (vol. 1). Available at Accessed Sep 2016
  5. Ekbia H, Mattioli M, Kouper I, Arave G, Ghazinejad A, Bowman T, Sugimoto C (2015) Big data, bigger dilemmas: a critical review. J Assoc Inf Sci Technol 66(8):1523–1545CrossRefGoogle Scholar
  6. Fan W, Gordon MD (2014) The power of social media analytics. Commun ACM 57(6):74–81CrossRefGoogle Scholar
  7. Fesenmaier DR, Wöber KW, Werthner H (eds) (2006). Destination recommendation systems: behavioral foundations and applications. CABIGoogle Scholar
  8. Frické M (2015) Big data and its epistemology. J Assoc Inf Sci Technol 66(4):651–661CrossRefGoogle Scholar
  9. Gretzel U, Fesenmaier DR (2002) Building narrative logic into tourism information systems. IEEE Intell Syst 17(6):59–61Google Scholar
  10. Lazer D, Pentland AS, Adamic L, Aral S, Barabasi AL, Brewer D, Jebara T (2009) Life in the network: the coming age of computational social science. Science (New York, NY) 323(5915):721CrossRefGoogle Scholar
  11. McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. AAAI-98 Workshop on learning for text categorization, vol 752, pp 41–48Google Scholar
  12. Mccleary KW, Weaver PA, Hutchinson JC (1993) Hotel selection factors as they relate to business travel situations. J Travel Res 32(2):42–48CrossRefGoogle Scholar
  13. Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134CrossRefGoogle Scholar
  14. Park S, Nicolau JL (2015) Asymmetric effects of online consumer reviews. Ann Tour Res 50:67–83CrossRefGoogle Scholar
  15. Ruths D, Pfeffer J (2014) Social media for large studies of behavior. Science 346(6213):1063–1064CrossRefGoogle Scholar
  16. Schuckert M, Liu X, Law R (2015) Hospitality and tourism online reviews: recent trends and future directions. J Travel Tour Mark 32(5):608–621CrossRefGoogle Scholar
  17. Schuckert M, Liu X, Law R (2016) Insights into suspicious online ratings: direct evidence from TripAdvisor. Asia Pac J Tour Res 21(3):259–272CrossRefGoogle Scholar
  18. Tufekci Z (2014) Big questions for social media big data: representativeness, validity and other methodological pitfalls. Preprint arXiv:1403.7400
  19. Xiang Z, Pan B (2011) Travel queries on cities in the United States: implications for search engine marketing for tourist destinations. Tour Manag 32(1):88–97CrossRefGoogle Scholar
  20. Xiang Z, Schwartz Z, Gerdes J, Uysal M (2015) What can big data and text analytics tell us about hotel guest experience and satisfaction? Int J Hosp Manag 44(1):120–130CrossRefGoogle Scholar
  21. Xiang Z, Du Q, Ma Y, Fan W (2017) A comparative analysis of major online review platforms: implications for social media analytics in hospitality and tourism. Tour Manag 58:51–65CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  • Zheng Xiang
    • 1
    • 2
  • Qianzhou Du
    • 3
  • Yufeng Ma
    • 4
  • Weiguo Fan
    • 5
  1. 1.Department of Hospitality and Tourism Management, Pamplin College of BusinessVirginia TechBlacksburgUSA
  2. 2.Collaborative Innovation Center of eTourismBeijing Union UniversityBeijingChina
  3. 3.Department of Business Information Technology, Pamplin College of BusinessVirginia TechBlacksburgUSA
  4. 4.Department of Computer ScienceVirginia TechBlacksburgUSA
  5. 5.Department of Accounting and Information Systems, Pamplin College of BusinessVirginia TechBlacksburgUSA

Personalised recommendations