Skip to main content

Machine learning for assessing quality of service in the hospitality sector based on customer reviews

Abstract

The increasing use of online hospitality platforms provides firsthand information about clients preferences, which are essential to improve hotel services and increase the quality of service perception. Customer reviews can be used to automatically extract the most relevant aspects of the quality of service for hospitality clientele. This paper proposes a framework for the assessment of the quality of service in the hospitality sector based on the exploitation of customer reviews through natural language processing and machine learning methods. The proposed framework automatically discovers the quality of service aspects relevant to hotel customers. Hotel reviews from Bogotá and Madrid are automatically scrapped from Booking.com. Semantic information is inferred through Latent Dirichlet Allocation and FastText, which allow representing text reviews as vectors. A dimensionality reduction technique is applied to visualise and interpret large amounts of customer reviews. Visualisations of the most important quality of service aspects are generated, allowing to qualitatively and quantitatively assess the quality of service. Results show that it is possible to automatically extract the main quality of service aspects perceived by customers from large customer review datasets. These findings could be used by hospitality managers to understand clients better and to improve the quality of service.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    As a matter of fact, in the past two years, Colombian government is applying a public policy where all kind of businesses have to have electronic payment options for electronic billing (see Ruling 42 of 2020 from Colombia’s DIAN).

  2. 2.

    Here the prediction is made with the same weight matrix W. However, in practice, the prediction matrix has different weights, meaning that there are two different vector representations for each word.

References

  1. Abubakar AM, Ilkan M, Al-Tal RM, Eluwole KK (2017) Ewom, revisit intention, destination trust and gender. J Hosp Tour Manag 31:220–227

    Google Scholar 

  2. Agrawal V, Bhakar S, Rana PS, Tiwari D (2018) Prediction of online perceived service quality using spider monkey optimisation. World Rev Sci Technol Sustain Dev 14(4):376–393

    Google Scholar 

  3. Ahani A, Nilashi M, Ibrahim O, Sanzogni L, Weaven S (2019) Market segmentation and travel choice prediction in spa hotels through tripadvisor’s online reviews. Int J Hosp Manag 80:52–77. https://doi.org/10.1016/j.ijhm.2019.01.003

    Article  Google Scholar 

  4. Ahmad SZ, Ahmad N, Papastathopoulos A (2018) Measuring service quality and customer satisfaction of the small-and medium-sized hotels (smshs) industry: lessons from United Arab Emirates (UAE). Tour Rev 74(3):349–370

    Google Scholar 

  5. Akbaba A (2006) Measuring service quality in the hotel industry: a study in a business hotel in turkey. Int J Hosp Manag 25(2):170–192

    Google Scholar 

  6. Alén González ME (2004) Evaluación de la calidad percibida por los clientes de establecimientos termales a través del anélisis de sus expectativas y percepciones. Rev Galega Econ 13(1–2):5–22

    Google Scholar 

  7. Anderson EW, Sullivan MW (1993) The antecedents and consequences of customer satisfaction for firms. Market Sci 12(2):125–143. https://doi.org/10.1287/mksc.12.2.125

    Article  Google Scholar 

  8. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Google Scholar 

  9. Brady MK, Cronin JJ Jr (2001) Some new thoughts on conceptualizing perceived service quality: a hierarchical approach. J Marketing 65(3):34–49

    Google Scholar 

  10. Buhalis D (2019) Technology in tourism-from information communication technologies to etourism and smart tourism towards ambient intelligence tourism: a perspective article. Tour Rev 75(1):267–272

    Google Scholar 

  11. Buhalis D, Harwood T, Bogicevic V, Viglia G, Beldona S, Hofacker C (2019) Technological disruptions in services: lessons from tourism and hospitality. J Serv Manag 30(4):484–506

    Google Scholar 

  12. Chen Y, Wang J, Lai G (2018) Research on improving the government service quality by public comments monitoring: take suburb park an example. In: 2018 15th international conference on service systems and service management (ICSSSM), IEEE, pp 1–5

  13. Cheng M, Jin X (2019) What do airbnb users care about? an analysis of online review comments. Int J Hosp Manag 76:58–70. https://doi.org/10.1016/j.ijhm.2018.04.004

    Article  Google Scholar 

  14. Chi OH, Denton G, Gursoy D (2020) Artificially intelligent device use in service delivery: a systematic review, synthesis, and research agenda. J Hosp Market Manag 2020:1–30. https://doi.org/10.1080/19368623.2020.1721394

    Article  Google Scholar 

  15. Cronin JJ Jr, Taylor SA (1992) Measuring service quality: a reexamination and extension. J Mark 56(3):55–68

    Google Scholar 

  16. Dabholkar PA, Thorpe DI, Rentz JO (1996) A measure of service quality for retail stores: scale development and validation. J Acad Mark Sci 24(1):3

    Google Scholar 

  17. Dhar RL (2015) Service quality and the training of employees: the mediating role of organizational commitment. Tour Manag 46:419–430. https://doi.org/10.1016/j.tourman.2014.08.001

    Article  Google Scholar 

  18. Douven I, Meijs W (2007) Measuring coherence. Synthese 156(3):405–425

    Google Scholar 

  19. Ghotbabadi AR, Baharun R, Feiz S (2012) A review of service quality models. In: 2nd international conference on management, pp 1–8

  20. Gronroos C (1984) A service quality model and its marketing implications. Eur J Mark 18(4):36–44

    Google Scholar 

  21. Harris ZS (1954) Distributional structure. Word 10(2–3):146–162

    Google Scholar 

  22. Hernández Maestro RM, Muñoz Gallego PA, Santos Requejo L (2006) Calidad objetiva y su relación con la formación y la satisfacción del empresario: El caso de los alojamientos rurales españoles. In: Universidad de Salamanca (España) Facultad de Economía y Empresa

  23. Hoffman MD, Blei DM, Bach F (2010) Online learning for latent dirichlet allocation. In: Proceedings of the 23rd international conference on neural information processing systems—Volume 1, Curran Associates Inc., USA, NIPS’10, pp 856–864

  24. Instituto Nacional de Estadística (2020) Un retrato de nuestros turistas. https://www.ine.es/ss/Satellite?L=es_ES&c=INECifrasINE_C&cid=1259952806229&p=1254735116567&pagename=ProductosYServicios%2FINECifrasINE_C%2FPYSDetalleCifrasINE

  25. John B, Cristian M (2018) Beware hospitality industry: the robots are coming. Worldwide Hosp Tour Themes 10(6):726–733. https://doi.org/10.1108/WHATT-07-2018-0045

    Article  Google Scholar 

  26. Joshi P, Santy S, Budhiraja A, Bali K, Choudhury M (2020) The state and fate of linguistic diversity and inclusion in the nlp world. arXiv:200409095

  27. Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification. arXiv:160701759

  28. Keshavarz Y, Jamshidi D (2018) Service quality evaluation and the mediating role of perceived value and customer satisfaction in customer loyalty. Int J Tour Cities 4(2):220–244

    Google Scholar 

  29. Kim S, Kandampully J, Bilgihan A (2018) The influence of ewom communications: an application of online social network framework. Comput Hum Behav 80:243–254

    Google Scholar 

  30. Knutson B, Stevens P, Wullaert C, Patton M, Yokoyama F (1990) Lodgserv: a service quality index for the lodging industry. Hosp Res J 14(2):277–284

    Google Scholar 

  31. Lai IK, Hitchcock M, Yang T, Lu TW (2018) Literature review on service quality in hospitality and tourism (1984–2014). Int J Contemp Hosp Manag 30(1):114–159

    Google Scholar 

  32. Lamest M, Brady M (2019) Data-focused managerial challenges within the hotel sector. Tour Rev 74(1):104–115

    Google Scholar 

  33. Lee PJ, Hu YH, Lu KT (2018) Assessing the helpfulness of online hotel reviews: a classification-based approach. Telematics Inform 35(2):436–445

    Google Scholar 

  34. Lee WH, Cheng CC (2018) Less is more: a new insight for measuring service quality of green hotels. Int J Hosp Manag 68:32–40

    Google Scholar 

  35. Lestari YD, Laode M (2018) Service innovation of 3/2 star hotel in bandung. J Asian Financ Econ Business (JAFEB) 5(3):73–80

    Google Scholar 

  36. Lestari YD, Saputra D (2018) Market study on hospitality sector: evidence from 4/5-star hotel in bandung city indonesia. Int J Business Soc 19:1

    Google Scholar 

  37. Lin H, Chi OH, Gursoy D (2019) Antecedents of customers’ acceptance of artificially intelligent robotic device use in hospitality services. J Hosp Mark Manag 2019:1–20. https://doi.org/10.1080/19368623.2020.1685053

    Article  Google Scholar 

  38. Lin HC, Han X, Lyu T, Ho WH, Xu Y, Hsieh TC, Zhu L, Zhang L (2020) Task-technology fit analysis of social media use for marketing in the tourism and hospitality industry: a systematic literature review. Int J Contemp Hosp Manag 32(8):2677–2715

    Article  Google Scholar 

  39. Luo Q, Chen Y, Chen L, Luo X, Xia H, Zhang Y, Chen L (2019) Research on situation awareness of airport operation based on petri nets. IEEE Access 7:25438–25451

    Google Scholar 

  40. Luo Y, Tang RL (2019) Understanding hidden dimensions in textual reviews on airbnb: an application of modified latent aspect rating analysis (lara). Int J Hosp Manag 80:144–154. https://doi.org/10.1016/j.ijhm.2019.02.008

    Article  Google Scholar 

  41. Ma E, Cheng M, Hsiao A (2018a) Sentiment analysis-a review and agenda for future research in hospitality contexts. Int J Contemp Hosp Manag 30(11):3287–3308

    Google Scholar 

  42. Ma Y, Xiang Z, Du Q, Fan W (2018b) Effects of user-provided photos on hotel review helpfulness: an analytical approach with deep leaning. Int J Hosp Manag 71:120–131. https://doi.org/10.1016/j.ijhm.2017.12.008

    Article  Google Scholar 

  43. Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Association for computational linguistics (ACL) system demonstrations, pp 55–60

  44. Mariani M (2019) Big data and analytics in tourism and hospitality: a perspective article. Tour Rev 75(1):299–303

  45. Mariani M, Baggio R, Fuchs M, Höepken W (2018) Business intelligence and big data in hospitality and tourism: a systematic literature review. Int J Contemp Hosp Manag 30(12):3514–3554

    Google Scholar 

  46. Martin-Fuentes E, Fernandez C, Mateu C, Marine-Roig E (2018) Modelling a grading scheme for peer-to-peer accommodation: stars for airbnb. Int J Hosp Manag 69:75–83

    Google Scholar 

  47. Martinez-Torres M, Toral S (2019) A machine learning approach for the identification of the deceptive reviews in the hospitality sector using unique attributes and sentiment orientation. Tour Manag 75:393–403. https://doi.org/10.1016/j.tourman.2019.06.003

    Article  Google Scholar 

  48. McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. 1802:03426

  49. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  50. Ministerio de Comercio, Industria y Turismo (2020) Centro de información turística de colombia. http://www.citur.gov.co/

  51. Mitchell TM et al (1997) Machine learning

  52. Moro S, Esmerado J, Ramos P, Alturas B (2019) Evaluating a guest satisfaction model through data mining. Int J Contemp Hosp Manag 32(4):1523–1538

    Google Scholar 

  53. Moros Ochoa M, Vázquez JCR, Nieto GYC, Viloria A, Ariza-Salazar J (2016) Adaptation of the ”caltic” service quality model in the tourism sector. In: International Journal of Control Theory and Applications ISSN, pp 0974–5572

  54. Önder I, Gunter U, Scharl A (2019) Forecasting tourist arrivals with the help of web sentiment: a mixed-frequency modeling approach for big data. Tour Anal 24(4):437–452

    Google Scholar 

  55. Padma P, Ahn J (2020) Guest satisfaction & dissatisfaction in luxury hotels: an application of big data. Int J Hosp Manag 84:102318. https://doi.org/10.1016/j.ijhm.2019.102318, http://www.sciencedirect.com/science/article/pii/S0278431919301549

  56. Parasuraman A, Zeithaml VA, Berry LL (1988) Servqual: a multiple-item scale for measuring consumer perceptions of service quality. J Retail 64(1):12

    Google Scholar 

  57. Parasuraman A, Zeithaml VA, Berry LL (1994) Reassessment of expectations as a comparison standard in measuring service quality: implications for further research. J Mark 58(1):111–124

    Google Scholar 

  58. Pourfakhimi S, Duncan T, Coetzee WJ (2020) Electronic word of mouth in tourism and hospitality consumer behaviour: state of the art. Tour Rev 75(4):637–661

    Google Scholar 

  59. Rahmani K, Gnoth J, Mather D (2018) Tourists’ participation on web 2.0: a corpus linguistic analysis of experiences. J Travel Res 57(8):1108–1120

    Google Scholar 

  60. Rahmani K, Gnoth J, Mather D (2019) A psycholinguistic view of tourists’ emotional experiences. J Travel Res 58(2):192–206

    Google Scholar 

  61. Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM international conference on web search and data mining, ACM, New York, NY, USA, WSDM ’15, pp 399–408. https://doi.org/10.1145/2684822.2685324

  62. Salunke SS (2014) Selenium webdriver in python: learn with examples, 1st edn. In: CreateSpace Independent Publishing Platform, North Charleston

  63. Septianto F, Chiew TM (2018) The effects of different, discrete positive emotions on electronic word-of-mouth. J Retail Consumer Serv 44:1–10

    Google Scholar 

  64. Smith AE, Humphreys MS (2006) Evaluation of unsupervised semantic mapping of natural language with leximancer concept mapping. Behav Res Methods 38(2):262–279. https://doi.org/10.3758/BF03192778

    Article  Google Scholar 

  65. Sun TVW, Norman A (2018) Exploring customer experiences with robotics in hospitality. Int J Contemp Hosp Manag 30(7):2680–2697. https://doi.org/10.1108/IJCHM-06-2017-0322

    Article  Google Scholar 

  66. Syed S, Spruit M (2017) Full-text or abstract? Examining topic coherence scores using latent dirichlet allocation. In: 2017 IEEE international conference on data science and advanced analytics (DSAA), pp 165–174. https://doi.org/10.1109/DSAA.2017.61

  67. Taecharungroj V, Mathayomchan B (2019) Analysing tripadvisor reviews of tourist attractions in Phuket, Thailand. Tour Manag 75:550–568. https://doi.org/10.1016/j.tourman.2019.06.020

    Article  Google Scholar 

  68. Tukey JW (1949) Comparing individual means in the analysis of variance. Biometrics 1994:99–114

    Google Scholar 

  69. Vallejo JM, Redondo YP, Acerete AU (2015) Las características del boca-oído electrónico y su influencia en la intención de recompra online. Rev Eur Dirección y Econ Empresa 24(2):61–75. https://doi.org/10.1016/j.redee.2015.03.002

    Article  Google Scholar 

  70. Vargas-Calderón V, Dominguez MS, Parra-A N, Vinck-Posada H, Camargo JE (2020) Using machine learning techniques for discovering latent topics in twitter colombian news. In: Narváez FR, Vallejo DF, Morillo PA, Proaño JR (eds) Smart technologies, systems and applications. Springer International Publishing, Cham, pp 132–141

  71. Vargas-Calderón V, Parra-AN, Camargo JE, Vinck-Posada H (2019) Event detection in colombian security twitter news using fine-grained latent topic analysis. arXiv:1911.08370

  72. Williams NL, Ferdinand N, Bustard J (2019) From wom to awom-the evolution of unpaid influence: a perspective article. Tourism Review 75(1):314–318

    Google Scholar 

  73. Wong Ooi Mei A, Dean AM, White CJ (1999) Analysing service quality in the hospitality industry. Manag Serv Qual Int J 9(2):136–143

    Google Scholar 

  74. Xiang Z, Du Q, Ma Y, Fan W (2017) A comparative analysis of major online review platforms: implications for social media analytics in hospitality and tourism. Tour Manag 58:51–65. https://doi.org/10.1016/j.tourman.2016.10.001

    Article  Google Scholar 

  75. Xiang Z, Shin S, Li N (2019) Online tourism-related text: a perspective article. Tour Rev 75(1):324–328

    Google Scholar 

  76. Zeithaml VA, Bitner MJ, Gremler DD (2018) Services marketing: integrating customer focus across the firm. McGraw-Hill Education, England

    Google Scholar 

  77. Zhou S, Yan Q, Yan M, Shen C (2020) Tourists’ emotional changes and ewom behavior on social media and integrated tourism websites. Int J Tour Res 22(3):336–350

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Vladimir Vargas-Calderón.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Essentials of FastText

FastText is a library that creates text embeddings. This means that a string s is mapped to a vector in the vector space \({\mathbb {R}}^N\). The FastText method shares the embedding ideas from other models such as Word2Vec (Mikolov et al. 2013). In what follows, we will see the general ideas of how the map/embedding is built; however, the interested reader is referred to (Bojanowski et al. 2017; Joulin et al. 2016; Mikolov et al. 2013; Vargas-Calderón et al. 2019) for a more formal exhibition of the method with mathematical details.

Consider a dataset of texts, or documents \({\mathcal {D}}\). We can create the vocabulary set \({\mathcal {V}}\) as the set of words contained in the documents. We can order this set arbitrarily, but for the sake of simplicity, let us assume that we deal with a vocabulary that is alphabetically ordered. Let \(V=|{\mathcal {V}}|\) be the size of the vocabulary. Consider a one-hot encoding map \(\phi : {\mathcal {V}}\rightarrow {\mathbb {R}}^V\) be defined as a function that takes the i-th element of the vocabulary (in alphabetical order) and maps it to a vector \(\varvec{\phi }_i\), which has all of its components equal to 0 except the i-th component, which is equal to 1. The embedding is an \(N\times V\) matrix W that maps a vector from the one-hot encoding vocabulary in \({\mathbb {R}}^V\) to the embedded vector space \({\mathbb {R}}^N\), where \(N \ll V\). This means that the i-th word of the vocabulary will have an embedded vector representation \({\varvec{w}}_i := W\varvec{\phi }_i\) (note that \({\varvec{w}}_i\) is just the i-th column of W). The main feature is that words that are semantically similar, also have similar vector representations in the embedded space, i.e. \({\varvec{w}}_i\cdot {\varvec{w}}_j /(||{\varvec{w}}_i|| \, ||{\varvec{w}}_j||) \approx 1\) for similar words \(w_i,w_j\in {\mathcal {V}}\).

The question that immediately arises is: how can one measure semantic similarity? Mikolov et al. (2013) define semantic similarity with a prediction problem that has its origin in the distributional hypothesis of linguistics (Harris 1954), which states that semantically similar words are used in similar contexts. For instance, the words “kindness” and “courtesy” are expected to have similar vector representations because they can be found in positive comments about hotel staff with similar contexts. The context is formally defined as the set of words that surround the word of interest, and the amount of words that are taken into the context is normally referred as the context size. The definition of context allows us to state the prediction problem that defines the semantic similarity: given a context around a word of interest \(w_i\), can we predict that the word of interest is \(w_i\)? or, given a word of interest \(w_i\), can we predict its context? These two questions are answered by the continuous bag of words (CBOW) and the skip-gram configurations of Word2Vec-like architectures, respectively.

As an example, let us consider the CBOW configuration. Consider a part of a sentence consisting of a word of interest w (we drop the sub-index) and a context of size 4: \(w_1\,w_2\,w\,w_3\,w_4\). In the CBOW configuration, we use the context words to predict the word of interest. This is done by averaging the vector representation of the context words, i.e. \({\varvec{w}}_c = \frac{1}{4}\sum _{i=1}^4 {\varvec{w}}_i\). The prediction of the word of interestFootnote 2 is done by computing \(W^T{\varvec{w}}_c\), which should equal to the one-hot encoding of the word w. The matrix elements of W can be learnt through any minimisation algorithm of a loss function such as categorical cross-entropy, built by sampling pairs (word of interest, context words) and predicting words of interest given their context words.

FastText (Bojanowski et al. 2017) leverages this idea to learn sub-word information embeddings. Instead of dealing with a vocabulary of words, FastText considers a vocabulary of n-char chains. To understand this, consider a sentence which contains the word “kindness”. We use two special characters \(\langle\) and \(\rangle\) to mark where a word starts or ends, so that “kindness” is transformed to “\(\langle\)kindness\(\rangle\)”. If we consider 5-char chains, we would get the following decomposition of “kindness”: \(\{\)\(\langle\)kind”, “kindn”, “indne”, “ndnes”, “dness”, “ness\(\rangle\)\(\}\). We learn a vector representation for each 5-char chain found in our vocabulary in the same fashion of context words, and now, the representation of a word is the average of the representation of the chains that form its decomposition. This can be extended to sentence representation by also averaging its word representations.

B \(C_V\) topic coherence

The \(C_V\) topic coherence (Röder et al. 2015) is a metric that correlates well with human topic ranking, which gives a gold standard of interpretability. The \(C_V\) coherence is calculated as follows. For each topic, consider the set \(W=\{w_1,\ldots , w_N\}\) of the N most frequent words within the documents assigned to that topic. Compute \(p(w_i)\) as a frequency that tells the probability of finding word \(w_i\) in te documents of that topic. Also, compute \(p(w_i, w_j)\) of finding \(w_i\) and \(w_j\) within a document, with the constrain that \(w_j\) must be at most s tokens away from \(w_i\), where s is some fixed window size.

Now, we consider a segmentation of W, in the sense of Douven and Meijs (2007). Such a segmentation is a set of pairs of subsets of W. In particular, the \(C_V\) coherence uses a segmentation of the form

$$\begin{aligned} S = \{(W_\beta ', W) \,\vert \, W_\beta \in {\mathscr {P}}(W)- \{\varnothing \}\}, \end{aligned}$$
(1)

where \({\mathscr {P}}(W)\) is the power set of W. We refer to each pair in S by \(S_\beta = (W_\beta ', W)\).

We can represent each set of vectors \({\bar{W}}\in {\mathscr {P}}(W)- \{\varnothing \}\}\) with a context vector \({\varvec{v}}({\bar{W}})\) of size |W|, whose j-th component is

$$\begin{aligned} v_j({\bar{W}}) = \sum _{w_i\in {\bar{W}}} \text {NPMI}(w_i, w_j)^\gamma \end{aligned}$$
(2)

, where where NPMI stands for normalised point-wise mutual information and \(\gamma\) assigns greater values to larger NPMI’s. The NPMI is defined via

$$\begin{aligned} \text {NPMI}(w_i,w_j)^\gamma = \left( -\frac{\log \frac{p(w_i,w_j) + \epsilon }{p(w_i)p(w_j)}}{\log (p(w_i,w_j) + \epsilon )}\right) ^\gamma , \end{aligned}$$
(3)

where \(\epsilon\) is a parameter added for numerical stability. Notice that the numerator in Eq. (3) is just (ignoring \(\epsilon\)) \(p(w_i|w_j)/p(w_i)\), which will be greater than 0 if the conditional probability of \(w_i\) given \(w_j\) is greater than the probability of the word \(w_i\). Therefore context vectors represent the level of co-ocurrence of a set of words \({\bar{W}}\), with respect to all words W in the N most frequent words within the documents of a topic.

Now, for each pair \(S_\beta\), we compute a confirmation measure \(\phi (S_\beta )\) (Syed and Spruit 2017) as

$$\begin{aligned} \phi (S_\beta ) = \frac{{\varvec{v}}(W'_\beta )\cdot {\varvec{v}}(W)}{||{\varvec{v}}(W'_\beta )||\,||{\varvec{v}}(W)||}. \end{aligned}$$
(4)

The confirmation measure tells how strongly W supports \(W'\), i.e. how much semantically words from W are related to \(W'\) irrespective of how much two words (or sets of words) appear together in the corpus (see the work by Röder et al. (2015) for more detail on this). The average over all pairs \(S_\beta\) are taken as the coherence for the specific topic under study. Further averaging over all topics, gives the \(C_V\) coherence.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vargas-Calderón, V., Moros Ochoa, A., Castro Nieto, G.Y. et al. Machine learning for assessing quality of service in the hospitality sector based on customer reviews. Inf Technol Tourism 23, 351–379 (2021). https://doi.org/10.1007/s40558-021-00207-4

Download citation

Keywords

  • Quality of service
  • Natural language processing
  • Word embedding
  • Latent topic analysis
  • Dimensionality reduction