1 Introduction

Hotel selection is an important decision of making a travel plan, which is directly related to the satisfaction of the travel experience. Since most tourists know nothing about the destination, they may find it very difficult to select a satisfactory hotel. With the development of online hotel booking websites and social media such as booking.com, Airbnb.com, facebook.com, online reviews of hotels, which reflect the real experiences and opinions of consumers [2], have exploded [1]. Benefit from the online reviews, tourists can learn information about the alternative hotels in their travel destinations without being there [3]. About 88% of customers tend to check online reviews before purchasing a product, and the percentage is growing [4]. Meanwhile, customers tend to share the purchase experience after buying products by leaving online reviews on websites and social media. Due to a large number of online reviews on websites, potential customers need to spend a lot of time and effort on identifying the helpful information of various products and services in the hospitality industry [5], such as travel products and restaurant services [6]. The method of review helpfulness evaluation attracted a lot of attention by researchers [7, 8]. Fake online review is also a kind of useless information that needs to be considered to delete from data sets [9].

Multi-criteria decision-making is a wildly used method in hotel selection with online reviews [10]. To develop reasonable and effective hotel selection methods, the existing studies focused on three main problems: Firstly, some researchers proposed sentiment analysis methods of online reviews for hotel selection [11]. Secondly, the criteria extraction and prioritization methods of hotel selection are the main issues that have been studied by researchers [21]. Zhang et al. [12]extracted 8 criteria and 149 keywords related to them from online. Bi et al. [13] used 6 criteria and found asymmetric effects of attribute performance. Thirdly, some researchers are interested in group tourism problem [3]. However, the existing studies have the following issues to be addressed:

The first issue is to obtain the helpful reviews and criteria of hotel selection [15]. the helpfulness of online reviews is ignored by the proposed hotel selection methods. The online reviews of opinion leaders provided a solution for the above problem. RFM is an effective method for identifying key customers in marketing [25, 26]. For identifying the opinion leaders, the social network is a vital factor which can not be ignored. Furthermore, criteria of hotel selection obtained from the opinion leaders will be more helpful.

The second issue is to obtain the weight vectors of criteria. In real life, as hotel selection is a non-expert decision-making problem, even if tourists know the criteria for selecting a hotel, the weights of these criteria will vary depending on the travel destination. For example, public transportation in specific travel destinations is underdeveloped, and the "location" should be given a higher weight. Conventional researches only considered either objective weight vectors of criteria (extract from the online reviews) or subjective weight vectors (given by travelers), which is not comprehensive. Besides, the subjective weighting method is difficult to ensure the consistency of the weight vector. The BWM has an advantage in ensuring the consistency of criteria weights by reducing comparisons time [14].

To address the above research gaps, this paper proposes a novel hotel selection decision support model based on the online reviews from opinion leaders. The main purposes of this study are as follows:

  1. (1)

    To obtain the helpful online reviews, a RFMP model is proposed to identify the opinion leaders among the customers who leave reviews on the hotel booking websites. Then, hotel selection criteria can be extract by Word2Vec method from the online reviews of opinion leaders.

  2. (2)

    To obtain the weight vectors of criteria, a subjective and objective weighting method is proposed. The objective weight vectors are obtained from the opinion leaders. The subjective weight vectors are obtained by the BWM.

Thus, this study is organized as follows: Section 2 investigates an overview of the related work. Section 3 illustrate the proposed methodology for hotel selection with online reviews. Section 4 presents a case study of the Mafengwo.com to illustrate the application of the proposed method. Comparative analyses is also presented in this section. Section 5 presents the Result and sensitivity analysis. Finally, conclusion and future work are presented in Sect. 6.

2 Literature Review

Previous research on hotel choice has focused on two categories [27, 28]. The first type of research aspect emphasizes the criteria that tourists may consider when booking and selecting a hotel, and the model based on hotel choice can help tourists is an important research aspect that has witnessed a growing number of cases supported by the fuzzy MCDM approach. Xu and Li [29] state that the most valuable criteria that may significantly influence visitors’ decisions are room quality, staff attitudes and behavior, location, transportation, value, and food. Yen and Tang [30] specifically analyze the impact of hotel criteria on eWOM behavior and provide evidence to support the relevance of these criteria to hotel performance. Kwok and Lau [31]explored an improved TOPSIS-based decision support algorithm for tourists to choose Adal [32]introduced a new integrated MCDM approach in this context, in which a stepwise weight assessment ratio analysis method was used to determine the criteria weights. An operational competitive rating analysis was also used to find the optimal hotel. Yu et al. [33] developed an MCDM model based on Vlsekriterijumska optimization I KOmpromisno Resenje (Serbian) (VIKOR) using online rating-based LDAs to solve the hotel selection problem. Peng et al. [27] used probabilistic linguistic term sets (PLTSs) converted from customer ratings to represent evaluation information and investigated a cloud decision support model to search for the best choice for tourists. They confirmed the advantages of applying PLTSs and LDAs in a hotel setting. However, because users have not been to the local area, users are not as confident in their evaluation criteria weights, they need to refer to the evaluation criteria of opinion leaders, and the evaluation criteria weights are too complex, and some current traditional methods have many shortcomings [38].

As hotel selection is a non-expert decision- making problem, online review provide decision-making opinions and weight information for tourists who have never book the alternative Hotels. The main issue is how to obtain the helpfulness online reviews to better leverage the impact of online evaluation on decision-makers. In general, the validity of online reviews is analyzed predominantly with two approaches. Firstly, Handcrafted features like structural statistics, the sentiment, features of online reviews are used to predict or identify the helpfulness of online reviews [15,16,17,18]. In some studies, comments are classified according to the quality of the information, readability, and subjectivity of the text content [19, 20]. Secondly, the automatic feature extraction process base on deep learning [21]. Chen [22]conducted an sentiment analysis based method and concluded that negative emotions attract more helpful votes, while male readers prefer comments with positive emotions. Although both statistical and NLP-based approaches were incorporated by Zhou [24], the semantic understanding of the comment content remained unattended.

The existing methods of obtaining weights for hotel selection are mainly divided into two types: given subjectively by decision-makers and objective weights calculated using machine learning algorithms. There are two main problems in the research in online reviews regarding the acquisition of weights, namely, the method of weight acquisition is complicated and does not take into account both subjective and objective weights [36]. Rezaei [14] developed a Multi-criteria Decision Making (MCDM) method named the Best Worst Method (BWM). Compared to the Analytic Hierarchy Process (AHP), one of the MCDM methods based on pairwise comparisons, BWM requires less comparative data while producing more consistent comparisons, allowing it to produce more reliable results based on previous analyses [14]. Due to its simplicity and reliability, BWM has been widely applied to solve a range of different problems [34, 35]. Besides, in the design of weights few past studies of online hotel evaluations have considered both subjective and objective weights. Nowadays, decision-makers want to seek the decision-making opinions of opinion leaders on the one hand, and ensure that their preferences are reflected on the other [37].

3 Methodology

3.1 Problem Description

To help travelers select the most appropriate hotel in the travel destination they have never been, a framework of hotel selection decision support model based on the online reviews from opinion leaders is constructed. The proposed framework are consisted of the following four parts: (1) Obtain the opinion leaders and their online reviews by the proposed RFMP method; (2) Extract the hotel selection criteria from online reviews of the opinion leaders by Word2Vec method; (3) Calculate the weight vectors of criteria which is a linear weighting of objective and subjective weight with a parameter.(4) Select the ideal hotel by TOPSIS method. The proposed framework of hotel selection decision support model based on the online reviews from opinion leaders can be shown in Fig. 1,

Fig. 1
figure 1

The structure of a novel hotel selection decision support model

The following notations are used to denote the sets and variables in the problem, which will be used throughout this paper:

  • \(A{\text { = }}\left\{ {{A_1},{A_2},...,{A_m}} \right\}\): the set of m hotels, where \({A_i}\) denotes the i-th alternative, \(i = 1,...,m\).

  • \(C = \left\{ {{C_1},{C_2}...,{C_n}} \right\}\): the set of n hotel criteria, where \({C_j}\) denotes the j-th criterion, \(j = 1,...,n\).

  • \({W_j} = \left\{ {{W_1},{W_2},...,{W_n}} \right\}\): the vector of criteria weights, where \({W_j}\) denotes the weights of criteria \({C_j}\), \({W_j} \geqslant 0\) and \(\sum \nolimits _{j = 1}^n {{W_j}} \geqslant 0\), \(j = 1,2,...,n\)

  • \(U\_ID = {\left\{ {User_h^{{A_j}}} \right\} _{k \times n}}\): the users set, where \(User_h^{{A_j}}\) denotes the h-th user who gave online reviews of \({A_j}\) hotels, \(h = 1,...,k\), \(j = 1,...,n\).

3.2 Data Preparation

Since the identification of opinion leaders improves the helpfulness of online reviews on the online travel platforms,This study crawled the online hotel reviews from the online travel platforms, as well as the features and social network centrality of users, including online evaluation time, evaluation score, evaluation text, numbers of evaluations for a moment, impact value of online reviews for a moment and user levels, user fans, number of user followings, which are pre-processed to filter opinion leaders for subsequent analysis using the RFMP model. The abbreviations of each proper noun are shown in the following Table 1.

Table 1 List of Abbreviations

In this study, as Mafengwo.com is one of the leading OTPs in China, it is selected to crawl the user online reviews and the related users’ data of multiple hotels in a specified destination using python crawler and store the crawled data by applying certain rules. The crawled data need to be pre-processed by the following steps:

Step 1: Crawling for OHRs

The crawled dataset includes hotel name, user ID, ET, ES, OET, time of the last review, number of reviews in three years, IVORs, ULs, UFs, NUFs.

Step 2: Deleting incomplete online reviews

Each valid data crawled is made up of 11 dimensions, and we do a cleanup of all the data crawled in Step 1, eliminating data with missing dimensions.

Step 3: Standardizing data

Definition 1

Date of last online reviews (DLORs), NT means now time.

$$\begin{aligned} {T^{use{r_h}}}= & {} \{ OE{T_1},OE{T_2},OE{T_3},OE{T_4},...,OE{T_n}\} ,h = 1,2,3,...,k \end{aligned}$$
(1)
$$\begin{aligned} T_{DLORs}^{use{r_h}}= & {} \min \{ \mathrm{{NT}} - OE{T_h}\},h = 1,2,3,...,k \end{aligned}$$
(2)

where \(OE{T_j}\) denotes the jth comment of the h-th user,and \({T^{use{r_h}}}\) denotes the set of online reviews of the ith user, \(T_{DLORs}^{use{r_h}}\) denotes the time interval between the last evaluation of the h-th user and the current one.

After getting the DLORs for each user, this paper standardizes on different dimensions of each data item, eliminating the effect of different measures on the metrics.

$$\begin{aligned} S\_Data = \left\{ {\begin{array}{*{20}{c}} {\frac{{Data_{_h}^{{p_j}} - Data_{\min }^{{p_j}}}}{{Data_{\max }^{{p_j}} - Data_{\min }^{{p_j}}}}}&{}{{p_j} = \{ NEMs,IVORs,ULs,UFs,NUFs\} }\\ { - \frac{{Data_{_h}^{{p_j}} - Data_{\min }^{{p_j}}}}{{Data_{\max }^{{p_j}} - Data_{\min }^{{p_j}}}}}&{}{{p_j} = T_{DLORs}^{use{r_h}} \;\;\; {{ h = 1,2,}}...{{,k}}} \end{array}} \right. \end{aligned}$$
(3)

where \({Data_{_h}^{{p_j}}}\) denotes the p-th value of the \({{p_j}}\)-th indicator, \({Data_{\max }^{{p_j}}}\) and \({Data_{\min }^{{p_j}}}\) denote the h-th max-value and min-value of the \({{p_j}}\)-th indicator respectively.

3.3 Best-Worst Method (Algorithm 1)

This paper adopts the Best Worst Method (BWM) [27], which was proved to be superior to the AHP method with respect to the computational complexity. The computational procedure of the method contains the detailed steps of the six processes involved in the proposed method. They are as follows:

Step 1. Defining a set of evaluation criteria: \(C = \{ {C_{_1}},{C_{_2}},...,{C_{_n}}\}\).

Step 2. Selecting the best and the worst criteria, respectively.

Step 3. Determine the priority of the best criterion over each of the other criteria

Using a number between 1 and 9 to determine the preference of the best criterion over all the other criteria . The resulting Best-to-Others vector (BOV) would be:\({A_{_B}} = \{ {a_{_{B1}}},{a_{_{B2}}},...,{a_{_{Bn}}}\}\) , where \({a_{_{Bj}}}\) indicates the preference of the best criterion over criterion \({C_j}\). It is clearly indicated that \({a_{BB}} = 1\).

Step 4. Determine the priority of each criterion over the worst criteria

Using a number between 1 and 9 to determine the preference of all the criteria over the worst criterion. The resulting Others-to-Worst vector (OWV) would be:\({A_{_W}} = {\{ {a_{_{1W}}},{a_{_{2W}}},...,{a_{_{nW}}}\} ^T}\), where \({a_{_{jW}}}\) indicates the preference of the criterion j over the worst criterion W. It is clearly indicated that \({a_{_{WW}}} = 1\).

The values of BOV and OWV are shown in Table 2.

Table 2 Pairwise comparison vector for BOV and OWV

Step 5. Solve the optimal weights solution \({\mathrm{{W}}^\mathrm{{*}}}\mathrm{{ }} = \mathrm{{\{ W}}_1^*,\mathrm{{W}}_2^*,...,\mathrm{{W}}_n^*\}\) . The optimal weight for the criteria is the one where, for each pair of \(W_{_B}/W_{_j}\) and \(W_{_j}/W_{_W}\) , we have \(W_{_B} / W_{_j} = {a_{{_Bj}}}\) and \(W_{j}/W_{W} = {a_{{jW}}}\). To satisfy these conditions for all j, we should find a solution where the maximum absolute differences \(\left| {\frac{{{W_{_B}}}}{{{W_{_j}}}} - {a_{_{Bj}}}} \right| \; and\; \left| {\frac{{{W_{_j}}}}{{{W_{_W}}}} - {a_{_{jW}}}} \right|\) for all j is minimized. Considering the non-negativity and sum condition for the weights, the following problem is resulted:

$$\begin{gathered} {\text{min}}\mathop {\max }\limits_{j} \left\{ {\left| {\frac{{W_{B} }}{{W_{j} }} - a_{{Bj}} } \right|\;{\text{and}}\;\left| {\frac{{W_{j} }}{{W_{W} }} - a_{{jW}} } \right|} \right\} \hfill \\ {\text{s.t.}}\left\{ {\begin{array}{ll} {\sum\limits_{j} {W_{j} = 1} } \\ {W_{j} \ge 0,{\text{for}}\;{\text{all}}\;j} \\ \end{array} } \right. \hfill \\ \end{gathered}$$
(4)

Where \({{\mathrm{{W}}_{_\mathrm{{j}}}}}\) denotes the criteria weight. \({{W_{_B}}}\) and \({{W_{_W}}}\) denotes the best weight, the worst weight,respectively. \({{a_{_{Bj}}}}\) denotes the preference score of the best criteria for the j-th criteria, similarly, \({{a_{_{jW}}}}\) denotes the preference score of the j-th criteria for the worst criteria. The problem can be transferred to the following problem:

$$\begin{aligned} \begin{array}{l} \quad \quad \quad \quad \mathrm{{min}}\delta \\ s.t.\left\{ \begin{array}{l} \left| {\frac{{{\mathrm{{W}}_B}}}{{{\mathrm{{W}}_j}}} - {a_{_{Bj}}}} \right| \le \delta \\ \left| {\frac{{{\mathrm{{W}}_j}}}{{{\mathrm{{W}}_W}}} - {a_{_{jW}}}} \right| \le \delta \\ \sum \limits _\mathrm{{j}} {{\mathrm{{W}}_{_{{j}}}}} = 1\\ {\mathrm{{W}}_{_{{j}}}} \ge \mathrm{{0,for \;all \;}}j \end{array} \right. \end{array} \end{aligned}$$
(5)

Where \({{\mathrm{{W}}_{_\mathrm{{j}}}}}\) denotes the criteria weight. \({{W_{_B}}}\) and \({{W_{_W}}}\) denotes the best weight, the worst weight,respectively. \({{a_{_{Bj}}}}\) denotes the preference score of the best criteria for the j-th criteria, similarly, \({{a_{_{jW}}}}\) denotes the preference score of the j-th criteria for the worst criteria,   and the \(\delta\) is an infinitesimal value.

Solving the problem, the optimal weights \({\mathrm{{W}}^*}\mathrm{{ }} = \mathrm{{\{ W}}_1^*,\mathrm{{W}}_2^*,...,\mathrm{{W}}_n^*\}\) and \({\delta ^*}\) are obtained

Step 6. Calculate the consistency ratio and improve the consistency of criteria weight which have been obtained. Next, the consistency of the obtained results is verified. For the consistency ratio, determining firstly as, then the consistency index(CI) can be found through the established data relationship.

$$\begin{aligned} CI{\text { = }}\frac{{{\lambda _{\max }} - n}}{{n - 1}} \end{aligned}$$
(6)

Where \({{\lambda _{\max }}}\) is the maximum eigenvalue of the matrix.

Then, by calculating the ratio of a and the consistency index, it is the required consistency ratio (CR).

$$\begin{aligned} CR = \frac{{{\delta ^*}}}{{C{I^{{a_{BW}}}}}} \end{aligned}$$
(7)

When \(CR \leqslant 0.1\), the expected level of consistency is achieved. Otherwise, the consistency of preference relations (PRs) can be improved by modifying some values in the PRs. The smaller the CR value obtained, the better the consistency and the more scientific the solution [27]

Thus, the pseudo code of the Best Worst Method is shown in Algorithm 1 for obtaining the criteria weight of hotel selection.

figure a

3.4 Extract the Online Text Review of Opinion Leaders

3.4.1 Basic Marketing Model (RFM)

According to Arthur Hughes of the Database Marketing Institute, three magic elements in the customer database make up the best metrics for data analysis. the RFM model based on three important indicators of customer behavior,is utilized to analyze customer value, in which, R represents recency, F represents frequency, and M represents monetary.

The RFM analysis is based on the following assumptions:(1) Customers who have recently made purchases are more likely to purchase again than customers who have not made purchases recently;(2) Customers who purchase more frequently are more likely than those who purchase less frequently. May buy the company’s products (services) again; (3) Customers with a higher total purchase amount are more likely to buy again and are customers with a higher value.

Definition 2

(Recency) For a \(Reviewe{r_h}\) , the recency of his consumption on OTPs can be expressed as:

$$\begin{aligned} R^{use{r_h}} = T_{DLORs}^{use{r_h}},h = 1,2,3,...,k \end{aligned}$$
(8)

where \(R^{use{r_h}}\) denotes the recency of the h-th user to posting online reviews in OTPs.

Recency represents the time of last consumption, which indicates the time since the user last consumed lasting to the present. The more recent the consumption time, the greater the customer value. Here, time is measured in days. And there is a problem here, as R indicates the length of time since the users latest consume, the closer the consumption time, the greater the customer value.

Definition 3

(Frequency) For a \(Reviewe{r_h}\) , the frequency of his consumption on OTPs can be expressed as:

$$\begin{aligned} {F^{use{r_h}}} = \sum \limits _{l = 1}^e {Reviewer_l^{use{r_h}}} ,\,l = 1,2,...,e;\,h = 1,2,...,k \end{aligned}$$
(9)

where \(F^{use{r_h}}\) denotes the frequency of the h-th user to posting online reviews in OTPs, and \({{{\mathrm{Re}}}viewer_l^{use{r_h}}}\) denotes the l-th online rating posted by the h-th user in OTPs. Frequency indicates how many times the user has spent within a certain period, which can reflect user activity.

Definition 4

(Monetary) For a \(reviewe{r_h}\), the monetary of his consumption on OTPs can be expressed as:

$$\begin{aligned} {M^{use{r_h}}} = \sum \limits _{l = 1}^e {Mone_l^{use{r_h}}} ,\,l = 1,2,...,e;\,h = 1,2,...,k \end{aligned}$$
(10)

where \(M^{use{r_h}}\) denotes the the power of impact of the h-th user to posting online reviews in OTPs, and \({Mone_l^{use{r_h}}}\) denotes the l-th online rating impact posted by the h-th user in OTPs. The amount of money a user spends over some time. A score based on the transaction amount, the higher the transaction amount, the higher the score. Reflects customer value.

3.4.2 RFMP Medel

Based on the RFM marketing model, this paper considers the popularity (P) as a new indicator, and proposes the RFMP model to aggregate criteria for each online review, applies the BWM to address multi-criteria decision problems to measure weights and the influence value of online review. To measure the publisher of online review credibility, the overall evaluation helpfulness of the publisher of online review under the RFMP model can be expressed as Eq. 11:

$$\begin{aligned} {S_{user}}_{_h}=\; & {} \mathrm{{ }}{\mathrm{{w}}_{_R}} \times \mathrm{{ }}\overline{{\mathrm{{S}}_{{\mathrm{{R}}_\mathrm{{h}}}}}} + \mathrm{{ }}{\mathrm{{w}}_{_F}} \times \mathrm{{ }}\overline{{\mathrm{{S}}_{{\mathrm{{F}}_\mathrm{{h}}}}}} + \mathrm{{ }}{\mathrm{{w}}_{_M}} \times \mathrm{{ }}\overline{{\mathrm{{S}}_{{\mathrm{{M}}_\mathrm{{h}}}}}} \mathrm{{ + }}{\mathrm{{w}}_{_\mathrm{{P}}}} \times {\mathrm{{S}}_{{\mathrm{{P}}^{use{r_h}}}}},\nonumber \\&\quad h\mathrm{{ }} = \mathrm{{ }}\left\{ {1,2,...,k} \right\} \end{aligned}$$
(11)

where the \(\overline{{\mathrm{{S}}_{{\mathrm{{R}}_\mathrm{{h}}}}}} ,\overline{{\mathrm{{S}}_{{\mathrm{{F}}_\mathrm{{h}}}}}} ,\overline{{\mathrm{{S}}_{{\mathrm{{M}}_\mathrm{{h}}}}}}\) were the data after \({\mathrm{{S}}_{{\mathrm{{R}}_\mathrm{{h}}}}}\mathrm{{,}}{\mathrm{{S}}_{{\mathrm{{F}}_\mathrm{{h}}}}}\mathrm{{,}}{\mathrm{{S}}_{{\mathrm{{M}}_\mathrm{{h}}}}}\)standardization respectively, \({\mathrm{{S}}_{{\mathrm{{P}}^{use{r_h}}}}}\) was the solution from the above model.

The parameter P is a function of three indicators: ULs, UFs, and NUFs. Since the contribution of these three indicators is not the same, which weight allocation to assemble them is one of the first issues that the model must address. According to Algorithm 1, this paper selected an criteria set \(BOV = {A_B} = \{ {a_{_{\mathrm{{NFs,1}}}}},{a_{_{\mathrm{{NFs,}}2}}},{a_{_{\mathrm{{NFs,}}3}}}\} \mathrm{{ }} and\; \mathrm{{ }}OWV = {A_W} = {\{ {a_{_{1,ULs}}},{a_{_{2,ULs}}},{a_{_{3,ULs}}}\} ^T}\) also are given by experts to assess the degree of significant preference between criteria. The centrality score of the publisher of online review can be expressed as:

$$\begin{aligned} {S_{P_{}^{use{r_h}}}}= & {} \mathrm{{ }}{\mathrm{{w}}_{_{ULs}}} \times \overline{{S_{ULs}}} + \mathrm{{ }}{\mathrm{{w}}_{_{UFs}}} \times \overline{{S_{UFs}}} + \mathrm{{ }}{\mathrm{{w}}_{_{NUFs}}} \times \overline{{S_{NUFs}}} ,\nonumber \\&\quad h\mathrm{{ }} = \mathrm{{ }}\left\{ {1,2,...,k} \right\} \end{aligned}$$
(12)

where the \(\overline{{S_{_{ULs}}}} ,\overline{{S_{_{UFs}}}} ,\overline{{S_{_{NUFs}}}}\) were the data \({S_{_{ULs}}},{S_{_{UFs}}}\mathrm{{,}}{S_{_{NUFs}}}\) after standardization respectively.

Similar to previous studies that applied RFM analysis to calculate customer value, the value of the publisher of online review is measured here.

The best and worst criteria are selected from the set of \(C = \{ R, F, M, P\}\) under the expert’s preference, and the expert is given the BOV and OMV to compute the respective weights of the criteria based on Algorithm 1. the process of data standardized is implemented in the data pre-processing to eliminating the influence of different data frames.

We cluster the four criterion, calculate the weights of each indicator, and multiply each indicator score. The publisher of online review with high overall scores is selected by setting thresholds and is considered opinion leaders.

3.5 Obtaining the Weight Vector of the Hotel Selection Criteria

This study utilized the following six first-level criteria of hotel selection: Position, Service, Cleanliness, Comfort, Facility, and Food, which are crawled from the mafengwo.com. Based on the crawled criteria, the word2vec algorithm is utilized to aggregate the preferences of the opinion leaders with them to calculate the objective evaluation criteria, which are the objective weight vectors proposed in this paper. Besides, the subjective weight vectors are obtained by calculating the preferences of each decision-maker based on the BWM.

Hotel selection is a multi-criteria decision making problem based on online reviews. Tourists who have already consumed a certain hotel used to leave an online review of several criteria on the website based on their living experience. However, it is unrealistic to expect all users to comment on a particular criteria in the same words. Some examples of hotel reviews show that both "It’s so convenient to go to the subway station" and "The hotel is in the center of the city, very convenient transportation" is about the position, but use different words. Therefore, words describing the same criteria need to be organized into a first-level criteria word, which includes a number of feature words.

In this subsection, the Word2Vec algorithm is used to construct a first-level criteria set of hotel selection and its related feature words. Since the Word2Vec algorithm is used to determine the similarity between two text data sets, in this study, it is used to calculate the similarity in semantic space between a feature word extracted from an online review and the first-level criteria words and the semantic space similarity between the first-level criteria words. Words with higher similarity are constructed as the corresponding criteria dictionaries. The specific process is as follows. First, each review in review sets \(S = \{ {s_1},{s_2},...,{s_e}\}\) needs to be segmented, using the word segmentation technology, into several words according to dependency parsing. Next, the words are converted into corresponding space vectors and then the top N words with the closest similarity to each first-level criteria word are obtained using the the Word2Vec algorithm. This step is implemented based on the word2vec.Word2Vec() method of the gensim.model in Python 3.7. After manually removing some unqualified words from it, the criteria dictionary sets \(S = \{ {s_1},{s_2},...,{s_e}\}\) is obtained. The pseudo-code of the algorithm for extracting criteria set is shown in Algorithm 2.

The following Sects. 3.5.1 and  3.5.2 focus on the acquisition and use of subjective and objective weights

figure b

3.5.1 Obtaining Objective Weights of the Criteria

Since hotel selection is a typical non-expert decision-making problem, the identification of opinion leader helps decision maker filter out the helpfulness online review and provide objective evidences for determining the weight vector of criteria. Based on the word2vec, opinion leaders (OLs) in the OTPs can be identified, and the objective weight vectors in the evaluation model rely mainly on analyzing the online review texts of the OLs. The online comments of OLs are analyzed in the following. This subsection contains the detailed steps of the four processes involved in the proposed algorithm. This subsection contains the detailed steps of the four processes involved in the proposed algorithm. They are as follows.

Step 1. Extract OLs’ online reviews.

Each evaluation online review is addressed separately. Implemented using the third-party library pandas for Python data analysis.

Step 2. Text pre-processing.

Three basic procedures (tokenization, removal of capitalization, and removal of stop words and non-characters) were used to process the crawled text comments. The aim of Tokenization is to break up text into words, phrases, or other tokens using the third-party library jieba for Python. In this study, each textual comment was broken down into words. Stop words (e.g., a, an, the, of, for he, and she), which do not contribute significantly in text mining, should be filtered before text processing. The applied stop words include the existing list of stop words (see https://github.com/goto456/stopwords). Non-Chinese characters were also removed, such as \([,.?!():;']\)and emoji.

Step 3. Extracting and classify textual online reviews to \({D_j} = \{ word_1^{s{}_l},word_{_2}^{{s_l}},...,word_j^{{s_l}},word_{_n}^{{s_l}}\}\) based on Algorithm 2

Step 4. Obtain the objective weight vectors based on Eq. 13.

$$\begin{aligned}&P(X = {D_{{j}}}) = \frac{{\sum \limits _{j = 1}^6 {\sum \limits _{l = 1}^e {\mathrm{{word}}_{_{{j}}}^{{s_l}}} } }}{{\sum \limits _{l = 1}^e {\mathrm{{word}}_{_{{j}}}^{{s_l}}} }},\; \\where\; {{j}}&= \{ 1,2,3....6\}\; are\;the\;number\;of\;criteria \end{aligned}$$
(13)

In this subsection, we need to calculate the weights of the objective criterion and used the percentage to approximate the objective weights of each criterion in the paper.

3.5.2 Obtaining Subjective Weights of the Criteria

Every traveler has his/her own preferences in hotel selection, and it is incomplete to aggregate the decision preference using only objective weight vectors obtained from the online reviews of opinion leaders. How do decision makers reflect their preferences, which is another issue considered in this paper? In this paper, the decision support model we provide is that decision makers can obtain expert’s preference opinions through online reviews of OLs, and this information is used as objective weights for the criteria. Meanwhile, the decision maker gives his/her own preferences, and the subjective weights of criteria can be calculated based on the BWM method using personal preferences. The steps to solve the subjective weights of criteria are as follows:

Step 1: Selecting the most desirable and the least important criteria, respectively.

Step 2: Determining \(BOV = A_B = \{ {a_\mathrm{{1}}},{a_\mathrm{{2}}},...,{a_n}\} \;and\; OWV = A_W = {\{ {b_1},{b_2},...,{b_n}\}}\) to assess the degree of PRs between criteria.

Step 3: Calling Algorithm 1 to solve for the subjective weights of the criteria.

Step 4: Calculate the consistency of the subjective weight vectors. calculate the consistency ratio (CR) by Eq. 7

The weights derived from the steps above are considered as an important component of making the final decision weights, which is a collection of OLs’ recommendations and personal preferences through the weight aggregation method.

3.5.3 Obtaining Combined Weight Vector

We calculated the objective weight vector and subjective weight vector respectively, and the common combined weight vector(CWAs) formula can be expressed as:

Through the above, the CWAs of each criteria are solved as shown in Eq. 14.

$$\begin{aligned} {W_j} = W_{_j}^{OWAs} \times \beta \mathrm{{ + }}W_{_j}^{SWAs} \times (\mathrm{{1 - }}\beta ) \end{aligned}$$
(14)

Where OWAs means objective weight of criteria, SWAs means subjective weight of criteria, \(\beta \in \left[ {0,1} \right]\) is a parameter to control the degree of objective weights and subjective weights.

3.6 The Hotel Selection Process

Considering the stability of the decision solution in the decision-making process, the TOPSIS method is proposed to help travelers to select hotels and analyze their decision results.TOPSIS method was first proposed by C.L. Hwang and K. Yoon in 1981 and is a sequential method that approximates the ideal solution [45]. It consists of a 6-step solution process. The step-by-step description of the TOPSIS method hotel selection process is shown below:

Step 1: Constructing a preference matrix

The decision-maker created Matrix A, and is shown as formula 15:

$$\begin{aligned} {A_{\mathrm{{ij}}}} = \left[ {\begin{array}{*{20}{c}} {{a_{11}}}&{}{{a_{12}}}&{}{...}&{}{{a_{1n}}}\\ {{a_{21}}}&{}{{a_{22}}}&{}{...}&{}{{a_{2n}}}\\ {...}&{}{...}&{}{...}&{}{...}\\ {{a_{m1}}}&{}{{a_{m2}}}&{}{...}&{}{{a_{mn}}} \end{array}} \right] \end{aligned}$$
(15)

In the matrix, m represents the number of decision points and n represents the number of evaluation criteria.

Step 2: Constructing a Standard Decision Matrix (R)

Using the elements of matrix A and the following formula 16 to determine the elements of The Standard Decision Matrix :

$$\begin{aligned} {{{r}}_{ij}} = \frac{{{a_{ij}}}}{{\sqrt{\sum \limits _{i = 1}^m {a_{ij}^2} } }}{{ \quad \quad i = 1,2,}}...{{,m; \quad j = 1,2,}}...{{,n;}} \end{aligned}$$
(16)

The matrix R is defined by the matrix shown below:

$$\begin{aligned} {R_{{{ij}}}} = \left[ {\begin{array}{*{20}{c}} {{r_{11}}}&{}{{r_{12}}}&{}{...}&{}{{r_{1n}}}\\ {{r_{21}}}&{}{{r_{22}}}&{}{...}&{}{{r_{2n}}}\\ {...}&{}{...}&{}{...}&{}{...}\\ {{r_{m1}}}&{}{{r_{m2}}}&{}{...}&{}{{r_{mn}}} \end{array}} \right] \end{aligned}$$
(17)

Step 3: Creating ideal (\({R^ + }\)) and negative ideal (\({R^\_}\)) solutions:

Find the ideal solution and the set of negative solutions, as shown in Eq. 18 below:

$$\begin{aligned} \begin{array}{l} {R^ + } = \{ (max{r_{ij}}|j \in {J_1}),(min{r_{ij}}|j \in {J_2})\} = \{ r_1^ + ,r_2^ + ,...,r_n^ + \} \\ {R^\_} = \{ (min{r_{ij}}|j \in {J_1}),(max{r_{ij}}|j \in {J_2})\} = \{ r_1^ - ,r_2^ - ,...,r_n^ - \} \end{array} \end{aligned}$$
(18)

Where \(J_1\) is a benefit indicator and \(J_2\) is a cost indicator

Step 4: In the TOPSIS method, the Euclidean distance method is used to find the deviation of the evaluation factor values at each decision point from the ideal solution and the set of negative ideal solutions. The calculation of the ideal discrimination (\({S^ + }\) ) measure is shown in the formula 19 and the calculation of the negative ideal discrimination (\({S^\_}\)) measure is shown in the formula19:

$$\begin{aligned} S_i^ += & {} \sqrt{{{\sum \limits _{j = 1}^n {[{r_{ij}}{w_j} - r_j^ + {w_j}]} }^2}} = \sqrt{{{\sum \limits _{j = 1}^n {w_j^2[{r_{ij}} - r_j^ + ]} }^2}} \nonumber \\&\quad \quad {{ i = 1,2,}}...{{,8; \quad j = 1,2,}}..{{,6;}}\nonumber \\ S_i^ -= & {} \sqrt{{{\sum \limits _{j = 1}^n {[{r_{ij}}{w_j} - r_j^ - {w_j}]} }^2}} = \sqrt{{{\sum \limits _{j = 1}^n {w_j^2[{r_{ij}} - r_j^ - ]} }^2}}\nonumber \\&\quad \quad {{ i = 1,2,}}...{{,8; \quad j = 1,2,}}...\mathrm{{,6;}} \end{aligned}$$
(19)

Step 5: The proximity (Ci*) of each decision point relative to the ideal solution are calculated. The calculation of the ideal solution is shown in the following formula20:

$$\begin{aligned} C_i^ * = \frac{{S_i^\_}}{{S_i^\_ + S_i^ + }} \end{aligned}$$
(20)

The value \(C_i^ *\) is in the range \(0 \le C_i^ * \le 1\) and \(C_i^ * = 1\) indicates the absolute proximity of the corresponding decision point to the ideal solution.

In the above TOPSIS Algorithm (Step 3), this paper adopts a subjective-objective weight set approach to creating the Weighted Standard Decision Matrix, the recommendation system had been obtained the preferences of OLs from a large number of online reviews, calculate the objective weight of criteria based on the preferences of the OLs using Algorithm 2, then take into account the preferences of the decision-makers, find the subjective weight of criteria based on Algorithm 1, and finally add the \(\beta\) to calculate the combined weights of the criteria and rank the alternatives by TOPSIS. Finally, the stability of the alternative ordering is analyzed by adjusting the parameters \(\beta\) .

3.7 The Proposed Hotel Selection Decision Support Model Based on the Online Reviews by Opinion Leaders

In this part, considering the current demands of decision-makers, the proposed model were introduce in detail, which consists of four stages as illustrated in Fig. 2, namely, data preparation, opinion leaders identification, weighing solution, and decision recommendation, to meeting the needs of decision support. Details of the method are described as follows.

Fig. 2
figure 2

The structure of the proposed decision support model for hotels

Step 1: Data preparation

Determine the criteria for hotel decisions and crawl user information on OTPs, which includes ULs, UFs, NUFs, Review Recency, Review Frequency, Review Momentary, and Review Content.

Step 2: Opinion leaders identification

The three criteria ULs UFs and NUFs are assembled using the BWM multi-criteria decision method, and the results of the assembly are expressed in terms of Popular user centrality. Based on the RFM model, the parameter P is expanded to become the RFMP model, which is the opinion leader identification model proposed in this paper. Through the scoring value and weight of each indicator, the comprehensive scoring value of each publisher of online review is obtained and the opinion leaders are selected.

Step 3: Obtain the weight vector of criteria

Based on the identification of opinion leaders, the evaluation texts are clustered and learned, and the mathematical frequencies of each indicator are counted, which are used as objective weights of criteria to provide a reference for decision experts. Meanwhile, the individual decision preferences for each criteria are satisfied again by the BWM, which can be used to solve for the weights of each criteria, which are considered here as subjective weights. Finally, the subjective and objective weights are combined, which is used as the basis for solution selection.

Step 4: Selection processes

Considering the rationality and feasibility of the proposed method, the TOPSIS-based scheme ranking method is proposed in this paper. The whole ranking is observed by the change of confidence coefficient.

4 A Case Study and Calculation

In this section, to illustrate the proposed hotel selection decision support model, a case study of Mafengwo.com is conducted based on online reviews by opinion leaders. The data of the site is shown in Fig. 3. The corpus data used in this article comes from Mafengwo.com, which is a typical OTPs, which offers a variety of hotel ranking services, with members who are in the top 100 local rankings in terms of total postings on various topics. A growing number of scholars have conducted research based on the online review of Mafengwo.com and can confirm the stability and reliability of the website data.

Fig. 3
figure 3

A screen shot from Mafengwo.com

Firstly, we retrieved all available information from Mafengwo.com listed Shanghai City hotels in April 28, 2020 by Python. To ensure the credibility of our research sample, we crawled online reviews for hotels with at least 6137 reviews.Partial Evaluation Information in a hotel review on Mafengwo.com are shown in Table 3. The data of other hotels \(A_2\) \(A_3\)...\(A_8\) are the same as \(A_1\), not shown here for now. Here are some categories of information (name, online reviews, OET ES ET NEMs IVORs UFs and NUFs) were crawled, multi-criteria (value, rooms, location, cleanliness, food quality, comfort and service) ratings can download from eight hotels \(\left\{ {{A_1},{A_2},{A_3},{A_4},{A_5},{A_6},{A_7},{A_8}} \right\}\),and The set of evaluation criteria for each hotel is \(C = \{ {C_1},{C_2},{C_3},{C_4},{C_5},{C_6}\}\) where \({C_1},{C_2},{C_3},{C_4},{C_5},{C_6}\) represent position, service, cleanliness, comfort, facility and food, respectively. The set of users is \(U\_ID = {\left\{ {User_h^{{A_j}}} \right\} _{k \times n}}\). The \(User_h^{{A_j}}\) denotes the data of the h-th user in \(A_j\) hotels, \(j = 1,2,...,8\).

Table 3 Partial Evaluation Information in a hotel review on Mafengwo.com

The data of Hotel \({A_1}\) is shown in Table 4:

Table 4 Original data sheet

The data of other hotels \(A_2\) \(A_3\)...\(A_8\) are the same as \(A_1\), not shown here for now.

For example, firstly we found a user named ZYM in \(A_1\), who wrote a online review after booking a hotel on OTPs: "Great location, deep culture, comfortable and elegant facilities, attentive service. The service charge is confusing: it is known that no waiter receives this charge. Then it can be said that the money is unreasonably taken by the hotel. Including the service charge of the hotel, it can be said that it is an unreasonable charge under the name of the hotel. Shanghai is probably the only place in the world that does this. I think that the hotel can increase the fee, but do not increase the price in disguise". Secondly, we visit the user’s ZYM profile and download data such as the user’s ULs, UFs, NUFs, and all rating information for that user over time on the platform.

4.1 The Dataset and Data Pre-Process

In this paper, based on an extension of the RFM, the RFMP is used to measure the influence scores of publisher of online review and to select OLs. The P scores of publisher of online review are calculated based on three criteria: UL, UFs, NUFs, and the Algorithm 1 is used to measure the P scores of publishers of online reviews.

The steps for using Algorithm 1 are as follows:

Step 1: Define the decision criteria set { ULs,UFs,NUFs} and identify the best(UFs) and worst(ULs) criteria respectively.

Step 2: Determine the BOV and OWV vectors. Calculate BOV = {8,1,2} and OWV = {1,8,5} based on the preference relationship of multiple experts for the indicator

Step 3: Calculate the optimal weights of the criteria Eq. 21.

$$\begin{aligned} \begin{array}{l} \mathrm{{min}}\zeta \\ s.t.\left\{ \begin{array}{l} \left| {\frac{{{\mathrm{{W}}_{_{\mathrm{{UFS}}}}}}}{{{\mathrm{{W}}_{_{\mathrm{{ULS}}}}}}} - 8} \right| \le \zeta \\ \left| {\frac{{{\mathrm{{W}}_{_{\mathrm{{UFS}}}}}}}{{{\mathrm{{W}}_{_{\mathrm{{NUFs}}}}}}} - 2} \right| \le \zeta \\ \left| {\frac{{{\mathrm{{W}}_{_{\mathrm{{NUFs}}}}}}}{{{\mathrm{{W}}_{_{\mathrm{{ULS}}}}}}} - 5} \right| \le \zeta \\ \sum \limits _\mathrm{{j}} {{{{W}}_{_{{j}}}}} = 1\\ {{{W}}_{_{{j}}}} \ge \mathrm{{0,\;for\; all\; j}} \end{array} \right. \end{array} \end{aligned}$$
(21)

Solving the problem, the optimal weights \({\mathrm{{W}}^*}\mathrm{{ }} = \mathrm{{\{ W}}_{_{\mathrm{{ULs}}}}^*,\mathrm{{W}}_{_{UFs}}^*,\mathrm{{W}}_{_{NUFs}}^*\} = \{ 0.072,0.589,0.339\}\) are obtained,

Step 4: Calculate the consistency ratio

For the consistency ratio, as the consistency index for this problem is 4.47 (see Table 5), the consistency ratio can be calculated by Eq. 7. Thus, \(CR = 0.058\), which implies a very good consistency.

Table 5 Consistency index (CI) table

Remark 1

The consistency index for BWM presents in Table  3 only including the \(a_{BW}\) numbers range from 1 to 9.

Next, according to the calculated optimal weights of the three indicators, the P of a score of all publishers of online reviews is calculated, and the calculation result is shown in Table 6.

Table 6 User popularity score P

Similarly, the score of P of rest alternative hotels \(\left\{ {{A_2},{A_3},{A_4},{A_5},{A_6},{A_7},{A_8}} \right\}\) can be calculated.

Calculate the influence score of publishers of online reviews using the four criteria of R, F, M and P. Furthermore, M and R is determined as the best and worst criteria, respectively. BOV = { 9,8,1,3} and OWV = {1,3,9,8} were determined by expert’PRs.

Calling Algorithm 1 to solve for the optimal weights as follows:

$$\begin{aligned} \begin{array}{l} \quad \quad \quad \mathrm{{min}}\mu \\ s.t.\left\{ \begin{array}{l} \left| {\frac{{{\mathrm{{w}}_\mathrm{{M}}}}}{{{\mathrm{{w}}_\mathrm{{R}}}}} - 8} \right| \le \mu \\ \left| {\frac{{{\mathrm{{w}}_\mathrm{{M}}}}}{{{\mathrm{{w}}_\mathrm{{F}}}}} - 8} \right| \le \mu \\ \left| {\frac{{{\mathrm{{w}}_\mathrm{{M}}}}}{{{\mathrm{{w}}_\mathrm{{P}}}}} - 6} \right| \le \mu \\ \left| {\frac{{{\mathrm{{w}}_\mathrm{{F}}}}}{{{\mathrm{{w}}_\mathrm{{R}}}}} - 3} \right| \le \mu \\ \left| {\frac{{{\mathrm{{w}}_\mathrm{{P}}}}}{{{\mathrm{{w}}_\mathrm{{R}}}}} - 5} \right| \le \mu \\ \sum \limits _\mathrm{{j}} {{\mathrm{{w}}_\mathrm{{j}}}} = 1\\ {\mathrm{{w}}_\mathrm{{j}}} \ge {{0, \mathrm{for} \;\mathrm{all} \;j}} \end{array} \right. \end{array} \end{aligned}$$
(22)

The optimal weights \(\mathrm{{W* }} = \mathrm{{\{ }}{\mathrm{{w}}_\mathrm{{R}}}*,{\mathrm{{w}}_F}*,{\mathrm{{w}}_M}*,{\mathrm{{w}}_P}*\} = \{ 0.064,0.113,0.649,0.174\}\) and \(\mu \mathrm{{* = 2}}\mathrm{{.26}}\) are obtained.As \({a_{BW}}\mathrm{{ = }}{a_{MR}} = 9\) the consistency index for this problem is 4.47 (see Table 3),the consistency ratio is \(2.26/4.47 = 0.505\) using Eq. 24, which also is a good consistency.

Before calculating the user influence score, this article first standardizes the users’ various parameter values. The standardized formula is:

$$\begin{aligned} S\mathrm{{t}}{\mathrm{{d}}_{{A_i}}} = \frac{{{A_i} - {A_{\min }}}}{{{A_{\max }} - {A_{\min }}}} \end{aligned}$$
(23)

After the RFMP model is trained, the influence of the publisher is predicted by inputting test samples, and then the influence ranking is performed. The standardized data table of R, F, M, and S of the publishers of online reviews publishers are shown in the Table 7.

Table 7 Standardized data sheet

Similarly, the value of RFMP of rest alternative hotels \(\left\{ {{A_2},{A_3},{A_4},{A_5},{A_6},{A_7},{A_8}} \right\}\) can be calculated.

Set the threshold to 0.074 and select OLs. The selected OLs are shown in the Table 8:

Table 8 Opinion leader data sheet

The user score value calculation formula is:

$$\begin{aligned} {U_{\mathrm{{Scor}}{\mathrm{{e}}_\mathrm{{h}}}}}\mathrm{{ = }}{\mathrm{{W}}_\mathrm{{R}}} \times {\mathrm{{S}}_{{\mathrm{{R}}_\mathrm{{h}}}}}\mathrm{{ + }}{\mathrm{{W}}_\mathrm{{F}}} \times {\mathrm{{S}}_{{\mathrm{{F}}_\mathrm{{h}}}}}\mathrm{{ + }}{\mathrm{{W}}_\mathrm{{M}}} \times {\mathrm{{S}}_{{\mathrm{{M}}_\mathrm{{h}}}}}\mathrm{{ + }}{\mathrm{{W}}_\mathrm{{P}}} \times {\mathrm{{S}}_{{\mathrm{{P}}_\mathrm{{h}}}}} \end{aligned}$$
(24)

As a result, 333 OLs were selected based on the RFMP model. In the next subsection, a detailed analysis of opinion leaders comments and preferred opinions.

4.2 Obtaining Weight Vector of Hotel Selection

4.2.1 Obtaining Objective Weight Vector

In the previous subsection, 366 OLs had been screened out and this section focuses on processing and analyzing OLs online reviews.Calling Algorithm 2 to process the text data of the OLs, after classifying the vocabulary, the Table 8 shows:

Table 9 Hotel selection criteria and feature words
Fig. 4
figure 4

Classification results for sub-criteria

By analyzing the comment texts of 333 opinion leaders, a total of 7935 words were counted, of which 6967 unique words were counted. In this paper, 968 invalid words were removed, counting six categories of words, Position, Service, Cleanliness, Comfort, Facility, and Food. Cleanliness appeared 950 times, comfort 955 times, facility 1299 times, food 878 times, location 1545 times, and service 1340 times. The feature words are grouped as shown in the Table 9. The words were plotted into a word cloud as shown in the figure 4. In this paper, the frequencies are used to estimate the OWAs, which could be estimated as { 0.222,0.192,0.136,0.137,0.186,0.126}.

Table 10 Pairwise comparison vector based on PRs

4.2.2 Obtaining Subjective Weight Vector

Considering the data provided in Table 10 results in model named BWM for this problem, as follows:

$$\begin{aligned} \begin{array}{l} \min \lambda \\ s.t.\left\{ {\begin{array}{*{20}{c}} {\left| {\frac{{{\mathrm{{W}}_\mathrm{{3}}}}}{{{\mathrm{{W}}_\mathrm{{1}}}}} - 4} \right| \le \lambda }\\ {\left| {\frac{{{\mathrm{{W}}_\mathrm{{3}}}}}{{{\mathrm{{W}}_2}}} - 3} \right| \le \lambda }\\ {\left| {\frac{{{\mathrm{{W}}_\mathrm{{3}}}}}{{{\mathrm{{W}}_\mathrm{{4}}}}} - 2} \right| \le \lambda }\\ {\left| {\frac{{{\mathrm{{W}}_\mathrm{{3}}}}}{{{\mathrm{{W}}_\mathrm{{5}}}}} - 4} \right| \le \lambda }\\ {\left| {\frac{{{\mathrm{{W}}_\mathrm{{3}}}}}{{{\mathrm{{W}}_6}}} - 8} \right| \le \lambda }\\ {\left| {\frac{{{\mathrm{{W}}_\mathrm{{1}}}}}{{{\mathrm{{W}}_6}}} - 3} \right| \le \lambda }\\ {\left| {\frac{{{\mathrm{{W}}_\mathrm{{2}}}}}{{{\mathrm{{W}}_6}}} - 2} \right| \le \lambda }\\ {\left| {\frac{{{\mathrm{{W}}_\mathrm{{4}}}}}{{{\mathrm{{W}}_6}}} - 4} \right| \le \lambda }\\ {\left| {\frac{{{\mathrm{{W}}_\mathrm{{5}}}}}{{{\mathrm{{W}}_6}}} - 1} \right| \le \lambda }\\ {\sum \limits _{{j}} {{\mathrm{{W}}_{{j}}}} = 1}\\ {{\mathrm{{W}}_\mathrm{{j}}} \ge \mathrm{{0,for\; all \; j}}} \end{array}} \right. \end{array} \end{aligned}$$
(25)

Solving the problem, the optimal weights \({\mathrm{{W}}^{SWA\mathrm{{s}}}} =\{ \mathrm{{0}}\mathrm{{.116}},\mathrm{{0}}\mathrm{{.133}},\mathrm{{0}}\mathrm{{.387}},\mathrm{{0}}\mathrm{{.232}},\mathrm{{0}}\mathrm{{.083}},\mathrm{{0}}\mathrm{{.05}}0\}\) are obtained, for the consistency ratio, as \({{{\mathrm{{W}}_{_{Best}}^{SWAs}}} / {{\mathrm{{W}}_{_{Worst}}^{SWAs}}}} = {{{\mathrm{{W}}_{_3}^{SWAs}}} / {{\mathrm{{W}}_{_6}^{SWAs}}}} = 8\) the consistency index for this problem is Table 3 (see Table 3), which is a standard comparison table calculated by experts and the consistency ratio is 0.149, and implies a very good consistency.

4.2.3 Obtaining Combined Weight Vectors(CWAs) of Hotel Selection

suppose that the parameter \(\beta = 0.5\), which means that objective weight and subjective weight are equally important to the traveler, the combined weight vector can be calculated by Eq. 14:

W ={ 0.169,0.163,0.261,0.185,0.135,0.088}

4.3 Ranking of Hotel

It is shown in Table 11 of the initial matrix created,however we must normalize the original data in the analysis before.

Table 11 Comprehensive Hotel Score

We use formula 16 to evaluate indicators with different units together before TOPSIS analysis, normalization process was applied . The decision matrix after standardization is shown in Eq. 26.

$$\begin{aligned} \left[ {\begin{array}{*{20}{c}} {0.151}&{}{0.143}&{}{0.144}&{}{0.140}&{}{0.148}&{}{0.140}\\ {0.152}&{}{0.143}&{}{0.148}&{}{0.143}&{}{0.144}&{}{0.143}\\ {0.149}&{}{0.144}&{}{0.141}&{}{0.146}&{}{0.143}&{}{0.140}\\ {0.154}&{}{0.143}&{}{0.140}&{}{0.140}&{}{0.141}&{}{0.149}\\ {0.148}&{}{0.140}&{}{0.144}&{}{0.140}&{}{0.148}&{}{0.146}\\ {0.149}&{}{0.144}&{}{0.141}&{}{0.146}&{}{0.143}&{}{0.140}\\ {0.149}&{}{0.146}&{}{0.149}&{}{0.148}&{}{0.148}&{}{0.144}\\ {0.149}&{}{0.146}&{}{0.140}&{}{0.133}&{}{0.141}&{}{0.140} \end{array}} \right] \end{aligned}$$
(26)

According to Eq. 19, the ideal solution \({R^ + }\) and negative ideal solution \({R^ - }\) are determined, and the calculation results are shown in follow:

$$\begin{aligned} {\mathrm{{R}}^ + } = \{ \mathrm{{0}}\mathrm{{.154}},\mathrm{{0}}\mathrm{{.146}},\mathrm{{0}}\mathrm{{.149}},\mathrm{{0}}\mathrm{{.148}},\mathrm{{0}}\mathrm{{.148}},\mathrm{{0}}\mathrm{{.149}}\} \nonumber \\ {\mathrm{{R}}^ - } = \{ \mathrm{{0}}\mathrm{{.148}},\mathrm{{0}}\mathrm{{.140}},\mathrm{{0}}\mathrm{{.140}},\mathrm{{0}}\mathrm{{.133}},\mathrm{{0}}\mathrm{{.141}},\mathrm{{0}}\mathrm{{.140}}\} \end{aligned}$$
(27)

According to Eq. 20 ideal discrimination \(S_1^ +\) and negative ideal discrimination \(S_1^ -\) are determined, and the calculation results are shown in follow:

$$\begin{aligned} S_1^ += & {} \sqrt{{{\sum \limits _{j = 1}^n {[{r_{1j}}{w_j} - r_j^ + {w_j}]} }^2}} = \sqrt{{{\sum \limits _{j = 1}^n {w_j^2[{r_{1j}} - r_j^ + ]} }^2}}\nonumber \\= & {} \frac{{\sqrt{(2756786{\beta ^2} - 6697564\beta + 7669687} }}{{1000000}}\nonumber \\ S_1^ -= & {} \sqrt{{{\sum \limits _{j = 1}^n {[{r_{ij}}{w_j} - r_j^ - {w_j}]} }^2}} = \sqrt{{{\sum \limits _{j = 1}^n {w_j^2[{r_{ij}} - r_j^ - ]} }^2}} \nonumber \\= & {} \frac{{\sqrt{(2106387{\beta ^2} - 4065696\beta + 5649467} }}{{1000000}}\nonumber \\ C_1^*= & {} \frac{{\sqrt{(2106387{\beta ^2} - 4065696\beta + 5649467} }}{{\sqrt{(2106387{\beta ^2} - 4065696\beta + 5649467} + \sqrt{(2756786{\beta ^2} - 6697564\beta + 7669687} }} \end{aligned}$$
(28)

Similarly, \(S_2^ +\) \(\cdots\) \(S_6^ +\) , \(S_2^ -\) \(\cdots\) \(S_6^ -\) and \(C_2^*\) \(\cdots\) \(C_6^*\) can be solved in turn. The results of the calculations are shown in the following table. Here, the weighting of the subjective and objective weights is chosen to be 0.5 each, and the final ranking score results in:

\({A_1} = 0.477\); \({A_2} = 0.685\); \({A_3} = 0.499\); \({A_4} = 0.388\); \({A_5} = 0.449\); \({A_6} = 0.499\); \({A_7} = 0.804\); \({A_8} = 0.144\)

Then the final ranking of the hotel is:\({A_7} \succ {A_2} \succ {A_6} \succ {A_3} \succ {A_1} \succ {A_5} \succ {A_4} \succ {A_8}\).

4.4 Comparison Experiments

MCDM based on BWM and fuzzy TOPSIS methods has been proposed and widely used by traditional research cite Himanshu Gupta. However, hotel selection using online reviews is a lay decision problem. Filtering out useful online reviews from the large number of online reviews is a problem that needs to be solved. To illustrate the effect of objective weight vector on the final hotel selection results, we conducted a comparative experiment between the BWM and Fuzzy TOPSIS based MCDM proposed by Gupta and the method proposed in this study, and when the proportion of objective weights reached 0.7 and above, the solution ranking showed a significant change, and the larger the value of the weight of objective weights, the larger the solution ranking change,as shown in Fig. 5.

Thus, when the percentage of objective weight is less than 0.9 the final ranking of the hotel by Gupta’s method is:\({A_7} \succ {A_2} \succ {A_6} \succ {A_3} \succ {A_1} \succ {A_5} \succ {A_4} \succ {A_8}\). Meanwhile, the final ranking of the hotel by the proposed method is \({A_7} \succ {A_2} \succ {A_1} \succ {A_3} \succ {A_6} \succ {A_4} \succ {A_5} \succ {A_8}\).

4.5 Computational Complexity

Since the BMW has fewer comparisons than the AHP, and only linear calculation is needed, the simplicity of weight obtaining is an advantage of this method. the RFMP proposed in this study filters out online reviews of opinion leaders and reduces the amount of online reviews during data processing.

5 Result and Discussion

In the last content of the previous chapter, we equalize the influence utility of the subjective and objective weights and obtain the alternative hotel solutions in the following order:\({A_7} \succ {A_2} \succ {A_6} \succ {A_3} \succ {A_1} \succ {A_5} \succ {A_4} \succ {A_8}\). Since tourism evaluation is a non-expert system, the customer has not been to the destination, it is said that some OLs are needed to give objective weights to help the decision-maker to make a decision, and the existing literature does not consider the influence of OLs on the decision, so our method is more reasonable. Sensitivity is now widely used by researchers for studies presenting a hybrid model to confirm the validity of the results and eliminate any chance of biasness by subjectivity.To perform sensitivity analysis the criteria obtaining highest weight in subjective and objective is varied from 0.1 to 0.9 and consequently combined weights of all the criteria are varied.

This section focuses on the impact of the objective weights represented by opinion leaders on the overall ranking of hotel alternatives.

Based on the above solution, the trend of the alternatives scores value posting progress with parameters is plotted using mathematical tools as shown in Fig. 5. Table 12 displays the expected results for each hotel under different contexts, whereas Fig. 6 visualizes the robustness analysis results of these experiments.

Table 12 Sensitive analysis with parameters
Fig. 5
figure 5

Robust analysis results of hotel ranking with parameters

Fig. 6
figure 6

Robust analysis results of hotel ranking

As the Fig. 5 shows, the objective weight vectors affect the ranking result of the scheme to some extent, but on the whole, the trend of the scheme is stable and variable. This shows that the stability of the online decision support model proposed in this paper considering the subjective and objective weight set is reasonable.

6 Conclusion and Future Work

This article aims to provide a solution for the non-expert decision-making method such as hotel selection. To achieve this goal, a novel hotel selection decision support model based on the online reviews by opinion leaders. In this method, It has the following main advantages with respect to conventional models proposed in the literature.

  1. 1.

    To utilized the helpful information rather than the massive information with fake and useless information on the OTPs, an RFMP model is proposed to extract the online reviews of opinion leaders, and the Word2vec method is used to extract the criteria from the online review of opinion leaders. Due to the BWM method performs better than the AHP [27] method in terms of consistency and number of comparisons, it is employed to aggregate the RFMP values.

  2. 2.

    Since hotel selection is a typical non-expert decision, the objective weights derived from the online reviews helps the decision maker know which factors are important in places they have not been, and the subjective preferences of decision makers have not been ignored. The method of combining subjective and objective weights proposed in this study considers more comprehensive factors.

This study makes the above contributes but also has some limitations, which may serve as avenues for future research. First, Due to the different types of data displayed on OTPs, the data set in this study crawled from only one website. On the premise that data is available, data cross-validation of multiple websites is an effective way to verify this method. Second, As group travel is increasingly popular in people’s leisure life, The opinion leaders in group tourism will undoubtedly have an impact on the decision-making results. the group analysis of opinion leaders and group consensus reaching mechanism could be a future research field [39,40,41,42,43,44]. Besides, the sentiment analysis of online reviews to obtain the objective preference for decision-makers could be considered in the future work.