Introduction

Tourism is considered one of the leading industries with a significant impact on socio-economic development globally [1]. If taken from a service perspective, it can be identified as a service that initiates the mobilisation of people from one place to another for leisure or business purposes [2]. Tourism has also paved the way for a fast-growing sector examining connections with developing countries and cultivating sustainable development goals [3]. Tourism reality has shifted the mentality, let alone the culture, of the people in exploring vast locations and cultures [4] beyond their domestic areas [5]. This fact is more apparent because of the tourism shift from the typical business model to the customer-centred approach [6]. According to tourism statistics presented in [7], nearly 1.8 million international tourists would be recorded in 2020. This number is expected to hit 7.8 billion in 2025. Tourism requires planning and developing to search for compromises amongst environmental, social and economic aspects [5]; owing to this intensive demand, technology growth comes in handy. The concept of e-tourism stemmed from a simple idea of digitalising the travel concept for tourism, making it accessible to individuals through their mobile devices. It is an e-commerce solution for companies within the travel and tourism industry to refer to when using information communication technology for tourists. E-tourism has evolved by making substantial use of information technology development aligned with core problems in tourism [8]. Although e-tourism emerged as significant, it may not serve its intended purpose in the best way possible, which resulted in a more centric concept of smart e-tourism. Previous research [9] has identified an approach that can provide e-tourism experience with better aspects, including real data, context awareness (CA) and personalisation. It, therefore, maximises the potential to enhance the quality of overall tourism [9] sought by the interactive environment for system users.

The smart tourism sector has relied on a social method to provide a positive social experience for travellers who share various pleasant tour experiences through e-tourism [10]. However, studies have also argued the critical differences between typical e-tourism and smart e-tourism. The key differences can be pinpointed upon concluding that typical e-tourism provides a digital connection between the clients and business. By contrast, smart e-tourism takes the latter concepts and connects them with physical worlds via the available resources (technologies), such as cloud computing, the Internet of Things (IoT) and social media (SM) [11]. Smart e-tourism, following all the previous technologies, has also found its way with more interesting technological advances. It presents itself in various concepts, including SM, recommender systems (RS), big data (BD), privacy preserving (PP), user modelling (UM), IoT, real time (RT), augmented reality (AR), CA, user experience (UE), theoretical contributions (TC) and cultural heritage (CH) [12]. All these concepts can fall into one of two main smart e-tourism categories, tourism marketing [13,14,15,16,17,18,19,20] and smart-based tourism recommendation system (TRS). Smart-based TRS includes four sub-categories, namely, (1) collaborative filtering [21,22,23,24,25,26,27,28,29], (2) content [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47], (3) context [48,49,50,51,52,53,54,55,56,57] and hybrid models [58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75]. In smart e-tourism, undeniable smart key concept attributes variations are observed for smart-based TRS and tourism marketing strategies. This notable discrepancy raises an issue in understanding the differences and greatly challenges the assessing and benchmarking of the applications per each category. This discrepancy can be attributed to the fact that different applications perform differently across these concepts [12]. These differences constitute significant inspiration for proposing a satisfactory assessment and benchmarking to evaluate these applications [76].

The benchmarking of smart e-tourism applications can be used as a reference point for industry practitioners and researchers to determine the most critical attributes associated with smart e-tourism applications. Additionally, they can maximise benefits for tourism-related authorities by assisting them in developing the most effective development strategies and directly identifying the ideal alternative without manually checking extensive reviews and screening potentially valuable information [77]. Selecting the smartest application from a set of applications whilst considering the decision attributes and decision makers’ disparate priorities is a complex task. Although such a consideration may be hard for everyone, decision support systems can enforce any decision once they are provided with the right parameters and factors [78]. Multi-criteria decision-making (MCDM) is one of the most important techniques in the decision science field, specifically concerned with making optimal decisions based on available information [79]. This technique aids in making the right decision, especially when multiple factors/concepts influencing the final decision are taken into account. MCDM is classified into two main categories [80, 81]: (i) multi-objective decision making (MODM), which naturally entails several competing objectives that must be optimised concurrently, and (ii) multi-attribute decision making (MADM), which focuses on discrete decision spaces. These categories consider different data types and purposes. Practically, the alternatives are not preidentified in the problems associated with MODM [82, 83]. In this category, the primary concern is to design or plan the most appropriate alternative given the available resources [84, 85]. By contrast, the alternatives must be preidentified in the problems associated with MADM. The primary concern of this category is to select, prioritise or rank a finite number of alternatives [86, 87]. Unquestionably, both groups’ capabilities shine most when required. Many studies have applied MODM and MADM approaches in different smart e-tourism perspectives.

A number of interesting works have been on smart e-tourism applications in the MODM literature. Pop et al. [88] proposed a complex architecture for smart tourism that combines Martins and NSGA II algorithms. They developed a scheduling technique with multiple objectives for multimodal and time-dependent transportation networks. Ahmad et al. [89] developed an innovative application that recommends the best travel route depending on user requirements (e.g., maximum time, distance and popularity of a particular place). A Markov chain model was used to forecast the popularity of various locations on a short-term and long-term basis, where popularity index in conjunction with user requirements was used to find optimal routes. To optimise the network structure whilst adjusting future resource levels, Wen et al. [90] came up with a double-layered coupling system focused on tourism traffic. Wang et al. [91] designed a smart tourism management model for wisdom tourist service. Their model, which utilises artificial colony theory and big data to optimise tourism resource allocation, increases tourism management efficiency and quality. From another angle, various studies have shown that MADM methods markedly aid in assessing smart e-tourism applications. Nilashi et al. [92] built a hotel RS for online tourist platforms using multi-attribute ratings from social media networking sites. Lin [77] used principal component analysis and analytic network process methods for assessment and Vlsekriterijumska Optimizcija I Kaompromisno Resenje (VIKOR) for benchmarking urban and rural tourism systems. Nie et al. [93] investigated an MADM model for hotel selection to aid travellers in selecting an appropriate hotel and guide hoteliers in getting competitive advantages in the e-tourism era. Nilashi et al. [94] incorporated multi-attribute ratings into the development of a new system based on MADM for hotel suggestions on e-tourism platforms. Some works have evaluated and benchmarked the study alternatives for certain cases. Despite the availability of all these studies, their aims, evaluations and benchmarking methods have only been directed at certain cases and aspects related to different smart e-tourism categories. None of these studies have considered assessing all smart e-tourism applications across all categories and aspects except for the work presented by [76]. This study has assessed and benchmarked all smart e-tourism applications across their associated categories using the MADM methods. Therefore, the present study focuses on applying MADM methods to various concepts of these applications. The rationale for this emphasis is the important research contributions in analysing various applications based on diverse concepts.

Despite all the previous efforts, the benchmarking procedures of smart e-tourism applications do not come without issues, especially when considering the nature of applications to be benchmarked. These issues include the presence of multiple evaluation attributes, the variance of data between categories and their concepts and determining the importance of the relative attributes. Krishnan et al. [76]. considered all these issues whilst assessing and ranking the applications of smart e-tourism. They proposed MADM method (i.e., interval type 2 trapezoidal fuzzy weighted with zero inconsistency (IT2TR-FWZIC) integrated with the VIKOR) to assess all the applications across each category. Regardless of their commendable efforts in addressing the aforesaid issues, further issues appeared. They then used an existing MADM ranking approach called VIKOR. VIKOR is one of the most ranking methods used to determine the optimal solution amongst a set of solutions. However, the VIKOR method has weaknesses; it requires external weighting and normalisation, that is, converting data measured on multiple scales into the same scale in the decision matrix (DM) methods [95]. There are various normalisation methods. However, normalisation is considered an issue with VIKOR because each method provides a different scale, which alters the data behaviour and the final decision [95]. In addition, VIKOR cannot weight attributes [96], and, hence, an alternative weighing method is required to address this issue. Moreover, VIKOR is ineffective in dealing with ambiguous and vague data.

Given these issues and to pave the way for a more robust method, Salih et al. [95] proved the robustness of an approach called fuzzy decision by opinion score method (FDOSM) in addressing all of the preceding issues. FDOSM relies on the opinion of experts (decision makers) to deliver logical decisions because it utilises opinion matrix and the ideal/optimal solution concepts. The experts determine the ideal/optimal solution and compare it to other solutions for the same criterion per solutions. Following the generation of the opinion matrix, the final ranking of solutions is established using arithmetic mean (AM) with direct aggregation. FDOSM is built on two essential parts: individual and group decision-making contexts. Comparing FDOSM to other MADM (ranking and weighting) methods, the former used an ideal/optimal solution concept, avoided inconsistency and two preferences, reduced comparisons numbers, specified reasonable and implicit comparisons and required fewer mathematical operations. In addition, it overcame normalisation and weight issues in MADM methods. The earliest version of the FDOSM focused exclusively on the AM operator in the direct aggregation whilst ignoring all other possible operators. Simultaneously, FDOSM ignored distance measurements, which could have resulted in different ranking outcomes. Since then, FDOSM has been improved to include the geometric mean, harmonic mean and root mean square, as well as distance measurement, to determine the ideal/optimal solution [97]. Both versions of FDOSM try to handle ambiguous and fuzzy information using triangular fuzzy numbers (TFNs). However, TFN cannot address several real-world problems [98]. The ambiguous and fuzzy information that results from the subjectivity of experts’ feedback can be handled through the use of fuzzy set methods [95]. Hence, FDOSM must be improved and extended under another fuzzy environment to efficiently tackle ambiguity problems, gather additional useful information under imprecise settings and benchmark smart e-tourism applications.

Recently, neutrosophic fuzzy sets (NFSs) have been proposed [99], where the term ‘neutrosophy’ refers to the neutral thought’s knowledge. NFS allowed decision makers to deal with the knowledge of neural thinking [132, 133]. The neutrality of this type makes it possible to add extra functionality to model ambiguity information [134, 135]. Neutrosophy is a modern branch of philosophy concerned with the scope of neutralities, nature and origin as well as their interactions with various ideational spectra [100]. NFSs are applauded for their ability to manage ambiguous, imprecise and incomplete information [101]. Owing to these features, many researchers utilised NFSs. Pamucar et al. [102] integrated NFS with Measurement Alternatives and Ranking according to the Compromise Solution to benchmark fuel vehicles alternatives for sustainable transport. Gokasar et al. [103] extended Technique For Order Preference By Similarity To An Ideal Solution (TOPSIS) and Weighted Aggregated Sum Product Assessment (WASPAS) under type-2 neutrosophic environment to prioritise the projects of bridge maintenance. A triangular linguistic neutrosophic cubic fuzzy with TOPSIS was presented by [104] to solve real-world MCDM problems. Adaikalaraj et al. [105] extended VIKOR method with interval valued Neutrosophic fuzzy set to rank the nanotoxicity evaluation methods. Preference Ranking Organisation Method for Enrichment Evaluation was extended by [106] under Pythagorean neutrosophic fuzzy environment to select smart materials. Zavadskas et al. [107] selected of single-family house elements and materials by integrating Step-wise Weight Assessment Ratio Analysis and Multi-objective Optimisation by Ratio Analysis Plus Full Multiplicative Form under NFS environment. Pamucar et al. [101] prioritised the alternatives of energy storage technology by extending Dombi weighted geometric averaging operator and Multi-Attributive Ideal-Real Comparative Analysis with trapezoidal NFS. The multi-attribute border approximation area comparison–elimination and choice translating reality was extended by [108] under single-valued neutrosophic linguistic fuzzy to select the best outsourcing provider. Liang et al. [109] integrated DEcision-MAking Trial and Evaluation Laboratory (DEMATEL) with single-valued trapezoidal neutrosophic fuzzy to evaluate and select e-commerce websites. Başhan et al. [110] selected the risk in ship navigation by extending TOPSIS with NFSs. Abdel-Basset et al. [111] benchmarked security services by developing DEMATEL under NFSs.

All robust MADM methods and their extensions based on NFSs show that this fuzzy type is superior to other types of fuzzy sets in terms of addressing ambiguity, uncertainty and fuzziness issues [112]. The limitations of FDOSM and its extensions can be handled by extending an FDOSM under NFSs environment to be used in benchmarking smart e-tourism applications based on multiple attributes. However, FDOSM can implicitly weight each attribute, but it cannot assign explicit weights to the attributes. FWZIC can be utilised to weight the attributes with zero inconstancies to address this theoretical issue. It is based on expert preference differences for each attribute to determine its significance level in the decision process, which includes multiple steps for establishing the weights of the attributes. The initial version of FWZIC was developed under TFNs, which has many shortcomings [113]. Krishnan et al. [76] thus extended FWZIC under interval type-2 trapezoidal fuzzy environment, which efficiently weights the attributes of the applications. Despite the capability of both versions of FWZIC (the initial version and IT2TR-FWZIC), NFSs show more capabilities and advantages in addressing uncertain phenomena of information. Therefore, FWZIC should also be extended under NFS environment resulting in a homogeneous fuzzy environment between FDOSM and FWZIC to address ambiguity, uncertainty and fuzziness issues in benchmarking the applications of smart e-tourism efficiently.

The novelty and main contribution of this work include (1) developing a new homogeneous MADM framework based on NFS environment, (2) formulating a new extension of FWZIC, called NS-FWZIC, for weighting the smart key concept attributes involved in assessing smart e-tourism and (3) formulating a new extension of FDOSM, called NS-FDOSM, for benchmarking the applications of smart e-tourism per each e-tourism category.

Methodology

This section presents the proposed methodology comprising two phases as described in Fig. 1.

Fig. 1
figure 1

Phases of the proposed method

Phase 1: DMs

In this section, the DMs used for benchmarking the smart e-tourism application are adopted from previous work [76]. A total of 12 smart key concept attributes used in this regard along with the applications (n = 65) are utilised based on their categories, including (1) tourism marketing (n = 8) illustrated in Table 1, (2) collaborative filtering (n = 11) illustrated in Table, (3) content-based (n = 18) illustrated in Table, (4) context-based (n = 10) illustrated in Table and hybrid models-based (n = 18) illustrated in Table.

Table 1 Tourism marketing decision matrix

Tables 1, 2, 3, 4, 5 show that some attributes are denoted by ✓ signs; others, × . A binary assumption-based assessment was conducted by [76] to evaluate the applications, with ✓ sign indicating the presence of the attribute in a given application and × sign indicating the absence of the attribute in a particular application. This study proposes a new method for benchmarking the previously evaluated applications. Benchmarking these applications per category is mostly accomplished using the 12 attributes, namely, CA, RS, SM, IoT, UE, RT, UM, AR, BD, TC, CH and PP. As discussed in [76], the evaluation attributes are considered benefit attributes, indicating that the higher the scalar value, the better. As for the upcoming phase, the development of the MADM solution is extensively described based on the extension of NS-FWZIC integrated with NS-FDOSM.

Table 2 Collaborative filtering decision matrix
Table 3 Content decision matrix
Table 4 Context decision matrix
Table 5 Hybrid model decision matrix

Phase 2: development phase

This section presents the extended MADM approach for weighting the 12 attributes based on NS-FWZIC along with NS-FDOSM, which is used for applications benchmarking. The full details of this integration can be elaborated as follows.

Formulation of NS-FWZIC

The new extension of FWZIC (NS-FWZIC) method involves 5 sequential steps to weight the evaluation attributes (i.e., 12 smart key concept attributes). These steps are presented in detail as follows:

  • Step 1: Defining evaluation attributes

The evaluation attributes are adopted from [76] as discussed in Sect. "Phase 1: DMs". The process of obtaining the subjective preferences for each attribute is justified in the following step.

  • Step 2: Structured expert judgment (SEJ)

This step discusses how experts are selected according to their degree of expertise to evaluate the relative importance of the attributes. Five sub-steps are included, starting with selecting domain experts recognised by others. Experts in this context are highly educated individuals with high experience in the e-tourism industry and professional academic knowledge (i.e., academic degrees). A pool of different experts is then created to choose the most suitable ones from a minimum of 11 experts. The next sub-step includes developing an evaluation form to gather expert data. The following step is where the attribute importance is defined using a 5-point Likert scale. Despite having other Likert scales, this one is selected because it is highly reliable and minimally biased [116]. Finally, a conversion of linguistic scale to numerical scale is applied for the opinion matrices of all decision matrices (DMs) for further analysis (Table 6).

  • Step 3: Building expert DM(EDM)

Table 6 Linguistic terms with their equivalent numerical and neutrosophic fuzzy numbers (NFNs)[117]

This step includes the creation of EDM, containing the alternatives and attributes as presented in Table 7.

Table 7 EDM

As shown in Table 7, each expert is crossed with each attribute, wherein each expert assigns an importance level to each attribute.

  • Step 4: Applying NFS membership function

This step entails applying the NFS membership function on the EDM data and then defuzzing it. The data (i.e., numerical scoring scale) are converted into NS-EDM to improve their precision and ease of use for subsequent analysis using Table. The NFNs used in this process (shown in Table 6) are based on the definition provided by [117], where \(\rho \) represent the truth-membership function, \(\sigma \) represent the indeterminacy-membership function and \(\tau \) represent the falsity-membership function. Ambiguous and vague information are frequently problematic in MADM as it is difficult to assign a precise preference to any attribute. The NFSs’ advantage is that they confront uncertain, vague, and inconsistent information to estimate the relative importance of the attributes [99]. The NFS is presented by [99, 118] and defined in Eq. (1).

$$ N = \left\{ {x,\rho_{N} \left( x \right),\sigma_{N} \left( x \right),\tau_{N} \left( x \right)|x \in X} \right\} $$
(1)

where X is a universe of discourse, and N is a simplified neutrosophic set (SNS). N in X is characterised by a truth-membership function\({\rho }_{N}(x)\), an indeterminacy-membership function \({\sigma }_{N}(x)\) and a falsity-membership function\({\tau }_{N}(x)\), where functions \({\rho }_{N}\left(x\right),{\sigma }_{N}\left(x\right) and {\tau }_{N}(x)\) are singleton subintervals/subsets in the real standard interval [0, 1], such that\({\rho }_{N}\left(x\right):X\to \left[\mathrm{0,1}\right] ,{\sigma }_{N}\left(x\right):X\to \left[\mathrm{0,1}\right] and {\tau }_{N}(x):X\to \left[\mathrm{0,1}\right]\).

Summation and aggregation operation [119]:

$$\text{Let }{\tilde{a }}_{j}=\langle {\rho }_{j},{\sigma }_{j},{\tau }_{j}\rangle (j=\mathrm{1,2},\dots ,n)\text{ be a collection of SNSs, and SNG: }{Q}_{n}\to Q\text{ if}$$
$$ {\text{SNG}}\left( {\tilde{a}_{1} ,\tilde{a}_{2} , \ldots ,\tilde{a}_{n} } \right) = \mathop \prod \limits_{j = 1}^{n} \tilde{a}_{j} = \left( {\mathop \prod \limits_{j = 1}^{n} \rho_{j} ,1 - \mathop \prod \limits_{j = 1}^{n} \left( {1 - \sigma_{j} } \right),1 - \mathop \prod \limits_{j = 1}^{n} \left( {1 - \tau_{j} } \right)} \right). $$
(2)

A simplified neutrosophic geometric average operator identifies SNG.

Division operation [118]:

For any two given SNSs A and B, the division operation of SNSs A and B is defined as follows:

$$\frac{A}{B}=\left\{\langle x,\frac{{\rho }_{A}\left(x\right)}{{\rho }_{B}\left(x\right)},\frac{{\sigma }_{A}\left(x\right)-{\sigma }_{B}\left(x\right)}{1-{\sigma }_{B}\left(x\right)},\frac{{\tau }_{A}\left(x\right)-{\tau }_{B}\left(x\right)}{1-{\tau }_{B}\left(x\right)}\rangle \mid x\in X\right\},$$
(3)

\(\text{which is valid under the conditions }B\ge A,{\rho }_{B}(x)\ne 0,{\sigma }_{B}(x)\ne 1,\text{ and }{\tau }_{B}(x)\ne 1.\)

In this stage, one of the fundamental structures of FWZIC is to divide the summed values by a constant. Therefore, the original equation of multiplications by scalar value [118] is modified to meet the fundamental process of FWZIC using the inverse of the constant, as indicated in Eq. (4).

Equation (4) shows the equation of SNS division on crisp value.

$${\text{N}}\text{ / }\lambda =\left\{\langle x,1-{\left(1-{\rho }_{N}(x)\right)}^{1/\lambda },{\sigma }_{N}^{1/\lambda }(x),{\tau }_{N}^{1/\lambda }(x)\rangle \mid x\in X\right\},\lambda >0$$
(4)

The results of Eq. (4) are aggregated using Eq. (5), where SNWG (neutrosophic weighted geometric) represents an average operator [119].

$$ {\text{Let }}\tilde{a}_{j} = u_{j} ,p_{j} ,v_{j} \left( {j = 1,2, \ldots ,n} \right){\text{ be a collection of SNNs, and SNG: }}Q_{n} \to Q{\text{ , if}} $$
$$ {\text{SNWG}}\left( {\tilde{a}_{1} ,\tilde{a}_{2} , \ldots ,\tilde{a}_{n} } \right) = \mathop \prod \limits_{j = 1}^{n} \tilde{a}_{j}^{{\omega_{j} }} = \left( {\mathop \prod \limits_{j = 1}^{n} \rho_{j}^{{\omega_{j} }} ,1 - \mathop \prod \limits_{j = 1}^{n} \left( {1 - \sigma_{j} } \right)^{{\omega_{j} }} ,1 - \mathop \prod \limits_{j = 1}^{n} \left( {1 - \tau_{j} } \right)^{{\omega_{j} }} } \right), $$
(5)

where \({\omega }_{j}\) is the weight of \({\tilde{a }}_{j}(j=\mathrm{1,2},\dots ,n),{\omega }_{j}\in [\mathrm{0,1}]\text{ and }\sum_{j=1}^{n} {\omega }_{j}=1\).

Scoring or the defuzzied (crisp) value of SNSs is defined as follows [120]:

$$ s(A) = \left( {\rho_{A} + 1 - \sigma_{A} + 1 - \tau_{A} } \right)/3. $$
(6)
  • Step 5: Computing the final weight value

  • Based on the fuzzification data of the attributes in the previous step, the final values of the weight coefficients of the attributes \({(w1,w2, ...,wn)}^{T}\) are calculated in this step as follows. The ratio of fuzzification data is computed using Eqs. (2) and (4). The preceding equations are used with SNS and SNWG [118, 119], where Eq. (7) symbolises the process, as shown in Table 8.

    $$ \tilde{E}_{ij} :\tilde{C}_{ij} = { }\frac{{Imp\left( {\tilde{E}_{ij} /\tilde{C}_{ij} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{n} Imp\left( {\tilde{E}_{ij} /\tilde{C}_{ij} } \right)}}\;,for\;i = 1,2,3,..m\;and\;j = 1,2,3,..n. $$
    (7)

where \(Imp(\tilde{E}_{ij} /\tilde{C}_{ij} )\) represents the fuzzy number of \(Imp(E_{ij} /C_{ij} )\) [121].

Table 8 NS-EDM
  1. (b)

    The mean values are calculated to obtain the final fuzzy values of the weight coefficients of the 12 attributes\(({\tilde{w }}_{1},{\tilde{w }}_{2},\hspace{0.33em}...,{\tilde{w }}_{n}{)}^{T}\). Using Eq. (5), each value of the NS-EDM is then computed. Equation 8 depicts the actual procedure of this phase symbolically [121].

    $${\tilde{w }}_{j}= \sum\limits_{i=1}^{m}{\tilde{E }}_{ij}:{\tilde{C }}_{ij}/m\hspace{0.33em},for\hspace{0.33em}i=\mathrm{1,2},3,..m\hspace{0.33em}and\hspace{0.33em}j=\mathrm{1,2},3,..n.$$
    (8)
  2. (c)

    The final weight is obtained after the defuzzification using Eq. (6). Each attribute’s weight importance should be assigned whilst considering all the weight sum to rescale and apply it.

Formulation of NS-FDOSM

NS-FDOSM is the extended version of the original FDOSM [95] after integrating with the NFS. The stages are presented to discuss the data transformation, followed by data processing.

  • Stage One: Data transformation unit

According to [95], the following steps are required to convert a DM into an opinion matrix:

  • Step 1:

The selection of the ideal solution is initiated for each attribute used in the applications’ DMs. The experts in charge of selecting the ideal solutions are three out of the eleven, as mentioned at the beginning of attribute weighting (Step 2 of NS-FWZIC). Accordingly, the following equation is used in determining the ideal solution (9):

$$ \begin{aligned} A^{*} & = \left\{ {\left[ {\left( {\mathop {\max }\limits_{i} v_{{ij}} |j \in J} \right),\left( {\mathop {\min }\limits_{i} v_{{ij}} |j \in J} \right),} \right.} \right. \\ & \quad \left. {\left. {({\text{Op}}_{{ij}} \in I.J)\left| {i = 1,2,3, \ldots ,m} \right.} \right]} \right\}, \\ \end{aligned} $$
(9)

where max represents the optimal/ideal value for benefit attributes (i.e., CA, RS, SM, …. PP), min represents the optimal/ideal value for cost attributes and \({Op}_{ij}\) represents the optimal/ideal value for critical attributes (the value lies between max and min). The DMs determine the critical value in each case. No cost attributes and critical attributes are identified amongst the identified attributes.

  • Step 2:

This step starts by selecting the optimal/ideal solution values of all the attributes utilised for benchmarking the application. The ideal value of a certain attribute is then compared with the remaining values of each application under that attribute, as shown in Eq. (10). The 5-point Likert scale is employed for the comparison.

$$O{p}_{Lang}=\left\{\left(\left(\underset{\mathit{ij}}{\tilde{v }}{\otimes v}_{ij}|j\in J\right).|i=1.2.3.\dots .m\right)\right\}, $$
(10)

where \(\otimes \) refers to the reference difference between the ideal/optimal solution and the value of each alternative under the same attribute.

The final output of this step is an opinion matrix (linguistic terms matrix) obtained from each expert that is ready to be turned into an NS-fuzzy opinion matrix (Stage Two), as specified in Eq. (11).

$$O{p}_{Lang}= \begin{array}{c}{A}_{1}\\ \vdots \\ {A}_{m}\end{array}\left[\begin{array}{ccc}{op}_{11}& \cdots & {op}_{1n}\\ \vdots & \ddots & \vdots \\ {op}_{m1}& \cdots & {op}_{mn}\end{array}\right]. $$
(11)
  • Stage Two: Data-processing unit

This stage begins by transforming the opinion matrix to a fuzzy opinion DM using NFNs. The NFNs shown in Table 9 are used to replace the linguistic terms obtained from the decision makers based on the description given by [117] (i.e., \(\rho \) represent the truth-membership function, \(\sigma \) represent the indeterminacy-membership function and \(\tau \) represent the falsity-membership function).The result of this stage is the NS-fuzzy opinion matrix.

Table 9 Linguistic terms and their equivalent NFNs [117]

In NS-FDOSM, two different contexts can be used to benchmark the applications: individual and group decision making (GDM).

Individual decision-making context using NS-FDOSM

The attributes weights of each application obtained using NS-FWZIC are fed into NS-FDOSM, and then they are used to comprehensively benchmark each application. The results of the fuzzy opinion matrices are aggregated using Eq. (5). Upon aggregating the values into fuzzy, their transformation into the crisp value is done using Eq. (6) for scoring the applications.

Group decision-making context using NS-FDOSM

Given variations in the applications’ benchmarking amongst the opinion matrices obtained from the three experts, aggregated decisions are necessary to unify the benchmarking results. Thus, group decision making is utilised in the setting of NS-FDOSM to unify all the variations in benchmarking results of the applications. Furthermore, the AM operator is used to get the final group decision-making score, as expressed in Eq. (12). The highest score value is the best application.

$$GS\left[i\right]\leftarrow \frac{1}{n}{\sum }_{1}^{n}{S}^{x}\left[i\right]$$
(12)

Results and discussion

The weighting findings of the 12 attributes obtained by the NS-FWZIC method outlined in Sect. "Formulation of NS-FWZIC" are first presented. The NS-FWZIC method is implemented in five steps. (1) Adopting the attributes (Sect. "Formulation of NS-FWZIC") is the first step. (2) Data from the 11 domain experts who determined the relative importance of each attribute using the provided evaluation form are collected. Based on the experts’ preferences, each attribute’s importance level is converted from a linguistic term to a numerical scale using Table 6. (3) EDM is constructed considering the extracted preferences of experts. (4) The NFS membership function is utilised for transforming crisp values to equivalent fuzzy numbers. The fuzzification result of the EDM is calculated. (5) The ratio values of those attributes (Step 5a) are computed on the basis of Eqs. (2) and (4). The mean of experts’ preferences (Step 5b) is calculated for each attribute to determine the fuzzy weights using Eq. (5). As mentioned in Step 5c, defuzzification is used to establish the final weight of the attributes using Eq. (6). Table 10 presents the final weights for all attributes in order from the highest weight to the lowest one.

Table 10 Attribute weight results

The weighting results of the 12 attributes based on the extended NS-FWZIC are shown in Table 11. The greatest importance weight (0.402) is that of RT, followed by BD with an importance weight of (0.198) and IoT with an importance weight of (0.097). AR received the lowest importance weight (0.005), followed by UM with an importance weight of (0.007) and CH with an importance weight of (0.008). The weights of the remaining attributes are distributed in between. The weight results of the proposed NS-FWZIC demonstrate that RT has a significant effect on the applications with respect to the other attributes. These results are fed to the NS-FDOSM to benchmark the applications.

Table 11 Individual decision-making context

The NS-FDOSM is operated on all the DMs (Tables 1, 2, 3, 4, 5), taking into account the extracted weights of the 12 attributes to discuss the benchmarking results amongst different applications. In this context, three expert opinions are converted into opinion matrices using 5-point Likert scale. In this stage, the DMs must select their ideal solution, and the decision is based on the attributes. Each ideal solution value is compared with all the remaining values per attribute. This process takes place between the ideal and all the other values in that designated attribute. This process is carried out for all the remaining attributes. The linguistic terms within opinion matrices obtained from the DMs are then transformed into a fuzzy opinion matrix for each smart e-tourism category using NFSs. Based on the resulted fuzzy opinion matrices, the individual decision-making context is achieved, as presented in Table 11.

According to Table 11, the benchmarking results of the applications are presented, which explain the importance of the DMs’ opinion in each attribute from their respected point of view. The best application has the highest score, whereas the lowest score value refers to the worst applications. For the first category of ‘Tourism Marketing’, three benchmarking results from 3 DMs are presented for eight applications. For the best application, the results indicate that A#2 is assigned the best across all the three DMs, followed by the second-best application assigned to A#7 across all the DMs.

A consistent ranking is also observed across all the three DMs for other applications (A#3 and A#1) with the third and eighth rankings, respectively. The remaining applications (A#4, A#5, A#6 and A#8) presented slightly different variations across all the DMs. A total of (n = 4/8) applications maintain their ranking across all the DMs with 50% ranking similarities. The remaining 50% is assigned for the (n = 4/8) for the remaining applications, where ranking results present slight variations across the DMs. For the first TRS category, ‘collaborative filtering’, the same three DMs provided their ranking results for (n = 11) applications. The best application for the first and second DM is A#8, but the third DM best is A#4. The second-best application is completely different across all the DMs, where it is A#9 for the first DM, followed by A#4 for the second DM and A#10 for the third DM. In this category, consistent ranking of applications and more different ranking results across all the DMs are observed. For the consistent ranking application results, A#2, A#3, A#5, A#6 and A#11 have similar ranking across the DMs, that is, eleventh, tenth, ninth, seventh and eighth, respectively. The remaining applications, including A#1, A#4, A#7, A#8, A#9 and A#10 present ranking differences across all the DMs. For the statistics of this application category, applications (n = 5/11) present similar ranking with 45.4545%, and the different ranking applications (n = 6/11) are presented in this category with 54.5455%. In conclusion, more differences are observed than the consistent applications.

For the second TRS category, ‘content’, applications (n = 18) are differently ranked across three DMs. The first four ranked applications are consistent across all the DMs (A#1, A#2 and A#3). All the remaining applications present different rankings across all the DMs. More various rankings are observed across all the DMs. The statistic of this category indicates that a total of (n = 3/18) is similarly ranked across the DMs with 16.66667%. The remaining applications (n = 15/18) present different ranking results across all the DMs with 83.3333%.

For the third TRS category, ‘context’, applications (n = 10) are ranked across three DMs. Only one application (i.e., A#34) has similar first ranking amongst the DMs, representing 10% of the total similarity percentages. Ninety percent of applications in this category have different ranking results across all the DMs.

In the fourth and last TRS category, ‘hybrid models’, applications (n = 18) are presented. Across all the DMs, A#43 is first ranked. Other applications obtain similar and consistent ranking results across all DMs, including A#47, A#49, A#52, A#55 and A#57. These applications (n = 6/18) represent 33.3333% of the other applications in this category. Other applications present different raking results across the three DMs. The results of the benchmarked applications for all the DMs are presented in Fig. 2.

Fig. 2
figure 2

Variation of individual benchmarking results

In conclusion, some DMs present consistent rankings across the applications and the same concept applied for different rankings, which are observed across all the DMs for all the categories. According to the ranking results, no unique ranking result is noted based on the opinions provided by the three experts. Given this variance, GDM, considering all the experts’ opinions is essential to solve variations found, provides a final and unique ranking for the applications per each category. The result of NS-FDOSM in a GDM context is reported in Table 12.

Table 12 Benchmarking results based on GDM context

From the point of view followed in the individual context of NS-FDOSM, the highest score concept of the benchmarked applications is applied in the GDM context. For the first category, ‘tourism marketing’, A#2 is ranked first with a score of 0.853520721. The last ranked application is attributed to A#1 with 0.508075283. For the second TRS category, ‘collaborative filtering’, the best ranked application is A#4 with a score of 0.499845376. The result of the following TRS category, ‘content’, shows that the best application is A#17 with 0.939126585. A#21 is the worst and ranked the last with 0.203630742. The result of the fourth TRS category, ‘context’, illustrates that the first rank is attributed to A#34 with a score of 0.813772248. A#31–A#33 rank last with a consistent score of 0.214314924. In the last TRS category, ‘hybrid models’, A#43 is the best application, whereas A#50 is the worst and last one. After uniforming the variations found in the individual benchmarking context using the GDM context of NS-FDOSM, checking the robustness of the result is warranted, leading to the following section of result evaluation.

Evaluation

This section describes the proposed benchmarking framework’s evaluation (NS-FWZIC integrated with NS-FDOSM) using three assessment processes: systematic ranking, sensitivity analysis and comparison analysis.

Systematic ranking

This subsection discusses the systematic ranking evaluation for applications benchmarked per each category for applications benchmarked per each category that have been done by many researchers [122,123,124,125]. Several procedures are performed as follows: (1) The opinion matrices for each smart e-tourism category are aggregated to create a unified opinion matrix. (2) Within each unified opinion matrix, the applications sorted/ordered according to the GDM ranking results of NS-FDOSM. (3) The sorted applications are separated into three groups. (4) The mean values (\(\overline{x }\)) for all the groups are calculated using Eq. (13).

$$\mathrm{Mean}=\frac{1}{n}\sum_{i=1}^{n}{x}_{i}.$$
(13)

The following tips should be noted [126, 127]:

  • The groups’ number and the applications’ number within each group have no impact on the final results.

  • Each group contains a different number of applications depending on the total number of included applications.

  • The last group should either contain the same or a larger number of applications than its previous group. The same concept must follow reaching the first group.

The systematic ranking results of the applications are shown in Table 13.

Table 13 Examining results

The evaluation results are achieved by calculating the mean for each group in each category. The comparison between the mean values across all the groups indicated that a group with minimal mean value is the best compared with the others. The reason is that the decision makers/experts assign the least linguistic terms score to the ideal solution of each attribute owing to the nature of NS-FDOSM method. From this point of view, the results indicate that the first group is the best across all the categories, including tourism marketing and the four smart based TRS sub-categories. The results of the systematic ranking evaluation show that the benchmarked applications within each category are corrected and subjected to a systematic benchmarking.

Sensitivity analysis

Many studies in the literature such as [76, 128, 129] have used this assessment approach to measure the sensitivity of the attributes’ weights and analyse its change. Sensitivity analysis estimates the impact of the most significant attribute in terms of its weight on the benchmarking results of the applications per each category. It requires different scenarios for the weight changes. The new weight of the attributes generated using Eq. 14 [130] and nine scenarios are deemed suitable for the analysis. Based on the elasticity coefficient (\({\alpha }_{c}\)), the relative change of each attribute over the most essential one is computed, and the upper and lower limits for adjusting weight coefficients of the most important attribute are established. Then, the ranks generated by varying the weights of the attributes in the produced scenarios are compared with the NS-FWZIC rank.

\(w_{c} = \left( {1 - w_{s} } \right) \times \left( {w_{c}^{o} /W_{c}^{0} } \right) = w_{c}^{o} - {\Delta }x\alpha_{c} ,\) (14)where \({w}_{s}\) refers to the most significant concept, \({w}_{c}^{o}\) refers to weight values computed using the NS-FWZIC method and \({W}_{c}^{0}\) refers to the aggregate value of the weights generated by NS-FWZIC. \(\Delta x\) refers to the range of the adjustments implemented on the weight values of the 12 smart key concepts, which are the upper and lower limits of the most significant attributes. RT (w = 0.402) is the most significant attribute (i.e., Table 10) amongst 12 attributes. The results of calculating the \({\alpha }_{c}\) for all the attributes are shown in Table 14. The limit values of the RT attribute are − \(0.402\le\Delta x\le 0.598\).

Table 14 \({\alpha }_{c}\) used for the changing of attributes’ weights

The given limit values are divided into nine scenarios, which result in the generation of new weight values, as shown in Table 15.

Table 15 New weight values for each attribute of nine scenarios

These extracted weights are utilised to measure their sensitivity and their effect on the benchmarking of smart e-tourism applications across all categories. In few scenarios, some of the benchmarked applications are changed owing to the attributes change. However, given that weight sensitivity changes in each category, further logic and discussion are required. For the sensitivity analysis results, five figures are provided, one for each smart e-tourism category. For the tourism marketing category (Fig. 3), A#2 maintains its first rank in (6/9) scenarios with 66.6667% and presents slight changes in the remaining scenarios. The second rank is attributed to A#7, which maintains its ranking order in (5/9) scenarios with 55.5556%. The third rank is for A#3, which only maintains its rank in (2/9) scenarios with 22.2222% similarities across scenarios. A#5 and A#6 maintain the fourth and fifth ranks, respectively, across all the scenarios with (4/9) scenarios and 44.4444 similarities with the original rank result. A#8 is ranked sixth, which maintains, similar to the previous two applications, (4/9) scenarios with 44.4444% similarity. A#4 is ranked seventh, which maintains (8/9) scenarios with 88.8889% similarity. The last application, A#1, is the only application with 100% similarity because it maintains its eighth rank across all the scenarios.

Fig. 3
figure 3

Sensitivity analysis of marketing category

The most notable implications in this category rendered A#1 as the one which maintains its ranking across all the scenarios. A potential reason for such consistency is that A#1 achieves one attribute (UM) across smart key concept attributes, and this particular attribute originally weights (0.00721), which does not significantly change across all the nine scenarios. Another important implication is attributed to A#3, which fluctuates between ranks to being the only application with the least similarities compared with its original rank. A potential reason for this occurrence is that it originally achieves 6/12 attributes (i.e., CA, RS, SM, UE, UM and BD) across all the smart key concept attributes. Furthermore, given their weight changes across the different scenarios, they can influence these rank changes.

For the TRS/collaborative category (Fig. 4), the most notable implication can be attributed to A#3 and A#2. These two applications maintain the 10th and 11th order, respectively, across all the nine scenarios with 100% consistency with their original ranking despite the nine scenarios’ weight changes. The reason is that A#2 only obtains one attribute (RS) across others with an original weight of 0.08791. Therefore, weight changes across the scenarios cannot significantly change the application rank. The same explanation goes for A#3, but it obtains two attributes (RS and UE), respectively, weighted as 0.08791 and 0.07053. The last implication in this category is attributed to A#1 and A#9. Both maintain the least number of scenarios where their rank has 44.4444% consistency. A#1 and A#9 are ranked fifth and sixth, respectively, maintaining their order in four scenarios (S4, S5, S6 and S7).

Fig. 4
figure 4

Sensitivity analysis of TRS/collaborative category

For the TRS/content category (Fig. 5), notable stability is found in ranking the application across all scenarios. This group’s implication can be concluded from the fact that the first ranked A#17 maintains its ranking with 100% consistency across all the nine scenarios despite all the weight changes across all attributes. This application has obtained (9/12) attributes (except UM, TC and PP); thus, fluctuating its order is difficult even though weight changes across the scenarios.

Fig. 5
figure 5

Sensitivity analysis of TRS/content category

For the TRS/context category (Fig. 6), A#34 is originally ranked first, which maintains 100% consistency across all the nine scenarios. It maintains full rank consistently across all scenarios, and other applications perform the same, including A#30, A#31, A#32, A#33, A#37, A#38 and A#39, across all the nine scenarios. A#35 and A#36 also maintain their ranking but in a slightly less number of scenarios (8/9), with 88.8889% consistency. The implication of the scenarios and ranking in this category rendered it to be the category where most application rankings do not change across the scenarios except for only two cases. Another important implication in this category is linked to the applications that maintain their ranking across all the scenarios, where the weight changes do not affect their ranking. These applications either score squally in all the 12 attributes similar to the case of A#31, A#32 and A#33, or they do not significantly vary from one another like the rest of the applications which maintain their ranking. The most notable aspect is that none of the applications which maintain their rank score the most significant attribute (RT), and none of them score in the worst attribute (AR). The only exception is A#34, which only scores in the most significant attribute (RT), which does not affect its ranking across all the scenarios.

Fig. 6
figure 6

Sensitivity analysis of TRS/context category

For the last category, TRS/Hybrid (Fig. 7), only A#50 maintains its last rank (18th) across all the nine scenarios. The other applications have slight changes because of the changing weights across all scenarios.

Fig. 7
figure 7

Sensitivity analysis of TRS/hybrid category

Therefore, the weight changes of the 12 attributes across the 9 scenarios can notably affect the benchmarked applications per category. Such a case can shed light on the fact that weight importance posts a big issue for benchmarking smart e-tourism applications. The discussed scenarios of attributes changing prove that importance weights should be assigned for the evaluation attributes used in the benchmarking task. Furthermore, these attributes’ weight changes can impact the ranking of the application and managerial decisions which are taken on the basis of these findings. Tourism managers can use these findings to understand preferences associated with smart e-tourism applications [81] and determine influential factors and their significant effects. Another managerial implication can be assigned to attributes’ weights changes, suggesting that smart e-tourism applications can allocate their resources more efficiently and maximise tourist satisfaction [131]. Lastly, correlational analysis is used, in particular Spearman correlation coefficient (SCC), which evaluates the relationship between different scenarios results [130], as presented in tourism marketing (Fig. 8), TRS/collaborative (Fig. 9), TRS/content (Fig. 10), TRS/context (Fig. 11) and TRS/hybrid (Fig. 12).

Fig. 8
figure 8

Rank correlation amongst the nine scenarios of tourism marketing

Fig. 9
figure 9

Rank correlation amongst the nine scenarios of TRS/Collaborative

Fig. 10
figure 10

Rank correlation amongst the nine scenarios of TRS/Content

Fig. 11
figure 11

Rank correlation amongst the nine scenarios of TRS/Context

Fig. 12
figure 12

Rank correlation amongst the nine scenarios of TRS/Hybrid

The rank correlation results over all categories of smart e-tourism show a positive high correlation across all the nine scenarios. For instance, in the tourism marketing category, (2/9) scenarios (S4 and S5) present the highest level SCC with values of 1, which are followed by the second two highest correlation scenarios S2 and S3, with values of 0.928358 and 0.976119, respectively. All remaining scenarios, except S1, present a high correlation with values more than (0.8) as opposed to S1, which is the only one with (0.790302) correlation. The mean correlation value for all the nine scenarios of each category is 0.8982, 0.9090, 0.9544, 0.9927 and 0.9645. Each value represents a high correlation of ranks per category, and these results favour the systematic ranking results presented previously in Table.

Comparison analysis

This section presents and discusses comparisons with a prior relevant study to emphasise the present study’s main contributions. The comparisons are focused on two main aspects: application and theoretical aspects. In the former, the present study is compared with the benchmark study [76] in terms of evaluation and benchmarking issues. Both studies have taken into account the multiple attributes for evaluation and the variance in data between categories and their concepts and determined the relative importance of the attribute. The proposed method is compared with the same benchmark study from two theoretical perspectives: weighting and ranking. From the weighting perspective, the importance of the 12 attributes are critical in final decision-making. Krishnan et al. [76] is compared between IT2TR-FWZIC, ANP, AHP and BWM, revealing the superiority of the IT2TR-FWZIC in weighting the 12 attributes. However, there are many other fuzzy set methods in the literature (e.g., NFSs, Pythagorean fuzzy set, t-spherical fuzzy set and q-rung orthopair fuzzy sets) to handle uncertainty issue as a consequence of experts’ feedback subjectivity. Although the uncertainty issue has been improved by these methods, it remains an open issue. NFSs are applauded for their capability to handle uncertain, incomplete and inconsistent information [101]. Therefore, FWZIC method is extended under neutrosophic fuzzy environment to consider decision maker’s uncertainty. Certainly, NS-FWZIC and IT2TR-FWZIC can successfully cope with information inconsistency in addition to uncertainty. These points are summarised in Table 16.

Table 16 Comparison analysis between the proposed method and benchmark study [76]

Moreover, the NS-FDOSM is compared with a VIKOR [76] in terms of ranking perspective. This comparison is conducted to highlight the NS-FDOSM method’s robustness and superiority to the VIKOR method. To begin, VIKOR is a very common MADM ranking method utilised in a wide variety of applications. Although it overcomes the issue of distance measurement, it relies on external methods to compute the weights of the attribute. Given VIKOR’s inability to cope with ambiguous and vague information, a new method (NS-FDOSM) is developed to address these issues. NS-FDOSM not only deals with distance measurement issue but also outperforms VIKOR in terms of weighting attributes’ values implicitly, ambiguous and vagueness information, incomplete information, unquantifiable attributes and normalisation. As illustrated in Table 16, NS-FDOSM is more robust than VIKOR for applications benchmarking owing to all of these comparison points. NS-FDOSM handles all of the aforementioned issues (n = 6/6), whereas VIKOR solves only the distance measurement issue (n = 1/6).

Conclusion

The novelty and contribution of this study can be elaborated as the development of a new homogeneous MADM framework based on NFSs to benchmark the applications of smart e-tourism and tackle the uncertainty problem thoroughly. In the proposed framework, NS-FWZIC is used for attributes’ weighting, and NS-FDOSM is used for complete benchmarking. Two major phases are presented (Fig. 1). Phase one adopts the smart e-tourism DMs. Phase two for development illustrates the sequential steps of the proposed MADM approach. In the framework evaluation, three processes are used and presented for the assessment of the results, namely, systematic ranking sensitivity analysis and comparison analysis. This study has certain limitations, which can be attributed to the fact that the adopted applications in each DM are only assessed by [76] based on binary assumption-based assessment (i.e., ✓ sign and × sign). When the latter arises, variation is reduced in benchmarking. However, this drawback can be avoided by revaluating the attributes to multi-assumption-based assessment reconsidering 5-, 7- or 10-point Likert scale to identify the variation in the applications’ data. For instance, A#3 (Table 2) has RS which represents the availability of the recommender system in this application. The present/availability can be revaluated based on how good recommender system in this application based on the Likert scale (e.g., very bad, bad, not that good, good, very good). In addition, the importance measurement, which reflects each DM’s preferences, is not considered in the present study. Assigning a certain level of importance to the DMs based on their expertise can circumvent this constraint. Another shortcoming is that only one aggregation operator is utilised in NS-FWZIC, and only one defuzzification method is adopted in NS-FWZIC and NS-FDOSM. Different defuzzification techniques, including bisector of area, centre of gravity and extended centre of area, may be used for weighting the attributes in FWZIC and/or the ranking alternatives in FDOSM. In future studies, other aggregation operators can be explored with FDOSM to benchmark alternatives. The uncertainty issue is still considered an open issue. Various fuzzy set methods, including spherical, T-spherical, interval type-2 hesitant, Pythagorean, q-Rung Orthopair and M-Polar, can be implemented with FWZIC and/or FDOSM to investigate the suitability of these types in solving the uncertainty issue.