1 Introduction

The tourism industry offers distinct benefits for a country such as economic growth, revenue generation, create jobs, and provide benefits to the local communities [1, 2]. Social tourism [3] is the fastest growing field in which building of positive relationship between the local host and tourist is most important factor. Social tourism is kind of working holiday in which a tourist can visit a place and do work based on his interest for the local host. On the other hand, local host providing a best environment for visit the place, food, and lodging. The major aim of social tourism is to encourage the tourist for visit the place in respect to utilized their skillset and obtained the maximum benefits to the visitors. The advantages of social tourism include supports for local projects, building relationships, helping children, donations, and support to the senior citizens. It also includes support for accessible travelling for all communities even if they cannot afford the visit [4]. The tourism sector has established itself as one of the most lucrative and economically stimulating industries for any nation. Numerous tourism-related sub-fields call for greater accuracy and precision; if they are attained, the industry could experience explosive growth. Machine learning [5] in the tourism sector reduces confusion and produces a more practical method of offering the sector's more advanced capabilities. Machine learning is the process by which computers pick up knowledge from many forms of data to produce accurate predictions in a variety of disciplines, such as demand forecasting, enhance accuracy, etc. A tourist recommendation system is thought to be a useful tool for fostering relationships and communication between travellers [6, 7]. The recommender system compares the information provided by the tourist destinations or the information/reviews provided by the visitors, use specific algorithms, do calculations, and generate a list of suggested attractions for the visitor.

1.1 Novelty and contributions of this study

  • An intelligent tourism recommendation system is proposed in the given paper with an aim to foster sustainable tourism industry.

  • The dataset taken into consideration is cleaned using a complimentary filter, which is a computationally inexpensive sensor fusion technique that consists of a low-pass and a high-pass filter. These filters provide reliable tourism recommendations by reducing redundant data.

  • Then, the cleaned data is subjected to fuzzy c-means clustering (FCM) for feature extraction. FCM is chosen because this algorithm can be used to identify tourists who have mixed preferences or interests based on their similarity to each cluster.

  • Ensemble machine learning classifiers are used for classification, namely decision trees (DT) and XGB. The former is chosen because it provides tourists with a wide range of options and investigates its possible outcomes, whereas the latter is highly scalable and produces the minimum loss for the dataset.

1.2 Organization

The remaining sections of the paper includes: In Sect. 2, the literature review in the field of a sophisticated recommender system for social tourism is analyzed. We have outlined a technique with the system's process in Sect. 3. The results and their explanation in terms of the evaluation criteria are covered in Sect. 4. The paper is then concluded with future scope in Sect. 5 followed by references.

2 Literature review

This section is focused on the works conducted by several researchers in the area of tourism recommendation system as shown in Table 1.

Table 1 A review of existing studies

3 Proposed system

Following components are included in the proposed tourism recommendation system as shown in Figure 1.

Fig. 1
figure 1

Workflow of the proposed system

3.1 Pre-processing of data

It involves the removal of errors, inconsistencies, and outliers that can adversely affect the system’s performance. It also involves normalizing data values for further queries and analysis. To deal with oversampling and under sampling, we have considered the synthetic minority oversampling (SMOTE) technique [16] for balancing the dataset values.

3.2 Data cleaning using complimentary filter

The complementary filter is a technique commonly used in sensor fusion and signal processing to combine the outputs of multiple sensors with different characteristics [17]. Firstly, determine the sensors type or data sources using complementary information. Then evaluate the weight factor of each data sources or sensors based on the accuracy. It is followed by applying the complementary filter method on the dataset using a formula (Equation 1):

$${\text{Filter}}\;{\text{method}} = \sum\limits_{i = 0}^{i = n} {{\text{Sensor}}\_{\text{data}}_{{\text{i}}} } *{\text{Weighted}}\_{\text{Sensor}}_{{\text{i}}}$$
(1)

3.3 Extraction of features using fuzzy c-means clustering (FCM)

This algorithm assigns data points to multiple clusters with varying degrees of membership based on their similarity to each cluster. In the context of tourism, it can be used to identify tourists who have mixed preferences or interests by computing clusters at each data points. Figure 2 shows the flowchart of working of FCM.

Fig. 2
figure 2

Working of FCM

We have considered a dataset D = {D1, D2, ………Dq} with a set of clusters C= {C1, C2, …….., Cp} and some set of membership values M = {1<P<m, 1<Q<n} that is required to be formulated in a manner such that train values can combine neural network with FCM [16]. Equation 2 shows the efficient auto-encoder values by minimizing the training set:

$$\sum_{P=1}^{P=m}\sum_{Q=1}^{Q=n}M{\Vert {D}_{Q}-{C}_{P}\Vert }^{2} \sum_{Q=1}^{n}{M}_{PQ}=1,{M}_{PQ}>0$$
(2)

It is used to enhance the performance of the system that can be represented as in Equation 3 such as:

$${F}_{0}\left(M, C\right)= \sum_{Q=1}^{n}{\partial }_{i}\sum_{P=1}^{m}(1-{\alpha }_{PQ}^{0})+ \sum_{Q=1}^{n}\sum_{P=1}^{m}M{\Vert {D}_{Q}-{C}_{P}\Vert }^{2}$$
(3)

Evaluate the cluster center and update the membership matrix accordingly with an Equation 4:

$${C}_{p}=\sum_{P=1}^{m}{M}_{PQ}{D}_{Q} / \sum_{Q=1}^{n}{M}_{PQ}$$
(4)

While the membership matrix can be defined with Equation 5 such as:

$$M_{PQ} = \left( {1 + \left( {{{m_{PQ} } \mathord{\left/ {\vphantom {{m_{PQ} } {\partial_{m} }}} \right. \kern-0pt} {\partial_{m} }}} \right)^{ - 1} } \right)^{ - 1}$$
(5)

3.4 Classification using DT and XGB

The given system is trained and tested by using ensemble machine learning classifiers, namely decision trees (DT) and XGB. The former is chosen because it provides tourists with a wide range of options and investigates its possible outcomes, whereas the latter is highly scalable and produces the minimum loss for the dataset. The confusion matrices related to both the classifiers are shown in next section.

4 Results and discussions

The dataset is collected from the UNWTO [18], which provides a questionnaire series based on the specific guidelines of the UN. The dataset contains information about the tourist's travel plans, including their accommodation and expenditure details during travel.

The model is implemented using python that includes distinct libraries such as sklearn, keras, numpy, pandas, etc. To do so, we have applied several fundamental pre-processing steps with the dataset such as extraction of useful features, filling of the missing values, removal of null records, and perform exploratory analysis of the data based on distinct features. The correlation among these parameters is depicted by correlation heatmap [19].

Figure 3 shows the heatmap that indicates the correlation among the parameters. Figure 4 and 5 shows confusion matrix for testing and training dataset showing number of actual and predicted travel plans purchased using DT and XGB respectively.

Fig. 3
figure 3

Heatmap Correlation of Distinct parameters

Fig. 4
figure 4

Confusion matrix with decision tree (DT) [20]

Fig. 5
figure 5

Confusion matrix with extreme gradient boosting (XGB) [20]

Figure 6 shows representation of dataset attributes when trained and tested using DT classifier.

Fig. 6
figure 6

Rules generation using DT

4.1 Comparative analysis

The given section presents the comparative analysis showing how the proposed system outperforms existing recent studies [8,9,10,11,12,13,14,15] in terms of accuracy, precision, recall, and f1 score. Table 2 displays the comparison between the proposed system and previously published studies.

Table 2 Comparative analysis of the proposed work with existing recent studies

The comparison study indicates that the proposed system (FCM + DT + XGB) outperforms other techniques used in the current state of the art in terms of accuracy, precision; recall and f1 score [8,9,10,11,12,13,14,15]. The graphical comparison is shown in Fig. 7. Of all the techniques, the proposed system provides the highest accuracy (87.45%), highest recall (86.55%), highest precision (85.37%), and highest f1 score (85.12%).

Fig. 7
figure 7

Graphical comparison

5 Conclusion and future scope

The development of an intelligent information recommender system (IIRS) for the next generation of sustainable social tourism holds immense potential for advancing the industry's goals of sustainability, responsible travel, and community engagement. It ensures that tourists have access to meaningful travel experiences while contributing positively to local communities and the environment. The proposed intelligent information recommender system (IIRS) combines a number of different components to handle a number of tourism-related issues, such as impact assessment, community involvement, travel planning, and destination selection with the purpose of promoting sustainable social tourism. Fuzzy C-means clustering (FCM) is used in the proposed system for extraction of features. After extraction of features, the model is trained and tested using ensemble machine learning classifiers such as decision tree (DT) and extreme gradient boosting (XGB). Followed by this, the performance of the system is validated based on evaluation metrics such as accuracy, precision, recall and f1 score. The results show that the best performance is achieved by our proposed work as compared to existing recent works.

As a future scope, the distinct dataset parameters can be integrated with designing of an IoT smart tourism system for ensuring personalized recommendations to travellers related to latest tour packages at affordable prices. The IoT system would make use of sensors for collecting relevant information related to tourism and process that information by applying sentiment analysis to produce personalized recommendations.