1 Introduction

Dining out is one of the biggest expenditures for travelers around the world and is essential for tourism-dependent destinations [1]. Therefore, restaurants must determine what elements affect elevate spending and positive behavioral intentions to improve sales and word of mouth [2]. In the tourism and hospitality industry, travelers mainly trust the online setting to assess various products and services. Particularly, electronic Word-Of-Mouth (eWOM) has been considered as one of the essential references of data for customers [3]. Perhaps eWOM is necessary at the pre-trip phase when travelers determine their destination and most of the products they want to buy in their trip [3].

By using user-generated content, including the ability to share photographs, videos and comments with other customers, the researchers also found that social media can help the hospitality industry attract potential visitors, increase their income and thus lead to more success [4]. Today, user-generated content is considered by researchers and professionals in many industries as the most common and efficient type of marketing strategies tool [5]. Online reviews, published on tourism websites, such as TripAdvisor, Yelp and Expedia, have obtained significant attention from various theoretical or practical aspects [6].

In this way, data analytics methods are vastly used to collect, summarize and interpret user-generated content to provide meaningful patterns and insights about management issues [7, 8]. Online review research has several advantages, such as data availability, clarity and simplicity of the data collection process. The mentioned data are often text-based and usually include huge data resources, or big data, that exceed the processing capacity of the conventional data analytics approaches [9].

Big data are fundamentally changing the management of the hospitality sector and the relationship between the customer and business, by simplifying the decision-making process based on large amounts of data. The hospitality sector has converted to an information-intensive industry where large volumes of data are stored with practical applications that are not widely wide-spread. By advent of big data, it is feasible to manipulate these data to obtain the determined goals and to convert the data to knowledge [3, 10, 11].

Market segmentation gives industries the potential to classify similar customers and categorize their preferred target markets, to assure the operative management of marketing expenses. From the hospitality and tourism perspective, it is essential to determine the correct behaviors and select various customer groups to create marketing strategies and shape customer interactions with the industry. Hotel managers should determine customers satisfaction and demands to enhance their marketing strategy and decision-making process [12, 13]. In other words, segmentation utilizing user-generated content is necessary if managers want to get a more idea of the unique preferences of customers and their level of satisfaction [14]. Interaction among big data and social media can reorganize the way that hospitality and tourism sectors interact with their customers [15]. Based on the literature, social media data are one of the most effective methods to shape consumer perspectives toward a product or service. Therefore, in this study, a hybrid machine learning approach utilizing neural networks and an optimization learning method for customer segmentation in Saudi Arabia restaurants was developed. Although clustering and neural networks are widely used in many prediction problems, the combination of these approaches in online reviews analysis is presented for the first time in this paper. In fact, the use of these approaches with the aid of optimization machine learning can still be novel for customer segmentation in Saudi Arabia restaurants.

This paper concentrates on the significance of online customers' reviews in revealing customers' satisfaction in restaurants. The current work mainly demonstrates the use of online review data for customer segmentation in the restaurant industry and compares its benefit contrasted with traditional data analysis approaches. We used clustering and neural network to develop the hybrid method. The performance of the neural network technique was improved using the optimization machine learning approach. The proposed method in this research highlights the role of customer segmentation in hospitality competitiveness. We investigate the preferences of restaurant customers based on food quality and service quality, and these results lead to recognizing customer clusters. Furthermore, the finding of this study will help managers to classify their customers into different clusters to deliver personalized services, intensifying customer gaining and inspiring excellent resource management in the hospitality industry. The current research develops novel market segmentation techniques utilizing big social media and online review platforms such as TripAdvisor. This paper also determines how managers can efficiently segment their customers through online customers' reviews. The provided hybrid machine learning technique can intensify the quality of segmentation analysis in a restaurant based on customer preferences and generated content.

This work is organized as follows. We present the literature review in Sect. 2. In Sect. 3, the hybrid method is presented. In Sect. 4, we provide the experimental results. Finally, in Sect. 5 and Sect. 6, the discussion and conclusion are, respectively, provided. The list of acronyms used in this paper is displayed in Table 1.

Table 1 List of abbreviations used in this manuscript

2 Research Background

2.1 Market Segmentation in Restaurant

User-generated data are one of the main sources of data related to customers, their needs and preferences. These data can explain the characteristics of users and their level of satisfaction with the product and services provided [16]. Customer segmentation is mainly defined as classifying customers into different groups. The users in each group have similar characteristics while differing from other groups. This method is helpful for industries to identify different customer groups [17]. Various customer segmentation methods have been used in the past researches. Customers with the same needs, such as shopping patterns and purchasing behavior, can be grouped into the same clusters through market segmentation. Based on this method, the customers are grouped utilizing their preferences and buying behavior. More accurate and correct customer segmentation in an industry will help organizational success. Market segmentation is primarily aimed at accurately predicting user needs, which will lead to increased profitability by investing or producing a suitable amount of goods within a certain period for final users at an appropriate cost [10]. Accordingly, organizations can communicate more efficiently with their customers, and users [18]. Customer segmentation is a common practice across many industries such as marketing [19], education [20] and tourism [21].

The segmentation method in the context of tourism and hospitality focuses on identifying the requirements and benefits associated with specific product groups, user purchases/behavior, demographics and nationalities [22]. However, much of the research in customer segmentation has concentrated on applying Google Analytics dataset, yet, there is a growing effort to applying customer segmentation by social media data from the main online environments.

Some of the past research (see Table 2) considered restaurant customers segmentation in various dimensions such as demographic characteristics, preferences, behavior and satisfaction level. As an example, [23] developed an approach for spa hotel classification and determining travel choice utilizing data mining methods. Based on findings, 9 clusters for spa hotel segmentation were identified.

Table 2 A literature review of customer segmentation in restaurant

A meal at a restaurant or the purchase of a hotel room can also be considered a hospitality product [24]. Recently, there has been a growing interest among consumers in the use of vegetarian products [25]. The vegetarian concept is widely applied in a population that does not eat meat, seafood or foods made up of these elements [26]. Recent evidence suggests that vegetarian foods are an important source of a wide variety of essential micronutrients, and there is strong evidence that eating vegetarian can be effective in preventing many chronic diseases, such as cardiovascular disorders, diabetes and some cancers [27]. According to this, in a study conducted by [17], customer classification and determining their preferences in vegetarian restaurants were performed and customer preferences prediction was conducted through Classification and Regression Tree (CART) algorithm in different groups. In each group, a decision tree model is created to determine the customers' preferences. Furthermore, Trappey et al. [28] use preference variables to determine customers' loyalty to Japanese-style chain restaurants by K-means clustering. Based on findings, the restaurant created customized price discounts for each customer according to their past preferences and behaviors. In addition, profiling profitable hotel customers by Recency, Frequency and Monetary (RFM) indicators evaluated in [13]. In this method, Recency, Frequency and Monetary indicators are used for determining the type of customers.

A further study used a hybrid technique for user-generated data investigation by multi-criteria decision-making and text mining methods to determine the significance of elements pertinent to customers decision-making to green hotels selection [29]. Similarly in [30] classification of customer satisfaction in Las Vegas hotels was conducted. Based on results, naive Bayesian method can classify hotel customers with high precision and recall. Moreover, Nilashi et al. [10] provided an analytical approach for customer segmentation in eco-friendly hotels. In this research by utilizing the Expectation–Maximization (EM) method, 4 clusters were identified and travelers' satisfaction levels were determined. Furthermore, an investigation of the travelers' choice behavior in green hotels was conducted in [31]. The developed techniques can examine travelers' reviews and classifying eco-friendly hotels to determine the next choice behavior and assist customers in their decision-making.

Segmentation techniques can be categorized according to two main parameters, a-priori and post-hoc methods. In the a-priori approach, the type and amount of clusters are determined priori the data gathering. In the post hoc method, the number of groups and their attributes is set out by the current study approach [32]. Despite the popularity of post hoc methods, to date, rare researches have been advocated in the hospitality sector for spa hotel segmentation and customer preferences. Furthermore, despite the vast advancement in the tourism industry, only a limited number of papers concentrated on customers, preferences and satisfaction in vegetarian restaurants. Today big social data and user-generated contents create a large repository of data that demonstrates users' satisfaction from restaurant services quality. Furthermore, user-generated contents have the potential to explain differences among customers with various preferences. Accordingly, online reviews create a beneficial means for a data-driven approach and post hoc market segmentation that provide insight into how managers can efficiently segment different customers.

2.2 Relationship Between Restaurant Experience and Online Review

User-generated data in social media are becoming the main sources of information for hospitality research and practices [9]. Nowadays, with an enormous amount of social media data and the vast amount of online reviews, it is difficult to manipulate customers' opinions [33]. These data sources can create real information about the restaurant industry. User-generated data are a type of e-WOM which is considered as the task of transferring information related to a service from one person to another by electronic tools [34].

Many studies have been done to determine the impact of online reviews in the hospitality and tourism industry. Many online review tools collect user-generated data and publish content to potential users. Each social media platform has exclusive characteristics; these tools permit consumers to post their opinion and ask them to appraise experiences based on a scale. There is a different platform for online reviews, such as Expedia, Yelp, AirBnB and TripAdvisor [8, 35]. Researches applying for online reviews often use a set of reviews and pertinent data, to obtain characteristics that permit the researcher to identify, explain or predict patterns that are useful from a theoretical and practical standpoint. This research complements traditional methods which rely primarily on surveys and interviews and demonstrates a promising research direction by using big social media data [18, 36].

TripAdvisor, as one of the favored online review platforms, provides travelers free online travel advice. This service now enables consumers from various market segments to search for restaurants utilizing their websites and mobile applications. Previous literature (see Table 3) concentrated on the online review opportunities in the hotel industry to obtain practical knowledge on customers' experiences, satisfaction level, and preferences [23, 37,38,39,40,41]. Based on these studies, three main factors for evaluating hospitality experience are: food quality, service quality and atmosphere [37, 42,43,44,45,46].

Table 3 Online review study in restaurant

Researchers rely on nutritional characteristics to judge the quality of food, for instance, in a study conducted by [44] food quality, price, perceived value and customer satisfaction were essential for users revisit and positive WOM in organic food restaurants. In [46], it is evident that food quality, restaurant environment and services, as well as the amount of consumer-generated reviews, have a positive impact on the popularity of restaurants. Another research [39] reached the same conclusions. Furthermore, [43] added a few other features to improve restaurant performance by online reviews,based on results food, service and environment are the main factors contributing to star ratings, followed by cost and atmosphere. In addition, some of the other factors such as ingredients, taste, well-being and hygiene had a paramount impact on user satisfaction [37]. In addition, customers not only assess the food quality but also service quality through their dining experience [1]. The influence of service quality on user satisfaction and behavioral intentions, such WOM, has been demonstrated by several studies. For example, Nilashi et al. [45] demonstrated that service quality has an important impact on hotel performance and customers' satisfaction. Based on the result of another study, the physical environment had the most impact on the sharing of positive content by consumers [42].

2.3 Machine Learning in Market Segmentation

Big social data analytics is considered a new research agenda that uses a variety of data repositories and analytical methods to draw conclusions and predict real situations [35]. In particular, with the advancement of machine learning techniques, textual contents in social media provide a vast shared cognitive context and, so, have been interpreted in most of the application areas [35]. Customer segmentation is considered as a classification of users into different groups. The individuals in each group possess similar characteristics while differing from other groups. User segmentation is a practical approach that helps industries to recognize different clusters of users [47], and various customer segmentation approaches have been applied the in literature. For instance, in the post hoc market segmentation which is according to analyzing customer features, several methods have been introduced in the literature, such as Chi-Squared Automatic Interaction Detection (CHAID), Self-Organizing Map (SOM) and Artificial Neural Network (ANN). Furthermore, in some of the reviewed literature, the K-means clustering method was used for customer segmentation [16, 28] Other scholars applied hybrid machine learning techniques for clustering, for instance, Ahani et al. [23] developed a method for spa hotel segmentation utilizing SOM, HOSVD, CART methods. Moreover, the clustering approach by the self-organizing map and predictive methods were applied for segmentation and predict customers preferences in vegetarian restaurants [17]. In addition, a combination approach for user-generated data analysis by multi-criteria decision-making and text analysis methods to determine the significance of elements influence on the travelers decision-making in spa hotels was developed in [29].

In the past decade, social media has become a workable and important environment in the tourism and hospitality industry. New social media create better communication among organizations and consumers. This interaction can occur by applying different methods such as liking, commenting on or sharing user-generated content [48]. The main direction of current studies concentrated on user-generated content is defined as “Big Data” that can be applied to get deeper insights, offer high value to customers and gain a competitive advantage in hospitality and tourism [11]. The data obtained from these resources create some of the methods for customers segmentation, by cultural traits, needs and demographic features. Most tourism settings use a variety of basic characteristics for customer segmentation, the most prevalent of these features are demographic, socioeconomic and psychological attributes. Furthermore, segmentation restricts competition to the groups, since companies concentrate merely on the target cluster of their preferences rather than the whole market. Correct customer segmentation also benefits the marketing strategies, enabling marketing managers to concentrate on the creating of novel messages and furthermore effectively distributing them through the most suitable methods. In other words, segmentation allows that positive word of mouth to be generated and disseminated to members of each specific group [49].

Concerning methodology, “priori” and “post hoc” are critical methods in market segmentation. The first method relates to the selection of features at the first step, subsequently the segmentation of the target group, while the second acts by the gathering of data; according to the correlated features, prior the customers are grouped into clusters with vast most similarities [3, 19]. For segmentation features, the first and most prevalent type of user clustering is by demographic traits, such as age, gender, language that generally select by managers for the clustering of users into different groups [50]. To deviate from these primary characteristics, the hospitality industry has started to use behavioral and psychological features for customer segmentation [18, 21]. This high amount of data necessitates the applying of advanced statistical methods. In the reviewed literature for customer segmentation, clustering seems to be the most prevalent approach for use. Today, data-driven methods are constantly growing, mainly by using tools such as web crawling, machine learning and other analytical methods for the aggregation and interpretation of big social data [3, 51]. In big social data analysis, a mixture of various machine learning methods seems to be a practical approach in obtaining appropriate segmentation outcome in both computational and marketing contexts.

3 Methodology

In this research, we aimed to develop a new data-driven approach to reveal customers' satisfaction in restaurants. The proposed method is shown in Fig. 1. As seen from this figure, we use clustering and neural network for customers segmentation through social data in social networking sites. We use clustering and neural network for analyzing customers' satisfaction. Specifically, k-means and neural network with the aid of particle swarm optimization technique are, respectively, used in data clustering and prediction tasks. After clustering the data, neural network is applied on the clusters for the prediction of customers' satisfaction. We evaluate the prediction models through a set of evaluation metrics, Mean Squared Error (MSE) [52] and coefficient of determination (R2) [53]. Finally, comparisons are made to show the robustness of the proposed method in relation to the previous methods, Means-Multiple Linear Regression (MLR), Support Vector Regression (SVR) and neural network.

Fig. 1
figure 1

Proposed hybrid method

3.1 Clustering Using K-Means

K-means has been an effective clustering technique which has been widely used knowledge discovery [54,55,56,57]. This clustering algorithm divides data points into \(k\) clusters. These clusters, \({S}_{i}(i=\mathrm{1,2},\dots ,k)\), are presented by their own representative (cluster center) \({C}_{i}\). \(S\) = {\(\mathbf{X}\)} represents the set of data points. For tow vectors \(\mathbf{X}\) and \(\mathbf{Y}\), the Euclidean distance of these two vectors is defined as \(d(\mathbf{X},\mathbf{Y})\). The main process of k-means clustering is shown in Algorithm 1.

figure a

3.2 ANN

As a computational approach, Artificial Neural Network (ANN) uses a directed network of linked neurons to map inputs to outputs [58, 59]. Each neuron in the network is a fundamental computation unit that computes \(y= max \left(0, {\sum }_{i}{w}_{i}{x}_{i}+b\right)\), where \(\left\{{x}_{i}\right\}\) denotes the neuron inputs, \(\left\{{w}_{i}\right\}\) denotes the neuron weights, \(\mathrm{b}\) denotes the neuron bias, and \(\mathrm{y}\) indicates the neuron output. Each neuron is coupled in a layered architecture, with the following formula used to map inputs to outputs:

$$h_{i} = max \left( {0,W_{i} \cdot h_{i - 1} + b_{i} } \right)\;{\text{for}}\;1 \le i \le L\;{\text{and}}\;h_{0} = x$$
(1)
$$y = max \left( {0,Vh_{L} } \right)$$
(2)

where \(V\), vector \({b}_{1}\),…, \({b}_{L}\) and matrices \({W}_{1},\dots\), \({W}_{L}\) are parameters of the model and \(L\) indicates the number of layers. The parameters are learnt from the dataset.

The data collected were used to train the artificial neural network. The artificial neural network inputs were the features of the dataset, and its output was the level of customer satisfaction. The entire dataset was divided into two parts: a training set (80% of the data) and a testing set (20% of the data). The validation method used is tenfold cross-validation [60]. Prior to application, PSO tuned the architecture of the ANN based on the MSE according to the following equation:

$${\text{MSE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {y_{i}^{*} - y_{i} } \right)^{2}$$
(3)

where \(N\) indicates the number of instances; \({y}_{i}^{*}\) and \({y}_{i}\) denote, respectively, the predicted and experimental customers' satisfaction levels of the \(i\mathrm{th}\) instance in the dataset.

A nonlinear function of the inputs is obtained by using an ANN, and the ANN outputs are controlled by weights obtained in the learning process in the network. The learning process and learning approach are mainly supervised and Back Propagation (BP). It is necessary to have a bounded differentiable activation function for the BP training process to work properly. The sigmoid function, which is the most well known of all functions, has been employed. It is restricted to the range between the minimum (0) and maximum (1). To pass the signal to the next layer of neurons, the neuron summed output is scaled by sigmoid function prior to being passed on. A network ability to convert the weights in the network in response to errors is essential for error propagation network learning. In Fig. 2, we present the schematic diagram of the learning algorithm use in the paper.

Fig. 2
figure 2

Constructing prediction models in ANN using PBP

3.3 PSO

Particle Swarm Optimization (PSO) is a highly effective optimization approach for locating the global optimum in a multidimensional search space [61,62,63]. The learning process of PSO began with swarm of particles which is randomly generated, each of which represents a distinct architecture of an ANN. To determine the fitness of particles' positions on the training set, the MSE metric is mainly used. For a more accurate network, architecture of an artificial neural network with a lower mean squared error is represented by a higher fitness particle. Particle position updates are used to generate next swarm, which took into account each particle' best position in history and swarm's best position in history. Swarms of particles were developed to the optimal position incrementally until the network reaches a maximum iteration. The following formulas are used in this paper to determine the position of particles:

$$V_{i}^{{{\text{r}} + 1}} = wV_{i}^{{\text{r}}} + c_{1} r_{1} \left( {p_{{bes{\text{t}}.i}}^{{\text{r}}} - X_{i}^{{\text{r}}} } \right) + c_{2} r_{2} \left( {g_{{bes{\text{f}}.i}}^{{\text{r}}} - X_{i}^{{\text{f}}} } \right)$$
(4)
$$X_{i}^{{{\text{f}} + 1}} = X_{i}^{{\text{r}}} + V_{i}^{{{\text{f}} + 1}}$$
(5)

where \(\mathrm{w}\) denotes the inertia parameter. The two parameters \({\mathrm{c}}_{1}\) and \({\mathrm{c}}_{2}\) indicate, respectively, the cognitive and social influences; \({\mathrm{V}}_{\mathrm{i}}^{\mathrm{t}+1}\) and \({\mathrm{V}}_{\mathrm{i}}^{\mathrm{r}}\) are, respectively, the velocity of particle \(i\) at \(r\) and \(r+1\) iterations \(; {X}_{i}^{r+1}\) and \({\chi }_{i}^{r}\) indicate positions of particle\(i\); \({r}_{1}\) and \({r}_{2}\) indicate random values in the ranges of 0 to 1; \({\mathrm{g}}_{\mathrm{besf}.\mathrm{i}}^{\mathrm{r}}\) and \({\mathrm{p}}_{\mathrm{besri}}^{\mathrm{r}}\) are, respectively, the swarm best position and the best position of a particle. The procedure for PSO is presented in Fig. 3.

4 Data Collection and Method Evaluation

To evaluate the method, we collected a set of data from TripAdvisor. The Saudi Arabia restaurants registered in TripAdvisor are selected for data collection (see Fig. 4). The numerical ratings were collected and analyzed by the proposed method.

Fig. 3
figure 3

PSO procedure

Fig. 4
figure 4

Restaurants registered in TripAdvisor

In TripAdvisor, the customers provide ratings in a range of 1 to 5 on the restaurants. There are four main criteria (Food, Service, Value and Atmosphere) to assess the quality of a restaurant's services. In addition, customers' satisfaction is revealed through an overall rating in a range of 1 to 5. We collected 7562 ratings from TripAdvisor. The records with null values in four main criteria were removed. Accordingly, 3822 records remained in the dataset for further analysis.

In the first step, k-means clustering was applied on the data to obtain several clusters from the customers' ratings on the restaurants. In this study, totally, 6 clusters were generated from the data. Each cluster included the users' ratings on the Food, Service, Value and Atmosphere along with their overall ratings. In Fig. 5, we present the importance of factors in each cluster. Restaurants criteria in each cluster base on customers' ratings are presented in Table 4. In addition, customer satisfaction in each cluster is presented in Table 5. We present the results of k-means clustering in Fig. 6.

Fig. 5
figure 5

Cluster centroids

Table 4 Restaurants criteria in each cluster
Table 5 Customer satisfaction in each cluster
Fig. 6
figure 6

Clustering of data by k-means

The clusters generated by k-means were used in PSO-ANN for predicting customers' satisfaction based on the inputs ratings. The data in each cluster were divided into training and test sets. The training sets (60% of the data) were used to construct the prediction models, and test sets (40% of the data) were used to evaluate the prediction models for their performance. To find the best network for this prediction, we performed an experiment on the use of different hidden layers of ANN and provided their mean squared error values. In this experiment, for different numbers of hidden layers in ANN, we present the swarm minimum mean squared error values versus iteration. According to this experiment, the swarm minimum Mean Squared Error values correspond to swarm best positions. The results are presented in Fig. 7. From this figure, it is found that a huge decrease in the swarm's minimum mean squared error occurred after one iteration. This indicates that PSO has been efficient in tuning the ANN architecture for customers' satisfaction prediction. In addition, as shown in Fig. 6, after the swarm' the minimum mean squared error was gradually reduced in the first 20 iterations. Overall, it was found that the lowest minimum error (MSE = 0.09190) was obtained by the ANN model in the first cluster with two hidden layers. In these two layers, eight neurons and two neurons, respectively, in the first and second layers were the best option. This experiment was performed for all 6 clusters to find an optimum ANN architecture, as shown in Fig. 7.

Fig. 7
figure 7

MSE versus iteration in ANN for 6 clusters

According to the optimum ANN models in six clusters, we present the results in Fig. 8 for the coefficient of determination. The results presented in the figures show that PSO-ANN has accurately predicted customers' satisfaction in six clusters. In addition, the results for the correlation coefficient in six segments show that a high coefficient of determination was achieved by PSO-ANN. In fact, the relationship between customers' satisfaction and quality dimensions has been successfully discovered by PSO-ANN. In Fig. 9, we present actual satisfaction versus predicted satisfaction in six clusters.

Fig. 8
figure 8

Coefficient of determination in six clusters

Fig. 9
figure 9

Actual satisfaction versus predicted satisfaction in six clusters

We also compared the results of PSO-ANN with ANN and SVR. The results of the comparisons are provided in Table 6. In this table, we have provided the average values of the MSE and coefficient of determination of the six models. The results clearly demonstrate that k-means-PSO-ANN (MSE = 0.09847; R2 = 0.98764) has outperformed other methods, k-Means-ANN (MSE = 0.16134; R2 = 0.86535), ANFIS (MSE = 0.18829; R2 = 0.87194), ANN Ensembles (MSE = 0.15632; R2 = 0.94352), SVR (MSE = 0.18936; R2 = 0.86535), k-Means-MLR (MSE = 0.18534; R2 = 0.88342) and ANN (MSE = 0.17563; R2 = 0.92678). This indicates that the use of PSO has improved the performance of ANN for customers' satisfaction analysis from TripAdvisor data. In addition, the results show that clustering has improved the prediction accuracy of the predictors.

Table 6 Method comparisons

5 Discussion

The competitiveness in different industries has become progressively important, especially for countries that rely mainly on tourism and hospitality. A hotel and restaurant can be assumed competitive if it can attract and satisfy pertinent consumers. The competitiveness of a destination not only has a direct effect on tourism revenue in terms of the number of visitors and expenses, but also indirectly influences tourism-related industries like hotels and restaurants [64]. User-generated data are considered as the main source of data for tourism and hospitality research and practices. They demonstrate customers' delight or dissatisfaction with their experiences [9]. In other words, user-generated data are easy and meaningful reaction created by customers that is openly attainable on the online setting [65]. User-generated data are the main approach that can benefit hospitality and tourism hence they help to improve competitiveness [66]. Online reviews have been considered as the primary channel for sharing feedback about different services. They support potential customers in the decision-making process and indirectly inspire hotel managers to upgrade the quality of their products or services [67]. The uniqueness of the hospitality and tourism products and services coupled with the growing importance of user-generated data has also triggered studies to determine the perspectives of hospitality customers, assisting to create managers' strategies [39]. Customer heterogeneity requires market segmentation to obtain suitable targeting, arranging, marketing and income management [68].

When we have heterogeneous customers, allocating them into mutually exclusive and homogeneous clusters has proved to be a practical and so extensively used idea among marketing managers [3]. In the restaurant's context, customers usually find it difficult to select a restaurant according to elements like the taste, physical setting, price, and service quality [46]. Online users' efforts to create pertinent customers with reliable information related to the unfamiliar restaurant can diminish the problems and ambiguity of selecting a restaurant. Past researches have revealed the benefit of user-generated content for different services and products like hospitality services [44]. The literature demonstrated that service quality has a notable effect on user satisfaction and loyalty compared to product or service features. Quality of services also causes lowering costs, elevating profitability, increasing industry performance and finally positive word-of-mouth. Managers and policymakers have currently initiated to develop strategies to determine service quality to their superiority [69]. Furthermore, user satisfaction, which is considered as the overall evaluation of different features of products or services, is one of the critical factors for business performance, competitiveness and profitability [70].

Past researchers have revealed that in the hospitality industry, the accurate measurement of a company's success lies in its capability to satisfy customer's preferences consistently. In other words, it is ascertained that service quality is antecedent to customer satisfaction [71]. In this way, many researchers are aiming to determine customer insight regarding products or services provided in hotels and restaurants and the impact of this insight on customer satisfaction [30, 44, 70,71,72]. On the other hand, social media has dramatically changed the way industries communicate with customers. The main characteristic of today's online business is that customers use social media to develop and maintain communication with a vast group of individuals [15]. Studies explain that eWOM communication on social media is one of the most impressive channels to set up consumer attitudes regarding a product or service and these contents have a critical impact on customer purchase decisions [4, 19, 35, 39, 48, 49, 73].

This study contributed to the analysis of social data provided by restaurants' customers using a hybrid method. The customer segmentation in Saudi Arabia restaurants was performed using clustering, neural networks, and optimization learning techniques. We collected 3822 ratings from TripAdvisor for data analysis to find the relationship between the quality dimension of restaurants and customers' satisfaction. In the first step, k-means clustering was applied on the data to obtain several clusters from the customers' ratings on the restaurants. Totally, six clusters were generated from the TripAdvisor data. Each cluster included the users' ratings on the Food, Service, Value and Atmosphere along with their overall ratings. The importance of the factors impacting customers' satisfaction was revealed in each cluster. It was found that in Cluster 1, Food, Atmosphere and Value were the most important factors. In Cluster 2, Service was highly impacted on the customers' satisfaction. In Cluster 3, Atmosphere was important for the customers and Food and Services were their main concerns. In Cluster 4, Food has highly influenced the customers' satisfaction. Service, Value and Atmosphere were the most important factors in Cluster 5. Food and Value were highly impacted on the customers' satisfaction in Cluster 6. These clusters provide valuable information about customers' preferences in Saudi Arabia restaurants. In addition, in this study, we have collected real data for our method evaluation. Nowadays, the analysis of social data using sophisticated learning approaches can aid business managers in better detecting the customers' segments. This will accordingly improve their competitive advantages in the market.

6 Conclusions

This study developed a new method for analyzing customers' satisfaction in Saudi Arabia restaurants through the use of social data. We used clustering and ANN to develop the hybrid method. The performance of the ANN technique was improved using PSO. We collected data from TripAdvisor to investigate the customers' satisfaction in the restaurants. The effectiveness of the method was measured using MSE and coefficient of determination values. In addition, we compared the results with other methods. The results demonstrated that PSO-ANN (MSE = 0.09847; R2 = 0.98764) has outperformed other methods, k-means-PSO-ANN (MSE = 0.09847; R2 = 0.98764) has outperformed other methods, k-Means-ANN (MSE = 0.16134; R2 = 0.86535), SVR (MSE = 0.18936; R2 = 0.86535), k-Means-MLR (MSE = 0.18534; R2 = 0.88342) and ANN (MSE = 0.17563; R2 = 0.92678), in analyzing customers' satisfaction. In future research, it is recommended that different sources of data are used for method evaluation.

A large-scale study is a complementary approach to obtain outcomes that can significantly advance existing knowledge in customers' preference learning and segmentation and can be used in conjunction with other approaches. Accordingly, compared to a small study with a limited number of participants or a small sample size, a large study with a very large population size is more likely to be successful in this context. In fact, large samples can produce estimates of effects with high precision and provide the capability to better detect the segments. In addition, as SVR has also provided accurate results, its effectiveness could be investigated using optimization techniques for analyzing customers' satisfaction. Furthermore, this research did not include textual reviews for customers' satisfaction assessments. Therefore, the text mining approaches can be incorporated into the proposed method for customers satisfaction analysis and segmentation. Moreover, the proposed method can be further developed by new machine learning algorithms [58, 74,75,76,77,78] and compared with the proposed method to see its effectiveness in term of prediction accuracy. Additionally, robust imputation methods for missing data will be effective in improving the accuracy of the prediction learning techniques [79, 80]. Finally, we used the NN technique for the evaluation of customers' satisfaction using social data, however, Explainable NN (XNN) could be used for the explanation of the features extracted by different layers of the network [81,82,83].