Product selection based on sentiment analysis of online reviews: an intuitionistic fuzzy TODIM method

Online reviews contain a great deal of information about consumers' purchasing preferences, which seriously affects potential consumers' purchasing decisions. Using the online review data to help customers make purchasing decisions has become a concern of customers, which has theoretical and practical application value. Therefore, a product selection model is presented based on sentiment analysis combined with an intuitionistic fuzzy TODIM method. Firstly, the product features are extracted by the Apriori algorithm based on online reviews. The sentiment orientation and intensity of the sentiment words for the product features are identified by the lexicon-based sentiment analysis approach. Next, the sentiment orientation of the product features is represented by an intuitionistic fuzzy value. Then the intuitionistic fuzzy TODIM method is used to determine the ranking results of the alternative products. Finally, the case study of mobile phone selection is given to illustrate the proposed approach. The results show that the proposed method considers the online reviews’ sentiment orientation and intensity and the consumers’ gain and loss in the purchasing product process and is more reasonable than the previous research.


Introduction
The rapid development of the Internet has brought great convenience to people's lives, and online consumer groups are increasing. Online reviews contain a great deal of information about consumers' purchasing preferences, which seriously affects potential consumers' purchasing decisions. It becomes an essential information source for consumers to make purchasing decisions and significantly impacts consumers' decision-making behavior [1][2][3]. However, due to the complexity of online reviews, consumers cannot effectively use online review data. Therefore, fully and effectively using the online review data to make it the basis of purchasing B Lixin Zhou zhoulixin1861@hotmail.com 1 School of Automation, Nanjing University of Science and Technology, Nanjing 210014, China 2 decisions has become a concern of many scholars, merchants, and consumers.
In the existing research [18], the ranking product methods through online reviews include two parts: sentiment analysis and multi-attribute decision-making. The first part is to identify the sentiment orientation of online reviews by analyzing online reviews extracted from online platforms. The second part is to select the best alternative product considering selected criteria based on sentiment analysis. However, some product ranking methods based on online reviews only consider online reviews' positive and negative sentiment tendencies [4][5][6][7][8][9]. The sentiment tendency of each sentence is divided into positive or negative, and ignore the information that the sentiment orientation in online reviews is neutral, resulting in a loss of information in the product purchase decision process.
The sentiment orientations of online reviews are classified into positive, negative, and neutral to avoid the loss of online review information. Intuitionistic fuzzy set (IFS) includes membership, non-membership, and hesitation simultaneously, providing a useful tool to represent the positive, negative, and neutral sentiments in online review data. The IFS is widely used to describe sentiment orientation and sentiment intensity [18]. However, the existing research ignores the customers' psychological behavior and gain and loss during purchasing. TODIM method is suitable for describing the psychological behavior of the customers in the product ranking process [32][33][34][35]. The main idea of the TODIM method is to compare the product feature value of each alternative product and obtain the gain and loss value, then calculate the dominance degree between every two alternative products and the overall prospect values of each product [36][37][38][39]. According to the overall prospect values, the alternative products are ranked.
Therefore, an online review-based product selection model combined with an intuitionistic fuzzy TODIM (IF-TODIM) method is developed. Firstly, the Apriori algorithm is used to extract the product features that customers focus on based on online reviews. Then the sentiment orientation and intensity of the sentiment words for the product features are identified by the lexicon-based sentiment analysis approach. The proportion of the sentiment orientations of the product features are represented by an intuitionistic fuzzy value (IFV). Finally, the IF-TODIM method is used to determine the final ranking results of the alternative products.
The rest of our work is organized as follows. "Related works" introduces some related works on the ranking selection. Considering the advantages of IFVs and intuitionistic fuzzy sets (IFSs) representing the sentiment orientations of product features, "Preliminaries" provides some concepts of IFVs and IFSs. "The IF-TODIM method for product ranking based on online review" develops a new IF-TODIM method for product selection based on online reviews. A case study is given to illustrate the effectiveness of the developed IF-TODIM method in "Case study". "Conclusion" takes some conclusions.

Related works
Recently, some scholars have concentrated on ranking products through online reviews [4][5][6][7][8][9]18]. Zhang et al. [4] identified multiple important product features, then extracted sentences about each feature from online reviews, divided the online reviews into subjective and comparative reviews using a dynamic programming algorithm. The online reviews' sentiment orientation was determined to construct a weighted product graph and rank the products using an improved PageRank algorithm. Later, Zhang et al. [5] improved the algorithm by considering the importance of different reviews. The weight of each review was determined by the review's usefulness and time. Kang et al. [6] proposed a customer satisfaction analysis framework based on customer review mining analysis for product improvement decision making. Najmi et al. [7] calculated each product's score by both review and brand. The review score was derived from sentiment analysis and usefulness analysis, and the brand score was calculated by an improved PageRank algorithm, and the products were ranked based on their combined scores. Li et al. [8] used the value function of prospect theory to determine the perceived value of alternative products based on consumers' expectations of product attributes and the sentiment orientation of product attributes in online reviews. Fan et al. [9] used the stochastic PROMETHEE-II method to determine product ranking based on online ratings.
Fuzzy set theory has been applied in the product ranking or recommendation to represent the uncertainty in the online review data [10][11][12]. Different forms of fuzzy sets have been used to represent product feature values, such as fuzzy set, hesitant fuzzy set (HFS), Pythagorean fuzzy set (PFS), interval type-2 fuzzy set (IT2 FS), and IFS. Peng et al. [13] calculated the similarity measures of words to cluster each product feature synonyms, then determined the important product features based on the total frequency of each product feature in the reviews. The subjective evaluation of experts was contributed to obtaining a fuzzy decision matrix of important product features, and finally, the products were ranked by the fuzzy PROMETHEE method. Zhang et al. [14] regarded different sentiment scores of product features as different membership values and integrated different sentiment scores by HFS. A product ranking method based on 2additive fuzzy measures and Choquet integral was developed. Considering IT2 FS was more accurate than the traditional fuzzy set in representing the uncertainty, Bi et al. [15] represented the uncertainty of the product features' sentiment orientations using IT2 FS. Fu et al. [16] used deep learning models and K-means clustering algorithms to identify sentiment tendencies, considered the credibility of the number of online reviews for different products. Interval-valued PFS sets were used to represent product attribute values, and finally, the Heronian mean operator was used to integrate product attribute information to derive product ranking. To retain both the online review sentiment propensity and its probability, Liu and Teng [17] used probabilistic linguistic term sets (PLTSs). The PL-TODIM method was proposed for alternative products based on the new entropy measures and possibility degrees. The probability multivalued neutrosophic linguistic numbers (PMVNLNs) was developed by Ji et al. [18] to characterize online reviews and reflect the differences in positive (negative) information. Regret theory was combined with outranking methods to construct a reviewbased decision support model. Liang et al. [19] considered the randomness and ambiguity of online reviews and the interrelationship between product features in the decision support model and developed a linguistic intuitionistic normal cloud (LINC) model. Liang et al. [20] represented tourists' sentiment preferences by distributed linguistic according to the online reviews, developed a method for determining the ideal and minimum value solutions, and proposed a DL-VIKOR to rank the alternative hotels for tourists. IFS has been widely used to describe sentiment orientation and sentiment intensity [21]. In the transforming process, the proportions of the positive, negative, and neutral sentiment orientations were transformed into the membership, non-membership, and hesitance values in IFVs, respectively. Therefore, the IFS has strong flexibility and practicality in the product ranking problem. Liu et al. [22] constructed a purchase decision model based on the IF-TOPSIS method, which focuses on product preference through similarity to the ideal solution. Liu et al. [23] ranked the alternative products using the combined intuitionistic fuzzy weighted average (IFWA) operator with the PROMETHEE II method. Çalı and Balaman [24] represented the online ratings of hotel customers by IFSs, and IF-ELECTRE was used to rank alternative hotels with VIKOR integration. Zhang et al. [25] calculated the feature weights considering the customers' attention and developed a product ranking model combining 2-additive fuzzy measures, non-linear programming, and Choquet integration.
Therefore, the main contributions of the developed IF-TODIM method for ranking products are as follows. Firstly, a new product selection method based on online reviews is proposed to consider the consumers' online reviews and psychological behavior. Secondly, the product features are exacted by the Apriori algorithm, which is different from the previous research. Thirdly, in the IF-TODIM method, new ranking methods of intuitionistic fuzzy values (IFVs) are developed to compare the gain and loss of each product feature. The objective weight values of product features are calculated by considering entropy measures. Fourthly, compared with the previous method, product ranking with the IF-TODIM method has advantages over the intuitionistic fuzzy TOPSIS (IF-TOPSIS) method.

Preliminaries
The IFVs have the advantage of representing the feature values of products. In the transforming process, the proportions of the positive, negative, and neutral sentiment orientations are transformed into the membership, non-membership, and hesitance values in IFVs, respectively. Therefore, some basic concepts of IFVs and IFSs are introduced. Definition 1 [26].
be two IFSs representing the feature values of products, where μ A (x i ), ν A (x i ) and π A (x i ) are the membership value, non-membership value , and hesitance value in IFV, μ A (x i ) + ν A (x i ) + π A (x i ) 1. The Hamming, Euclidean, and generalized distances between the two product features A and B are defined as follows.
be an IFS, and the entropy measures can be defined as follows.
The score measures of IFVs act as an important role in comparing the magnitude of alternative product feature values. Some new score measures considering the signed distance of IFVs are introduced as follows.
Definition 3. Let a < μ a , ν a > and b < μ b , ν b > be two IFVs representing the feature values of products,0 < 0, 1 > and1 < 1, 0 > are the worst and best evaluation values of the product features. Then, the new score measures R h , R e and R g of IFVs are defined as follows.
R e (a) d e a,0 The corresponding ranking method of IFVs is defined as

Problem description
The following symbols are used to represent collections and variables in the product selection problem. A {A 1 , A 2 , . . . , A n }: a collection of n alternative products, where A i represents the i-th product, i 1, 2, . . . , n and the consumers select the alternative product set A.
. . , f m }: a collection of m features, the products' features from the online reviews that the consumer focuses on, where f j represents the j-th feature, j 1, 2, . . . , m.
W {ω 1 , ω 2 , . . . , ω m }: the weight vector of the features, where ω j represents the weight of the feature f j , ω j > 0 and m j 1 ω j 1. Q {q 1 , q 2 , . . . , q n }: the collection of the number of online reviews for the alternative product A i , where q i means the number of online reviews about the alternative product The problem is how to select alternative products A 1 , A 2 , . . . , A n based on online review D ik and feature weight ω j , The flowchart of the product selection is shown in Fig. 1. The input information is the crawled online reviews of alternative products. The process includes two parts: sentiment orientation identification and product ranking based on the IF-TODIM method. In the first part, the Apriori algorithm is first used to identify the product features that customers focus on based on online reviews. The sentiment orientation and intensity of the sentiment words for the product features are identified by the lexicon-based sentiment analysis approach. The second part is to convert the sentiment orientation of the product features into an IFV and then use the IF-TODIM method to determine the final ranking results of the alternative products.

Sentiment orientation identification of the online reviews
(1) Product feature extraction A product feature extraction method based on online review data mining is introduced to extract the features of the alternative products that the consumers focus on from the online reviews. The process is described as follows.
First, the online review data is segmented, and the online review data after the segmentation is tagged. For the sake of accuracy and rationality, the ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System, http:// ictclas.nlpir.org/) tool is used for word segmentation of online review data. The lexical marking is for nouns, verbs, adjectives, or verbs with noun functions and proper nouns to improve the accuracy of the search.
Secondly, the association rule transaction file is created using the part-of-speech tagging, and the frequent itemset is searched based on the association rule Apriori algorithm. Here, the minimum support value is 1%, and at the same time, more than three frequent items are not considered. The frequent itemset is pruned and corrected according to the neighboring rules and independent support and formed into a product feature set F T F . Then, the common Chinese frequent item noun set F F F of non-product features (such as some common product brands, colloquial zed nouns, and personal names) and the product feature set F SF (containing single nouns) are constructed, and F T F is filtered to form the final product feature set F, (2) Construct the positive and negative sentiment dictionaries of product features Normally, different features have different positive or negative sentiment dictionaries. A word exhibits different sentiment orientations in the sentiment dictionaries of different features. For example, "high" is the negative sentiment word in the dictionary of the feature "price" and the positive sentiment word of the feature "pixel". Therefore, the positive and negative sentiment dictionary for each product feature should be constructed separately.
Firstly, according to the online review set after the part-ofspeech tagging, the association rule object file for the feature f j in the review is created. The frequent itemset F is searched based on the association rule Apriori algorithm to form the feature annotation set.
To improve their accuracy, W (3) Identify the sentiment orientations of product features Each feature's positive, neutral or negative sentiment orientations of each review are calculated. The principle of identifying the sentence's sentiment orientation is as follows [18]. If the number of positive sentiment words in the sentence is greater than that of the negative sentiment, the sentiment is considered positive. If the number of negative sentiment words in a sentence is greater than that of positive sentiment words, the sentence's sentiment orientation is considered negative. If there are equal positive and negative sentiment words or no sentiment words in the sentence, then the sentence is considered neutral in its sentiment orientation. If there is a negative word in the sentence, the sentiment orientation of the sentence is reversed. The rules are shown as follows.
For each sentiment word set   (0, 1, 0). The sentiment orientations of the product features are calculated by the above rules.

Product ranking based on IF-TODIM method
(1) Transform the sentiment orientations of product features into IFVs IFVs are a useful tool for representing the ambiguity and hesitation of products' features. IFVs can simultaneously reflect the like, neutral, and unlike of the online review [29]. Based on the theory of IFSs, online reviews of alternative products' sentiment orientations can be expressed simply and completely by IFVs [30].
In addition, most online reviews now have a click-andclick feature that makes it easy to understand the usefulness of each review. Therefore, more important weights are assigned to more praises, which are more useful reviews. Let X j ik be the importance of each review, and X j ik is determined by the number of likes and calculated as follows. (2) IF-TODIM method for ranking products Step 1: calculate the feature values μ i j , ν i j in each alternative product and construct the decision matrix A a i j m×n of product selection.
Step 2: compare the feature values of each two alternative products by Eqs. (8)-(10) and construct the advantagedisadvantage matrix, where "A" or "D" means that A i is larger or smaller than A k .
Step 3: calculate the weight w j of the feature f j as follows.
Firstly, calculate the entropy E i j of each product feature by Eqs. (4)- (7), and normalize the entropy by the following equation: i 1, 2, · · · , m; j 1, 2, · · · , n Then, the entropy weights of each product feature are calculated as follows.
where a j m i 1 h i j .
Step 4: the feature with the largest weight value is regarded as the reference feature f R . The relative weight value w j R of each feature f j over the reference feature f R is calculated by Eq. (25).
Step 5: the dominance degree ϑ(A i , A e ) of the alternative product A i over A e is calculated by Eq. (26). where where d a i j , a ej is calculated by Eqs. (1)-(3).
Step 6: the global prospect value δ(A i ) of each alternative product A i is calculated by Eq. (27).
Step 7: rank the alternative products according to the global prospect values δ( A i ). The larger the value of δ( A i ) is, the better the alternative product A i is.

Decision-making process
Online reviews of five mobile phones from Jingdong Mall (https://www.jd.com/) are crawled. The five alternative mobile phones are iPhone X, Huawei P10, OPPO R11S, Mito T8, and VIVO X9. The crawler software Bazhuayu (http:// www.bazhuayu.com/) is used to crawl 5000 reviews (1000 reviews per phone). After processing, 2000 reviews (400 reviews for each phone) are extracted from the review data set obtained. The mobile phone features extracted by the Apriori algorithm that customers focus on are F {Appearance, Screen, Photo, Battery, Price, System}. The positive sentiment dictionary W + j and negative sentiment dictionary W − j of mobile phone feature are constructed by Eqs. (11)- (14) in Table 1. "/n /a" (Price/n is/v affordable/a) is taken as an example to express the process of the sentiment orientation. W  Table 2.
The steps of ranking mobile phones by the IF-TODIM method are shown as follows.
Step 1: calculate the feature values μ i j , ν i j of each alternative mobile phone by Eqs. The intuitionistic fuzzy decision matrix of mobile phone selection is shown in Table 3.
Step 2: compare the feature values of each two alternative products by Eqs. (8)-(10) and construct the advantagedisadvantage matrix. For example, the score measures of IPHONE X (A 1 ) and HUAWEI P10 (A 2 ) under the attribute appearance (f 1 ) are R h (a 11 ) d h a 11 ,0 1 2 1 + μ a 11 + π a 11 − ν a 11 Therefore, R h (a 11 ) ≺ R h (a 21 ). The score measure of A 1 under the attribute appearance (f 1 ) is smaller than A 2 , represented by "D". The advantage-disadvantage matrix is shown in Table 4.
Step 3: calculate the weight w j of the feature f j . Calculate the entropy e i j of each mobile phone feature by Eq. (4) in Definition 2. For example, the entropy of a 11 is e 11 The entropy matrix E is as follows.    Table 4 Advantage-disadvantage matrix under each feature f j between two alternative mobile phones Then, the normalized entropy matrix is obtained by Eq. (23) as follows. Step 4: the feature with the largest weight value is regarded as the reference feature f R . The relative weight value w j R of each feature f j over the reference feature f R is calculated by Eq. (25). The relative weight values are shown in Table 5.
Step 5: the dominance ϑ( A i , A e ) of the alternative mobile phone A i over A e is calculated by Eq. (26).
Here, assume that θ 1 [36], then the gain and losses φ j (A i , A e ) are calculated and shown in Table 6. For example,  Then, the dominance degree of mobile phone A i over mobile phone A e is calculated by Eq. (26). For example, the dominance degree of mobile phone A 1 over A 2 is 457. The dominance degree matrix is shown in Table 7.
Step Step 7: rank the alternative mobile phones according to the global prospect values δ(A i ), the ranking result is δ( The alternative mobile phones are sorted as VIVO X9 > OPPO R11S > Huawei P10 > Mito T8 > IPHONE X. In the case of priority price, system performance, and appearance, the optimal choice is VIVO X9. According to the online reviews of VIVO X9, most of the online reviews indicate that the system is fluent and the mobile phone is cost-effective. Most of the online reviews of IPHONE X are too expensive, resulting in a lower ranking.

Analysis of the effect of the parameter
The product selection based on online reviews involves the attenuation coefficient θ . The attenuation coefficient θ affecting the ranking result is analyzed by taking different values. When the attenuation coefficient θ 1, 2, 3, 4, the product ranking result calculated by the IF-TODIM method under different attenuation coefficients are shown in Table 8. From the result, the A 5 is always the best choice under different attenuation coefficients. Therefore, different attenuation coefficient values have no effect on the product ranking results.
(1) Comparison with IF-TOPSIS method The main idea of the IF-TOPSIS method for ranking products is to normalize the original data matrix and determine the distance between the alternative products and the optimal or worst solution based on each attribute index's weight [20]. The relative closeness of each alternative product to the optimal solution is used as the evaluating basis. The steps of the IF-TOPSIS method are as follows.
Step 1: the mobile phone's positive ideal solution (PIS) A + and negative ideal solution (NIS) A − are defined as follows.
Then, the PIS and NIS of each mobile phone feature are shown in Table 9.
Step 2: Calculate the weighted distance from each alternative mobile phone A i to the PIS A + and the NIS A − .
Step 3: Calculate the relative closeness (CI i ) of each alternative mobile phone A i as follows.
Here, the weighted Hamming distance between the alternative mobile phone A i and the PIS A + or the NIS A − represented by the IFSs is calculated. The ranking result calculated by the IF-TOPSIS method is shown in Table 10. The product ranking result is Namely, the best choice to buy the alternative mobile phone based on online reviews is OPPO R11S (A 3 ).
(2) Comparison with IF-VIKOR method The IF-VIKOR method is developed by Yang et al. [40] to select the best compromise hotels.
Step 1: the PISs and NISs of mobile phones are shown in Table 9. Table 6 Gain and loss matrix under each feature f j between two alternative mobile phones  Step 2: calculate the S i and R i of alternative mobile phone A i . Step 3: assume that v 0.5, and calculate the Q i of alternative mobile phone A i . Q 1 0.062, Q 2 0.770, Q 3 0.710, Q 4 0, Q 5 1.
Step 4: obtain the ranking result of alternative mobile phones.
The ranking result of alternative mobile phones is obtained as (

3) Comparison with IF-PROMETHEE method
The IF-PROMETHEE method [23] is developed to support the consumers' purchase decisions.
Step 1: the priority index of A i over A j is shown in Table  7.
Step Step 4: the ranking result of the alternative mobile phones is

(4) Discussion
To illustrate the effectiveness of ranking products based on the IF-TODIM method and online reviews, the product ranking results of the IF-TODIM method and the three other methods are shown in Fig. 2. The results show that the product ranking result by the IF-TODIM method is the same as the IF-PROMETHEE method and different from those of the two other methods. The best choice to buy the mobile phone obtained by the IF-TODIM, IF-VIKOR, and IF-PROMETHEE methods is A 5 (VIVO X9), while that of the IF-TOPSIS method is A 3 (OPPO R11S). A 1 (IPHONE X) and A 4 (Mito T8) are always the worst two choices. The main  reason for the different results is that the IF-TODIM method considers the gain and loss of each mobile phone feature and prospect value in the product ranking process. VIVO X9 has some advantages in the attribute of price, and other features reappraise from all the features. The ranking result by the IF-TODIM method is closer to the actual situation. The customers are fully rational in purchasing mobile phones under the IF-TOPSIS and IF-VOKOR method. Customers are non-fully rational in the purchase decision process. The IF-TOPSIS and IF-VOKOR method is not reasonable for the ranking product. Therefore, the IF-TODIM method based on online reviews is more reasonable than the IF-TOPSIS and IF-VIKOR method.

Conclusion
In this paper, a new analytical method for ranking products is presented. The main idea of ranking product method through online reviews and IF-TODIM is as follows. Firstly, the Apriori algorithm is used to identify the product features based on online reviews. Then the sentiment orientation and intensity of the sentiment words for the product features are identified by the lexicon-based sentiment analysis approach. Next, the sentiment orientation of the product features is converted into an IFV, and then the IF-TODIM method is used to determine the ranking results of the alternative products. The proposed method fully considers consumers' subjective needs and different sentiment orientations (positive, neutral, and negative) for each product feature. The IFVs are used to fully reflect the different sentiment orientations of online reviews, which is more elaborate than previous studies and makes up for the lack of consideration of the neural sentiment orientation. In addition, the gain and loss of each mobile phone feature in the product ranking process are also considered. The obtained result is closer to the actual purchase needs of consumers. In general, the degree of membership, non-membership, and hesitation in IFV provides an effective way to solve the problem of product ranking. The proposed method has operability and practical application value and provides a new decision-making technology to solve the problem of product purchase decision-making using online review data in the current era of big data.
The developed method provides a convenient tool to give recommendations for purchasing products, and the decision support system needs to improve. In addition, the emojis and photos in the online review data are neglected during the data pre-processing process. In future work, it is necessary to study the product ranking method combing with emojis and photos.

Data availability
The data used to support the findings of this study are included within the article.

Declarations
Conflict of interest We declare that we do have no commercial or associative interests that represent a conflict of interests in connection with this manuscript. There are no professional or other personal interests that can inappropriately influence our submitted work.

Research involving human participants and/or animals This article
does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.