New similarity measures for single-valued neutrosophic sets with applications in pattern recognition and medical diagnosis problems

The single-valued neutrosophic set (SVNS) is a well-known model for handling uncertain and indeterminate information. Information measures such as distance measures, similarity measures and entropy measures are very useful tools to be used in many applications such as multi-criteria decision making (MCDM), medical diagnosis, pattern recognition and clustering problems. A lot of such information measures have been proposed for the SVNS model. However, many of these measures have inherent problems that prevent them from producing reasonable or consistent results to the decision makers. In this paper, we propose several new distance and similarity measures for the SVNS model. The proposed measures have been verified and proven to comply with the axiomatic definition of the distance and similarity measure for the SVNS model. A detailed and comprehensive comparative analysis between the proposed similarity measures and other well-known existing similarity measures has been done. Based on the comparison results, it is clearly proven that the proposed similarity measures are able to overcome the shortcomings that are inherent in existing similarity measures. Finally, an extensive set of numerical examples, related to pattern recognition and medical diagnosis, is given to demonstrate the practical applicability of the proposed similarity measures. In all numerical examples, it is proven that the proposed similarity measures are able to produce accurate and reasonable results. To further verify the superiority of the suggested similarity measures, the Spearman’s rank correlation coefficient test is performed on the ranking results that were obtained from the numerical examples, and it was again proven that the proposed similarity measures produced the most consistent ranking results compared to other existing similarity measures.


Introduction
The connection between precision and uncertainty has per-2 plexed humanity for centuries. Lukasiewicz [1], a Polish 3 logician and philosopher, gave the first formulation of multi- 4 valued logic which led to the study of possibility theory. 5 The first simple fuzzy set and fundamental thoughts of fuzzy 6 set operations were proposed by Black [2]. To overcome 7 the problem of handling uncertain and imprecise informa- Extended author information available on the last page of the article a fuzzy set is a single value in the interval of [0,1]. Fuzzy set 11 theory has been widely applied in a plethora of application 12 fields, including medical diagnosis, engineering, economics, 13 image processing and object recognition (Phuong et al. [4]; 14 Shahzadi et al. [5]; Tobias and Seara [6]). 15 The general fuzzy set was extended to the intuitionis-16 tic fuzzy set (IFS) by Atanassov [7]. The IFS model has a 17 degree of membership μ A (x i ) ∈ [0, 1] and a degree of non-18 membership ν A (x i ) ∈ [0, 1], such that μ A for each x ∈ X . The IFS model definitely extends the classi-20 cal fuzzy set model; however, it is often difficult to be applied 21 in real-life decision making situations, as only incomplete 22 and vague information can be dealt with but not indeterminate 23 or inconsistent information. Hence, Smarandache [8] initially 24 proposed the idea of the neutrosophic set (NS) which, from 25 a philosophical point of view, more effectively deals with [30] used a neutrosophic approach to minimize the cost of 79 project scheduling under uncertain environmental conditions 80 by assuming linear time-cost trade-offs. Abdel-Basset and 81 Mohamed [31] proposed a combination of the plithogenic 82 multi-criteria decision making approach based on TOPSIS 83 and the criteria importance through inter-criteria correlation 84 (CRITIC) method to evaluate the sustainability of a sup-85 ply chain risk management system. Abdel-Basset et al. [32] 86 considered the resource leveling problem in construction 87 projects using neutrosophic sets with the aim to overcome 88 the ambiguity surrounding the project scheduling decision 89 making process. Besides these, many other scientific studies 90 related to various extensions of the neutrosophic set model 91 have also been published over the years. Akram et al. [33] 92 developed an approach based on the maximizing deviation 93 method and TOPSIS for solving MCDM problems under the 94 assumptions of a simplified neutrosophic hesitant fuzzy envi-95 ronment. Zhan et al. [34] proposed an efficient algorithm to 96 solve MCDM problems based on bipolar neutrosophic infor-97 mation. Aslam [35] introduced a novel neutrosophic analysis 98 of variance, whereas Sumathi and Sweety [36] suggested a 99 new form of fuzzy differential equation using trapezoid neu-100 trosophic numbers. 101 Moreover, a lot of information measures for the SVNS 102 model have been proposed over the years, such as similarity 103 measures, distance measures, entropy measures, inclusion 104 measures and also correlation coefficients. Some of the 105 most important research works pertaining to similarity and 106 distance measures for SVNSs are due to Broumi and Smaran-107 dache [37], Ye [38][39][40][41][42][43][44], Ye and Zhang [45], Majumdar and 108 Samanta [46], Mondal and Pramanik [47], Ye and Fu [48], 109 Liu and Luo [49], Huang [50], Mandal [75] years ago, a lot of scholars and researchers have been 124 continuously proposing new similarity measures for fuzzy 125 based models, including the SVNS model, and applying 126 these measures in solving various practical problems related 127 to MCDM (Ye [41]; Ye and Zhang [45]; Pramanik et al. 128 [53]; Mondal and Pramanik [47]; Aydogdu [65]; Mandal and 129 Basu [76]), pattern recognition (Sahin et al. [52]), medical 130 diagnosis (Shahzadi, Akram and Saeid [5]; Ye and Fu [48]; 131 Abdel-Basset et al. [77]), clustering analysis (Ye [41,43] a falsity-membership function F A (x). These three func-182 tions T A (x), I A (x), F A (x) in X are real standard or non-183 standard subsets of − 0, 1 + , such that T A (x) : X → 184 − 0, 1 + [, I A (x) : X →] − 0, 1 + , and F A (x) : X → 185 − 0, 1 + . Thus, there is no restriction on the sum of T A (x), I A 186 (x) and F A (x), so that − 0 ≤ sup T A (x) + sup I A (x) + sup F A 187 (x) ≤ 3 + . 188 Smarandache [8] introduced the neutrosophic set from 189 a philosophical point of view as an extension of the fuzzy 190 set, the IFS, and the interval-valued IFS. Although the con-191 cept was a novel one, it was found to be difficult to apply 192 neutrosophic sets in practical problems, mainly due to the 193 range of values of the membership functions which lie in the 194 non-standard interval of − 0, 1 + . Datasets in many real-life 195 situations are often imprecise, uncertain and/or incomplete. 196 Any discrepancies or deficiencies in the used datasets will 197 have an adverse effect on the decision making process and, by 198 extension, on the results that are generated. Hence, it is often 199 pertinent to have a robust framework to effectively represent 200 all types of imprecise, uncertain and incomplete informa-201 tion. Fuzzy set theory was introduced as a good alternative 202 to deal with imprecise, inconsistent and incomplete informa-203 tion as classical methods, such as set theory and probability 204 theory, were unable to deal with such deficiencies in infor-205 mation. However, fuzzy set theory was found to be less than 206 ideal in dealing with imprecise, inconsistent and incomplete 207 information, as it only takes into consideration the truth com-208 ponent of any information and it is not able to handle the 209 falsity and indeterminacy components of the information. 210 As fuzzy set theory evolved into other fuzzy based models, 211 neutrosophic sets were introduced by Smarandache [8] as an 212 efficient mathematical model to deal with imprecise, incon-213 sistent and incomplete information. The SVNS model, which 214 was conceptualized by Wang et al. [10] as an extension of 215 the neutrosophic set model, has proven to be an effective 216 model for handling imprecise, inconsistent and incomplete 217 information in a systematic manner due its ability to con-218 sider the degree of truth, falsity and indeterminacy for each 219 piece of information. In addition, the structure of the SVNS 220 model in which its membership functions assume values in 221 the standard interval of [0, 1] makes it compatible with the 222 other fuzzy based models, thereby making it more convenient 223 to be applied to solving real-life decision making problems 224 with actual datasets. All these served as reasons to choose the 225 SVNS model as the object of study in this paper. The formal 226 definition of the SVNS is presented below.

227
Definition 2.2 [10]. Let X be a universal set. An SVNS A 228 in X is concluded by a truth-membership function T A (x), 229 an indeterminacy-membership function I A (x) and a falsity-230 membership function F A (x). An SVNS A can be signified 231

257
Definition 2.5 [83]. For any two given SVNSs A and B, the 258 addition and multiplication operation of A and B are defined 259 as shown below: . . , n, be 300 two SVNSs over the universe X .
373 Therefore, we have: the following hold:

422
(iv) The proof is similar to that of (iii) and is therefore
, we have the following: Therefore, we have: Proof For the sake of brevity, we only prove property (i) to to (iv) above. The proof for property (iv) is similar to that of 533 property (iii) and is therefore omitted.
(iv) The proof is similar to that of (iii) and is therefore 565 omitted.

566
Comparative studies 567 In this section, we conduct a comparative analysis between 568 the proposed similarity measures and other existing similar-569 ity measures presented in the literature to show the drawbacks 570 of the existing similarity measures and the advantages of the 571 suggested similarity measures.

572
Existing similarity measures for SVNSs 573 In this subsection, we present a detailed and comprehensive 574 comparative study of the previously defined similarity mea-575 sures and some existing similarity measures in the literature. 576 The existing similarity measures that will be considered in 577 this comparative study are listed in Table 1.

578
Comparison between the proposed and existing 579 similarity measures for SVNSs using artificial sets 580 In this subsection, we use 10 artificial sets of SVNSs that 581 consist of a combination of special SVNNs to do a thorough 582 comparison between the proposed similarity measures and 583 existing similarity measures which are listed in Table 1. The 584 results from this comparative study are presented in Table 2, 585 where all values in bold indicate unreasonable results. From 586 Table 2, it can be clearly seen that the proposed similarity 587 measures S 10 and S 11 are able to overcome the shortcomings 588 that are inherent in the existing similarity measures by pro-589 ducing reasonable results in all 10 cases that are studied. The 590 drawbacks and problems that are inherent in existing sim-591 Ye and Zhang [45] Majumdar and Samanta [46]

Cui and Ye [57]
Broumi and Smarandache [37]     ( p 1 in S Y 6 , S Y 7 , S G N , S P S , λ 1, β 1 1, β 2 β 3 β 4 0, λ 0.5 in S p , t 1 in S G N , t 1 2, t 2 3 in S P S andα β γ 1 3 in S 6 , S 7 ) Values in bold indicate unreasonable results "N/A" indicates that the corresponding formula failed to calculate the similarity value because of the "division by zero" problem ilarity measures are discussed in detail in "Discussion and 592 analysis of results".

657
(vii) Many of the similarity measures have been found to 658 produce unconscionable results in some of the cases 659 which are shown in Table 2 Table 4 The values of the similarity measures for our proposed formulae

Example 1 A numerical example adapted from Garg and 715
Nancy [54] is used here to illustrate the effectiveness of 716 the proposed similarity measures. Suppose that there are 717 3 known patterns A 1 , A 2 , and A 3 which are represented 718 by specific SVNSs, in a given universe of discourse X 719 {x 1 , x 2 , x 3 , x 4 }, and an unknown pattern B ∈ SV N S(X ), 720 all of which are presented in Table 3.

721
The values of the similarity measures between B and A k , 722 k 1, 2, 3 have been computed for all of the proposed 723 similarity measures, S i , i 1, 2 . . . , 11, and the results are 724 presented in Table 4. Note that values in bold indicate the 725 largest value of the corresponding similarity measure.

726
From Table 4, it can be seen that all of the proposed simi-727 larity measures produced the same ranking (i.e., A 2 > A 3 > 728 A 1 ), except for measure S 6 which produced a slightly dif-729 ferent ranking (i.e., A 2 > A 1 > A 3 ). However, based on 730 the ranking orders produced by all of the proposed similarity 731 measures it can be clearly concluded that sample B belongs 732 to pattern A 2 .  Table 1, 740 are applied to the pattern recognition problem presented in 741 Example 1. The results obtained are summarized in Table 5.

742
Note that the row in bold indicates a different ranking order.

743
From Table 5, it can be seen that all of the existing sim-  Table 6.

771
In the medical diagnosis, assume that we take a sample 772 from a patient P 1 with all the symptoms, which is represented 773 by the following SVNS information: The row in boldindicates a different ranking order. From Table 7, it can be seen that only formulas S 2 and S 7 781 produced results that are not consistent with the results pro-782 duced by the other proposed formulas. Since the largest value 783 of similarity indicates the proper diagnosis, we can conclude 784 that the diagnosis of patient P 1 is Q 2 (malaria) in all cases 785 except for the cases of S 2 and S 7 in which the patient was 786 diagnosed as having viral fever (S 2 ) and typhoid (S 7 ), respec-787 tively. These results are consistent with the results presented 788 123  Table 7 The similarity measures between P 1 and Q i for the proposed formulas   Table 1 are studied by applying these mea-799 sures to Example 2. The results obtained are given in Table   800 8. Note that values in bold indicate the largest value of the 801 corresponding similarity measure.

802
As we can see from Table 8  ( p 1 in S Y 6 , S Y 7 , S G N , S P S , λ 1, β 1 1, β 2 β 3 β 4 0, λ 0.5 in S p , t 1 in S G N and t 1 2, t 2 3 in S P S ) Values in bold indicate the largest value of the corresponding similarity measure   Table 9. From the results in Table 9, it can be clearly seen that  Table 1.

854
Summary of the discussion and overall 855 evaluation of the results 856 Through the comparative analyses that have been done, a 857 few major weaknesses and inherent problems were identi-858 fied in many of the existing similarity measures. Some of the 859 existing measures did not fulfill the axiomatic requirement, 860 failed to distinguish the positive difference and negative dif-861 ference, failed to produce any results due to the division by 862 zero problem, produced counter-intuitive results or produced 863 unreasonable results in some cases. From the results of the 864 comparative study presented in "Comparison between the 865 proposed and existing similarity measures for SVNSs using 866 artificial sets" and shown in in Table 2, it was found that only 867 2 of the existing similarity measures (S R X Z and S Y Z ) and 2 of 868 the proposed similarity measures (S 10 and S 11 ) did not pro-869 duce any unreasonable or counter-intuitive results. However, 870 through the Spearman's rank correlation coefficient test done 871 in "Ranking analysis with Spearman's rank correlation coef-872 ficient" it was evident that the proposed similarity measures 873 S 10 and S 11 had also the highest correlation with the actual 874 ranking, thereby proving that these similarity measures are 875 superior to the existing measures S R X Z and S Y Z . 876 We also compared the performance of these two proposed 877 similarity measures (S 10 and S 11 ) in terms of the discrimi-878 native power of the results obtained via the corresponding 879 these two formulas. From the illustrative examples given in 880 "Application of the similarity measures in a pattern recogni-881 tion problem" and "Application of the similarity measures in 882 a medical diagnosis problem", it can be observed that both of 883 these proposed similarity measures (S 10 and S 11 ) produced 884 the exact same rankings as the actual rankings which indi-885 cates that both of these measures are effective and feasible. 886 However, S 11 has a higher level of discriminative power com-887 pared to S 10 , and this can be observed by the results obtained 888 from the application of these measures to the pattern recog-889 nition and medical diagnosis problems in Tables 4 and 7, 890 respectively, in which the values of the decision values are 891 extremely close to another. It can be seen that S 11 could better 892 discriminate the values of the decision values and produce 893 results that show a clear distinction between the decision 894 values. By using this specific measure, we managed to dis-895 tinguish between the decision values, a result that enabled 896 us to rank the alternatives clearly and, consequently, enabled 897 clear and firm decisions to be made. Furthermore, S 11 has 898 a lower level of computational complexity. Hence, it can be 899 concluded that S 11 is superior to S 10 .    The results of the Spearman's rank test verified the superi-955 ority of our proposed similarity measures of S 10 and S 11 956 as both produced rankings that are perfectly correlated 957 with the actual rankings, thereby proving the superior-958 ity of our proposed similarity measures compared to the 959 existing similarity measures. 960 6. To further determine the more superior measure between 961 these two proposed similarity measures (S 10 and S 11 ), 962 we analyzed the discriminative power of these measures. 963 It was concluded that S 11 is superior to S 10 as it had a 964 higher discriminative power and a lower computational 965 complexity compared to S 10 .

966
Suggestions for future research 967 The future direction of this work involves the development of 968 other improved information measures such as entropy mea-969 sures, cross-entropy measures and inclusion measures for 970 SVNSs that are free from problems inherent in correspond-971 ing existing measures. We are also looking at applying the 972 proposed measures to actual datasets of real-world problems 973 instead of hypothetical datasets [85][86][87][88][89][90][91]. However, to accom-974 plish these goals, an effective method of converting crisp data 975 in real-life datasets has to be developed so that available crisp 976 data can be converted effectively without any significant loss 977 of data that would possibly affect the accuracy of the obtained 978 results.