Intuitionistic Fuzzy Laplacian Twin Support Vector Machine for Semi-supervised Classification

In general, data contain noises which come from faulty instruments, flawed measurements or faulty communication. Learning with data in the context of classification or regression is inevitably affected by noises in the data. In order to remove or greatly reduce the impact of noises, we introduce the ideas of fuzzy membership functions and the Laplacian twin support vector machine (Lap-TSVM). A formulation of the linear intuitionistic fuzzy Laplacian twin support vector machine (IFLap-TSVM) is presented. Moreover, we extend the linear IFLap-TSVM to the nonlinear case by kernel function. The proposed IFLap-TSVM resolves the negative impact of noises and outliers by using fuzzy membership functions and is a more accurate reasonable classifier by using the geometric distribution information of labeled data and unlabeled data based on manifold regularization. Experiments with constructed artificial datasets, several UCI benchmark datasets and MNIST dataset show that the IFLap-TSVM has better classification accuracy than other state-of-the-art twin support vector machine (TSVM), intuitionistic fuzzy twin support vector machine (IFTSVM) and Lap-TSVM.


Introduction
Support vector machine (SVM) was proposed in details by Vapnik et al. [1]. The goal of SVM was to find an optimal hyperplane to separate the labeled data points into two classes. Because of its excellent performance in text classification tasks [2], it soon became the mainstream technology of machine learning. At present, SVM and its variants have been successfully applied in many fields such as face recognition [3], financial distree prediction [4], regression [5], traffic flow prediction [6], medical [7] and more. Proximal support vector machine (PSVM) [8,9] was derived from SVM; it aimed to find two parallel hyperplanes so that each plane was closer to one of two classes and as far away from the other as possible. Furthermore, in order to simplify the constraints, the generalized eigenvalue proximal support vector machine (GEPSVM) [10] was proposed. The main idea of GEPSVM was to replace two parallel hyperplanes with two nonparallel ones. According to this concept, Jayadeva et al. [11] proposed a well-known twin support vector machine (TSVM). Unlike the large quadratic programming problem (QPP) considered by traditional SVM, TSVM solves a pair of relatively smaller QPPs. The constraints of each QPP are only related to the data points of each of the two classes. Therefore, TSVM not only keeps the advantages of SVM, but also trains four times faster than SVM. Based on TSVM, Shao et al. [12] proposed an imbalanced weighted Lagrangian twin support vector machine (WLTSVM) for the imbalanced data classification. Other extensions and applications of TSVM can be found in [13,14].
Recently, the research of semi-supervised learning (SSL) [15][16][17] has become a new hotspot in the field of machine learning. The main reason was that in many practical problems, labeled data are always scarce, but there are large amount of unlabeled data. SSL was to use these unlabeled data to assist a small number of labeled data for learning, so as to improve the performance of classifier. Manifold regularization (MR) [18,19] was one of the frameworks of SSL. In the MR framework, there are two regularization terms. One controls the complexity of classifier in the Reproducing Kernel Hilbert Spaces (RKHS), and the other controls the complexity as measured by the geometry of the distribution. Following the MR framework, Qi et al. [20] proposed a Laplacian twin support vector machine (Lap-TSVM), which was the first twin support vector machine applied in the SSL problem. Extensive experimental results show that Lap-TSVM has very good performance in semi-supervised classification. Other extensions and applications of semi-supervised twin support vector machine can be found in [21,22].
In general, data contain noises which come from faulty instruments, flawed measurements or faulty communication. Learning with data in the context of classification or regression is inevitably affected by noises in the data. If the training samples are mixed by noises, both SVM and its variants are often unable to find an optimal hyper-plane and subsequently have difficulty to obtain satisfactory results. In order to solve such problem, fuzzy support vector machine (FSVM) [23] was proposed. The idea of FSVM was to use a membership function for each training sample. And the introduction of membership function can effectively reduce the effects of noises and outlier points and thus produce a robust classifier. Moreover, combining the TSVM with membership function can not only improve computational efficiency but also pursue robust performance. In recent years, intuitionistic fuzzy twin support vector machine (IFTSVM) [24] has been proposed which assigns a pair of membership and nonmembership functions to every training sample. These two functions help the IFTSVM to reduce the influence of noises and identify support vectors from noises.
The same difficulty was also encountered by the current semi-supervised twin support vector machine and its variants. When there are many noises in the data, the classification results are very poor and unsatisfactory. Ideally, we would like to determine which points are noisy, and then either remove them or greatly lower their weight. Therefore, inspired by the ideas of IFTSVM, we assign a pair of membership functions to each labeled point, which reduces the influence of noises on the classifier. And we introduce the ideas of fuzzy membership functions and the Lap-TSVM. In this paper, we proposed a novel intuitionistic fuzzy Laplacian twin support vector machine (IFLap-TSVM) for a semi-supervised classification problem. We use some constructed tests and several real datasets to evaluate the effectiveness of the IFLap-TSVM. The main advantages of our IFLap-TSVM are: (1) Membership and nonmembership functions are used for each training sample to indicate the contributions of different training samples to the learning of decision functions, which significantly reduces the negative impact of noises and outliers on classification accuracy. (2) Intuitionistic fuzzy number can reduce the influence of noises and outliers in labeled samples, and the semi-supervised framework of manifold regularization was introduced to deal with labeled and unlabeled samples in the primal space and the feature space. The combination of the two can further improve the classification accuracy. (3) IFLap-TSVM has better classification accuracy compared with other state-of-theart TSVM, IFTSVM and Lap-TSVM on constructed tests and real-world datasets.
The remaining parts of this paper are organized as follows. In Sect. 2, we briefly introduce the background of SSL and Lap-TSVM. In Sect. 3, we describe the details of IFLap-TSVM. In Sect. 4, the numerical experiment results on the constructed test dataset, UCI dataset and MNIST dataset are reported. And Sect. 5 concludes the paper.

Background
In this section, we give a brief description of semi-supervised learning framework (SSL) and Lap-TSVM. The training data of the classification problem can be described as follows: where x i ∈ R n , y i = {+1, −1}, i = 1, 2, · · · , l, are the labeled data, and x i , i = l + 1, · · · , l + u, are the unlabeled data. Denote the matrix A ∈ R l 1 ×n as the labeled data belonging to class +1, where every row of matrix A represents a data point. Similarly, the matrix B ∈ R l 2 ×n as the labeled data belonging to class −1. Clearly, we have l 1 + l 2 = l.

Semi-supervised Learning Framework
SSL uses both labeled and unlabeled data to improve supervised learning. The goal is to build a more efficient classifier with large amounts of unlabeled data and relatively few labeled data. Regularization is a technique to prevent over fitting of training data, which is widely used in machine learning [25]. The MR framework takes advantage of the geometry of the probability distribution of the generated data and merges it as an additional regularization term. The decision function of the MR framework can be expressed as [18] f * = arg min where f is an unknown decision function. The first part of the above expression is some loss function on the labeled data. The second part is a regularization term; γ A is the weight of f 2 K and controls the complexity of f in the Reproducing Kernel Hilbert Space. γ I controls the complexity of the function in the intrinsic geometry of marginal distribution, and it is the weight of f 2 I , while f 2 I is an appropriate penalty term that should reflect the intrinsic structure of marginal distribution.
The MR framework [18] incorporates additional information about the geometric structure of the marginal distribution. The important assumption of this approach is that the probability distribution of data has the geometry structure of a Riemannian manifold M. If two points are very close in the intrinsic geometry, then they should have the same or similar labels. The RKHS regularization term f 2 K and the intrinsic regularizer f 2 I are as follows: where represents the decision function values over labeled and unlabeled points. W i j are edge weights in the data adjacency graph, L is the graph Laplacian given by L = D − W , W ∈ R (l+u)×(l+u) is the weight matrix with entries W i j , D is the diagonal matrix with its i-th diagonal D ii = l+u j=1 W i j . More detailed discussion of manifold regularization can be found in [18].

Laplacian Twin Support Vector Machine
Based on the TSVM, the Lap-TSVM [20] model is deduced by introducing the semi-supervised learning framework. According to the semi-supervised learning framework, for the linear case, the primal problems of Linear Lap-TSVM can be written as min and min And W i j are the edge weights in the data adjacency graph and may be defined by k-nearest neighbors or graph kernel as follows: otherwise. (7) ×n includes all the training data, and e is an appropriate ones vector. By introducing the Lagrangian multipliers, the Wolfe dual of the problem (5) and (6) can be formulated as and max Here  [26]. And the augmented vector v 1 , v 2 are given by The same as TSVM, the decision function of Lap-TSVM is as follows: where | · | is the perpendicular distance of point x from the planes w i x + b i . For the nonlinear case, we refer to [20].

Intuitionistic Fuzzy Laplacian Twin Support Vector Machine
In this section, we first describe the concept of intuitionistic fuzzy set and then propose IFLap-TSVM model. The structures of two kernel functions, i.e., linear and nonlinear, are discussed in detail.

Intuitionistic Fuzzy Set
The traditional fuzzy set was given by Zadeh [27] . Let X be a nonempty set, the fuzzy set A in a universe X can be defined as where μ A : X → [0, 1] and μ A (x) is the degree of membership of x belonging to X . As an extension of fuzzy set, an intuitionistic fuzzy set [28] is defined as where μÃ(x) and νÃ(x) are the degrees of membership and nonmembership of x belonging to X . Here μÃ : It is important to select an appropriate membership function to reduce the effect of noises and outlier points. For example, as shown in Fig. 1, the training points A and B are located on the boundary of the positive class, the degrees of membership of these two training points belonging to positive class are the same, but it is obvious that there are many negative points around point B. Therefore, the classification contribution of point A and point B is different. It may lead to wrong predictions if we only consider the membership degree. In this case, we employ the intuitionistic fuzzy number (μ, ν) to each training point as proposed in [29]. μ is the degree of membership function related to the one class, and ν is the degree of nonmembership function related to the other class. It is obvious that the points A and B in positive class have different degrees of nonmembership.

The Degree of Membership Function
In the high-dimensional feature space, the distance between training point and the class center is used as membership function. The distance between training points is expressed as where φ represents the mapping from the sample space to the high-dimensional feature space.
The class center of each class is given by where l + and l − denote the total number of positive and negative points, respectively. The radius of each class can be measured by For each training point, the degree of membership can be defined as where δ > 0 is an adjustable parameter.

The Degree of Nonmembership Function
The degree of nonmembership function is determined by the ratio between the number of all heterogeneous points and the total number of training points in its neighborhood. The degree of nonmembership function is as follows: is the proportion between all heterogeneous points and the total number of points in its neighborhood where | · | denotes the cardinality and α > 0 is an adjustable parameter. The degree of membership and nonmembership of a training point is designed based on the inner product distance in the feature space. Therefore, the kernel functions are used to make the construction of intuitionistic fuzzy numbers.

The Score Function
Based on the above definitions, the training points can be converted into the intuitionistic fuzzy numbers as follows: where μ i , ν i denote the degrees of membership functions and nonmembership functions of x i , respectively. For each given intuitionistic fuzzy number, a score function can be used to measure the classification contribution of each training point. The score function can be defined as The score value s i can easily distinguish the support vector from noises and outliers points. For example, when v i = 0 (positive point A shown in Fig. 2), there are no negative points in the neighborhood of A; a correct degree of membership function can be easily defined. Obviously, the positive point A is far away from the class center so its classification contribution is small. When μ i ν i (negative point B shown in Fig. 2), the point B has no negative points in the neighborhood; the degree of nonmembership value is greater than the degree of membership value. Thus, B is a noise point with zero classification contribution. For the case of positive point C, we have μ i > ν i and ν i = 0. C is far away from the class center, but there are some positive points in its neighborhood. Thus, it may be a support vector, instead of an outlier point. Hence, the classification contribution of point C is greater than that of outlier A.

Linear IFLap-TSVM
According to the semi-supervised learning framework, the square loss function and hinge loss function V (x i , y i , f ) can be expressed as where A i ,· and B i ,· represent the i-th row of A and B, respectively. S 1,i and S 2,i denote the i-th element in the vector S 1 and S 2 , respectively. And S 1 ∈ R l + and S 2 ∈ R l − are the score values of positive and negative points, respectively.
The regularization terms f 1 2 K and f 2 2 K can be written as And the manifold regularization terms f 1 2 I and f 2 2 I are defined by In accordance with (2), the linear IFLap-TSVM can be written as and min where c 1 , c 2 , · · · , c 6 are pre-specified penalty factors, and ξ, η are slack variables, e 1 , e 2 , e are column vectors of ones of appropriate dimensions, L is the graph Laplacian.
With the KKT conditions, we get Combining (31) and (32)

leads to
where I is an identity matrix of appropriate dimensions. It can be proved that H H + c 2 I + c 3 J L J is a positive definite matrix according to matrix theory [26].
Since β 0, from (33), we get Therefore, the Wolfe dual of the problem (28) can be written as Likewise, the dual of (29) is is written as follows: Once optimal v * 1 , v * 2 are achieved, the two hyperplanes are known. A new input data point x can be classified as positive or negative class based on the decision function where | · | is the absolute value. And the whole procedure of linear IFLap-TSVM is described in Algorithm 1.

Algorithm 1 The linear IFLap-TSVM algorithm
Input: l 1 data points of class +1, l 2 data points of class -1,u unlabeled data points, Test data point x.

Output:
Class label of the test data point.
Step 1 Set the value of the penalty factors c 1 , c 2 , c 3 and the graph kernel parameter σ .
Step 2 Compute the score functions S 1 and S 2 of data points in class +1 and -1.
Step 3 Compute graph Laplacian L=D−W where W is a weight matrix given by W i j =exp(−||x i − x j || 2 /2σ 2 ) and D is a diagonal matrix given by D ii = l+u j=1 W i j .
Step 4 Determine parameters of the two non-parallel hyperplanes using (35) and (39).
Step 5 Determine the class label of the test data point based on (40).

Nonlinear IFLap-TSVM
So far the above discussion is restricted to the linear case. Here, we extend the linear IFLap-TSVM to the nonlinear case. We consider the following two kernel-generated hyperplanes: where ) is a chosen kernel function. The nonlinear optimization problem can be written as and min The Lagrangian corresponding to the problem (42) is given by The KKT conditions are obtained as follows: Combining (45) and (46) leads to , and the augmented vector v non1 = [λ 1 b 1 ] , then (48) can be rewritten as Therefore, the Wolfe dual of the problem (42) can be written as Likewise, the dual of (43) is Once the optimal v * non1 , v * non2 are obtained, the two hyperplanes are known. A new input data point x can be classified as positive or negative class based on the decision function (53) The whole procedure of nonlinear IFLap-TSVM is described in Algorithm 2.

Algorithm 2
The nonlinear IFLap-TSVM algorithm Input: l 1 data points of class +1,l 2 data points of class -1,u unlabeled data points, Test data point x.

Output:
Class label of the test data point.
Step 1 Choose a kernel function K (x, y). And set the value of the penalty factors c 1 , c 2 , c 3 and the kernel parameter σ .
Step 2 Compute the score functions S 1 and S 2 of data points in class +1 and -1.
Step 3 Compute graph Laplacian L=D−W , where W is a weight matrix given by Step 4 Determine parameters of the two non-parallel hyperplanes using (49) and (52).
Step 5 Determine the class label of the test data point based on (53).

Experiment
In this section, we investigate the effectiveness and generalization capability of the proposed method on artificial and UCI datasets, and we compare IFLap-TSVM with Lap-TSVM [20], IFTSVM [24] and TSVM [11].

Artificial Datasets
In order to verify the validity of the model, two artificial datasets are constructed to evaluate IFLap-TSVM. We use two lines and half-moons containing 200 points as tests. And for the two lines dataset, we select a linear kernel, for the half-moons dataset, we select an RBF kernel. We choose 10 labeled points of each class as training set. And we inject different proportion of noise, i.e., 10% and 20%, into the training points. For example, 10% of the training points are randomly selected and their class are changed to another class.

The Impact of the Parameters
In this subsection, the effects of different setting of the parameters c 1 and c 3 are analyzed using the half-moons dataset. In the first experiment, we compare the performance of IFLap-TSVM and Lap-TSVM with different c 1 . For these two classifiers, we consider the half-moons dataset with 10% noise, and we fix the regularization parameters c 2 = c 3 = 1 and the RBF kernel parameter σ = 0.5, and then, let c 1 vary from 2 −5 to 2 5 . Figure 3(a) shows the accuracy rates of the IFLap-TSVM and Lap-TSVM. IFLap-TSVM achieves the optimal accuracy when c 1 = 2 2 , while Lap-TSVM obtains the best result when c 1 = 2 0 . The value of c 1 corresponding to the best performance for IFLap-TSVM is larger than that of Lap-TSVM, because the score s i of each training sample point in IFLap-TSVM is less than or equal to 1. In IFLap-TSVM, to achieve different levels of penalty, training samples are given different score values. The smaller the score value, the smaller the effect of training sample.
In the second experiment, in order to reflect the effectiveness of the manifold regularization, we fix c 1 = c 2 = 1 and the RBF kernel parameter σ = 0.5, and let c 3 vary from 2 −5 to 2 10 . It is easy to see from the results in Fig. 3(b) that with the increase in c 3 , the accuracy of IFLap-TSVM is also improved. However, the value of parameter c 3 should not be too large. When the value of c 3 exceeds 2 3 , the accuracy begins to decline drastically, and finally it will drop to 50%. The reason for this is that when the value of c 3 exceeds a certain limit, the manifold regularization will be penalized too much, which makes it lose its original function and make the model degenerate into a supervised model.

Comparison with Other Methods
In this subsection, we compare the effectiveness of our IFLap-TSVM with Lap-TSVM, IFTSVM and TSVM on the two lines and half-moons datasets. Figure  results of each classifier with a noise level of 0%,10% and 20%, respectively. It can be seen from the results that with the increase in noise, our IFLap-TSVM can produce more accurate hyperplanes than other models. And the one-run results of each classifier on the half-moons dataset with different level of noise are shown in Figs. 5, 6 and 7. It can be seen that compared with other methods, our IFLap-TSVM is more robust to noise, and the decision boundary is more accurate. In addition, for the half-moons dataset, we have done 10 experiments to further evaluate the classification results and the training time of the classifier, as shown in Table 1. The results show that with the increase in noise level, the accuracy of each method decreases. However, the effect of noise on the accuracy of IFLap-TSVM is the smallest. The training time of IFLap-TSVM is the longest, because compared with the supervised model, the objective function of dual QPP of semi-supervised model needs twice matrix inversion and the size of these matrices is (n + 1) × (n + 1). And IFLap-TSVM has more steps to calculate the score value of each training sample than Lap-TSVM. In general, compared with Lap-TSVM, IFTSVM and TSVM, IFLap-TSVM provides higher accuracy for both noiseless and noisy datasets. This is because IFLap-TSVM can use the information of unlabeled data to improve accuracy and use intuitionistic fuzzy number to reduce the effect of noises and outliers.

UCI Datasets
In this section, we investigate the performance of the IFLap-TSVM model on the UCI dataset [31]. And the results are compared with TSVM, Lap-TSVM and IFTSVM. Before training, all data are scaled such that all features locate in [0, 1]. First, each dataset is divided into two subsets: 65% for training and 35% for testing. Then, for each dataset, we randomly label m(m = 10%, 20%, 30%) as labeled data and the remainder as unlabeled data. Table 2 shows the detailed information of the UCI dataset.
The classification accuracy and standard deviation of the IFLap-TSVM and other models are shown in Tables 3, 4  in the proportion of labeled data, the classification performance of all classifiers also increases. And from the average "mean" accuracy given in Tables 3, 4 and 5, the performance of IFLap-TSVM is better than other methods under the same labeled data. Furthermore, IFLap-TSVM and Lap-TSVM have higher classification accuracy than IFTSVM and TSVM, which also shows that manifold regularization can help the classification model by using the geometric distribution information of labeled data and unlabeled data. More importantly, the accuracy of IFLap-TSVM is higher than that of Lap-TSVM, indicating that the intuitionistic fuzzy function is effective in reducing the effect of noise and outlier points.    Figure 9 shows the results of experiments. The results show that with the increase in unlabeled data, the test accuracies of semi-supervised learning classifiers are grad-  Dataset  IFLap-TSVM  Lap-TSVM  IFTSVM  TSVM ually improved, because manifold regularization can use the geometric distribution information of labeled data and unlabeled data to find a more accurate classifier. In most cases, the classification results of IFLap-TSVM are better than other models, and the standard deviations of IFLap-TSVM and IFTSVM are smaller than that of Lap-TSVM and TSVM. Therefore, intuitionistic fuzzy can effectively reduce the impact of noise and outliers on classification accuracy.

Conclusion
In this paper, we have proposed an intuitionistic fuzzy Laplacian twin support vector machine for a semi-supervised classification problem, which is inspired by the intuitionistic fuzzy number and Lap-TSVM. Not only can it reduce the effect of noises and outliers through the membership and non-membership functions, but also use the geometric distribution information of labeled data and unlabeled data to construct a more accurate classifier. Experimental results indicated that our IFLap-TSVM performs well on both constructed test data and several real-world datasets. Compared with Lap-TSVM, IFTSVM and TSVM, IFLap-TSVM has the best performance. In the future, we will pay attention to improve intuitionistic fuzzy number to further reduce the effect of noises and outliers on the model. Moreover, there may be noises in unlabeled samples and that how to deal with this should also be considered. And another possible work is to extend IFLap-TSVM to multi-class classification. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.