A fuzzy universum least squares twin support vector machine (FULSTSVM)

Abstract

Universum based twin support vector machines give prior information about the distribution of data to the classifier. This leads to better generalization performance of the model, due to the universum. However, in many applications the data points are not equally useful for the classification task. This leads to the use of fuzzy membership functions for the datasets. Similarly, in universum based algorithms, all the universum data points are not equally important for the classifier. To solve these problems, a novel fuzzy universum least squares twin support vector machine (FULSTSVM) is proposed in this work. In FULSTSVM, the membership values are used to provide weights for the data samples of the classes, as well as to the universum data. Further, the optimization problem of proposed FULSTSVM is obtained by solving a system of linear equations. This leads to an efficient fuzzy based algorithm. Numerical experiments are performed on various benchmark datasets, with discussions on generalization performance, and computational cost of the algorithms. The proposed FULSTSVM outperformed the existing algorithms on most datasets. A comparison is presented for the performance of the proposed and other baseline algorithms using statistical significance tests. To show the applicability of FULSTSVM, applications are also presented, such as detection of Alzheimer’s disease, and breast cancer.

Introduction

Support vector machines (SVMs) are one of the widely used supervised learning algorithms for classification problems [5]. SVM has been applied on applications ranging from face recognition [36], text categorization [30], to diagnosis of diseases, such as epilepsy [21, 35] or Alzheimer’s disease [28]. The objective function of SVM is convex, leading to a quadratic programming problem (QPP). Moreover, due to the convexity, the optimization problem of SVM gives a globally optimal solution. This is the benefit of SVM in comparison to techniques such as artificial neural networks (ANN), which give rise to locally optimal solutions. However, for solving a QPP, the computational complexity is very high i.e. \(O(m^3)\), where m is the number of samples. In order to reduce the overhead of solving one large QPP, Jayadeva et al. [13] proposed a twin support vector machine (TWSVM). TWSVM is based on the idea of twin hyperplanes, rather than one as in SVM. Further, Kumar and Gopal [15] proposed a least squares twin SVM (LSTSVM) by minimizing the squared errors of the data points. The computation time of LSTSVM is very less in comparison to TWSVM. However, the LSTSVM classifier is more sensitive to outliers, due to the quadratic loss function. In an evaluation of 187 classifiers, robust energy-based least squares twin support vector machine (RELSTSVM) [27] turned out to be one of the better performing classifiers. This can be attributed to the energy values used for removing the effect of noise.

Weston et al. [32] proposed the idea of universum data to improve the generalization performance of the SVM classifier. Universum is introduced in SVM based on the principle of maximal contradiction. In the formulation of SVM, there is no information about data distribution. Therefore, by introducing the universum data, prior information is incorporated in the optimization problem of SVM. This prior information is in a form of Bayesian prior for SVM [32]. Unlike the data points belonging to the classes, the universum data are unlabelled, and lie in between the binary classes. The univerum based SVM algorithm (USVM) includes the universum points in an \(\epsilon\)-insensitive tube between the binary classes. This algorithm is further improved in [18] by proposing a universum based twin support vector machine (UTSVM). The UTSVM algorithm computes the hyperplanes faster than USVM. To reduce the computation cost of UTSVM, Xu et al. [33] proposed a universum least squares twin support vector machine (ULSTSVM) by using quadratic loss for universum points. However, in many applications, the generation of universum is not proper for the classification problem [21], and is an active area of research. Cherkassky et al. [6] discussed about the conditions for the effectiveness of universum data. Like other forms of data, the universum data also suffer from the problem of noise, and therefore proper weights must be given to the data points of the classes, and the universum.

In many applications, the data are generated with noise in the features or labels. One way to solve this is to use fuzzy memberships for the different data points. Lin and Wang [16] proposed a fuzzy SVM using distance based fuzzy functions. Fuzzy SVM has also been used for multi-class problems [30], using a one-against-all (OAA) approach. This approach of fuzzy memberships is extended to class imbalance problems by Batuwita and Palade [4] using different fuzzy membership functions. Recently, an efficient fuzzy based approach is proposed as robust least squares twin support vector machine for class imbalance learning (RFLSTSVM-CIL) [22]. In the RFLSTSVM-CIL algorithm, a fuzzy membership function is also proposed to remove the effect of noisy data in class imbalance scenarios. A bilateral-weighted fuzzy SVM classifier is proposed [3] by assigning fuzzy membership values to the data points for both the classes. For detection of breast cancer, hepatitis, and diabetes, a weighted least squares twin SVM (WLSTSVM) [29] is proposed using hybrid feature selection. Moreover, a weighted least squares SVM is used with manifold regularization for nonlinear systems [19].

For universum based algorithms, an entropy based fuzzy membership is used in [23] to propose a fuzzy universum SVM (FUSVM) and fuzzy universum twin SVM (FUTSVM). In FUSVM and FUTSVM, the universum data is given weightage on the basis of fuzzy membership values. Fuzzy based twin SVMs have also been used in financial risk predictions. A kernel based fuzzy twin SVM algorithm [12] is proposed for estimating financial risks. A novel twin SVM algorithm [11] is proposed using fuzzy hyperplane for stock price prediction using financial news articles. Moreover, in the case of incremental and decremental learning, a fuzzy twin bounded support vector machine is proposed [17]. Even for regression problems, fuzzy based SVM models have been proposed, such as for financial time series forecasting [14].

In this work, to show the applicability of the proposed algorithm, we have included two applications: one is a neurological disease i.e. Alzheimer’s disease, and other is breast cancer. Alzheimer’s disease is a neuro-degenerative disease, which is detected through magnetic resonance imaging (MRI). Usually, the patients affected are of age 60 years and above [9]. Breast cancer consists of the majority of cancers in woman [34]. Machine learning techniques can improve the classification of identifying the cancer. Therefore, we used histopathological images to classify breast cancer using the proposed approach.

Inspired by the approach of RFLSTSVM-CIL and universum learning, in this work we present a novel fuzzy based universum least squares twin support vector machine (FULSTSVM). The main contributions of this work are as follows:

  • A novel fuzzy based least squares twin SVM algorithm is presented using universum data.

  • To remove the impact of outliers, fuzzy memberships are calculated for the data points of the classes.

  • Universum data are utilized to give prior information about data distribution.

  • To give proper information about data distribution, fuzzy membership is assigned to the universum data points.

  • The proposed fuzzy based approach is incorporated with a least squares model, leading to a system of linear equations for generating the classifier.

  • Experiments are shown for benchmark datasets, with comparative analysis of the proposed and baseline algorithms.

  • Applications are shown on Alzheimer’s disease and breast cancer datasets.

We use the following mathematical notations in this work: A vector represented as w is a column vector. The matrices \(D_1\) and \(D_2\) contain the data points of the binary classes having size \(m_1\times n\) and \(m_2\times n\), respectively, where n is the number of features in each sample. The total number of data points are \(m=m_1+m_2\). The universum data matrix is denoted by U having dimension \(m_3\times n\). The transpose of a vector is denoted as \(w^t\), 2-norm is represented by \(\Vert w\Vert\), and e is a vector of ones of appropriate dimension.

Related work

This section presents the formulations of two related algorithms for this work in brief. The algorithms are twin support vector machine (TWSVM), and universum least squares twin support vector machine (ULSTSVM).

TWSVM

The twin hyperplanes of nonlinear TWSVM algorithm are generated by solving the following optimization problems,

$$\begin{aligned}&\min _{w_1,\,b_1,\,\xi _1} \,\frac{1}{2}||K(D_1,G^t)w_1+e_1b_1||^2+{c_1}e_2^t\xi _1 \nonumber \\&\quad s.t. -(K(D_2,G^t)w_1+e_2b_1)+\xi _1\ge e_2, \xi _1\ge 0, \end{aligned}$$
(1)
$$\begin{aligned}&\quad \min _{w_2,\,b_2,\,\xi _2} \,\frac{1}{2}||K(D_2,G^t)w_2+e_2b_2||^2+{c_2}e_1^t\xi _2 \nonumber \\&\quad s.t. \quad K(D_1,G^t)w_2+e_1b_2+\xi _2\ge e_1, \xi _2\ge 0, \end{aligned}$$
(2)

where \(K(D_1,G^t)\) is the kernel matrix, \(G=[D_1;D_2]\), \(\xi _i\) is slack variable, \(c_i\) is penalty parameter, and \(e_i\) is vector of ones of suitable dimension, \(i=1,2\).

The classifiers are generated by calculating the parameters w and b by solving Wolfe duals [13] of Eqs. (1) and (2), written as

$$\begin{aligned} & \max _{\alpha _1}\,e_2^t\alpha _1-\frac{1}{2}\alpha _1^tN(M^tM)^{-1}N^t\alpha _1\nonumber \\& s.t.\quad0\le \alpha _1 \le c_1 \end{aligned}$$
(3)

and

$$\begin{aligned} \max _{\alpha _2}&\,e_1^t\alpha _2-\frac{1}{2}\alpha _2^tM(N^tN)^{-1}M^t\alpha _2\nonumber \\&\quad s.t. 0\le \alpha _2 \le c_2, \end{aligned}$$
(4)

where \(M=[K(D_1,\,G^t)\,\,e_1]\,\,\text{ and }\,\, N=[K(D_2,\,G^t)\,\,e_2]\); \(\alpha _1\) and \(\alpha _2\) are the vectors of Lagrange multipliers. The classifying hyperplanes \(K(x^t,G^t)w_1+b_1=0\) and \(K(x^t,G^t)w_2+b_2=0\), where x is a data point, are generated using the parameters \(w_i(i=1,2)\) and \(b_i(i=1,2)\) from the following equations,

$$\begin{aligned} \begin{bmatrix} w_1\\ b_1 \end{bmatrix} =-(M^tM+\delta I)^{-1}N^t\alpha _1, \end{aligned}$$
(5)
$$\begin{aligned} \begin{bmatrix} w_2\\ b_2 \end{bmatrix} =(N^tN+\delta I)^{-1}M^t\alpha _2, \end{aligned}$$
(6)

where \(\delta >0\) is a small positive value for avoiding ill-conditioning of the matrices \(M^tM\) and \(N^tN\), and I is an identity matrix of suitable size.

Using the following decision function, the class is assigned to a new data point x.

$$\begin{aligned} class\;(x)=\,{{\rm{arg\,min}\,}}_{i=1,2}|K(x,G^t)w_i+e_ib_i|. \end{aligned}$$
(7)

ULSTSVM

The optimization problems of nonlinear ULSTSVM [18] are described as

$$\begin{aligned}&\min _{w_1,b_1,\xi _1,\psi _1} \frac{1}{2}\Vert K(D_1,G^t) w_1+e_1b_1\Vert ^2+\frac{c_1}{2}\Vert \xi _1\Vert ^2\nonumber \\&\quad +\frac{c_3}{2}(\Vert w_1\Vert ^2+b_1^2)+\frac{c_u}{2}\Vert \psi _1\Vert ^2 \nonumber \\&\quad s.t.-(K(D_2,G^t)w_1+e_2b_1)+\xi _1=e_2,\nonumber \\&\quad K(U,G^t)w_1+e_ub_1+\psi _1=(-1+\epsilon )e_u, \end{aligned}$$
(8)
$$\begin{aligned}&\quad \min _{w_2,b_2,\xi _2,\psi _2} \frac{1}{2}\Vert K(D_2,G^t) w_2+e_2b_2\Vert ^2\nonumber \\&\quad +\frac{c_2}{2}\Vert \xi _2\Vert ^2+\frac{c_4}{2}(\Vert w_2\Vert ^2+b_2^2)+\frac{c_u}{2}\Vert \psi _2\Vert ^2 \nonumber \\&\quad s.t. K(D_1,G^t)w_2+e_1b_2+\xi _2=e_1,\nonumber \\&\quad -(K(U,G^t)w_2+e_ub_2)+\psi _2=(-1+\epsilon )e_u, \end{aligned}$$
(9)

where \(\xi _i, \psi _i,\) are the slack variables, \(c_i, c_u\), \(i=1,2\) are positive penalty parameters, and \(c_i\), \(i=3,4\) are positive parameters for the regularization.

Rewriting the optimization problem as an unconstrained problem using values of error variables,

$$\begin{aligned}&\quad \min _{w_1,b_1} \frac{1}{2}\Vert K(D_1,G^t) w_1+e_1b_1\Vert ^2\nonumber \\&\quad +\frac{c_1}{2}\Vert K(D_2,G^t)w_1+e_2b_1+e_2\Vert ^2\nonumber \\&\quad +\frac{c_3}{2}(\Vert w_1\Vert ^2+b_1^2)\nonumber \\&\quad +\frac{c_u}{2}\Vert -(K(U,G^t)w_1+e_ub_1)+(-1+\epsilon )e_u\Vert ^2, \end{aligned}$$
(10)
$$\begin{aligned}&\quad \min _{w_2,b_2} \frac{1}{2}\Vert K(D_2,G^t) w_2+e_2b_2\Vert ^2\nonumber \\&\quad +\frac{c_2}{2}\Vert -(K(D_1,G^t)w_2+e_1b_2)+e_1\Vert ^2\nonumber \\&\quad +\frac{c_4}{2}(\Vert w_2\Vert ^2+b_2^2)\nonumber \\&\quad +\frac{c_u}{2}\Vert (K(U,G^t)w_2+e_ub_2)+(-1+\epsilon )e_u\Vert ^2. \end{aligned}$$
(11)

By equating the gradient of Eq. (10) w.r.t. \(w_1\) and \(b_1\) equal to 0, we get

$$\begin{aligned}&K\big (D_1,G^t\big )^t\big (K(D_1,G^t)w_1+e_1b_1\big )\nonumber \\&\quad +c_1K\big (D_2,G^t\big )^t\big (K(D_2,G^t)w_1+e_2b_1+e_2\big )\nonumber \\&\quad +c_3w_1+c_uK(U,G^t)^t\big (K(U,G^t)w_1\nonumber \\&\quad +e_ub_1-(-1+\epsilon )e_u\big )=0, \end{aligned}$$
(12)
$$\begin{aligned}&\quad e_1^t\big (K(D_1,G^t)w_1+e_1b_1\big )\nonumber \\&\quad +c_1e_2^t\big (K(D_2,G^t)w_1+e_2b_1+e_2\big ) \nonumber \\&\quad +c_3b_1+c_ue_u^t\big (K(U,G^t)w_2\nonumber \\&\quad +e_ub_1-(-1+\epsilon )e_u\big )=0. \end{aligned}$$
(13)

Rewriting Eqs. (12) and (13) and solving, we get

$$\begin{aligned}{}[w_1\;\;b_1]^t=&-\big (M^tM+c_1N^tN+c_3I+c_uO^tO\big )^{-1}\nonumber \\&\quad \big (c_1N^te_2+c_u(1-\epsilon )O^te_u\big ), \end{aligned}$$
(14)

where \(M=[K(D_1,G^t)\;\;e_1]\), \(N=[K(D_2,G^t)\;\;e_2]\), and \(O=[K(U,G^t)\;\;e_u]\). In a similar manner, by performing the same procedure on Eq. (11), we get

$$\begin{aligned}{}[w_2\;\;b_2]^t=&\big (N^tN+c_2M^tM+c_4I+c_uO^tO\big )^{-1}\nonumber \\&\quad \big (c_2M^te_1+c_u(1-\epsilon )O^te_u\big ). \end{aligned}$$
(15)

The class of a new data point is assigned based on the proximal hyperplane [33].

Proposed fuzzy universum least squares twin support vector machine (FULSTSVM)

This section presents the formulation of the proposed FULSTSVM algorithm in the linear and nonlinear form, with the fuzzy membership function. The proposed algorithm is motivated by the approach used in RFLSTSVM-CIL for removing the effect of outliers. In proposed FULSTSVM, the fuzzy memberships are calculated for the data samples belonging to the classes, as well as to the universum using fuzzy membership matrices as described below.

Linear FULSTSVM

The formulation of proposed FULSTSVM for the linear case is described using optimization problems (16) and (17). In the objective function of the primal problem (16), we use three diagonal matrices represented by \(S_i\) containing the fuzzy memberships of the data points of \(i^{th}\) class. The memberships of the data points are calculated on the basis of distance from their respective class centres.

We also add regularization in the objective function to include the structural risk minimization principle (SRM) principle. The constraints are similar to the ULSTSVM formulation described in the previous subsection. Figure 1 shows a pictorial representation of the proposed approach.

Fig. 1
figure1

Universum data with noise

$$\begin{aligned}&\min _{w_1,b_1,\xi _1,\psi _1}\frac{1}{2}\Vert S_1(D_1 w_1+e_1b_1)\Vert ^2\nonumber \\&\quad +\frac{c_1}{2}\Vert S_2\xi _1\Vert ^2+\frac{c_3}{2}(\Vert w_1\Vert ^2+b_1^2)+\frac{c_u}{2}\Vert S_u\psi _1\Vert ^2 \nonumber \\&\quad s.t.\, -(D_2w_1+e_2b_1)+\xi _1=e_2,\nonumber \\&\quad Uw_1+e_ub_1+\psi _1=(-1+\epsilon )e_u, \end{aligned}$$
(16)
$$\begin{aligned}&\quad \min _{w_2,b_2,\xi _2,\psi _2} \frac{1}{2}\Vert S_2(D_2w_2+e_2b_2)\Vert ^2+\frac{c_2}{2}\Vert S_1\xi _2\Vert ^2\nonumber \\&\quad +\frac{c_4}{2}(\Vert w_2\Vert ^2+b_2^2)+\frac{c_u}{2}\Vert S_u\psi _2\Vert ^2 \nonumber \\&\quad s.t.\, D_1w_2+e_1b_2+\xi _2=e_1,\nonumber \\&-(Uw_2+e_ub_2)+\psi _2=(-1+\epsilon )e_u, \end{aligned}$$
(17)

where \(S_i, S_u\) are diagonal matrices containing fuzzy membership values of data samples belonging to the classes and universum, respectively. \(\xi _i, \psi _i,\) are the slack variables, and \(c_i, c_u\) are positive penalty parameters, \(i=1,2\). The parameter for the insensitive zone is \(\epsilon\), while \(c_i\), \(i=3,4\) are the parameters for regularization.

Rewriting the objective functions using the values of the error variables,

$$\begin{aligned}&\min _{w_1,b_1}\frac{1}{2}\Vert S_1(D_1w_1+e_1b_1)\Vert ^2\nonumber \\&\quad +\frac{c_1}{2}\Vert S_2(D_2w_1+e_2b_1+e_2)\Vert ^2\nonumber \\&\quad +\frac{c_3}{2}(\Vert w_1\Vert ^2+b_1^2)+\frac{c_u}{2}\Vert S_u(-(Uw_1+e_ub_1)+(-1+\epsilon )e_u)\Vert ^2, \end{aligned}$$
(18)
$$\begin{aligned}&\quad \min _{w_2,b_2}\frac{1}{2}\Vert S_2(D_2w_2+e_2b_2)\Vert ^2\nonumber \\&\quad +\frac{c_2}{2}\Vert S_1(-(D_1w_2+e_1b_2)+e_1)\Vert ^2 \nonumber \\&\quad +\frac{c_4}{2}(\Vert w_2\Vert ^2+b_2^2)+\frac{c_u}{2}\Vert S_u(Uw_2+e_ub_2+(-1+\epsilon )e_u)\Vert ^2. \end{aligned}$$
(19)

By setting the gradient of QPP (18) w.r.t. \(w_1\) and \(b_1\) equal to 0, and solving we get

$$\begin{aligned}&c_3w_1+(S_1D_1)^t(S_1(D_1w_1+e_1b_1))\nonumber \\&\quad +c_1(S_2D_2)^t(S_2(D_2w_1+e_2b_1+e_2))\nonumber \\&\quad -c_u(S_uU)^t(S_u(-(Uw_1+e_ub_1)+(-1+\epsilon )e_u)=0, \end{aligned}$$
(20)
$$\begin{aligned}&\quad c_3b_1+(S_1e_1)^t(S_1(D_1w_1+e_1b_1))\nonumber \\&\quad +c_1(S_2e_2)^t(S_2(D_2w_1+e_2b_1+e_2))\nonumber \\&\quad -c_u(S_ue_u)^t(S_u(-(Uw_1+e_ub_1)+(-1+\epsilon )e_u)=0, \end{aligned}$$
(21)

Rewriting Eqs. (20) and (21) with \(u_1=[w_1\;\;b_1]^t\) and combining, we get

$$\begin{aligned}&c_3u_1+V^tVu_1+c_1W^tWu_1+c_1W^tS_2e_2\nonumber \\&\quad +c_uZ^tZu_1+c_uZ^tS_u(1-\epsilon )e_u=0 \end{aligned}$$
(22)

Rearranging the terms and solving, we get

$$\begin{aligned}&[w_1\;\;b_1]^t=-(V^tV+c_1W^tW+c_3I+c_uZ^tZ)^{-1}\nonumber \\&\quad (c_1W^tS_2e_2+c_uZ^tS_u(1-\epsilon )e_u), \end{aligned}$$
(23)

where \(V=[S_1D_1\;\;S_1e_1]\), \(W=[S_2D_2\;\;S_2e_2]\), and \(Z=[S_uU\;\;S_ue_u]\).

Similarly, using the procedure for Eq. (19) and solving, we get

$$\begin{aligned}&[w_2\;\;b_2]^t=(W^tW+c_2V^tV+c_4I+c_uZ^tZ)^{-1}\nonumber \\&\quad (c_2V^tS_1e_1+c_uZ^tS_u(1-\epsilon )e_u). \end{aligned}$$
(24)

A new data point x is classified using the following function,

$$\begin{aligned} class\;(x)=\,{{\rm{arg\,min}\,}}_{i=1,2}\frac{|x^tw_i+e_ib_i|}{\Vert w_i\Vert }. \end{aligned}$$
(25)

Nonlinear FULSTSVM

The formulation of nonlinear FULSTSVM is written as

$$\begin{aligned}&\min _{w_1,b_1,\xi _1,\psi _1} \frac{1}{2}\Vert S_1(K(D_1,G^t)w_1+e_1b_1)\Vert ^2\nonumber \\&\quad +\frac{c_1}{2}\Vert S_2\xi _1\Vert ^2+\frac{c_3}{2}(\Vert w_1\Vert ^2+b_1^2)\nonumber \\&\quad +\frac{c_u}{2}\Vert S_u\psi _1\Vert ^2 \nonumber \\&\quad s.t. -(K(D_2,G^t)w_1+e_2b_1)+\xi _1=e_2,\nonumber \\&\quad K(U,G^t)w_1+e_ub_1+\psi _1=(-1+\epsilon )e_u, \end{aligned}$$
(26)
$$\begin{aligned}&\quad \min _{w_2,b_2,\xi _2,\psi _2} \frac{1}{2}\Vert S_2(K(D_2,G^t)w_2+e_2b_2)\Vert ^2\nonumber \\&\quad +\frac{c_2}{2}\Vert S_1\xi _2\Vert ^2+\frac{c_4}{2}(\Vert w_2\Vert ^2+b_2^2) \nonumber \\&\quad +\frac{c_u}{2}\Vert S_u\psi _2\Vert ^2 \end{aligned}$$
(27)
$$\begin{aligned}&\quad s.t.\, K(D_1,G^t)w_2+e_1b_2+\xi _2=e_1,\nonumber \\&\quad -(K(U,G^t)w_2+e_ub_2)+\psi _2=(-1+\epsilon )e_u, \end{aligned}$$
(28)

where \(K(D_1,G^t)\) is the kernel matrix, \(G=[D_1;D_2]\), \(S_i, S_u\) are diagonal matrices containing fuzzy membership values of data points in the classes and universum, respectively, \(i=1,2\).

Rewriting the objective functions using the constraints, we get

$$\begin{aligned}&\min _{w_1,b_1}\,\frac{1}{2}\Vert S_1(K(D_1,G^t) w_1+e_1b_1)\Vert ^2\nonumber \\&\quad +\frac{c_1}{2}\Vert S_2(K(D_2,G^t)w_1+e_2b_1+e_2)\Vert ^2\nonumber \\&\quad +\frac{c_3}{2}(\Vert w_1\Vert ^2+b_1^2)\nonumber \\&\quad +\frac{c_u}{2}\Vert S_u(-(K(U,G^t)w_1+e_ub_1)+(-1+\epsilon )e_u)\Vert ^2, \end{aligned}$$
(29)
$$\begin{aligned}&\quad \min _{w_2,b_2}\, \frac{1}{2}\Vert S_2(K(D_2,G^t) w_2+e_2b_2)\Vert ^2\nonumber \\&\quad +\frac{c_2}{2}\Vert S_1(-(K(D_1,G^t)w_2+e_1b_2)+e_1)\Vert ^2 \nonumber \\&\quad +\frac{c_4}{2}(\Vert w_2\Vert ^2+b_2^2)+\frac{c_u}{2}\Vert S_u(K(U,G^t)w_2\nonumber \\&\quad +e_ub_2+(-1+\epsilon )e_u)\Vert ^2. \end{aligned}$$
(30)

The parameters \(w_1\) and \(b_1\) are obtained by setting the gradient of QPP (29) w.r.t. \(w_1\) and \(b_1\) equal to 0, and solving we get,

$$\begin{aligned}{}[w_1\;\;b_1]^t = &-\big (M^tM+c_1N^tN+c_3I+c_uO^tO\big )^{-1}\nonumber \\&\quad \big (c_1N^tS_2e_2+c_uO^tS_u(1-\epsilon )e_u\big ), \end{aligned}$$
(31)

where \(M=[S_1K(D_1,G^t)\;\;S_1e_1]\), \(N=[S_2K(D_2,G^t)\;\;S_2e_2]\), and \(O=[S_uK(U,G^t)\;\;S_ue_u]\). Similarly, using Eq. (30), we get

$$\begin{aligned}{}[w_2\;\;b_2]^t=\,&\big (N^tN+c_2M^tM+c_4I+c_uO^tO\big )^{-1}\nonumber \\&\quad \big (c_2M^tS_1e_1+c_uO^tS_u(1-\epsilon )e_u\big ). \end{aligned}$$
(32)

For a new data point, similar to linear case, the class is assigned based on the class of the nearest hyperplane. In the following section, we present the fuzzy membership function used in the proposed FULSTSVM.

Fuzzy membership function

The proposed FULSTSVM utilizes a fuzzy function inspired by [16]. The following fuzzy function keeps the range of fuzzy memberships in the range (0.5, 1]. The membership function is described as

$$\begin{aligned} f(x_i)= 1-0.5\bigg (\frac{\left| x_i-c_j \right| }{r_j+\delta }\bigg ), \end{aligned}$$
(33)

where \(x_i\) is a data point belonging to class j with centre \(c_j\), \(i=1,\dots ,m_j, j=1,2\). The variable \(r_j\) is the largest distance from the class centre of data points of class j, and \(\delta\) is a very small positive value to avoid division by zero.

The range of fuzzy memberships in the above-mentioned fuzzy function is chosen as (0.5, 1]. This is to keep significant contribution of majority of the data points in the formation of the classifier. Moreover, the proposed FULSTSVM also gives fuzzy memberships to the universum data points. The contribution of most universum data points is required for providing prior information about the data, which is achieved by this function. Moreover, the contribution of outliers is reduced accordingly. This approach is in contrast to the approach proposed in FSVM [16], where the fuzzy memberships are chosen in the range (0, 1].

Time complexity

The time complexity of TWSVM is \(2*O(m/2)^3\) i.e. \(O(m)^3/4\), where m is total number of data points [13]. This is the time complexity of solving the QPPs, which is a computationally intensive task. Moreover, TWSVM involves two matrix inverses having a complexity of \(O(n)^3\), where n is the dimension of the matrix [2]. Similarly, UTWSVM has time complexity of \(O(m+2u)^3/4\), where u denotes the number of universum data points.

On the other hand, the formulation of LSTSVM involves solution of linear equations using two matrix inverses. Therefore, the computation time of LSTSVM is lesser than TWSVM and UTSVM in Table 1. Similarly, ULSTSVM involves two inverses with additional universum data. The time complexity of proposed FULSTSVM is similar to ULSTSVM, with additional complexity for fuzzy membership function. The complexity of fuzzy membership is O(m). Hence, the computation of FULSTSVM is more than ULSTSVM, but the additional time is O(m), which is insignificant w.r.t. cubic complexity of inverse calculation in FULSTSVM and ULSTSVM.

Experimental results

In this section, we perform numerical experiments, and show the comparative analysis of the results obtained on benchmark datasets. We also present two biomedical applications, viz. Alzheimer’s disease and breast cancer to show the utility of the proposed FULSTSVM.

Data

All the real-world benchmark datasets are downloaded from the UCI [8], and KEEL repositories [1]. The MRI images used in this work are taken from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). ADNI was started in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The main objective of ADNI is to find out the effectiveness of neuroimaging techniques like MRI, positron emission tomography (PET), other biological markers, and clinical neuropsychological tests to estimate the onset of Alzheimer’s disease from the state of mild cognitive impairment. For more information, visit www.adni-info.org. For breast cancer, the BreakHis histopathological dataset is utilized [26] in this work.

Setup and methodology

The experiments for all the algorithms are performed on a PC running on 64 bit Windows 10 operating system, with 2.30 GHz Intel® Xeon processor, and 128 GB of RAM with MATLAB R2017a environment. For cross validation, a 5-fold cross validation strategy is used for selecting the optimal parameters for all the algorithms. An additional optimization toolbox i.e. MOSEK optimization toolbox (http://www.mosek.com) is used for solving the QPPs of TWSVM and UTSVM.

For experiments on real-world datasets, the parameters are selected as follows: \(c_1=c_2=c_u\), and \(c_3=c_4\) are selected from \(\{10^{-5}, 10^{-4},...,10^5\}\), while \(\mu\) is selected from the set \(\{2^{-5}, 2^{-4},...,2^5\}\). The parameter \(\epsilon\) is selected from \(\{0.2, 0.4, 0.6, 0.8\}\). The universum is generated by averaging the samples randomly from the data [18, 21]. The training and testing data are chosen as 50% of total samples. For large scale datasets, we use fixed value of the hyper-parameters [24, 25]. Therefore, the value of \(c_1=c_2=c_u\) is fixed as 10, and \(c_3=c_4\) is set as \(10^{-5}\), \(\epsilon\) is selected as 0.7, and \(\mu\) is chosen as 2 for all the algorithms. In all the algorithms, a radial basis function (RBF) kernel is used, which is defined as

$$\begin{aligned} K(p,q)=exp\bigg (-\frac{{\Vert p-q\Vert }^2}{2\mu ^2}\bigg ), \end{aligned}$$
(34)

where \(\mu\) is a scalar parameter, and p and q are vectors.

In biomedical datasets, we used 150 structural MRI (T1) images from the ADNI database. The MRI images are preprocessed using Freesurfer pipeline [20, 31] to obtain the volumetric analysis of the brain. This resulted into volumetric data of 149 images, because 1 image failed to process. Therefore, the dataset includes 50 images each of control normal (CN), mild cognitive impairment (MCI), and 49 images of Alzheimer’s disease (AD). A total of 91 features are extracted including 23 subcortical, 34 white matter volumes, and 34 cortical thickness values.

For breast cancer, the images are converted to gray level, and features are extracted using wavelet transform (Daubechies-4) up to 3 levels of decomposition. The approximation and detail coefficients are concatenated to form the feature vector [10]. A total of 314 breast cancer images include ADN- adenosis (benign), and DC- ductal carcinoma (cancer).

Real-world data

The results on 18 real-world benchmark datasets are presented in Table 1. For comparison, we have used TWSVM [13], UTSVM [18], LSTSVM [15], and ULSTSVM [33] algorithms. One can observe in Table 1 that the proposed FULSTSVM obtained the highest accuracies in 11 datasets. FULSTSVM outperformed the existing algorithms by obtaining an average rank of 1.8056 on accuracy values. This is due to the use of fuzzy memberships for all the data points in the proposed FULSTSVM. It is noticeable that the proposed FULSTSVM achieved the highest accuracy of \(98.54\%\) for Breast cancer wisconsion dataset with LSTSVM. However, ULSTSVM achieved a lesser accuracy of \(98.25\%\), due to equal weighting to all the universum data points in ULSTSVM.

One can observe in Table 1 that the training time of proposed FULSTSVM is lower than TWSVM and UTSVM algorithms. This is because TWSVM and UTSVM solve a pair of QPPs, which is computationally expensive. On the other hand proposed FULSTSVM solves a system of linear equations. However, the training time of FULSTSVM is higher than LSTSVM, and ULSTSVM because of the fuzzy membership calculations.

Table 1 Comparative performance of proposed algorithm with existing approaches for classification on real-world benchmark datasets

Statistical significance

In order to prove the statistical significance of the proposed FULSTSVM for generalization performance, we perform the Friedman and Nemenyi post hoc test [7].

The Friedman test is performed using the average ranks of the algorithms from Table 1. Here, we first assume that all the algorithms are not significantly different, as the null hypothesis. Then, we calculate the \(\chi ^2_F\) value as

$$\begin{aligned} \chi ^2_F=\frac{12N}{p(p+1)}\Bigg [\sum ^{p}_{i=1}R^2_i-\frac{p(p+1)^2}{4}\Bigg ], \end{aligned}$$
(35)

where N is number of datasets, p is the number of algorithms, and \(R_i\) is average rank for the methods.

$$\begin{aligned} \chi ^2_F&=\frac{12\times 18}{5(5+1)}\nonumber \\&\quad \Bigg [(3.5556^2+3.2222^2+3.3056^2+3.1111^2+1.8056^2)-\frac{5(5+1)^2}{4}\Bigg ],\nonumber \\&\quad \approx 13.6151. \end{aligned}$$
(36)

The \(F_F\) value is obtained as

$$\begin{aligned} F_F=\frac{(18-1)(13.6151)}{18\times (5-1)-13.6151}\approx 3.9643. \end{aligned}$$
(37)

In this case, the F-distribution has \(\big (5-1, (5-1)(18-1)\big )=(4,68)\) degrees of freedom. Therefore, the critical value for F(6, 150) at \(\alpha =0.05\) level of significance is 2.5066. Since, \(F_F=3.9643>2.5066\), we reject the null hypothesis. Thus, there is significant difference between these methods.

Next, for pairwise difference, we use the Nemenyi posthoc test [7] to check pairwise difference between proposed FULSTSVM and existing algorithms. The critical difference for our case at \(\alpha =0.10\) level of significance level is \(2.459\sqrt{\frac{5(5+1)}{6\times 18}}\approx 1.296\). The pairwise difference of the average ranks should be greater than CD for significance. Table 2 shows the pairwise significant difference between the methods based on average ranks. One can observe that proposed FULSTSVM is significantly different from TWSVM, UTSVM, LSTSVM, and ULSTSVM algorithms.

Table 2 Significant difference between the proposed FULSTSVM and existing algorithms in pairwise comparison

Insensitivity analysis

To check the effect of hyper-parameter values on the accuracy of the proposed FULSTSVM, we present the insensitive analysis. The insensitivity performance is checked for the penalty parameter \(c_1\), with kernel parameter \(\mu\), and \(c_1\) with \(\epsilon\) parameter of the insensitive zone. The variation of accuracy of FULSTSVM for these parameters is shown for four datasets in Fig. 2.

Figure 2a, b shows the change in accuracy for different values of \(c_1\) and \(\mu\). One can observe that accuracy of proposed FULSTSVM is higher for lower values of \(c_1\), and higher values of \(\mu\). The variation in accuracy for \(c_1\) with \(\epsilon\) is shown in Fig. 2c, d. The parameter \(\epsilon\) is not affecting the accuracy in a significant manner. However, here also the accuracy of FULSTSVM is higher for lesser values of the hyper-parameter \(c_1\). This also justifies the parameter selection in the experiments.

Fig. 2
figure2

Insensitivity analysis of proposed FULSTSVM for \(c_1\) and \(\mu\) in (a) and (b), and for \(c_1\) and \(\epsilon\) in (c) and (d) on real-world benchmark datasets

Biomedical data

In this section, we present the results on classification of Alzheimer’s disease and breast cancer datasets. The results for these applications are shown in Table 3. One can observe that the proposed FULSTSVM performs better than baseline algorithms in most of the cases. This is reflected in the average rank based on accuracy. Proposed FULSTSVM obtained lowest average rank of 2.5. The accuracy of FULSTSVM is higher than other algorithms for AD_MCI, which is a difficult classification problem [28].

Moreover, for breast cancer data i.e. adenosis vs ductal carcinoma, proposed FULSTSVM obtains highest accuracy of \(84.18\%\). The better average rank of proposed FULSTSVM for accuracy can be attributed to the use of fuzzy membership with universum data. It leads to prior information for the model, with less sensitivity to outlier data points in the classes, as well as in the universum. This implies the applicability of the proposed FULSTSVM for biomedical applications.

Table 3 Comparative performance of the proposed and baseline algorithms on Alzheimer’s and breast cancer datasets

Large-scale data

In order to check the performance of our proposed FULSTSVM on large datasets, we used the Skin segmentation dataset from UCI repository [8]. For comparison, we used two other efficient algorithms, viz. LSTSVM, and ULSTSVM. The results are shown in Table 4. It is observable that the proposed FULSTSVM is showing higher accuracy on most of the datasets. This is because FULSTSVM removes the effect of outliers in the generation of universum data, whereas ULSTSVM gives equal importance to all the universum data points. Moreover, the proposed FULSTSVM also gives proper weighting to the data points of the binary classes, where LSTSVM gives equal to weights to all the data points. However, the time is the least in case of LSTSVM, because there is no universum data in LSTSVM. The time is slightly higher in FULSTSVM as compared to ULSTSVM due to the calculation of fuzzy membership values. However, the additional time in FULSTSVM is not significant as discussed in Sect. 3.4 in terms of time complexity.

Table 4 Classification performance of the proposed and baseline algorithms on large-scale datasets

Conclusions and future work

In this work, to deal with noisy datasets, we proposed a novel and efficient fuzzy based learning algorithm, termed as fuzzy universum least squares twin support vector machine (FULSTSVM). The proposed algorithm gives prior information about data distribution to the classifier, and also provides fuzzy membership to the data points and universum. Moreover, the optimization problem of FULSTSVM is solved by a system of linear equations. This makes FULSTSVM efficient in terms of training time. The proposed FULSTVM is a robust universum based algorithm for classification of data with outliers. Statistical tests on experimental results confirm the significance of the proposed algorithm. Proposed FULSTSVM also performed better on large sized datasets in terms of accuracy, showing its scalability on large datasets.

Results on applications i.e. Alzheimer’s disease and breast cancer clearly show the applicability of the proposed FULSTSVM for healthcare data. In future, the proposed FULSTSVM can be improved by implementing new techniques for selecting the universum. The universum data can be selected from a dataset related to a particular application. Moreover, novel fuzzy membership functions can be used with the proposed FULSTSVM in various applications. Proposed FULSTSVM can also be extended for class imbalanced and multiclass problems. The code of proposed FULSTSVM will be available on the author’s homepage: https://github.com/mtanveer1/.

References

  1. 1.

    Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17:255–287

    Google Scholar 

  2. 2.

    Bai L, Shao YH, Wang Z, Li CN (2019) Clustering by twin support vector machine and least square twin support vector classifier with uniform output coding. Knowl-Based Syst 163:227–240

    Article  Google Scholar 

  3. 3.

    Balasundaram S, Tanveer M (2012) On proximal bilateral-weighted fuzzy support vector machine classifiers. Int J Adv Intell Paradigms 4(3–4):199–210

    Article  Google Scholar 

  4. 4.

    Batuwita R, Palade V (2010) Fsvm-cil: fuzzy support vector machines for class imbalance learning. IEEE Trans Fuzzy Syst 18(3):558–571

    Article  Google Scholar 

  5. 5.

    Cervantes J, Garcia-Lamont F, Rodriguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215

    Article  Google Scholar 

  6. 6.

    Cherkassky V, Dhar S, Dai W (2011) Practical conditions for effectiveness of the universum learning. IEEE Trans Neural Netw 22(8):1241–1255

    Article  Google Scholar 

  7. 7.

    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  8. 8.

    Dua D, Graff C (2014) UCI machine learning repository. http://archive.ics.uci.edu/ml

  9. 9.

    Frozza RL, Lourenco MV, De Felice FG (2018) Challenges for Alzheimer’s disease therapy: insights from novel mechanisms beyond memory defects. Front Neurosci 12:37

    Article  Google Scholar 

  10. 10.

    Gautam C, Mishra PK, Tiwari A, Richhariya B, Pandey HM, Wang S, Tanveer M (2020) ADNI: minimum variance-embedded deep kernel regularized least squares method for one-class classification and its applications to biomedical data. Neural Netw 123:191–216

    Article  Google Scholar 

  11. 11.

    Hao PY, Kung CF, Chang CY, Ou JB (2020) Predicting stock price trends based on financial news articles and using a novel twin support vector machine with fuzzy hyperplane. Appl Soft Comput 98:106806

    Article  Google Scholar 

  12. 12.

    Huang X, Guo F (2020) A kernel fuzzy twin SVM model for early warning systems of extreme financial risks. Int J Financ Econ. https://doi.org/10.1002/ijfe.1858

    Article  Google Scholar 

  13. 13.

    Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910

    Article  Google Scholar 

  14. 14.

    Khemchandani R, Jayadeva CS (2009) Regularized least squares fuzzy support vector regression for financial time series forecasting. Expert Syst Appl 36(1):132–138

    Article  Google Scholar 

  15. 15.

    Kumar MA, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36(4):7535–7543

    Article  Google Scholar 

  16. 16.

    Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471

    Article  Google Scholar 

  17. 17.

    Mello AR, Stemmer MR, Koerich AL (2020) Incremental and decremental fuzzy bounded twin support vector machine. Inf Sci 526:20–38

    MathSciNet  Article  Google Scholar 

  18. 18.

    Qi Z, Tian Y, Shi Y (2012) Twin support vector machine with universum data. Neural Netw 36:112–119

    Article  Google Scholar 

  19. 19.

    Qin G, Lu X (2018) Integration of weighted LS-SVM and manifold learning for fuzzy modeling. Neurocomputing 282:184–191

    Article  Google Scholar 

  20. 20.

    Reuter M, Schmansky NJ, Rosas HD, Fischl B (2012) Within-subject template estimation for unbiased longitudinal image analysis. NeuroImage 61(4):1402–1418

    Article  Google Scholar 

  21. 21.

    Richhariya B, Tanveer M (2018) EEG signal classification using universum support vector machine. Expert Syst Appl 106:169–182

    Article  Google Scholar 

  22. 22.

    Richhariya B, Tanveer M (2018) A robust fuzzy least squares twin support vector machine for class imbalance learning. Appl Soft Comput 71:418–432

    Article  Google Scholar 

  23. 23.

    Richhariya B, Tanveer M (2019) A fuzzy universum support vector machine based on information entropy. In: Tanveer M, Pachori RB (eds) Machine Intelligence and Signal Analysis. Advances in Intelligent Systems and Computing. Springer, Singapore, pp 569–582. https://doi.org/10.1007/978-981-13-0923-6_49

    Google Scholar 

  24. 24.

    Richhariya B, Tanveer M (2020) Alzheimer’s disease neuroimaging initiative: an efficient angle based universum least squares twin support vector machine for pattern classification. ACM Trans Internet Technol. https://doi.org/10.1145/3387131

    Article  Google Scholar 

  25. 25.

    Shao YH, Deng NY, Yang ZM (2012) Least squares recursive projection twin support vector machine for classification. Pattern Recogn 45(6):2299–2307

    Article  Google Scholar 

  26. 26.

    Spanhol F, Oliveira L, Petitjean C, Heutte L (2015) A dataset for breast cancer histopathological image classification. IEEE Trans Biomed Eng 63(7):1455–1462

    Article  Google Scholar 

  27. 27.

    Tanveer M, Khan MA, Ho SS (2016) Robust energy-based least squares twin support vector machines. Appl Intell 45(1):174–186

    Article  Google Scholar 

  28. 28.

    Tanveer M, Richhariya B, Khan RU, Rashid AH, Khanna P, Prasad M, Lin CT (2020) Machine learning techniques for the diagnosis of Alzheimer’s disease: a review. ACM Trans Multimed Comput Commun Appl 16(1):1–35

    Google Scholar 

  29. 29.

    Tomar D, Agarwal S (2015) Hybrid feature selection based weighted least squares twin support vector machine approach for diagnosing breast cancer, hepatitis, and diabetes. Adv Artif Neural Syst 2015:265637. https://doi.org/10.1155/2015/265637

    Article  Google Scholar 

  30. 30.

    Wang TY, Chiang HM (2007) Fuzzy support vector machine for multi-class text categorization. Inf Process Manag 43(4):914–929

    Article  Google Scholar 

  31. 31.

    Westman E, Muehlboeck JS, Simmons A (2012) Combining MRI and CSF measures for classification of Alzheimer’s disease and prediction of mild cognitive impairment conversion. NeuroImage 62(1):229–238

    Article  Google Scholar 

  32. 32.

    Weston J, Collobert R, Sinz F, Bottou L, Vapnik V (2006) Inference with the universum. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 1009–1016

  33. 33.

    Xu Y, Chen M, Li G (2016) Least squares twin support vector machine with universum data for classification. Int J Syst Sci 47(15):3637–3645

    MathSciNet  Article  Google Scholar 

  34. 34.

    Yue W, Wang Z, Chen H, Payne A, Liu X (2018) Machine learning with applications in breast cancer diagnosis and prognosis. Designs 2(2):13

    Article  Google Scholar 

  35. 35.

    Zhang T, Chen W, Li M (2019) Classification of inter-ictal and ictal EEGs using multi-basis MODWPT, dimensionality reduction algorithms and LS-SVM: a comparative study. Biomed Signal Process Control 47:240–251

    Article  Google Scholar 

  36. 36.

    Zhou X, Jiang W, Tian Y, Shi Y (2010) Kernel subclass convex hull sample selection method for SVM on face recognition. Neurocomputing 73(10–12):2234–2246

    Article  Google Scholar 

Download references

Acknowledgements

The funding for this work is obtained from Science and Engineering Research Board (SERB), INDIA under Ramanujan fellowship grant no. SB/S2/RJN-001/2016, and also under Early Career Research Award grant no. ECR/2017/000053. We also acknowledge Council of Scientific & Industrial Research (CSIR), New Delhi, INDIA for funding under Extra Mural Research (EMR) Scheme grant no. 22(0751)/17/EMR-II. We want to acknowledge our institute, the Indian Institute of Technology Indore for providing various facilities and resources for this work. We also thank the Indian Institute of Technology Indore for providing Institute fellowship to Mr. Bharat Richhariya. The collection of data and sharing of this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904), and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). The funding for ADNI is provided by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. The dissemination of ADNI data is carried out by the Laboratory for Neuro Imaging at the University of Southern California.

Author information

Affiliations

Authors

Consortia

Corresponding author

Correspondence to M. Tanveer.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to _apply/ADNI_Acknowledgement_List.pdf

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Richhariya, B., Tanveer, M. & for the Alzheimer’s Disease Neuroimaging Initiative. A fuzzy universum least squares twin support vector machine (FULSTSVM). Neural Comput & Applic (2021). https://doi.org/10.1007/s00521-021-05721-4

Download citation

Keywords

  • Universum
  • Fuzzy membership
  • Least squares twin support vector machine
  • Outliers
  • Alzheimer’s disease