Robust Optimization and Data Classification for Characterization of Huntington Disease Onset via Duality Methods

The features that characterize the onset of Huntington disease (HD) are poorly understood yet have significant implications for research and clinical practice. Motivated by the need to address this issue, and the fact that there may be inaccuracies in clinical HD data, we apply robust optimization and duality techniques to study support vector machine (SVM) classifiers in the face of uncertainty in feature data. We present readily numerically solvable semi-definite program reformulations via conic duality for a broad class of robust SVM classification problems under a general spectrahedron uncertainty set that covers the most commonly used uncertainty sets of robust optimization models, such as boxes, balls, and ellipsoids. In the case of the box-uncertainty model, we also provide a new simple quadratic program reformulation, via Lagrangian duality, leading to a very efficient iterative scheme for robust classifiers. Computational results on a range of datasets indicate that these robust classification methods allow for greater classification accuracies than conventional support vector machines in addition to selecting groups of highly correlated features. The conic duality-based robust SVMs were also successfully applied to a new, large HD dataset, achieving classification accuracies of over 95% and providing important information about the features that characterize HD onset.


Introduction
Support vector machines (SVMs) are optimization-based numerical methods for data classification problems [9,24] that are generally formulated as linear or convex optimization problems.SVMs have become one of the most widely used methods for binary classification, which separates data into two desired groups, and have found applications in numerous fields of science [9], engineering [15] and medicine [12,14,27,46].These methods are inherently performed in the face of data uncertainty due to the presence of noise in the training data.
In recent years, robust optimization, which was pioneered in the 1970s for treating uncertain linear programming problems, has now emerged as a powerful approach for decision-making in the face of data uncertainty.It treats uncertainty as deterministic but does not limit data values to point estimates.Two decades since the advent of robust optimization, in the late 1990s, Ben-Tal et al. [4,5] provided a highly successful computationally tractable treatment of the robust optimization approach for linear as well as nonlinear optimization problems under data uncertainty [21,28,45].
In this framework, one associates with the uncertain SVM classification problem with feature uncertainty its robust counterpart, where the uncertain constraints are enforced for every possible value of the data within their prescribed uncertainty sets [6,7,29,41].In this paper, we consider a broad class of robust SVM classification problems under general spectrahedron uncertainty sets [10,42].The spectrahedron uncertainty set covers the most commonly used uncertainty sets of numerically solvable robust optimization models, such as boxes, balls and ellipsoids [4,6,7].The robust counterpart, in general, is a hard nonlinear optimization problem with infinitely many constraints, and we reformulate it as a numerically tractable equivalent conic linear program using conic duality [3,4] and a support function technique [3,18].We show that the robust counterpart reduces to the second-order cone programs for the cases where the uncertainty sets are ellipsoids, balls or boxes.These second-order cone programs can be solved more efficiently.
In the case of the box-uncertainty model, employing Lagrangian duality [17,19], we also provide a new robust SVM classifier by transforming the robust counterpart into a convex quadratic program with non-negative variables, leading to a very efficient computational scheme via a simple iterative algorithm.This approach, which was inspired by the Lagrangian support vector machine developed by Mangasarian et al [16,34], extends the pq-SVM developed by Dunbar et al [14] to robust SVMs.Computational results on a range of datasets indicate that our methods allow for greater classification accuracy than conventional SVMs in addition to selecting smaller groups of highly correlated features.
The conic duality-based robust SVM methods were also applied to a new dataset, Enroll-HD, which contains 36,953 sets of observations on 32 physical features from subjects with, or at risk of, Huntington disease (HD).HD is a neurodegenerative movement disorder with motor (relating to movement), cognitive and psychiatric man-ifestations caused by an inherited mutation in the Huntingtin (HTT) gene [1,11,33].Characterizing the onset of the disease in subjects harbouring the causative mutation in terms of its associated features is of significant clinical and research importance.Our robust SVM methods also performed well on the Enroll-HD dataset, achieving accuracies of over 95% and selecting meaningful features for classifying subjects as having manifest (post-onset) or non-manifest HD.
The outline of the paper is as follows.Section 2 develops robust SVM data classification models.Section 3 presents equivalent conic program reformulations of these robust classification models for various classes of uncertainty sets.Section 4 provides a robust classification scheme in the case of box uncertainty and gives a simple iterative algorithm to find robust classifiers.Section 5 provides results on the computational experiments on three publicly available datasets.Section 6 describes the Enroll-HD dataset, the performance of conic duality-based robust methods on this dataset and the implications of these results on the characterization of HD onset.Section 7 concludes with brief discussion on further work.The appendix provides additional technical details on spectrahedra, lists the features contained in the Enroll-HD dataset and the proof of (linear) convergence of our iterative algorithm.

Robust Optimization-Based Data Classification
In this section, we introduce the SVM formulation and describe the so-called robust SVM formulations.We begin by fixing the notations that will be used later in the paper.Given a vector x ∈ R n , |x| denotes the vector consisting of the absolute value of each component x i for i = 1, 2, . . ., n.The zero vector in R n is denoted by 0 n .For a vector x ∈ R n , x ≥ 0 n if every component, x i ≥ 0 for i = 1, 2, . . ., n.The n × n identity matrix is denoted by I n (or I n×n ).The n × n matrix of zeros is denoted by 0 n×n (or simply 0 if the dimension is clear).The vector of all ones in R n is denoted by e n .We denote by S n the space of all real-valued n × n symmetric matrices.For a vector For convenience, we also write x 2 ≡ x .For a matrix A ∈ R m×n , its norm (or 2-norm) is denoted by A , and is given by A = δ max (A) where δ max (A) is the largest singular value of A. This corresponds to the magnitude of the largest eigenvalue of A, |λ max (A)| if A ∈ R n×n and symmetric.For a vector x ∈ R n , diag(x) denotes a diagonal matrix in R n×n whose entries consist of the elements of x.The gradient of the scalar function f : R n → R with respect to the vector x is denoted by Consider two sets of data A and B whose elements are vectors in R s .The SVM classifier distinguishes between these two datasets by attempting to separate the m data points into one of two open halfspaces with minimal error -each halfspace containing only those datapoints that correspond to the set A or B, respectively, where m is the cardinality of A ∪ B. Each datapoint u i , i = 1, . . ., m, has a corresponding class label α i ∈ {−1, 1} according to the set A or B in which it is contained.The classifier used in the standard (linear) SVM formulation is a hyperplane of the form: where w is the normal to the surface of the hyperplane and γ determines the location of the hyperplane relative to the origin.To construct the SVM classifier, the margin (denoted by M = 2/ w ) between the planes is maximized, subject to the condition that each plane bounds one of the sets (the so-called "hard-margin" case).The optimal classifier lies midway between these two bounding planes.Often the data are not linearly separable, and so the data cannot be correctly classified by linear bounding hyperplanes.This situation results in the following "softmargin" SVM formulation with the tuning parameter λ (see [4,9]): where 1} are the given training data, α i is the class label for each data u i , and the number of nonzero entries in the slack vector ξ is the number of errors the classifier makes on the training data.The soft-margin classification via a doubly regularized support vector machine (DrSVM) examined in [4, Section 12.1.1][14]can be formulated as: where λ 1 ≥ 0 and λ 2 ≥ 0 are tuning parameters.The (Dr SV M) formulation that incorporates 1-norm is known to generate sparse solutions.When a linear classifier is used, solution sparsity implies that the separating hyperplane depends on few input features.This makes the doubly regularized approach a very effective tool for feature selection in classification problems [8,14,16,24].The soft-margin SVM model (SV M) is a convex quadratic optimization problem with finitely many linear inequality constraints.By introducing auxiliary variables, the doubly regularized support vector machine (DrSVM) can also be equivalently reformulated as a convex quadratic optimization problem with finitely many linear inequality constraints.Noting that the feasible regions of these optimization models are nonempty, the celebrated Frank-Wolfe theorem [2] ensures that the optimal solutions always exist for these two optimization models.
In practice, the given data u i , i = 1, . . ., m, are often uncertain.We assume that these data are subject to the following spectrahedral data uncertainty parameterized with the radius parameter r i ≥ 0: where each V i , i = 1, . . ., m, is a bounded spectrahedron given by l , l = 1, . . ., s, being symmetric ( p × p) matrices.The spectrahedral uncertainty encompasses many important commonly used uncertainty sets such as polyhedral uncertainty sets (where all A (i) l are diagonal matrices), ball uncertainty sets, ellipsoidal uncertainty sets and their intersections.We assume that the label α i is free of uncertainty.
Let r = (r 1 , . . ., r m ).Then, the robust support vector machine can be stated as: Note that a robust support vector machine model problem is, in general, a semiinfinite convex optimization problem.Note also that an optimal solution exists for (RSV M r ) whenever the robust feasible set F is nonempty where F = {(w, γ, ξ) : . ., m}, and the label sets I A and I B are both nonempty, where I A = {1 ≤ i ≤ m : α i = 1} and I B = {1 ≤ i ≤ m : α i = −1}.To see this, denote the objective function of (RSV M r ) by f and let the optimal value of (RSV M r ) be inf(RSV M r ).As the robust feasible set is nonempty and the objective function f is always bounded below by 0, and so, inf(RSV M r ) is a non-negative real number.Let (w k , γ k , ξ k ) be a minimizing sequence, that is, (w k , γ k , ξ k ) ∈ F and f (w k , γ k , ξ k ) → inf(RSV M r ).From the definitions of f and the fact that ξ k ∈ R m + , we see that {w k } and {ξ k } are bounded sequences.Now, As F is a closed set and f is a continuous function, it follows that an optimal solution exists for (RSV M r ).
Figure 1 presents an illustration for both robust and non-robust SVM classifiers.On the left, we see the separating hyperplane and two bounding hyperplanes found by solving a standard (non-robust) SVM; on the right, we see the corresponding hyperplanes found by solving a robust SVM with box uncertainty: In the next section, we will turn our attention to reformulating the robust SVM (RSV M r ) into a numerically tractable optimization problem.Without loss of generality throughout this paper, we always assume that an optimal solution for the robust SVM model (RSV M r ) exists.

SDP Formulations for Robust SVM via Conic Duality
In this section, we first show that the robust support vector machine problem under more general spectrahedron uncertainty can be equivalently reformulated as a semidefinite programming problem via a support function technique and conic duality [3,4].We then derive simple numerically tractable formulations for the cases where the uncertainty sets are ellipsoids, balls and boxes.
We begin by establishing a simple lemma which shows that the robust support vector machine problem is equivalent to a nonsmooth convex optimization problem with finitely many inequality constraints.As we see later in the section, this lemma allows us to easily achieve an equivalent semi-definite programming reformulation for (RSV M r ) .To do this, we define the support function of a closed convex and bounded set C ⊂ R s by Then, the support function, σ C (•), is a convex function and closed-form formulae for the support function are known for various cases of C, such as balls and boxes.For instance, if For details, see [3,4].
Consider the following nonsmooth convex optimization problem which we associate with (RSV M r ): where for each i = 1, 2, . . ., m, the support function Proof The robust SVM problem can be equivalently rewritten as follows: Note that |w l | ≤ t l is equivalent to −w l ≤ t l and w l ≤ t l for all l = 1, . . ., s.Moreover, w 2  2 ≤ μ can be equivalently rewritten in terms of conic constraints as follows.
To finish the proof, we only need to show that for all i = 1, . . ., m, To see this, observe that α i (u which is in turn equivalent to 0 ≥ max This means that (3.1) holds, and so, the conclusion follows.

123
We now consider the following semi-definite program which can easily be shown to be equivalent to (RSV M r ) by conic duality [3,4].
Theorem 3.1 (Spectrahedral uncertainty: semi-definite program) Consider the robust support vector machine problem (RSV M r ) with spectrahedral uncertainty, and its associated semi-definite program problem (S D P r ).Suppose that for each i = 1, . . ., m, the interior of the spectrahedron V i is nonempty, i.e., there exists Proof By lemma 3.1, the robust SVM problem is equivalent to l , l = 1, . . ., s, being symmetric ( p × p) matrices.To see the conclusion, it suffices to show that for each i = 1, . . ., m, is equivalent to the existence of W i 0 such that Suppose that (3.2) holds.Then, the following implication holds: As the interior point condition holds, it follows from the conic duality theorem [3,4] that there exists W i 0 such that Note that the validity of the affine inequality, a T x + b ≤ 0 for all x ∈ R s , with a ∈ R s and b ∈ R, means that a = 0 s and b ≤ 0. Thus, we see that (3.5) is equivalent to (3.3).Conversely, suppose that for each i = 1, . . ., m, there exists W i 0 such that (3.3) holds.Then, (3.5) holds.Consequently, where the last inequality follows from W i 0 and v i ∈ V i (and so, A 2) holds.Hence, (3.2) is equivalent to the existence of W i 0 such that (3.3) holds and the conclusion follows.
We now derive numerically tractable formulations for (RSV M r ) in terms of the second-order cone programs, under the uncertainty sets that take the form of an ellipsoid, ball or box.Although these equivalent formulations and the associated duality results may be derived from (S D P r ) and Theorem 3.1, respectively, by appropriately choosing the matrices A (i)  l , l = 1, . . ., s, i = 1, . . ., m, of the spectrahedron V i , in the interest of simplicity, we present the results from the model (AP r ) and Lemma 3.1.Related special cases for the standard robust SVM models, where λ 1 = 0 or λ 2 = 0, can be found in [4,6].Ellipsoidal Uncertainty Consider the case where the uncertainty sets V i are ellipsoids in the sense that for some M i 0. Let M i = L i L T i with L i being an invertible matrix.We associate with this case the following second-order cone program: Proposition 3.1 (Ellipsoidal uncertainty: second-order cone program) For the robust support vector machine problem (RSV M r ) under ellipsoidal uncertainty, as defined in (3.6), and its associated second-order cone problem (SOC P r ,E ), it holds that min(RSV M r ) = min(SOC P r ,E ).
Moreover, (w, γ, ξ) is a solution of (RSV M r ) under ellipsoidal uncertainty if and only if there exists t ∈ R s and μ ∈ R such that (w, γ, ξ, t, μ) is a solution of (SOC P r ,E ).Proof In the case of ellipsoidal uncertainty as defined in (3.6), the support function σ U i (r i ) (−α i w) can be expressed as: where the last two equalities follow from the support function formula and α i ∈ {−1, 1}, respectively.Thus, the conclusion follows from Lemma 3.1.

Ball Uncertainty
We now consider the case where the perturbation sets V i are unit balls: In this case, we consider the following second-order cone program: Moreover, (w, γ, ξ) is a solution of (RSV M r ) under ball uncertainty if and only there exist t ∈ R s and μ ∈ R such that (w, γ, ξ, t, μ) is a solution of (SOC P r ,B ).

Proof
The result follows immediately from Proposition 3.1, since v i Box Uncertainty Finally, we consider the case where the perturbation sets V i are unit boxes: We associate with this case the following second-order cone program (see [6]): ) can be expressed as: Thus, the conclusion follows from Lemma 3.1.

4 A New Robust pq-SVM for Efficient Classification
In this section, we derive an efficient scheme for finding a robust classifier under box uncertainty by extending the approach in [14,16,27] and using a variable transformation and Lagrangian duality [18] to reformulate the robust SVM model (SOC P r ,∞ ) into a simple non-negative quadratic program.

QP Reformulation via Lagrangian Duality
Recall that the problem (SOC P r ,∞ ) can be equivalently rewritten as: Define vectors p, q ∈ R s + by for i = 1, . . ., s.Then, it is easy to see that w = p − q, p, q ≥ 0 s and p T q = 0.
Consequently, we can rewrite w 2 2 and w 1 as w 2 2 = p 2 2 + q 2 2 and w 1 = e T s ( p + q).So, the problem (SOC P r ,∞ ) can be defined as: where

Now, we define:
Define further that Then, problem (SOC P r ,∞ ) can be rewritten as the following convex quadratic programming problem: where b ≥ 0. By removing the linear term of the objective function via regularization, as in [14,16,27] and by regularizing γ , we arrive at the regularized problem min where ν > 0. Note that the regularization makes the non-negativity condition y ≥ 0 m+2s redundant.The Lagrangian dual of the regularized problem is given by max where and Eliminating the equality constraints, the dual can be written as: Define the matrix Q ∈ R (m+2s)×(m+2s) and the vector η ∈ R m+2s by Then, a direct verification shows that ( D Û + I m+2s ) T d = 0 m+2s if and only if d = 0 m+2s , and so, Q is positive definite.So, we arrive at a simple strictly convex quadratic programming problem over non-negative orthant: Notice that (Q P) no longer has the hyperparameter λ 1 in its formulation.Having found a solution u of (4.8), we can then retrieve a solution to our original problem, via the dual equality constraints (4.5) and (4.6), i.e., . . .

Efficient Iterative Scheme
To solve (Q P), we propose a variation of the LSVM algorithm put forth in [34].
Recall from [34] that the point u is an optimal solution for (Q P) if and only if where the k th -component of max(z, 0 m+2s ) = max(z k , 0), k = 1, . . ., m + 2s.Unlike the LSVM Algorithm where it is taken that a = Qu − η and b = u, we propose instead to take a = u and b = Qu −η.Therefore, we arrive at the optimality condition u = max(u − α(Qu − η), 0 m+2s ), α > 0 from which we derive the simple iterative scheme: with a starting point given by u The difference between our proposed iterative scheme and the LSVM Algorithm is that we require only the inversion of the matrix Q once, to define the starting point for the iteration, whereas the LSVM Algorithm requires solving a linear system in Q at each step of the iteration.We also only require a single matrix multiplication per iteration.
The complete iterative algorithm required to return the optimal solution u * of (Q P) is given below, and its proof of (linear) convergence is given in Appendix C.
while it < maxiter and u prev − u > tol do u prev ← u; u ← max(u − α(Qu − η), 0 m+2s ); it ← it +1; end return u Notice that maxiter is a threshold for the maximum number of iterations, tol is for the convergence tolerance of the algorithm, and α > 0 is a pre-selected constant.

Experiments with Real-World Data Sets
In this section, we evaluate several different SVM models, derived from our above results, against some real-world datasets, which are all available from the UCI Machine Learning repository [13].The aim is to compare the models in terms of accuracy, computational expense and feature selection.

Experimental Setup
Three datasets are used for the comparison of our models: • The Wisconsin Breast Cancer (Diagnostic) (WBCD) [13]: this dataset describes the (binary) classification problem of labelling tumours as either malignant or benign.The dataset contains 569 instances, 30 features and a 63/37 split of the two classes.• Cylinder Bands (CYLBN) [13]: this dataset describes the problem of mitigating delays known as "cylinder bands" in rotogravure printing.It contains 541 instances, 33 features and a 58/42 class split.• Pima Indians Diabetes (PIMA) [39]: this dataset describes the problem of classifying diabetes in patients.It contains 768 instances, 8 features and a 35/65 class split.
Each of these datasets had their features standardized and then split into 80%/20% training and test sets.For each of our models, we then perform the following.We tune the model hyperparameters in their cross-validation range via fivefold cross-validation on the training set.The model is then fitted to the full training set, to obtain optimal solution w * , γ * , and the training accuracy is recorded.We next determine the features selected by the model.This is done by considering the significance (weighting) of each feature in w * .More precisely, feature k is considered significant by a model if (5.1) Having done this, we set the value of w * k to zero for each insignificant feature.Note that this is a stricter version of the feature selection methods employed in [14,27].Finally, the model predicts the classification of all datapoints u i in the test set: α i := sign(w T u i + γ ), and we record the test accuracy.

Classification Methods
We will apply the following classification methods to each of the datasets.For robust models, we will assume that the radius of robustness is constant for each datapoint: • The standard SVM model (SV M).This method does not consider uncertainty in the datapoints.We refer to this method as SVM.• The ball uncertainty robust SVM model (SOC P r ,B ).We refer to this method as Ball-SVM.
For our experiments, the cross-validation range for each hyperparameter was determined as follows.For the first three methods, λ 1 and λ 2 were tuned over values 2 k , k ∈ {−10, . . ., 4}. (Note that SVM does not tune the parameter λ 1 which is set to 0.) For the pq-SVM, ν and λ 2 were tuned over the range 2 j , j ∈ {−10, 9, . . ., 10}.This is due to the sensitivity of the pq-SVM to hyperparameters.For all of our robust methods (Ball-SVM, Box-SVM, and Boxpq-SVM), the radius of robustness was tuned as follows.We set a lower bound of 2 −20 , and an upper bound given by a simple heuristic: for each datapoint in one class, we calculate the maximum distance from it to any point in the other class.We then take the minimum of these distances and finally use as our upper bound the maximum over the two classes.Formally, we can define our upper bound U as: Having obtained our upper bound on the radius, we then tuned our radius of robustness within an exponential range over this lower and upper bound.
Our choice of heuristic for the upper bound is justified as follows: for any radius of robustness larger than this value, consider a (non-trivial) true classifier.It is certain that every datapoint would simultaneously have a point in its uncertainty set on one side of the classifier, and another point in its uncertainty set on the other side.In this case, it is clear that the robust SVM will select a trivial solution, i.e., a majority class prediction.

Results
All computations were performed using a 3.2GHz Intel(R) Core(TM) i7-8700 and 16GB of RAM, equipped with MATLAB R2019B.All optimization problems were solved via the MOSEK software [36], handled through the YALMIP interface [31].
Table 1 shows the results for each of the classification methods on each of the datasets.The columns are interpreted as follows: • Dataset: as written.
• Instances: number of instances in the dataset.
• Model: the classification method used.
• Train Acc: Accuracy obtained on the training set, as a percentage.
• Test Acc: Accuracy obtained on the test set, as a percentage.
• Fit Time (s): CPU time (seconds) taken to solve the final optimization problem, after cross validation.• Features: the number of selected features from each model, out of the total number of features in the dataset.

Discussion
Overall, it is apparent from Table 1 that on all three datasets, for both testing accuracy and number of selected features, the standard SVM is outperformed by both Box-SVM and Ball-SVM robust methods.Box-SVM-Best performing robust method Regarding best performance in terms of testing accuracy, this method performs best in this area, producing the highest testing Columns defined above.The best result(s) for each dataset in each category is given in bold accuracy on all three datasets.It is also consistently economical in its selection of features.We see the method as the most consistent of the four.

Boxpq-SVM-Faster robust method
This method is remarkably efficient for reasonable sized datasets, and much faster than our other two robust methods.

Robust methods in applications
Regarding the choice of which of the three robust methods to utilize in an application, one should consider which subset of accuracy, feature selection and computational time is most desired.No one of our robust methods outperforms any other on all three counts.We do note that the Boxpq-SVM method requires storage of an (m + 2s) × (m + 2s) size matrix which, even if defined sparsely, can consume significant amounts of physical memory in computation for very large datasets.
Finally, we can also compare our results to others in the recent literature on robust optimization methods for SVMs, namely [6].Our methods achieve near identical accuracies compared on WBCD and PIMA to the feature-robust method presented in [6] (which also is designed for box-uncertainty), with a slightly higher accuracy achieved by our methods on PIMA.
We do note that whilst the heuristic we use for defining an upper bound on the radius of robustness tuning range is simple, it is both effective and computationally efficient even for very large datasets.Characterizing an upper bound on the radius within a mathematical framework would need to be investigated in a further study.

Application to Characterization of Huntington Disease Onset
HD is a neurodegenerative movement disorder caused by a mutation in the HTT gene with motor, cognitive and psychiatric manifestations [1,11,33].It is one of the most common monogenic neurological diseases in the developed world [1].From its onset, typically in the fourth decade of life, the signs and symptoms of HD progress inexorably until certain death.No treatment currently available can alter this course.
Defining the onset of HD is of significant clinical and research importance.Studies tracking the progression of the motor manifestations of HD over time have shown that there is an acceleration in motor decline around the time of onset diagnosis [30,32].Thus, making a clinical diagnosis of HD onset heralds a poor prognosis for the patient and, in addition to carrying significant emotional weight, can have important implications for key life decisions such as family planning.Secondly, clearly defining disease onset is paramount for establishing endpoints in clinical trials where the efficacy of putative disease-modifying therapies in the future could be measured by their ability to delay onset in subjects harbouring the HTT mutation [25].
The current formal diagnosis of HD onset or "manifest HD" (mHD) is based on the motor manifestations of the disease according to the motor component of the Unified Huntington Disease Rating Scale (UHDRS).The UHDRS has been shown to have a high degree of internal consistency and interrater reliability [26].It contains 31 items relating to characteristic motor abnormalities in HD, which are scored from 0 (normal) to 4 (significantly abnormal), and their sum, the total motor score (TMS).Based on these scores, the clinician assigns a "diagnostic confidence level" (DCL) between 0 (normal) and 4 (≥ 99% confidence that the motor abnormalities are unequivocal signs of HD), representing their confidence that the subject's motor abnormalities are due to mHD.A diagnosis of mHD is made when the DCL is 4 [33].However, there are currently no rules relating the scores from the 31 items and the TMS to the DCL and so the DCL rating relies on the clinician's expertise.Given the significance of making a diagnosis of HD onset, a standardized and objective means of arriving at the diagnosis from the motor assessment is needed [11,33].

Conic duality Methods
We applied the conic duality-based robust SVM methods to data from the Enroll-HD study consisting of 36,953 sets of motor scores and corresponding DCLs from patients with HD around the world.Subjects with a DCL of 4 were defined as having mHD, whilst those with a DCL less than 4 (i.e.0, 1, 2 and 3) were defined as having nonmanifest HD (nHD).There were 19303 (52%) cases of mHD and 17650 (48%) cases of nHD defined in this way, a roughly even split between the two groups, minimizing classification bias.Data from the 31 motor items from the UHDRS and the TMS (32 features in total) for these subjects were used as potential features to predict classification by conic duality-based robust SVM models.
Due to the physical memory limitations of our hardware, and the size of the dataset, Boxpq-SVM was not applied to the Enroll-HD dataset.

Results
Both box and ball robust SVM models achieved similarly high accuracies of over 95% in both the training and testing phases (Table 2).The features that were selected by each model for each problem are given in Table 3 above.The descriptions of all features are given later in "Section 8.2 Appendix B".

Discussion
In this application, we have used conic duality-based robust SVM methods to establish a highly accurate classification of HD disease status based solely on UHDRS motor scores.
Not all features were selected by the models despite achieving very high classification accuracies, suggesting that the feature selection aspect of these models successfully eliminated unnecessary variables that may interfere with prediction.The Ball-SVM achieved a marginally higher classification accuracy than the Box-SVM at the expense of selecting more features.All of the features selected by the Box-SVM were also selected by the Ball-SVM.
The features that were not selected by both models, vertical saccade velocity and right-sided arm rigidity, are members of two pairs of features examining identical aspects of motor function on different sides of the body (left/right) or in different planes (horizontal/vertical).It would not be surprising if only one feature from each pair was sufficient and more efficient for prediction.Alternatively, this may reflect an inherent asymmetry in HD [37,44].
Notably, none of the five features relating to dystonia (involuntary muscle contractions) were selected by the Box-SVM model.This may reflect the particular disease phenotype or disease stage of the study population.In typical adult-onset HD, choreiform ("dance-like") movements, which were features that were selected by both models, dominate early in the disease course, whereas dystonia is not prominent until the later stages [1].
Similarly, gait abnormality, which is a composite effect of multiple other motor abnormalities including late features like dystonia, was not selected by the Box-SVM model [43].TMS was also not selected by this model, suggesting that it does not strongly influence classification when its component items are used as features.

Conclusion and Further Work
In this paper, by employing a support function approach and conic duality of convex optimization, we first presented a readily numerically solvable semi-definite program reformulation for a general robust SVM data classification problem, where the uncertainty sets are spectrahedra.A spectrahedron, which is an important generalization of a polyhedron, has played a key role in a range of fields, including algebraic geometry and semi-definite optimization.It also encompasses the most commonly used uncertainty sets of robust optimization models, such as boxes, balls and ellipsoids.We have shown that the conic duality-based robust SVMs with box and ball uncertainty sets achieved classification accuracies of over 95% on the large Enroll-HD dataset and provided important information about the features that characterize HD onset.
As an alternative to the second-order cone program reformulation of the robust SVM with box-uncertainty sets, we also presented a new efficient iterative scheme, Boxpq-SVM, for solving the robust SVM by reformulating it as a simple convex quadratic optimization problem via Lagrangian duality.We have demonstrated through computational studies on a range of datasets that these robust classification methods allow for greater classification accuracies than conventional support vector machines in addition to selecting groups of highly correlated features.
Further work is planned to examine the generalizability of the Enroll-HD robust classifier on other HD datasets with the aim of producing a simple and reliable clinical decision support tool to aid in the identification of patients with manifest HD.It would also be of interest to study how the Boxpq-SVM approach can be extended to treating large-scale data sets.material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix A: Spectrahedra
A spectrahedron ⊂ R m is the intersection of the cone of positive semidefinite matrices with an affine-linear space and is represented algebraically by := {(x 1 , . . ., x m ) ∈ R m : A 0 + m j=1 x j A j 0} for some symmetric matrices A j , j = 0, 1, . . ., m.If A j 's are diagonal matrices, then is known as a polyhedron; thus, a polyhedron is a spectrahedron and the converse is not true.For instance, a closed unit ball 1} is a bounded spectrahedron as it can be written as: A spectrahedron is an important extension of a polyhedron, and it covers many convex infinite sets arising in a range of applications (see [42]).The reader is directed to [38] for more details on the links between polyhedra, spectrahedra and semi-definite linear programming.It is worth noting that in general, a bounded spectrahedron can have infinitely many faces, whilst a bounded polyhedron has only finitely many faces (see Fig. 2 for an illustration).A three-dimensional example of a bounded spectrahedron which has infinitely many faces is the three-dimensional elliptope [42] which is given by E := {(x 1 , x 2 , x 3 ) ∈ where the first inequality is from the non-expansive property for the projection mapping onto the non-negative orthant.To finish the proof, it suffices to show that where the last strict inequality follows by our choice of α and the fact that d = 0 m+2s .So, (8.2) implies that |d T (I m+2s − α Q)d| < d 2 for all d = 0 m+2s .Hence, all the eigenvalues of I m+2s − α Q have absolute value strictly less than one, and so, I m+2s − α Q < 1.Thus, the conclusion follows.

Fig. 1 a
Fig. 1 a Classifier determined by the standard SVM.b Classifier determined by the robust SVM with box uncertainty.The uncertainty set around each datapoint is shown

Lemma 3 . 1 (
Equivalent nonsmooth convex program) Consider the robust support vector machine problem (RSV M r ) with a bounded spectrahedron uncertainty set.Then,

Corollary 3 . 1 (
Ball uncertainty: simple second-order cone program) For the robust support vector machine problem (RSV M r ) under ball uncertainty, as defined in(3.7), and its associated second-order cone problem (SOC P r ,B ), it holds that min(RSV M r ) = min(SOC P r ,B ).

Table 1
Results for each classification method on each dataset

Table 2
Results for each classification method on Enroll-HD dataset

Table 3
Motor features selected by the classifier found by each method