Keywords

1 Introduction

Gliomas account for approximately 45% of primary brain tumors. Most deadly gliomas are classified by World Health Organization (WHO) as grades III and IV, referred to as high-grade gliomas (HGG). Related studies have shown that O6-methylguanine-DNA methyltransferase promoter methylation (MGMT-m) and isocitrate dehydrogenases mutation (IDH1-m) are the two strong molecular indicators that may associate with better prognosis (i.e., better sensitivity to the treatment and longer survival time), compared to their counterparts, i.e., MGMT promoter unmethylation (MGMT-u) and IDH1 wild (IDH1-w) [1, 2]. To date, the identification of the MGMT and IDH1 statuses is becoming clinical routine, but conducted via invasive biopsy, which has limited their wider clinical implementation. For better treatment planning, non-invasive and preoperative prediction of MGMT and IDH1 statuses is highly desired.

A few studies have been carried out to predict MGMT/IDH1 status based on the preoperative neuroimages. For example, Korfiatis et al. extracted tumor texture features from a single T2-weighted MRI modality, and trained a support vector machine (SVM) to predict MGMT status [1]. Yamashita et al. extracted both the functional information (i.e., tumor blood flow) from perfusion MRI and the structural features from T1-weighted MRI, and employed a nonparametric approach to predict IDH1 status [2]. Zhang et al. extracted more voxel- and histogram-based features from T1-, T2-, and diffusion-weighted images (DWI), and employed a random forest (RF) classifier to predict IDH1 status [3].

However, all these studies are limited to predict either MGMT or IDH1 status alone by using a single-task machine learning technique, which simply ignores the potential relationship of these two molecular expressers that may help each other to achieve more accurate prediction results [4]. It is desirable to use multi-task learning approach to jointly predict the MGMT and IDH1 statuses. Meanwhile, in the clinical practice, a complete molecular pathological testing may not always be conducted; therefore, in several cases there is only one biopsy-proven MGMT or IDH1 status, which leads to incomplete training labels or a missing data problem. Traditional methods usually simply discard the subjects with incomplete labels, which, however, further reduces the number of training samples. The recently proposed Multi-label Transductive Matrix Completion (MTMC) model is an important multi-task classification method, which can make full use of the samples with missing labels [5] and has produced good performance in many previous studies [5, 6]. However, it is difficult to be generalized to a study with a limited sample size due to its inherent overfitting; thus, many phenotype-genotype studies inevitably suffer from such a problem.

In order to address the above limitations, we propose a novel Multi-label Inductive Matrix Completion (MIMC) model by introducing an online inductive learning strategy into the MTMC model. However, the solution of MIMC is not trivial, since it contains both the non-smooth nuclear-norm and L21-norm constraints. Therefore, based on the block coordinate descent method, we design an optimization algorithm to optimize the MIMC model. Note that, in this paper, we do not adopt the commonly used radiomics information derived from T1- or T2-weighted structural MRI, but instead use the connectomics information derived from both resting-state functional MRI (RS-fMRI) and diffusion tensor imaging (DTI). The motivation behind this is that the structural MRI-based radiomics features are highly affected by tumor characteristics (e.g., locations and sizes) and thus significantly variable across subjects, which is undesirable for group study and also individual-based classification. On the other hand, brain connectome features extracted from RS-fMRI and DTI reflect the inherent brain connectivity architecture and its alterations due to the highly diffusive HGG, and thus could be more consistent and reliable as imaging biomarkers.

2 Materials, Preprocessing, and Feature Extraction

Our dataset includes 63 HGG patient subjects, which were recruited during 2010–2015. Each subject has at least one biopsy-proven MGMT or IDH1 status. We exclude the subjects without entire RS-fMRI or DTI, or with significant imaging artifacts as well as excessive head motion. Finally, 47 HGG subjects are used in this paper. We summarize subjects’ demographic and clinical information in Table 1. For simplicity, MGMT-m and IDH1-m are labeled as “positive”, respectively, and MGMT-u and IDH1-w as “negative”. This study has been approved by the local ethical committee at local hospital.

Table 1. Demographic and clinical information of the subjects involved in this study.

In this study, all the RS-fMRI and DTI data are collected preoperatively with the following parameters. RS-fMRI: TR (repetition time) = 2 s, number of acquisitions = 240 (8 min), and voxel size = 3.4 × 3.4 × 4 mm3. DTI: 20 directions, voxel size = 2 × 2 × 2 mm3, and multiple acquisitions = 2. SPM8 and DPARSF [7] are used to preprocess RS-fMRI data and construct brain functional networks. FSL and PANDA [8] are used to process DTI and construct brain structural networks. Multi-modality images are first co-registered within the same subject, and then registered to the atlas space. All the processing procedures are following the commonly accepted pipeline [9]. Specifically, we parcellate each brain into 90 regions of interest (ROIs) using Automated Anatomical Labeling (AAL) atlas. The parcellated ROIs in each subject are regarded as nodes in a graph, while the Pearson’s correlation coefficient between the blood oxygenation level dependent (BOLD) time series from each pair of the ROIs are calculated as the functional connectivity strength for the corresponding edge in the graph. Similarly, the structural network is constructed based on the whole-brain DTI tractography by calculating the normalized number of the tracked main streams as the structural connectivity strength for each pair of the AAL ROIs.

After network constructions, we use GRETNA [10] to extract various network properties based on graph theoretic analysis, including degree, shortest path length, clustering coefficient, global efficiency, local efficiency, and nodal centrality. These complex network properties are extracted as the connectomics features for each node in each network. We also use 12 clinical features for each subject, such as patient’s age, gender, tumor size, tumor WHO grade, tumor location, etc. Therefore, each subject has 1092 (6 metrics × 2 networks × 90 regions + 12 clinical features) features.

3 MIMC-Based MGMT and IDH1 Status Prediction

We first introduce the notations used in this paper. \( {\mathbf{X}}^{\left( i \right)} \) denotes the \( i \)-th column of matrix \( {\mathbf{X}} \). \( {\mathbf{x}}_{ij} \) denotes the element in the \( i \)-th row and \( j \)-th column of matrix \( {\mathbf{X}} \). \( 1 \) denotes all-one column vector. \( {\mathbf{X}}^{\text{T}} \) denotes the transpose of matrix \( {\mathbf{X}} \). \( {\mathbf{X}}_{train} = \left[ {{\mathbf{x}}_{1} , \cdots ,{\mathbf{x}}_{m} } \right]^{\text{T}} \in {\mathbb{R}}^{m \times d} \) and \( {\mathbf{X}}_{test} = \left[ {{\mathbf{x}}_{m + 1} , \cdots ,{\mathbf{x}}_{m + n} } \right]^{\text{T}} \in {\mathbb{R}}^{n \times d} \) denote the feature matrices associated with \( m \) training subjects and \( n \) testing subjects, respectively. Assume there are \( t \) binary classification tasks, and \( {\mathbf{Y}}_{train} = \left[ {{\mathbf{y}}_{1} , \cdots ,{\mathbf{y}}_{m} } \right]^{\text{T}} \in \left\{ { - 1,1,?} \right\}^{m \times t} \) and \( {\mathbf{Y}}_{test} = \left[ {{\mathbf{y}}_{m + 1} , \cdots ,{\mathbf{y}}_{m + n} } \right]^{\text{T}} \in \left\{ ? \right\}^{n \times t} \) denote the label matrices associated with \( m \) training subjects and \( n \) testing subjects, where ‘\( ? \)’ denotes the unknown label. Furthermore, for the convenience of description, let \( {\mathbf{X}}^{obs} = \left[ {{\mathbf{X}}_{train} ;{\mathbf{X}}_{test} } \right] \), \( {\mathbf{Y}}^{obs} = \left[ {{\mathbf{Y}}_{train} ;{\mathbf{Y}}_{test} } \right] \) and \( {\mathbf{Z}}^{obs} = \left[ {{\mathbf{X}}^{obs} ,1,{\mathbf{Y}}^{obs} } \right] \) denote the observed feature matrix, label matrix, and stacked matrix, respectively. Let \( {\mathbf{X}}^{0} \in {\mathbb{R}}^{{\left( {m + n} \right) \times d}} \) denote the underlying noise-free feature matrix corresponding to \( {\mathbf{X}}^{obs} \). Let \( {\mathbf{Y}}^{0} \in {\mathbb{R}}^{{\left( {m + n} \right) \times t}} \) denote the underlying soft label matrix, and \( {\mathbf{sign}}\left( {{\mathbf{Y}}^{0} } \right) \) for the underlying label matrix corresponding to \( {\mathbf{Y}}^{obs} \), where \( {\mathbf{sign}}\left( \cdot \right) \) is the element-wise sign function.

3.1 Multi-label Transductive Matrix Completion (MTMC)

MTMC is a well-known multi-label matrix completion model, which is developed with two assumptions. First, linear relationship is assumed between \( {\mathbf{X}}^{0} \) and \( {\mathbf{Y}}^{0} \), i.e., \( {\mathbf{Y}}^{0} = \left[ {{\mathbf{X}}^{0} ,1} \right]{\mathbf{W}} \), where \( {\mathbf{W}} \in {\mathbb{R}}^{{\left( {d + 1} \right) \times t}} \) is the implicit weight matrix. Second, \( {\mathbf{X}}^{0} \) is also assumed to be low-rank. Let \( {\mathbf{Z}}^{0} = \left[ {{\mathbf{X}}^{0} ,1,{\mathbf{Y}}^{0} } \right] \) denote the underlying stacked matrix corresponding to \( {\mathbf{Z}}^{obs} \), and then from \( {\text{rank}}\left( {{\mathbf{Z}}^{0} } \right) \le {\text{rank}}\left( { {\mathbf{X}}^{0} } \right) + 1 \), we can infer that \( {\mathbf{Z}}^{0} \) is also low-rank. The goal of MTMC is to estimate \( {\mathbf{Z}}^{0} \) given \( {\mathbf{Z}}^{obs} \). In the real application, where \( {\mathbf{Z}}^{obs} \) is contaminated by noise, MTMC is formulated as:

$$ \mathop {\hbox{min} }\nolimits_{{{\mathbf{Z}}^{{\left( {d + 1} \right)}} = 1}} \mu \left\| {\mathbf{Z}} \right\|_{*} + \frac{1}{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} - {\mathbf{X}}^{obs} } \right\|_{F}^{2} + \gamma \sum\nolimits_{{\left( {i,j} \right) \in\Omega _{{\mathbf{Y}}} }} {{\mathbb{C}}_{y} \left( {{\mathbf{z}}_{{i\left( {d + 1 + j} \right)}} ,{\mathbf{y}}_{ij}^{obs} } \right)} , $$
(1)

where \( {\mathbf{Z}} = \left[ {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} ,{\mathbf{Z}}^{{\left( {d + 1} \right)}} ,{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} } \right] \) denotes the matrix to be optimized, \( {\mathbf{Z}}_{{\Delta {\mathbf{X}}}} \) denotes the noise-free feature submatrix, \( {\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} \) denotes the soft label submatrix, \( \Omega _{Y} \) denotes the subscripts set of the observed entries in \( {\mathbf{Y}}^{obs} \), \( \left\| \cdot \right\|_{*} \) denotes the nuclear norm, \( \left\| \cdot \right\|_{F} \) denotes the Frobenius norm, and \( {\mathbb{C}}_{y} \left( { \cdot , \cdot } \right) \) denotes the logistic loss function. Once the optimal \( {\mathbf{Z}}^{opt} \) is found, the labels \( {\mathbf{Y}}_{test} \) of the testing subjects can then be estimated by \( {\mathbf{sign}}\left( {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} } \right) \), where \( {\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} \) denotes the optimal soft labels of the testing subjects. Based on the formulation of MTMC, we know that \( {\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} \) is implicitly obtained from \( {\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} = \left[ {{\mathbf{X}}_{test}^{opt} ,1} \right]{\mathbf{W}}^{opt} \), where \( {\mathbf{X}}_{test}^{opt} \) is the optimal noise-free counterpart of \( {\mathbf{X}}_{test} \), and \( {\mathbf{W}}^{opt} \) is the optimal estimation of \( {\mathbf{W}} \). Although \( {\mathbf{W}}^{opt} \) is not explicitly computed, it is implicitly determined by the training subjects and their known labels (i.e., in the third term of Eq. (1)). Therefore, for multi-label classification tasks with insufficient training subjects as in our case, MTMC will still have the inherent overfitting.

3.2 Multi-label Inductive Matrix Completion (MIMC)

In order to alleviate the overfitting, we employ an online inductive learning strategy to modify the MTMC model, and name the modified MTMC as Multi-label Inductive Matrix Completion (MIMC) model. Specifically, we introduce an explicit predictor matrix \( {\tilde{\mathbf{W}}} \in {\mathbb{R}}^{{\left( {d + 1} \right) \times t}} \) into MTMC by adding the following constraint into Eq. (1):

$$ \mathop {\hbox{min} }\nolimits_{{{\tilde{\mathbf{W}}}}} \,\lambda \left\| {{\tilde{\mathbf{W}}}} \right\|_{2,1} + \frac{\beta }{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}} \right\|_{F}^{2} , $$
(2)

where \( \left\| \cdot \right\|_{2,1} \) denotes the L21-norm, which imposes row sparsity on \( {\tilde{\mathbf{W}}} \) to learn the shared representations across all related classification tasks by selecting the common discriminative features. In addition, note also that, in the second term of Eq. (2), we use all the subjects (including the testing subjects) to learn the sparse predictor matrix \( {\tilde{\mathbf{W}}} \) based on the transductive soft labels \( {\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} \). In other words, we leverage the testing subjects as an efficient supplement to the limited training subjects, thus alleviating the small-sample-size issue of the training data that often causes the overfitting problem for training of the classifier. The final MIMC model is given as:

$$ \mathop {\hbox{min} }\nolimits_{{\begin{array}{*{20}c} {{\mathbf{Z}}, {\tilde{\mathbf{W}}}} \\ {{\mathbf{Z}}^{{\left( {d + 1} \right)}} = 1} \\ \end{array} }} \left\{ {\begin{array}{*{20}c} {\mu \left\| {\mathbf{Z}} \right\|_{*} + \frac{1}{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} - {\mathbf{X}}^{obs} } \right\|_{F}^{2} + \gamma \sum\nolimits_{{\left( {i,j} \right) \in\Omega _{{\mathbf{Y}}} }} {{\mathbb{C}}_{y} \left( {{\mathbf{z}}_{{i\left( {d + 1 + j} \right)}} ,{\mathbf{y}}_{ij}^{obs} } \right)} } \\ { + \lambda \left\| {{\tilde{\mathbf{W}}}} \right\|_{2,1} + \frac{\beta }{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}} \right\|_{F}^{2} } \\ \end{array} } \right\}. $$
(3)

In this way, we can obtain the optimal sparse predictor matrix \( {\tilde{\mathbf{W}}}^{opt} \) by using our proposed optimization algorithm in Sect. 3.3 below, and estimate the labels \( {\mathbf{Y}}_{test} \) of the testing subjects \( {\mathbf{X}}_{test} \) by induction:

$$ {\mathbf{Y}}_{test} = {\mathbf{sign}}\left( {\left[ {{\mathbf{X}}_{test} ,1} \right]{\tilde{\mathbf{W}}}^{opt} } \right). $$
(4)

Comparing with the overfitting-prone transductive labels \( {\mathbf{sign}}\left( {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} } \right) \), the inductive labels in Eq. (4), which learned from more subjects (by including the testing subjects) and benefit from the advantage of joint feature selection (via L21-norm), would give us more robust predictions, thus suffering less from the small-sample-size issue.

3.3 Optimization Algorithm for MIMC

The solution of MIMC is not trivial, as it contains the all-1-column constraint (i.e., \( {\mathbf{Z}}^{{\left( {d + 1} \right)}} = 1 \)) in Eq. (3), along with the fact that the L21-norm and nuclear norm are the non-smooth penalties. Here, we employ the block coordinate descent method to design an optimization algorithm for solving MIMC. The key steps of this algorithm are to iteratively optimize the following two Subproblems:

$$ {\mathbf{Z}}_{k} = { \arg }\mathop {\hbox{min} }\nolimits_{{{\mathbf{Z}}^{{\left( {d + 1} \right)}} = 1}} \left\{ {\begin{array}{*{20}c} {\frac{1}{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} - {\mathbf{X}}^{obs} } \right\|_{F}^{2} + \gamma \sum\nolimits_{{\left( {i,j} \right) \in\Omega _{{\mathbf{Y}}} }} {{\mathbb{C}}_{y} \left( {{\mathbf{z}}_{{i\left( {d + 1 + j} \right)}} ,{\mathbf{y}}_{ij}^{obs} } \right)} } \\ { + \mu \left\| {\mathbf{Z}} \right\|_{ *} + \frac{\beta }{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}_{k - 1} } \right\|_{F}^{2} } \\ \end{array} } \right\}, $$
(5)
$$ {\tilde{\mathbf{W}}}_{k} = { \arg }\mathop {\hbox{min} }\nolimits_{{{\tilde{\mathbf{W}}} }} \lambda \left\| {{\tilde{\mathbf{W}}}} \right\|_{2,1} + \frac{\beta }{2}\left\| {\left( {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} } \right)_{k} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}} \right\|_{F}^{2} . $$
(6)

We solve Subproblem 1 in Eq. (5) by employing the Fixed Point Continuation (FPC) method plus the projection technique, with its convergence being proven by Cabral et al. [6]. Specifically, it consists of two steps for each iteration \( t \):

$$ \left\{ {\begin{array}{*{20}l} {\left( {{\mathbf{Z}}_{k} } \right)_{t} = {\mathbf{\mathcal{D}}}_{{\mu \tau_{{\mathbf{Z}}} }} \left( {\left( {{\mathbf{Z}}_{k} } \right)_{t - 1} - \tau_{{\mathbf{Z}}} \nabla {\text{G}}\left( {\left( {{\mathbf{Z}}_{k} } \right)_{t - 1} } \right)} \right) } \hfill \\ { \left( {\left( {{\mathbf{Z}}_{k} } \right)_{t} } \right)^{{\left( {d + 1} \right)}} = 1 } \hfill \\ \end{array} } \right., $$
(7)

where \( \tau_{{\mathbf{Z}}} = { \hbox{min} }\left\{ {1,4/\sqrt {32\beta^{2} + 2\gamma^{2} } } \right\} \) denotes the gradient step size, \( {\mathbf{\mathcal{D}}}_{{\mu \tau_{{\mathbf{Z}}} }} \left( \cdot \right) \) denotes the proximal operator of the nuclear norm [6], and \( \nabla {\text{G}}\left( {\mathbf{Z}} \right) \) is the gradient of \( {\text{G}}\left( {\mathbf{Z}} \right) \):

$$ {\text{G}}\left( {\mathbf{Z}} \right) = \left\{ {\begin{array}{*{20}c} {\frac{1}{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} - {\mathbf{X}}^{obs} } \right\|_{F}^{2} + \frac{\beta }{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}_{k - 1} } \right\|_{F}^{2} } \\ { + \gamma \sum\nolimits_{{\left( {i,j} \right) \in\Omega _{{\mathbf{Y}}} }} {{\mathbb{C}}_{y} \left( {{\mathbf{z}}_{{i\left( {d + 1 + j} \right)}} ,{\mathbf{y}}_{ij}^{obs} } \right)} } \\ \end{array} } \right\}. $$
(8)

The Subproblem 2 in Eq. (6) is a standard L21-norm regularization problem, which can be solved via the accelarated Nesterov’s method with convergence proof in [11]. Specifically, it includes the following step for each iteration \( t \):

$$ \left( {{\tilde{\mathbf{W}}}_{k} } \right)_{t} = {\mathbf{\mathcal{J}}}_{{\lambda \tau_{{{\tilde{\mathbf{W}}}}} }} \left( {\left( {{\tilde{\mathbf{W}}}_{k} } \right)_{t - 1} - \tau_{{{\tilde{\mathbf{W}}}}} \nabla {\text{F}}\left( {\left( {{\tilde{\mathbf{W}}}_{k} } \right)_{t - 1} } \right)} \right), $$
(9)

where \( \tau_{{{\tilde{\mathbf{W}}}}} = 1/\sigma_{max} \left( {\beta \left[ {{\mathbf{X}}^{obs} ,1} \right]^{\text{T}} \left[ {{\mathbf{X}}^{obs} ,1} \right]} \right) \) denotes the gradient step size, \( \sigma_{max} \left( \cdot \right) \) denotes the maximal singular value of matrix, \( {\mathbf{\mathcal{J}}}_{{\lambda \tau_{{{\tilde{\mathbf{W}}}}} }} \left( \cdot \right) \) denotes the proximal operator of L21-norm [11], and \( \nabla {\text{F}}\left( {{\tilde{\mathbf{W}}} } \right) \) is the gradient of \( {\text{F}}\left( {{\tilde{\mathbf{W}}} } \right) \):

$$ {\text{F}}\left( {{\tilde{\mathbf{W}}} } \right) = \frac{\beta }{2}\left\| {\left( {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} } \right)_{k} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}} \right\|_{F}^{2} . $$
(10)

Theoretically, for the jointly convex problem with the separable non-smooth terms, Tseng [12] has demonstrated that the block coordinate descent method is guaranteed to converge to a global optimum, as long as all Subproblems are solvable. In our MIMC model, obviously, the objective function in Eq. (3) is jointly convex for \( {\mathbf{Z}} \) and \( {\tilde{\mathbf{W}}} \), and its non-smooth parts, i.e., both \( \mu \left\|{\mathbf{Z}}\right\|_{ *} \) and \( \lambda\left\| {\tilde{\mathbf{W}}}\right\|_{2,1} \), are separable. Based on this fact, our proposed optimization algorithm also has the provable convergence.

4 Results and Discussions

We evaluate the proposed MIMC by jointly predicting MGMT and IDH1 statuses using our HGG patients. Considering the limited number of 47 subjects, we use 10-fold cross validation to ensure a relatively unbiased prediction performance for the new testing subjects. We compare MIMC with the widely-used single-task machine learning methods (including SVM with RBF kernel [13] and RF [14]) and state-of-the-art multi-task machine learning methods (i.e., Lest_L21 [11] and MTMC [5]). All the involved parameters in these methods are optimized by using the nested 10-fold cross validation procedure.

We measure the prediction performance in terms of accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the receiver operating characteristic curve (AUC). In order to avoid any bias introduced by randomly partitioning the dataset, each 10-fold cross-validation is independently repeated for 20 times. The average experimental results for MGMT and IDH1 status predictions are reported in Tables 2 and 3, respectively. The best results and those ones not significantly worse than the best results at 95% confidence level are highlighted in bold. Except that the Lest_L21 achieves slightly higher specificity (but not statistically significant) than MIMC (i.e., 70.75% vs. 70.00%) in MGMT status prediction, MIMC consistently outperforms SVM, RF and MTMC in all performance metrics, which indicate that our proposed online inductive learning strategy can help improve the prediction performance of MIMC. In addition, we also find that all the multi-task machine learning methods consistently outperform the single-task RF method, but not outperform the single-task SVM method in terms of ACC. We speculate that this is mainly caused by the kernel trick of SVM, which implicitly carries out the nonlinear feature mapping. In future work, we will extend our proposed MIMC model to its nonlinear version by employing the kernel trick to further improve the performance of MGMT and IDH1 status predictions.

Table 2. Performance comparison of different methods for MGMT status prediction.
Table 3. Performance comparison of different methods for IDH1 status prediction.

5 Conclusion

In this paper, we focus on addressing the tasks of predicting MGMT and IDH1 statuses for HGG patients. Considering strong correlation between MGMT promoter methylation and IDH1 mutation, we formulate their prediction tasks as a Multi-label Inductive Matrix Completion (MIMC) model, and then design an optimization algorithm with provable convergence to solve this model. The promising results by various experiments verify the advantages of the proposed MIMC model over the widely-used single- and multi-task classifiers. Also, for the first time, we show the feasibility of molecular biomarker prediction based on the preoperative multi-modality neuroimaging and connectomics analysis.