Multi-label Inductive Matrix Completion for Joint MGMT and IDH1 Status Prediction for Glioma Patients

Chen, Lei; Zhang, Han; Thung, Kim-Han; Liu, Luyan; Lu, Junfeng; Wu, Jinsong; Wang, Qian; Shen, Dinggang

doi:10.1007/978-3-319-66185-8_51

Lei Chen^21,22,
Han Zhang²²,
Kim-Han Thung²²,
Luyan Liu²³,
Junfeng Lu^24,25,
Jinsong Wu^24,25,
Qian Wang²³ &
…
Dinggang Shen²²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10434))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

9714 Accesses
7 Citations
1 Altmetric

Abstract

MGMT promoter methylation and IDH1 mutation in high-grade gliomas (HGG) have proven to be the two important molecular indicators associated with better prognosis. Traditionally, the statuses of MGMT and IDH1 are obtained via surgical biopsy, which is laborious, invasive and time-consuming. Accurate presurgical prediction of their statuses based on preoperative imaging data is of great clinical value towards better treatment plan. In this paper, we propose a novel Multi-label Inductive Matrix Completion (MIMC) model, highlighted by the online inductive learning strategy, to jointly predict both MGMT and IDH1 statuses. Our MIMC model not only uses the training subjects with possibly missing MGMT/IDH1 labels, but also leverages the unlabeled testing subjects as a supplement to the limited training dataset. More importantly, we learn inductive labels, instead of directly using transductive labels, as the prediction results for the testing subjects, to alleviate the overfitting issue in small-sample-size studies. Furthermore, we design an optimization algorithm with guaranteed convergence based on the block coordinate descent method to solve the multivariate non-smooth MIMC model. Finally, by using a precious single-center multi-modality presurgical brain imaging and genetic dataset of primary HGG, we demonstrate that our method can produce accurate prediction results, outperforming the previous widely-used single- or multi-task machine learning methods. This study shows the promise of utilizing imaging-derived brain connectome phenotypes for prognosis of HGG in a non-invasive manner.

You have full access to this open access chapter, Download conference paper PDF

Integrating imaging and genomic data for the discovery of distinct glioblastoma subtypes: a joint learning approach

Article Open access 28 February 2024

Efficient diagnosis of IDH-mutant gliomas: 1p/19qNET assesses 1p/19q codeletion status using weakly-supervised learning

Article Open access 16 September 2023

Deep semi-supervised learning for brain tumor classification

Article Open access 29 July 2020

Keywords

1 Introduction

Gliomas account for approximately 45% of primary brain tumors. Most deadly gliomas are classified by World Health Organization (WHO) as grades III and IV, referred to as high-grade gliomas (HGG). Related studies have shown that O6-methylguanine-DNA methyltransferase promoter methylation (MGMT-m) and isocitrate dehydrogenases mutation (IDH1-m) are the two strong molecular indicators that may associate with better prognosis (i.e., better sensitivity to the treatment and longer survival time), compared to their counterparts, i.e., MGMT promoter unmethylation (MGMT-u) and IDH1 wild (IDH1-w) [1, 2]. To date, the identification of the MGMT and IDH1 statuses is becoming clinical routine, but conducted via invasive biopsy, which has limited their wider clinical implementation. For better treatment planning, non-invasive and preoperative prediction of MGMT and IDH1 statuses is highly desired.

A few studies have been carried out to predict MGMT/IDH1 status based on the preoperative neuroimages. For example, Korfiatis et al. extracted tumor texture features from a single T2-weighted MRI modality, and trained a support vector machine (SVM) to predict MGMT status [1]. Yamashita et al. extracted both the functional information (i.e., tumor blood flow) from perfusion MRI and the structural features from T1-weighted MRI, and employed a nonparametric approach to predict IDH1 status [2]. Zhang et al. extracted more voxel- and histogram-based features from T1-, T2-, and diffusion-weighted images (DWI), and employed a random forest (RF) classifier to predict IDH1 status [3].

However, all these studies are limited to predict either MGMT or IDH1 status alone by using a single-task machine learning technique, which simply ignores the potential relationship of these two molecular expressers that may help each other to achieve more accurate prediction results [4]. It is desirable to use multi-task learning approach to jointly predict the MGMT and IDH1 statuses. Meanwhile, in the clinical practice, a complete molecular pathological testing may not always be conducted; therefore, in several cases there is only one biopsy-proven MGMT or IDH1 status, which leads to incomplete training labels or a missing data problem. Traditional methods usually simply discard the subjects with incomplete labels, which, however, further reduces the number of training samples. The recently proposed Multi-label Transductive Matrix Completion (MTMC) model is an important multi-task classification method, which can make full use of the samples with missing labels [5] and has produced good performance in many previous studies [5, 6]. However, it is difficult to be generalized to a study with a limited sample size due to its inherent overfitting; thus, many phenotype-genotype studies inevitably suffer from such a problem.

In order to address the above limitations, we propose a novel Multi-label Inductive Matrix Completion (MIMC) model by introducing an online inductive learning strategy into the MTMC model. However, the solution of MIMC is not trivial, since it contains both the non-smooth nuclear-norm and L₂₁-norm constraints. Therefore, based on the block coordinate descent method, we design an optimization algorithm to optimize the MIMC model. Note that, in this paper, we do not adopt the commonly used radiomics information derived from T1- or T2-weighted structural MRI, but instead use the connectomics information derived from both resting-state functional MRI (RS-fMRI) and diffusion tensor imaging (DTI). The motivation behind this is that the structural MRI-based radiomics features are highly affected by tumor characteristics (e.g., locations and sizes) and thus significantly variable across subjects, which is undesirable for group study and also individual-based classification. On the other hand, brain connectome features extracted from RS-fMRI and DTI reflect the inherent brain connectivity architecture and its alterations due to the highly diffusive HGG, and thus could be more consistent and reliable as imaging biomarkers.

2 Materials, Preprocessing, and Feature Extraction

Our dataset includes 63 HGG patient subjects, which were recruited during 2010–2015. Each subject has at least one biopsy-proven MGMT or IDH1 status. We exclude the subjects without entire RS-fMRI or DTI, or with significant imaging artifacts as well as excessive head motion. Finally, 47 HGG subjects are used in this paper. We summarize subjects’ demographic and clinical information in Table 1. For simplicity, MGMT-m and IDH1-m are labeled as “positive”, respectively, and MGMT-u and IDH1-w as “negative”. This study has been approved by the local ethical committee at local hospital.

Table 1. Demographic and clinical information of the subjects involved in this study.

Full size table

In this study, all the RS-fMRI and DTI data are collected preoperatively with the following parameters. RS-fMRI: TR (repetition time) = 2 s, number of acquisitions = 240 (8 min), and voxel size = 3.4 × 3.4 × 4 mm³. DTI: 20 directions, voxel size = 2 × 2 × 2 mm³, and multiple acquisitions = 2. SPM8 and DPARSF [7] are used to preprocess RS-fMRI data and construct brain functional networks. FSL and PANDA [8] are used to process DTI and construct brain structural networks. Multi-modality images are first co-registered within the same subject, and then registered to the atlas space. All the processing procedures are following the commonly accepted pipeline [9]. Specifically, we parcellate each brain into 90 regions of interest (ROIs) using Automated Anatomical Labeling (AAL) atlas. The parcellated ROIs in each subject are regarded as nodes in a graph, while the Pearson’s correlation coefficient between the blood oxygenation level dependent (BOLD) time series from each pair of the ROIs are calculated as the functional connectivity strength for the corresponding edge in the graph. Similarly, the structural network is constructed based on the whole-brain DTI tractography by calculating the normalized number of the tracked main streams as the structural connectivity strength for each pair of the AAL ROIs.

After network constructions, we use GRETNA [10] to extract various network properties based on graph theoretic analysis, including degree, shortest path length, clustering coefficient, global efficiency, local efficiency, and nodal centrality. These complex network properties are extracted as the connectomics features for each node in each network. We also use 12 clinical features for each subject, such as patient’s age, gender, tumor size, tumor WHO grade, tumor location, etc. Therefore, each subject has 1092 (6 metrics × 2 networks × 90 regions + 12 clinical features) features.

3 MIMC-Based MGMT and IDH1 Status Prediction

We first introduce the notations used in this paper. $ {\mathbf{X}}^{\left( i \right)} $ denotes the $ i $-th column of matrix $ {\mathbf{X}} $. $ {\mathbf{x}}_{ij} $ denotes the element in the $ i $-th row and $ j $-th column of matrix $ {\mathbf{X}} $. $ 1 $ denotes all-one column vector. $ {\mathbf{X}}^{\text{T}} $ denotes the transpose of matrix $ {\mathbf{X}} $. $ {\mathbf{X}}_{train} = \left[ {{\mathbf{x}}_{1} , \cdots ,{\mathbf{x}}_{m} } \right]^{\text{T}} \in {\mathbb{R}}^{m \times d} $ and $ {\mathbf{X}}_{test} = \left[ {{\mathbf{x}}_{m + 1} , \cdots ,{\mathbf{x}}_{m + n} } \right]^{\text{T}} \in {\mathbb{R}}^{n \times d} $ denote the feature matrices associated with $ m $ training subjects and $ n $ testing subjects, respectively. Assume there are $ t $ binary classification tasks, and $ {\mathbf{Y}}_{train} = \left[ {{\mathbf{y}}_{1} , \cdots ,{\mathbf{y}}_{m} } \right]^{\text{T}} \in \left\{ { - 1,1,?} \right\}^{m \times t} $ and $ {\mathbf{Y}}_{test} = \left[ {{\mathbf{y}}_{m + 1} , \cdots ,{\mathbf{y}}_{m + n} } \right]^{\text{T}} \in \left\{ ? \right\}^{n \times t} $ denote the label matrices associated with $ m $ training subjects and $ n $ testing subjects, where ‘$ ? $’ denotes the unknown label. Furthermore, for the convenience of description, let $ {\mathbf{X}}^{obs} = \left[ {{\mathbf{X}}_{train} ;{\mathbf{X}}_{test} } \right] $, $ {\mathbf{Y}}^{obs} = \left[ {{\mathbf{Y}}_{train} ;{\mathbf{Y}}_{test} } \right] $ and $ {\mathbf{Z}}^{obs} = \left[ {{\mathbf{X}}^{obs} ,1,{\mathbf{Y}}^{obs} } \right] $ denote the observed feature matrix, label matrix, and stacked matrix, respectively. Let $ {\mathbf{X}}^{0} \in {\mathbb{R}}^{{\left( {m + n} \right) \times d}} $ denote the underlying noise-free feature matrix corresponding to $ {\mathbf{X}}^{obs} $. Let $ {\mathbf{Y}}^{0} \in {\mathbb{R}}^{{\left( {m + n} \right) \times t}} $ denote the underlying soft label matrix, and $ {\mathbf{sign}}\left( {{\mathbf{Y}}^{0} } \right) $ for the underlying label matrix corresponding to $ {\mathbf{Y}}^{obs} $, where $ {\mathbf{sign}}\left( \cdot \right) $ is the element-wise sign function.

3.1 Multi-label Transductive Matrix Completion (MTMC)

MTMC is a well-known multi-label matrix completion model, which is developed with two assumptions. First, linear relationship is assumed between $ {\mathbf{X}}^{0} $ and $ {\mathbf{Y}}^{0} $, i.e., $ {\mathbf{Y}}^{0} = \left[ {{\mathbf{X}}^{0} ,1} \right]{\mathbf{W}} $, where $ {\mathbf{W}} \in {\mathbb{R}}^{{\left( {d + 1} \right) \times t}} $ is the implicit weight matrix. Second, $ {\mathbf{X}}^{0} $ is also assumed to be low-rank. Let $ {\mathbf{Z}}^{0} = \left[ {{\mathbf{X}}^{0} ,1,{\mathbf{Y}}^{0} } \right] $ denote the underlying stacked matrix corresponding to $ {\mathbf{Z}}^{obs} $, and then from $ {\text{rank}}\left( {{\mathbf{Z}}^{0} } \right) \le {\text{rank}}\left( { {\mathbf{X}}^{0} } \right) + 1 $, we can infer that $ {\mathbf{Z}}^{0} $ is also low-rank. The goal of MTMC is to estimate $ {\mathbf{Z}}^{0} $ given $ {\mathbf{Z}}^{obs} $. In the real application, where $ {\mathbf{Z}}^{obs} $ is contaminated by noise, MTMC is formulated as:

$$ \mathop {\hbox{min} }\nolimits_{{{\mathbf{Z}}^{{\left( {d + 1} \right)}} = 1}} \mu \left\| {\mathbf{Z}} \right\|_{*} + \frac{1}{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} - {\mathbf{X}}^{obs} } \right\|_{F}^{2} + \gamma \sum\nolimits_{{\left( {i,j} \right) \in\Omega _{{\mathbf{Y}}} }} {{\mathbb{C}}_{y} \left( {{\mathbf{z}}_{{i\left( {d + 1 + j} \right)}} ,{\mathbf{y}}_{ij}^{obs} } \right)} , $$

(1)

where $ {\mathbf{Z}} = \left[ {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} ,{\mathbf{Z}}^{{\left( {d + 1} \right)}} ,{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} } \right] $ denotes the matrix to be optimized, $ {\mathbf{Z}}_{{\Delta {\mathbf{X}}}} $ denotes the noise-free feature submatrix, $ {\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} $ denotes the soft label submatrix, $ \Omega _{Y} $ denotes the subscripts set of the observed entries in $ {\mathbf{Y}}^{obs} $, $ \left\| \cdot \right\|_{*} $ denotes the nuclear norm, $ \left\| \cdot \right\|_{F} $ denotes the Frobenius norm, and $ {\mathbb{C}}_{y} \left( { \cdot , \cdot } \right) $ denotes the logistic loss function. Once the optimal $ {\mathbf{Z}}^{opt} $ is found, the labels $ {\mathbf{Y}}_{test} $ of the testing subjects can then be estimated by $ {\mathbf{sign}}\left( {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} } \right) $, where $ {\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} $ denotes the optimal soft labels of the testing subjects. Based on the formulation of MTMC, we know that $ {\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} $ is implicitly obtained from $ {\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} = \left[ {{\mathbf{X}}_{test}^{opt} ,1} \right]{\mathbf{W}}^{opt} $, where $ {\mathbf{X}}_{test}^{opt} $ is the optimal noise-free counterpart of $ {\mathbf{X}}_{test} $, and $ {\mathbf{W}}^{opt} $ is the optimal estimation of $ {\mathbf{W}} $. Although $ {\mathbf{W}}^{opt} $ is not explicitly computed, it is implicitly determined by the training subjects and their known labels (i.e., in the third term of Eq. (1)). Therefore, for multi-label classification tasks with insufficient training subjects as in our case, MTMC will still have the inherent overfitting.

3.2 Multi-label Inductive Matrix Completion (MIMC)

In order to alleviate the overfitting, we employ an online inductive learning strategy to modify the MTMC model, and name the modified MTMC as Multi-label Inductive Matrix Completion (MIMC) model. Specifically, we introduce an explicit predictor matrix $ {\tilde{\mathbf{W}}} \in {\mathbb{R}}^{{\left( {d + 1} \right) \times t}} $ into MTMC by adding the following constraint into Eq. (1):

$$ \mathop {\hbox{min} }\nolimits_{{{\tilde{\mathbf{W}}}}} \,\lambda \left\| {{\tilde{\mathbf{W}}}} \right\|_{2,1} + \frac{\beta }{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}} \right\|_{F}^{2} , $$

(2)

where $ \left\| \cdot \right\|_{2,1} $ denotes the L₂₁-norm, which imposes row sparsity on $ {\tilde{\mathbf{W}}} $ to learn the shared representations across all related classification tasks by selecting the common discriminative features. In addition, note also that, in the second term of Eq. (2), we use all the subjects (including the testing subjects) to learn the sparse predictor matrix $ {\tilde{\mathbf{W}}} $ based on the transductive soft labels $ {\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} $. In other words, we leverage the testing subjects as an efficient supplement to the limited training subjects, thus alleviating the small-sample-size issue of the training data that often causes the overfitting problem for training of the classifier. The final MIMC model is given as:

$$ \mathop {\hbox{min} }\nolimits_{{\begin{array}{*{20}c} {{\mathbf{Z}}, {\tilde{\mathbf{W}}}} \\ {{\mathbf{Z}}^{{\left( {d + 1} \right)}} = 1} \\ \end{array} }} \left\{ {\begin{array}{*{20}c} {\mu \left\| {\mathbf{Z}} \right\|_{*} + \frac{1}{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} - {\mathbf{X}}^{obs} } \right\|_{F}^{2} + \gamma \sum\nolimits_{{\left( {i,j} \right) \in\Omega _{{\mathbf{Y}}} }} {{\mathbb{C}}_{y} \left( {{\mathbf{z}}_{{i\left( {d + 1 + j} \right)}} ,{\mathbf{y}}_{ij}^{obs} } \right)} } \\ { + \lambda \left\| {{\tilde{\mathbf{W}}}} \right\|_{2,1} + \frac{\beta }{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}} \right\|_{F}^{2} } \\ \end{array} } \right\}. $$

(3)

In this way, we can obtain the optimal sparse predictor matrix $ {\tilde{\mathbf{W}}}^{opt} $ by using our proposed optimization algorithm in Sect. 3.3 below, and estimate the labels $ {\mathbf{Y}}_{test} $ of the testing subjects $ {\mathbf{X}}_{test} $ by induction:

$$ {\mathbf{Y}}_{test} = {\mathbf{sign}}\left( {\left[ {{\mathbf{X}}_{test} ,1} \right]{\tilde{\mathbf{W}}}^{opt} } \right). $$

(4)

Comparing with the overfitting-prone transductive labels $ {\mathbf{sign}}\left( {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}_{test} }}^{opt} } \right) $, the inductive labels in Eq. (4), which learned from more subjects (by including the testing subjects) and benefit from the advantage of joint feature selection (via L₂₁-norm), would give us more robust predictions, thus suffering less from the small-sample-size issue.

3.3 Optimization Algorithm for MIMC

The solution of MIMC is not trivial, as it contains the all-1-column constraint (i.e., $ {\mathbf{Z}}^{{\left( {d + 1} \right)}} = 1 $) in Eq. (3), along with the fact that the L₂₁-norm and nuclear norm are the non-smooth penalties. Here, we employ the block coordinate descent method to design an optimization algorithm for solving MIMC. The key steps of this algorithm are to iteratively optimize the following two Subproblems:

$$ {\mathbf{Z}}_{k} = { \arg }\mathop {\hbox{min} }\nolimits_{{{\mathbf{Z}}^{{\left( {d + 1} \right)}} = 1}} \left\{ {\begin{array}{*{20}c} {\frac{1}{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} - {\mathbf{X}}^{obs} } \right\|_{F}^{2} + \gamma \sum\nolimits_{{\left( {i,j} \right) \in\Omega _{{\mathbf{Y}}} }} {{\mathbb{C}}_{y} \left( {{\mathbf{z}}_{{i\left( {d + 1 + j} \right)}} ,{\mathbf{y}}_{ij}^{obs} } \right)} } \\ { + \mu \left\| {\mathbf{Z}} \right\|_{ *} + \frac{\beta }{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}_{k - 1} } \right\|_{F}^{2} } \\ \end{array} } \right\}, $$

(5)

$$ {\tilde{\mathbf{W}}}_{k} = { \arg }\mathop {\hbox{min} }\nolimits_{{{\tilde{\mathbf{W}}} }} \lambda \left\| {{\tilde{\mathbf{W}}}} \right\|_{2,1} + \frac{\beta }{2}\left\| {\left( {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} } \right)_{k} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}} \right\|_{F}^{2} . $$

(6)

We solve Subproblem 1 in Eq. (5) by employing the Fixed Point Continuation (FPC) method plus the projection technique, with its convergence being proven by Cabral et al. [6]. Specifically, it consists of two steps for each iteration $ t $:

$$ \left\{ {\begin{array}{*{20}l} {\left( {{\mathbf{Z}}_{k} } \right)_{t} = {\mathbf{\mathcal{D}}}_{{\mu \tau_{{\mathbf{Z}}} }} \left( {\left( {{\mathbf{Z}}_{k} } \right)_{t - 1} - \tau_{{\mathbf{Z}}} \nabla {\text{G}}\left( {\left( {{\mathbf{Z}}_{k} } \right)_{t - 1} } \right)} \right) } \hfill \\ { \left( {\left( {{\mathbf{Z}}_{k} } \right)_{t} } \right)^{{\left( {d + 1} \right)}} = 1 } \hfill \\ \end{array} } \right., $$

(7)

where $ \tau_{{\mathbf{Z}}} = { \hbox{min} }\left\{ {1,4/\sqrt {32\beta^{2} + 2\gamma^{2} } } \right\} $ denotes the gradient step size, $ {\mathbf{\mathcal{D}}}_{{\mu \tau_{{\mathbf{Z}}} }} \left( \cdot \right) $ denotes the proximal operator of the nuclear norm [6], and $ \nabla {\text{G}}\left( {\mathbf{Z}} \right) $ is the gradient of $ {\text{G}}\left( {\mathbf{Z}} \right) $:

$$ {\text{G}}\left( {\mathbf{Z}} \right) = \left\{ {\begin{array}{*{20}c} {\frac{1}{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{X}}}} - {\mathbf{X}}^{obs} } \right\|_{F}^{2} + \frac{\beta }{2}\left\| {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}_{k - 1} } \right\|_{F}^{2} } \\ { + \gamma \sum\nolimits_{{\left( {i,j} \right) \in\Omega _{{\mathbf{Y}}} }} {{\mathbb{C}}_{y} \left( {{\mathbf{z}}_{{i\left( {d + 1 + j} \right)}} ,{\mathbf{y}}_{ij}^{obs} } \right)} } \\ \end{array} } \right\}. $$

(8)

The Subproblem 2 in Eq. (6) is a standard L₂₁-norm regularization problem, which can be solved via the accelarated Nesterov’s method with convergence proof in [11]. Specifically, it includes the following step for each iteration $ t $:

$$ \left( {{\tilde{\mathbf{W}}}_{k} } \right)_{t} = {\mathbf{\mathcal{J}}}_{{\lambda \tau_{{{\tilde{\mathbf{W}}}}} }} \left( {\left( {{\tilde{\mathbf{W}}}_{k} } \right)_{t - 1} - \tau_{{{\tilde{\mathbf{W}}}}} \nabla {\text{F}}\left( {\left( {{\tilde{\mathbf{W}}}_{k} } \right)_{t - 1} } \right)} \right), $$

(9)

where $ \tau_{{{\tilde{\mathbf{W}}}}} = 1/\sigma_{max} \left( {\beta \left[ {{\mathbf{X}}^{obs} ,1} \right]^{\text{T}} \left[ {{\mathbf{X}}^{obs} ,1} \right]} \right) $ denotes the gradient step size, $ \sigma_{max} \left( \cdot \right) $ denotes the maximal singular value of matrix, $ {\mathbf{\mathcal{J}}}_{{\lambda \tau_{{{\tilde{\mathbf{W}}}}} }} \left( \cdot \right) $ denotes the proximal operator of L₂₁-norm [11], and $ \nabla {\text{F}}\left( {{\tilde{\mathbf{W}}} } \right) $ is the gradient of $ {\text{F}}\left( {{\tilde{\mathbf{W}}} } \right) $:

$$ {\text{F}}\left( {{\tilde{\mathbf{W}}} } \right) = \frac{\beta }{2}\left\| {\left( {{\mathbf{Z}}_{{\Delta {\mathbf{Y}}}} } \right)_{k} - \left[ {{\mathbf{X}}^{obs} ,1} \right]{\tilde{\mathbf{W}}}} \right\|_{F}^{2} . $$

(10)

Theoretically, for the jointly convex problem with the separable non-smooth terms, Tseng [12] has demonstrated that the block coordinate descent method is guaranteed to converge to a global optimum, as long as all Subproblems are solvable. In our MIMC model, obviously, the objective function in Eq. (3) is jointly convex for $ {\mathbf{Z}} $ and $ {\tilde{\mathbf{W}}} $, and its non-smooth parts, i.e., both $ \mu \left\|{\mathbf{Z}}\right\|_{ *} $ and $ \lambda\left\| {\tilde{\mathbf{W}}}\right\|_{2,1} $, are separable. Based on this fact, our proposed optimization algorithm also has the provable convergence.

4 Results and Discussions

We evaluate the proposed MIMC by jointly predicting MGMT and IDH1 statuses using our HGG patients. Considering the limited number of 47 subjects, we use 10-fold cross validation to ensure a relatively unbiased prediction performance for the new testing subjects. We compare MIMC with the widely-used single-task machine learning methods (including SVM with RBF kernel [13] and RF [14]) and state-of-the-art multi-task machine learning methods (i.e., Lest_L21 [11] and MTMC [5]). All the involved parameters in these methods are optimized by using the nested 10-fold cross validation procedure.

We measure the prediction performance in terms of accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the receiver operating characteristic curve (AUC). In order to avoid any bias introduced by randomly partitioning the dataset, each 10-fold cross-validation is independently repeated for 20 times. The average experimental results for MGMT and IDH1 status predictions are reported in Tables 2 and 3, respectively. The best results and those ones not significantly worse than the best results at 95% confidence level are highlighted in bold. Except that the Lest_L21 achieves slightly higher specificity (but not statistically significant) than MIMC (i.e., 70.75% vs. 70.00%) in MGMT status prediction, MIMC consistently outperforms SVM, RF and MTMC in all performance metrics, which indicate that our proposed online inductive learning strategy can help improve the prediction performance of MIMC. In addition, we also find that all the multi-task machine learning methods consistently outperform the single-task RF method, but not outperform the single-task SVM method in terms of ACC. We speculate that this is mainly caused by the kernel trick of SVM, which implicitly carries out the nonlinear feature mapping. In future work, we will extend our proposed MIMC model to its nonlinear version by employing the kernel trick to further improve the performance of MGMT and IDH1 status predictions.

Table 2. Performance comparison of different methods for MGMT status prediction.

Full size table

Table 3. Performance comparison of different methods for IDH1 status prediction.

Full size table

5 Conclusion

In this paper, we focus on addressing the tasks of predicting MGMT and IDH1 statuses for HGG patients. Considering strong correlation between MGMT promoter methylation and IDH1 mutation, we formulate their prediction tasks as a Multi-label Inductive Matrix Completion (MIMC) model, and then design an optimization algorithm with provable convergence to solve this model. The promising results by various experiments verify the advantages of the proposed MIMC model over the widely-used single- and multi-task classifiers. Also, for the first time, we show the feasibility of molecular biomarker prediction based on the preoperative multi-modality neuroimaging and connectomics analysis.

References

Korfiatis, P., Kline, T., et al.: MRI texture features as biomarkers to predict MGMT methylation status in glioblastomas. Med. Phys. 43(6), 2835–2844 (2016)
Article Google Scholar
Yamashita, K., Hiwatashi, A., et al.: MR imaging-based analysis of glioblastoma multiform: estimation of IDH1 mutation status. AJNI Am. J. Neuroradiol. 37(1), 58–65 (2016)
Article Google Scholar
Zhang, B., Chang, K., et al.: Multimodal MRI features predict isocitrate dehydrogenase genotype in high-grade gliomas. Neuro-oncology 19(1), 109–117 (2017)
Article Google Scholar
Noushmehr, H., Weisenberger, D., et al.: Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 17(5), 510–522 (2010)
Article Google Scholar
Goldberg, A., Zhu, X., et al.: Transduction with matrix completion: three birds with one stone. In: Proceedings of NIPS, pp. 757–765 (2010)
Google Scholar
Cabral, R., et al.: Matrix completion for weakly-supervised multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 121–135 (2015)
Article Google Scholar
Yan, C., Zang, Y.: DPARSF: a MATLAB toolbox for “pipeline” data analysis of resting-state fMRI. Front. Syst. Neurosci. 4, 13 (2010)
Google Scholar
Cui, Z., Zhong, S., et al.: PANDA: a pipeline toolbox for analyzing brain diffusion images. Front. Hum. Neurosci. 7, 42 (2013)
Google Scholar
Liu, L., Zhang, H., Rekik, I., Chen, X., Wang, Q., Shen, D.: Outcome prediction for patient with high-grade gliomas from brain functional and structural networks. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 26–34. Springer, Cham (2016). doi:10.1007/978-3-319-46723-8_4
Chapter Google Scholar
Wang, J., Wang, X., et al.: GRETNA: a graph theoretical network analysis toolbox for imaging connectomics. Front. Hum. Neurosci. 9, 386 (2015)
Google Scholar
Liu, J., Ji S., Ye, J.: Multi-task feature learning via efficient L2,1-norm minimization. In: Proceedings of UAI, pp. 339–348 (2009)
Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for non-differentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Article MathSciNet Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by NIH grants (EB006733, EB008374, MH100217, MH108914, AG041721, AG049371, AG042599, AG053867, EB022880), Natural Science Foundation of Jiangsu Province (BK20161516, BK20151511), China Postdoctoral Science Foundation (2015M581794), Natural Science Research Project of Jiangsu University (15KJB520027), and Postdoctoral Science Foundation of Jiangsu Province (1501023C).

Author information

Authors and Affiliations

Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, Nanjing University of Posts and Telecommunications, Nanjing, China
Lei Chen
Department of Radiology and BRIC, University of North Carolina, Chapel Hill, USA
Lei Chen, Han Zhang, Kim-Han Thung & Dinggang Shen
School of Biomedical Engineering, Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, China
Luyan Liu & Qian Wang
Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
Junfeng Lu & Jinsong Wu
Shanghai Key Lab of Medical Image Computing and Computer Assisted Intervention, Shanghai, China
Junfeng Lu & Jinsong Wu

Authors

Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Han Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Han Thung
View author publications
You can also search for this author in PubMed Google Scholar
Luyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jinsong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dinggang Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dinggang Shen .

Editor information

Editors and Affiliations

Université de Sherbrooke, Sherbrooke, QC, Canada
Maxime Descoteaux
DKFZ, Heidelberg, Germany
Lena Maier-Hein
Ulm University of Applied Sciences, Ulm, Germany
Alfred Franz
Université de Rennes 1, Rennes, France
Pierre Jannin
McGill University, Montreal, QC, Canada
D. Louis Collins
Université Laval, Québec, QC, Canada
Simon Duchesne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, L. et al. (2017). Multi-label Inductive Matrix Completion for Joint MGMT and IDH1 Status Prediction for Glioma Patients. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D., Duchesne, S. (eds) Medical Image Computing and Computer-Assisted Intervention − MICCAI 2017. MICCAI 2017. Lecture Notes in Computer Science(), vol 10434. Springer, Cham. https://doi.org/10.1007/978-3-319-66185-8_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-66185-8_51
Published: 04 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66184-1
Online ISBN: 978-3-319-66185-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Multi-label Inductive Matrix Completion for Joint MGMT and IDH1 Status Prediction for Glioma Patients

Abstract

Similar content being viewed by others

Integrating imaging and genomic data for the discovery of distinct glioblastoma subtypes: a joint learning approach

Efficient diagnosis of IDH-mutant gliomas: 1p/19qNET assesses 1p/19q codeletion status using weakly-supervised learning

Deep semi-supervised learning for brain tumor classification

Keywords

1 Introduction

2 Materials, Preprocessing, and Feature Extraction

3 MIMC-Based MGMT and IDH1 Status Prediction

3.1 Multi-label Transductive Matrix Completion (MTMC)

3.2 Multi-label Inductive Matrix Completion (MIMC)

3.3 Optimization Algorithm for MIMC

4 Results and Discussions

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Multi-label Inductive Matrix Completion for Joint MGMT and IDH1 Status Prediction for Glioma Patients

Abstract

Similar content being viewed by others

Integrating imaging and genomic data for the discovery of distinct glioblastoma subtypes: a joint learning approach

Efficient diagnosis of IDH-mutant gliomas: 1p/19qNET assesses 1p/19q codeletion status using weakly-supervised learning

Deep semi-supervised learning for brain tumor classification

Keywords

1 Introduction

2 Materials, Preprocessing, and Feature Extraction

3 MIMC-Based MGMT and IDH1 Status Prediction

3.1 Multi-label Transductive Matrix Completion (MTMC)

3.2 Multi-label Inductive Matrix Completion (MIMC)

3.3 Optimization Algorithm for MIMC

4 Results and Discussions

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation