Multi-modality Low-Rank Learning Fused First-Order and Second-Order Information for Computer-Aided Diagnosis of Schizophrenia

Li, Huijie; Zhu, Qi; Zhang, Rui; Zhang, Daoqiang

doi:10.1007/978-3-030-36204-1_30

Multi-modality Low-Rank Learning Fused First-Order and Second-Order Information for Computer-Aided Diagnosis of Schizophrenia

Huijie Li^13,14,
Qi Zhu^13,14,
Rui Zhang¹³ &
…
Daoqiang Zhang¹³

Conference paper
First Online: 29 November 2019

1625 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11936))

Abstract

The brain functional connectivity network (BFCN) based methods for diagnosing brain diseases have shown great advantages. At present, most BFCN construction strategies only calculate the first-order correlation between brain areas, such as the Pearson correlation coefficient method. Although the work of the low-order and high-order BFCN construction methods exists, there is very little work to integrate them, that is, to design a multi-modal BFCN feature selection and classification method to combine low-order and high-order information. This may affect the performance of brain disease diagnosis. To this end, we propose a multi-modality low-rank learning framework jointly learning first-order and second-order BFCN information and apply it to the diagnosis of schizophrenia. The proposed method not only embeds the correlation information of multi-modality data in the learning model, but also encourages the cooperation between the first-order and the second-order BFCN by combining the ideal representation term. The experimental results of the three schizophrenia datasets (totally including 168 patients and 163 normal controls) show that our proposed method achieves promising classification results in the diagnosis of schizophrenia.

Download conference paper PDF

1 Introduction

Studies of resting-state functional magnetic resonance imaging (rs-fMRI) have shown that BFCN has disease-related changes [1]. A number of BFCN strategies have been proposed for the diagnosis and analysis of mental illness. Most BFCN disease diagnosis strategies first construct a brain network based on the correlation between paired-wise brain areas, and then perform feature selection and classification algorithms.

Currently, most BFCN construction methods only consider the association between pairs of brain areas or voxels, i.e. the first-order BFCN construction methods. For example, method based on Pearson correlation (PC) coefficients [2]. The first-order BFCN is robust, but not sensitive to small signal changes. There are also some BFCN construction methods that consider the association between multiple brain areas, namely the high-order BFCN construction method. For example, Guo et al. [3] take for that the correlation between pairs of brain areas may be affected by the third brain area, and proposes a method to eliminate the effects through partial correlation. High-order BFCN construction methods can capture small changes between brain networks, but it lacks robustness. Zhu et al. [4] proposed a BFCN construction method for hybrid first-order second-order brain networks, performing a simple weighted combination of first-order and second-order brain networks. However, before the classification step, no machine learning method such as feature selection is used for brain networks to process the brain network data.

In the field of machine learning, researchers have proposed various methods for processing multi-modality data. These methods may provide a theoretical basis for combining first-order and second-order BFCN information. In fact, although there seems no multi-modality work for integrating first-order and second-order BFCNs, some multi-modality methods have been proposed and applied to the diagnosis of brain diseases. For example, Huang et al. [5] proposed a sparse composite linear descriptive analysis model to jointly identify disease-related brain features from multi-modality data. Zhang et al. [6] proposed a multi-modality multi-task method for joint feature selection and classification of Alzheimer’s disease data. However, these methods ignore the structural information of multi-modality data and the data noise problem is not considered.

In view of the above problems, in this paper, we proposed a multi-modality low rank representation learning framework to fuse first and second order BFCN information and applied it to the diagnosis of schizophrenia. Our contributions mainly include the following two points:

(1)
We extract the intrinsic structure information through low-rank constraint, embed the correlation of multi-modality data into the learning model, and encourage the cooperation between first-order and second-order BFCN by combining ideal representation term to obtain better diagnostic performance.
(2)
To the best of our knowledge, this is the first work to combine first-order and second-order BFCN information through multi-modality learning strategy.

2 Background

2.1 First-Order and Second-Order Brain Functional Connection Network

At present, BFCN analysis has been widely used in the diagnosis of mental diseases, and brain function network construction is the core step of BFCN analysis. The BFCN construction methods can be divided into low-order methods and high-order methods. The low-order methods are highly robust, while the high-order methods are usually more sensitive to subtle changes in signals.

The most common low-order method is the BFCN construction method based on the PC coefficient, which reveals the first-order relationship of the brain interval by calculating the correlation coefficient of the paired brain areas. Let $ x_{i} $ and $ x_{j} $ represent a pair of brain areas, PC can be calculated by the following formula:

$$ C_{ij}^{1} = \frac{{Cov\left( {x_{i} , x_{j} } \right)}}{{\sqrt {Var\left( {x_{i} } \right)Var\left( {x_{j} } \right)} }} $$

(1)

where $ Cov\left( { \cdot , \cdot } \right) $ is a function for calculating the covariance, and $ Var\left( \cdot \right) $ is a function for calculating the variance.

In our previous work, we proposed a second-order BFCN strategy based on triplet [8] for extracting high-order information from brain areas. Specifically, let triplet $ \left( {x_{i} ,x_{u} ,x_{v} } \right) $ consists of $ x_{i} $ and its neighbors $ x_{u} $ and $ x_{v} $. $ S_{uv}^{i} $ defines the distance between $ x_{i} $ and $ x_{v} $ relative to $ x_{u} $:

$$ S_{uv}^{i} = dist\left( {x_{i} ,x_{v} } \right) - dist\left( {x_{i} ,x_{u} } \right) $$

(2)

where $ dist\left( { \cdot , \cdot } \right) $ calculates the squared Euclidean distance.

The triplet-based second-order BFCN strategy takes into account that a brain area usually interacts with its neighbors rather than distant brain areas, so it only considers second-order information among neighbors. Let $ N_{i} $ be a set of sequence numbers indicating the $ k $ nearest neighbors of $ x_{i} $, and then, relative to all $ k $ nearest neighbors of $ x_{i} $, the distance between $ x_{i} $ and $ x_{v} $ can be expressed as $ \sum\limits_{{u \in N_{i} }} {S_{uv}^{i} } $. Thus, the triplet-based second-order BFCN can be expressed as:

$$ C_{ij}^{2} = \left\{ {\begin{array}{*{20}l} {norm\left( { - \mathop \sum \limits_{{u \in N_{i} }} S_{uv}^{i} } \right),\left( {j \in N_{i} } \right)} \hfill \\ {0,\left( {j \notin N_{i} } \right)} \hfill \\ \end{array} } \right. $$

(3)

where $ norm\left( \cdot \right) $ is a function that that normalizes the data.

2.2 Low-Rank Representation

Recently, LRR has performed very well in feature extraction and subspace learning [7]. For a given set of samples, the LRR looks for the low-rank component of all samples as the basis in the dictionary so that the data can be represented as a linear combination of bases [9].

Let $ \varvec{X} = \left[ {\varvec{x}_{1} , \varvec{x}_{2} , \ldots , \varvec{x}_{n} } \right] \in R^{d \times n} $ denote a set of data vectors, and each column of which can be represented by a linear combination of the bases in dictionary $ \varvec{A} = \left[ {\varvec{a}_{1} , \varvec{a}_{2} , \ldots , \varvec{a}_{m} } \right] $:

$$ \varvec{X} = \varvec{AZ} $$

(4)

where $ \varvec{Z} = \left[ {\varvec{z}_{1} , \varvec{z}_{2} , \ldots , \varvec{z}_{n} } \right] $ is a coefficient matrix, and $ \varvec{z}_{i} $ is the representation coefficient vector of $ \varvec{x}_{i} $.

Considering that dictionary $ \varvec{ A} $ is usually over-complete, LRR seeks a low-order solution to solve the problem of possible multiple solutions by solving the following problems [10]:

$$ \mathop {min }\limits_{\varvec{Z}} \left\| { \varvec{Z}} \right\|_{ *} \;\;\;s.t. \varvec{X} = \varvec{AZ} $$

(5)

where $ \left\| \cdot \right\|_{ *} $ represents the nuclear norm of the matrix [12].

2.3 Materials

In this study, three schizophrenia rs-fMRI datasets were used, including The Center for Biomedical Research Excellence (COBRE) dataset (53 patients and 67 normal controls), Nottingham dataset (32 patients and 36 normal controls), and Xiangya dataset (83 patients and 60 normal controls). Detailed information such as subjects, image acquistions, data preprocessing, and anatomical parcellation please refer to [4].

3 Method

3.1 Proposed Method

Let $ \varvec{X}_{k} = \left[ {\varvec{x}_{k,1} ,\varvec{x}_{k,2} , \ldots ,\varvec{x}_{k,n} } \right] \in R^{m \times n} $ denotes the k-th modality of the training data. Assuming that $ \varvec{X}_{k} $ contains $ c $ classes, then $ \varvec{X}_{k} $ can be divided into $ c $ subsets, expressed as $ \varvec{X}_{k} = \left\{ {\varvec{X}_{k}^{1} ,\varvec{X}_{k}^{2} , \ldots ,\varvec{X}_{k}^{c} } \right\} $. In the context of this study, $ \varvec{X}_{1} $ represents the first-order BFCN, and $ \varvec{X}_{2} $ represents the second-order BFCN. Considering $ \varvec{X}_{k} $ itself as the dictionary, $ \varvec{X}_{k} $ can be re-represented as $ \varvec{X}_{k} = \varvec{X}_{k} \varvec{Z}_{k} + \varvec{E}_{k} $, where $ \varvec{Z}_{k} = \left[ {\varvec{z}_{k,1} ,\varvec{z}_{k,2} , \ldots ,\varvec{z}_{k,n} } \right] \in R^{n \times n} $ represents the LRR of $ \varvec{X}_{k} $ and $ \varvec{E}_{k} $ denotes the sparse noise matrix. In order to include the block diagonal structure information in the learning process and enhance the cooperation between the two modalities, we introduced the regular ideal representation term $ \left\| {\varvec{Z}_{1} \varvec{Z}_{2}^{T} - \varvec{P}} \right\|_{F}^{2} $, where $ \varvec{P} = block\left( {\varvec{p}_{1} ,\varvec{p}_{2} , \ldots ,\varvec{p}_{c} } \right) $, $ \varvec{p}_{i} = ones\left( {n_{i} ,n_{i} } \right) $ is the code for $ \varvec{X}^{i} $, $ n_{l} $ is the number of samples in $ \varvec{X}^{i} $. That means, if $ \varvec{ x}_{k,j} $ belongs to class $ i $, then the coefficients in $ \varvec{p}_{i} $ are all 1 s, whiles in others are all 0 s. Thus, we have the follow function:

$$ \mathop {\hbox{min} }\nolimits_{\varvec{Z}} \sum\nolimits_{k = 1}^{2} {\left( {\left\| {\varvec{Z}_{k} } \right\|_{*} + \beta \left\| {\varvec{Z}_{k} } \right\|_{1} + \gamma \left\| {\varvec{E}_{k} } \right\|_{2, 1} } \right)} + \lambda \left\| {\varvec{Z}_{1} \varvec{Z}_{2}^{T} - \varvec{P}} \right\|_{F}^{2} \,\;\;\;\;\;{\text{s}}.{\text{t}}. \;\varvec{X}_{k} = \varvec{X}_{k} \varvec{Z}_{k} + \varvec{E}_{k} $$

(6)

where $ \left\| \cdot \right\|_{1} $ denotes the l₁-norm, $ \left\| \cdot \right\|_{2, 1} $ denotes the l_2,1-norm, $ \left\| \cdot \right\|_{F} $ denotes the Frobenius norm, and $ \beta $, $ \gamma $, $ \lambda $ are the hyperparameters used to balance the differents parts of the function.

Combining Eq. (6) with the multimodal supervised feature selection framework [6, 13], the following objective functions can be obtained:

$$ \mathop {\hbox{min} }\nolimits_{{\varvec{W},\varvec{Z}}} \sum\nolimits_{k = 1}^{2} {\left( {\left\| {\varvec{H} - \varvec{Z}_{k}^{T} \varvec{W}_{k} } \right\|_{F}^{2} + \alpha \left\| {\varvec{Z}_{k} } \right\|_{*} + \beta \left\| {\varvec{Z}_{k} } \right\|_{1} + \gamma \left\| {\varvec{E}_{k} } \right\|_{2, 1} +\upxi\left\| {\varvec{W}_{k} } \right\|_{F}^{2} } \right)} + \lambda \left\| {\varvec{Z}_{1} \varvec{Z}_{2}^{T} - \varvec{P}} \right\|_{F}^{2} \;\;\;\;\;{\text{s}}.{\text{t}}. \;\varvec{X}_{k} = \varvec{X}_{k} \varvec{Z}_{k} + \varvec{E}_{k} $$

(7)

where $ \varvec{H} $ denotes the ground truth of $ \varvec{X} $, $ \varvec{W} = \left[ {\varvec{w}_{1} , \varvec{w}_{2} , \ldots ,\varvec{w}_{m} } \right] \in R^{c \times m} $ is the weight matrix, and $ \alpha $, $ \upxi $ denote the hyperparameters.

3.2 Optimization and Solution

Since the problem (7) is non-convex, we first introduce the auxiliary variables $ {\mathbf{Z}}_{k}^{'} $ and $ {\mathbf{Z}}_{k}^{''} $ to make the problem separable:

$$ \mathop {\hbox{min} }\nolimits_{\varvec{W}} \sum\nolimits_{k = 1}^{2} {\left( {\left\| {\varvec{H} - \varvec{Z}_{k}^{T} \varvec{W}_{k} } \right\|_{F}^{2} + \alpha \left\| {\varvec{Z}_{k}^{{\prime }} } \right\|_{*} + \beta \left\| {\varvec{Z}_{k}^{{\prime \prime }} } \right\|_{1} + \gamma \left\| {\varvec{E}_{k} } \right\|_{2, 1} +\upxi\left\| {\varvec{W}_{k} } \right\|_{F}^{2} } \right)} + \lambda \left\| {\varvec{Z}_{1} \varvec{Z}_{2}^{T} - \varvec{P}} \right\|_{F}^{2} \;\;\;\;{\text{s}}.{\text{t}}. \varvec{X}_{k} = \varvec{X}_{k} \varvec{Z}_{k} + \varvec{E}_{k} , \varvec{Z}_{k}^{{\prime }} = \varvec{Z}_{k} , \varvec{Z}_{k}^{{\prime \prime }} = \varvec{Z}_{k} $$

(8)

The problem (8) can be solved by minimizing the following augmented Lagrangian multiplier (ALM) function L:

$$ \begin{aligned} L & = \left\| {\varvec{H} - \varvec{Z}_{k}^{T} \varvec{W}_{k} } \right\|_{F}^{2} + \alpha \left\| {\varvec{Z}_{k}^{{\prime }} } \right\|_{*} + \beta \left\| {\varvec{Z}_{k}^{{\prime \prime }} } \right\|_{1} + \gamma \left\| {\varvec{E}_{k} } \right\|_{2, 1} + \lambda \left\| {\varvec{Z}_{1} \varvec{Z}_{2}^{T} - \varvec{P}} \right\|_{F}^{2} +\upxi\left\| {\varvec{W}_{k} } \right\|_{F}^{2} \\ & \quad + \left\langle {\varvec{Y}_{1, k} , \varvec{X}_{k} - \varvec{X}_{k} \varvec{Z}_{k} - \varvec{E}_{k} } \right\rangle + \left\langle {\varvec{Y}_{2, k} , \varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime }} } \right\rangle + \left\langle {\varvec{Y}_{3, k} , \varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime \prime }} } \right\rangle \\ & \quad + \frac{\mu }{2}\left( {\left\| {\varvec{X}_{k} - \varvec{X}_{k} \varvec{Z}_{k} - \varvec{E}_{k} } \right\|_{F}^{2} + \left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime }} } \right\|_{F}^{2} + \left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime \prime }} } \right\|_{F}^{2} } \right) \;\;\;k = 1, 2 \\ \end{aligned} $$

(9)

where $ \left\langle {\varvec{A}, \varvec{B}} \right\rangle = tr\left( {\varvec{A}^{T} \varvec{B}} \right) $, $ \varvec{Y}_{1, k} $, $ \varvec{Y}_{2, k} $ and $ \varvec{Y}_{3, k} $ are Lagrange multipliers and $ \mu $ is a balance parameter.

Further, the augmented Lagrangian function (9) would reduce to:

$$ \begin{aligned} L = & \left\| {\varvec{H} - \varvec{Z}_{k}^{T} \varvec{W}_{k} } \right\|_{F}^{2} + \alpha \left\| {\varvec{Z}_{k}^{{\prime }} } \right\|_{*} + \beta \left\| {\varvec{Z}_{k}^{''} } \right\|_{1} + \gamma \left\| {\varvec{E}_{k} } \right\|_{2, 1} + \lambda \left\| {\varvec{Z}_{1} \varvec{Z}_{2}^{T} - \varvec{P}} \right\|_{F}^{2} +\upxi\left\| {\varvec{W}_{k} } \right\|_{F}^{2} \\ & + \frac{\mu }{2}\left( {\left\| {\varvec{X}_{k} - \varvec{X}_{k} \varvec{Z}_{k} - \varvec{E}_{k} + \frac{{\varvec{Y}_{1, k} }}{\mu }} \right\|_{F}^{2} + \left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime }} + \frac{{\varvec{Y}_{2, k} }}{\mu }} \right\|_{F}^{2} + \left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime \prime }} + \frac{{\varvec{Y}_{3, k} }}{\mu }} \right\|_{F}^{2} } \right) - \frac{1}{2\mu }\left( {\left\| {\varvec{Y}_{1, k} } \right\|_{F}^{2} + \left\| {\varvec{Y}_{2, k} } \right\|_{F}^{2} + \left\| {\varvec{Y}_{3, k} } \right\|_{F}^{2} } \right) \,\,\,k = 1, 2 \\ \end{aligned} $$

(10)

The above problem can be solved by inexact ALM (IALM) algorithm, which is an iterative method that solves each variable in a decreasing coordinate manner [14,15,16,17,18]. The stopping criteria are $ \left\| {\varvec{X}_{k} - \varvec{X}_{k} \varvec{Z}_{k} - \varvec{E}_{k} } \right\|_{\infty } < \varepsilon $, $ \left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime }} } \right\|_{\infty } < \varepsilon $ and $ \left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime \prime }} } \right\|_{\infty } < \varepsilon $, where $ \left\| \cdot \right\|_{\infty } $ denotes the l_∞-norm.

The solution processes are as follows, and each step has a closed solution:

Step 1 (Update $ \varvec{Z}_{k}^{{\prime }} $ ):

$$ \varvec{Z}_{k}^{{\prime }} = \mathop {argmin }\limits_{{\varvec{Z}_{k}^{{\prime }} }} \alpha \left\| {\varvec{Z}_{k}^{{\prime }} } \right\|_{*} + \frac{\mu }{2}\left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime }} + \frac{{\varvec{Y}_{2, k} }}{\mu }} \right\|_{F}^{2} $$

(11)

Problem (11) can be solved by the singular value threshold (SVT) operator [11]:

$$ \varvec{Z}_{k}^{{\prime }} = \varvec{US}_{\theta } \left[ \varvec{S} \right]\varvec{V}^{T} $$

(12)

where $ \theta = \frac{\alpha }{\mu } $, $ \varvec{USV}^{T} $ is the SVD decomposition of $ \varvec{Z}_{k} + \frac{{\varvec{Y}_{2, k} }}{\mu } $, and $ \varvec{S}_{\theta } \left[ x \right] $ is the soft-thresholding (shrinkage) operator [17], which defined as follows:

$$ \varvec{S}_{\theta } \left[ x \right] = \left\{ {\begin{array}{*{20}l} {x - \theta ,\;\;} \hfill & {if\;\;x > \theta } \hfill \\ {x + \theta ,} \hfill & {if\;\;x < - \theta } \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$

(13)

Step 2 (Update $ \varvec{Z}_{k}^{{\prime \prime }} $ ):

$$ \varvec{Z}_{k}^{{\prime \prime }} = \mathop {argmin }\limits_{{\varvec{Z}_{k}^{''} }} \beta \left\| {\varvec{Z}_{k}^{{\prime \prime }} } \right\|_{1} + \frac{\mu }{2}\left\| { \varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime \prime }} + \frac{{\varvec{Y}_{3, k} }}{\mu }} \right\|_{F}^{2} $$

(14)

According to [14], the above problem has the following closed form solution:

$$ \varvec{Z}_{k}^{{\prime \prime }} = shrink\left( { \varvec{Z}_{k} + \frac{{\varvec{Y}_{3, k} }}{\mu },\frac{\beta }{\mu } } \right) $$

(15)

Step 3 (Update $ \varvec{Z}_{k} $ ):

$ \varvec{Z}_{k} $ is updated by solving optimization problem (16):

$$ \begin{aligned} \varvec{Z}_{k} = & \mathop {argmin }\limits_{{\varvec{Z}_{k} }} \left\| {\varvec{H} - \varvec{Z}_{k}^{T} \varvec{W}_{k} } \right\|_{F}^{2} + \lambda \left\| {\varvec{Z}_{1} \varvec{Z}_{2}^{T} - \varvec{P}} \right\|_{F}^{2} + \\ & \frac{\mu }{2}\left( {\left\| {\varvec{X}_{k} - \varvec{X}_{k} \varvec{Z}_{k} - \varvec{E}_{k} + \frac{{\varvec{Y}_{1, k} }}{\mu }} \right\|_{F}^{2} + \left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime }} + \frac{{\varvec{Y}_{2, k} }}{\mu }} \right\|_{F}^{2} + \left\| {\varvec{Z}_{k} - \varvec{Z}_{k}^{{\prime \prime }} + \frac{{\varvec{Y}_{3, k} }}{\mu }} \right\|_{F}^{2} } \right) \\ \end{aligned} $$

(16)

It is easy to solve for the closed solution of $ \varvec{Z}_{k} $:

$$ \begin{aligned} \varvec{Z}_{k} = & \left( {2\left( {1 + \lambda + \mu } \right)\varvec{I} + \mu \varvec{X}_{k}^{T} \varvec{X}_{k} } \right)^{ - 1} \left( {2\varvec{HW}_{k}^{T} + 2\lambda \varvec{PZ}_{l} + \mu \varvec{X}_{k}^{T} \varvec{X}_{k} + \mu \varvec{X}_{k}^{T} \varvec{E}_{k} - \varvec{X}_{k}^{T} \varvec{Y}_{1, k} + \mu \varvec{Z}_{k}^{'} - \varvec{Y}_{2, k} + \mu \varvec{Z}_{k}^{''} - \varvec{Y}_{3, k} } \right) \\ & \left( {\varvec{W}_{k} \varvec{W}_{k}^{T} + \varvec{Z}_{l}^{T} \varvec{Z}_{l} + 3\varvec{I}} \right)^{ - 1} \\ \end{aligned} $$

(17)

where $ l \ne k $, $ \varvec{Z}_{k}^{T} = \varvec{Z}_{k} $ and $ \varvec{P}^{T} = \varvec{P} $.

Step 4 (Update $ \varvec{W}_{k} $ ):

$ \varvec{W}_{k} $ can be updated by solving optimization problem (18):

$$ \varvec{W}_{k} = \mathop {argmin }\limits_{{\varvec{W}_{k} }} \left\| {\varvec{H} - \varvec{Z}_{k}^{T} \varvec{W}_{k} } \right\|_{F}^{2} +\upxi\left\| {\varvec{W}_{k} } \right\|_{F}^{2} $$

(18)

Similar to the previous step, it is easy to get the solution:

$$ \varvec{W}_{k} = \left( {\varvec{Z}_{k} \varvec{Z}_{k}^{T} +\upxi\varvec{I}} \right)^{ - 1} \varvec{Z}_{k} \varvec{H} $$

(19)

Step 5 (Update $ \varvec{E}_{k} $ ):

$$ \varvec{E}_{k} = \mathop {argmin }\limits_{{\varvec{E}_{k} }} \gamma \left\| {\varvec{E}_{k} } \right\|_{2, 1} + \frac{\mu }{2}\left\| {\varvec{X}_{k} - \varvec{X}_{k} \varvec{Z}_{k} - \varvec{E}_{k} + \frac{{\varvec{Y}_{1, k} }}{\mu }} \right\|_{F}^{2} $$

(20)

In order to solve the problem (20), the following lemma is required:

Lemma 1:

Let $ \varvec{Q} $ be a given matrix. If the optimal solution to

$$ \mathop {min }\limits_{\varvec{E}} \alpha \left\| \varvec{B} \right\|_{2,1} + \frac{1}{2}\left\| {\varvec{B} - \varvec{Q}} \right\|_{F}^{2} $$

(21)

is $ \varvec{U}^{ *} $, the i-th column of $ \varvec{U}^{ *} $ is as follows [19]:

$$ \left[ {\varvec{B}^{ *} } \right]_{:,i} = \left\{ {\begin{array}{*{20}l} {\frac{{\left\| {\varvec{Q}_{:,i} } \right\|_{2} - \alpha }}{{\left\| {\varvec{Q}_{:,i} } \right\|_{2} }}\varvec{Q}_{:,i} ,} \hfill & {if\left\| {\varvec{Q}_{:,i} } \right\|_{2} > \alpha } \hfill \\ {0, } \hfill & {otherwise} \hfill \\ \end{array} } \right. $$

(22)

Therefore, it is clear that the optimal solution of problem (20) is:

$$ \left[ \varvec{E} \right]_{:,i} = \left\{ {\begin{array}{*{20}l} {\frac{{\left\| {\varvec{J}_{:,i} } \right\|_{2} - \frac{\gamma }{\mu }}}{{\left\| {\varvec{J}_{:,i} } \right\|_{2} }}\varvec{J}_{:,i} , } \hfill & {if\left\| {\varvec{J}_{:,i} } \right\|_{2} > \frac{\gamma }{\mu }} \hfill \\ {0, } \hfill & {otherwise} \hfill \\ \end{array} } \right. $$

(23)

where $ \varvec{J} = \varvec{X}_{k} - \varvec{X}_{k} \varvec{Z}_{k} + \frac{{\varvec{Y}_{1, k} }}{\mu } $.

Step 6 (Update multiplies):

Multipliers $ \varvec{Y}_{1, k} $, $ \varvec{Y}_{2, k} $, $ \varvec{Y}_{3, k} $ and iteration step-size $ \rho $ ($ \rho $ > 1) are updated by Eq. (24):

$$ \left\{ {\begin{array}{*{20}l} {\varvec{Y}_{1, k} = \varvec{Y}_{1, k} + \mu \left( {\varvec{X}_{k} - \varvec{X}_{k} \varvec{Z}_{k} - \varvec{E}_{k} } \right)} \hfill \\ {\varvec{Y}_{2, k} = \varvec{Y}_{2, k} + \mu \left( {\varvec{Z}_{k} - \varvec{Z}_{k}^{'} } \right)} \hfill \\ {\varvec{Y}_{3, k} = \varvec{Y}_{3, k} + \mu \left( {\varvec{Z}_{k} - \varvec{Z}_{k}^{''} } \right)} \hfill \\ {\mu = min\left( {\rho \mu ,\mu_{max} } \right)} \hfill \\ \end{array} } \right. $$

(24)

In short, we summed up the process of solving the objective function (7) in Algorithm 1.

4 Experiments and Discussion

4.1 Comparison of Our Method and Baseline Classification Methods

We first compare our approach to some baseline feature selection and classification methods. Comparison methods include Nearest neighbor (NN) classifier without feature selection, linear discriminant analysis (LDA) [20], support machine vector (SVM) [21], and kernel discriminant analysis (KDA) [22].

We used the 10-fold cross-validation strategy in this experiment. Specifically, each dataset is equally divided into 10 subsets, and then each subset is taken as test set in turn, with all remaining subsets as training set. This process is repeated 20 times to avoid deviations caused by the sample segmentation process. Accuracy (ACC), sensitivity (SEN), specificity (SPE), the area under the receiver operating characteristic (ROC) curve (AUC) and their respective standard deviations (STD) are used to measure performance in the classification. For their calculation process, please refer to [4].

For fair comparison, the first-order and second-order BFCN construction steps are the same for all methods, that is, the method in [4]. We set a threshold of $ \left\{ {0,0.05, 0.1, \cdots ,0.4} \right\} $ for the first-order and second-order BFCN on all datasets. For NN, LDA, SVM, and KDA, we refer to [4], combining the first-order and second-order BFCN feature information for feature selection and classification steps. For SVM, we use a linear kernel. For KDA, we use a Gaussian kernel function. For the method we proposed, for simplicity, we use greedy strategy to select $ \alpha $ and $ \upxi $ from $ \left\{ {10^{ - 3} , 10^{ - 2} , \cdots , 10^{3} } \right\} $ and set the other hyperparams to 1.

Table 1 reports the comparison of our method and baseline feature selection and classification methods on three schizophrenic datasets. It can be seen that our proposed method has the best ACC, SEN and AUC performance on the three datasets compared to the baseline feature selection and classification algorithm. As can be seen from the Table 1, the SEN and SPE of these baseline methods show unbalanced results, so their SEN or SPE is very high, but the AUC results are not ideal. In contrast, the proposed method shows good performance, probably because the proposed method uses a multi-modality learning strategy to better combine the first-order and second-order BFCN information.

Table 1. The classification results (ACC/SEN/SPE/AUC ± STD%) of our method and several comparison baseline classification algorithms. The best results are shown in bold.

Full size table

4.2 Comparison of Our Method and State of Art Multi-modality Based Methods

In this experiment, we compare the proposed method with some start of art multimodality based methods. Comparison methods include multi-kernel learning (MKL) SVM method [23], multitask feature selection (MTFS) model [6], manifold regularized MTFS (M2TFS) model [13], and multi-modal structured low-rank dictionary learning (MM-SLDL) method [24].

The experimental setup and the metrics for measuring classification performance are the same as in the previous experiment. For fair comparison, the MKL parameters are chosen from $ \left\{ {0, 0.1, 0.2, \cdots , 1} \right\} $ using greedy strategy. The parameters selection method of MTFS and MT2FS is the same as that in [13], and the parameter selection range of MM-SLDL is the same as that in [24].

The experimental results are reported in Table 2. In addition, we plot the ROC curve of the experiment and show it in Fig. 1. It can be seen that the proposed method performs best on all indicators, and the ROC curve of the comparison method is almost at the bottom right of the ROC curve of our method. This may be because strategies based on multi-modality low-rank representation can better learn and fuse first-order and second-order BFCN information.

Table 2. The classification results (ACC/SEN/SPE/AUC ± STD%) of our method and state of art multi-modality based algorithms. The best results are shown in bold.

Full size table

4.3 Analysis of Convergence and Parameter Sensitivity

In this section, we first show the convergence iteration of the proposed method. Figure 2 shows the change in the value of the objective function as the number of iterations increases. It can be found that our method can basically converge within 300 iterations.

We then evaluated the effect of two hyperparameters ($ \alpha $ and $ \upxi $) on the performance of our method. The results are shown in Fig. 3. We mainly evaluated the effects of these two parameters on ACC and AUC. $ \alpha $ and $ \upxi $ have a value range of $ \left\{ {10^{ - 3} , 10^{ - 2} , \cdots , 10^{3} } \right\} $. For the sake of simplicity, we only show the effect of these two parameters on the results of the Xiangya dataset. Figure 3 shows that our method achieves a steady state when $ \upxi $ takes a value of 10, $ 10^{2} $, or $ 10^{3} $. Similar results were obtained on the other two datasets. In addition, we can see that parameter $ \upxi $ has a more critical impact on the results than parameter $ \alpha $.

5 Conclusion

In summary, we propose a multi-modality feature learning method based on low-rank representation to fuse first-order and second-order BFCN information and apply it to the classification of schizophrenia. Specifically, we combine the low-rank representation method with the multi-modality learning framework and add an ideal representation term to effectively learn the first-order and second-order BFCN feature information. Experiments on three schizophrenia datasets show that our approach is superior to existing multimodal feature learning methods.

References

Bluhm, R.L., et al.: Spontaneous low-frequency fluctuations in the BOLD signal in schizophrenic patients: anomalies in the default network. Schizophr. Bull. 33, 1004–1012 (2007)
Article Google Scholar
Richiardi, J., Achard, S., Bunke, H., Van De Ville, D.: Machine learning with brain graphs: predictive modeling approaches for functional imaging in systems neuroscience. IEEE Signal Process. Mag. 30, 58–70 (2013). https://doi.org/10.1109/MSP.2012.2233865
Article Google Scholar
Guo, S., Kendrick, K.M., Yu, R., Wang, H.L.S., Feng, J.: Key functional circuitry altered in schizophrenia involves parietal regions associated with sense of self. Hum. Brain Mapp. 35, 123–139 (2014). https://doi.org/10.1002/hbm.22162
Article Google Scholar
Zhu, Q., Li, H., Huang, J., Xu, X., Guan, D., Zhang, D.: Hybrid functional brain network with first-order and second-order information for computer-aided diagnosis of schizophrenia. Front. Neurosci. 13, 603 (2019). https://doi.org/10.3389/fnins.2019.00603
Article Google Scholar
Huang, S., et al.: Identifying Alzheimer’s disease-related brain areas from multi-modality neuroimaging data using sparse composite linear discrimination analysis. In: Advances in Neural Information Processing Systems, vol. 1431–1439 (2011)
Google Scholar
Zhang, D., Shen, D.: Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease. Neuroimage 59, 895–907 (2012). https://doi.org/10.1016/j.neuroimage.2011.09.069
Article Google Scholar
Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56, 2053–2080 (2010)
Article MathSciNet Google Scholar
Zhu, Q., Li, H., Huang, J., Xu, X., Guan, D., Zhang, D.: Hybrid functional brain network with first-order and second-order information for computer-aided diagnosis of schizophrenia. Front. Neurosci. 13, 603 (2019). https://doi.org/10.3389/fnins.2019.00603
Article Google Scholar
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 171–184 (2013). https://doi.org/10.1109/TPAMI.2012.88
Article Google Scholar
Zhang, N., Yang, J.: Low-rank representation based discriminative projection for robust feature extraction. Neurocomputing 111, 13–20 (2013). https://doi.org/10.1016/j.neucom.2012.12.012
Article Google Scholar
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717–772 (2009). https://doi.org/10.1007/s10208-009-9045-5
Article MathSciNet MATH Google Scholar
Fazel, M.: Matrix Rank Minimization with Applications. Dissertation (2002)
Google Scholar
Jie, B., Zhang, D., Cheng, B., Shen, D.: Manifold regularized multitask feature learning for multimodality disease classification. Hum. Brain Mapp. 36, 489–507 (2015)
Article Google Scholar
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 171–184 (2013). https://doi.org/10.1109/TPAMI.2012.88
Article Google Scholar
Wright, J., Ganesh, A., Rao, S., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices, vol. 1, pp. 289–298 (2009). 58
Google Scholar
Zhu, C., Wei, L., Zhou, R., Wang, X., Wu, A.: Robust subspace segmentation by self-representation constrained low-rank representation. In: Neural Processing Letters, pp. 1671–1691 (2018) https://doi.org/10.1007/s11063-018-9783-y
Article Google Scholar
Lin, Z., Chen, M., Wu, L., Ma, Y.: The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Eprint Arxiv. 9 (2010)
Google Scholar
Zhang, Z., Liu, L., Shen, F., Shen, H.T., Shao, L.: Binary multi-view clustering. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1 (2018)
Article Google Scholar
Yang, J., Yin, W., Zhang, Y., Wang, Y.: A fast algorithm for edge-preserving variational multichannel image restoration. SIAM J. Imaging Sci. 2, 569–592 (2009)
Article MathSciNet Google Scholar
Zhang, X., Jia, Y.: A linear discriminant analysis framework based on random subspace for face recognition. Pattern Recogn. 40, 2585–2591 (2007). https://doi.org/10.1016/j.patcog.2006.12.002
Article MATH Google Scholar
Chang, C., Lin, C.: LIBSVM. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011). https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Cai, D., He, X., Han, J.: Speed up kernel discriminant analysis. VLDB J. 20, 21–33 (2011)
Article Google Scholar
Zhang, D., Wang, Y., Zhou, L., Yuan, H., Shen, D.: Multimodal classification of Alzheimer’s disease and mild cognitive impairment. Neuroimage 55, 856–867 (2011). https://doi.org/10.1016/j.neuroimage.2011.01.008
Article Google Scholar
Foroughi, H., Shakeri, M., Ray, N., Zhang, H.: Face recognition using multi-modal low-rank dictionary learning. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1082–1086. IEEE (2017). https://doi.org/10.1109/ICIP.2017.8296448

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China (Nos. 61501230, 61732006, 61876082 and 81771444), National Science and Technology Major Project (No. 2018ZX10201002), and the Fundamental Research Funds for the Central Universities (No. NJ2019010).

Author information

Authors and Affiliations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Huijie Li, Qi Zhu, Rui Zhang & Daoqiang Zhang
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210093, China
Huijie Li & Qi Zhu

Authors

Huijie Li
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Daoqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Zhu .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhen Cui
Nanjing University of Science and Technology, Nanjing, China
Jinshan Pan
Nanjing University of Science and Technology, Nanjing, China
Shanshan Zhang
Nanjing University of Science and Technology, Nanjing, China
Liang Xiao
Nanjing University of Science and Technology, Nanjing, China
Jian Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Zhu, Q., Zhang, R., Zhang, D. (2019). Multi-modality Low-Rank Learning Fused First-Order and Second-Order Information for Computer-Aided Diagnosis of Schizophrenia. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Big Data and Machine Learning. IScIDE 2019. Lecture Notes in Computer Science(), vol 11936. Springer, Cham. https://doi.org/10.1007/978-3-030-36204-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-36204-1_30
Published: 29 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36203-4
Online ISBN: 978-3-030-36204-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics