Introduction

Alzheimer’s disease (AD) is a progressive mental disorder associated with neurodegeneration that generally occurs in old ages. It is one of the most common diseases in seniors, killing more than breast cancer and prostate cancer combined (Association, 2019). However, AD can only be formally diagnosed through an autopsy after a patient is deceased, encouraging the research of alternative proxies for AD diagnosis. There are currently three classes of potential biomarkers that could offer useful alternatives to assess AD diagnosis: \(\beta -\)amyloid(A), tau(T) and biomarkers for neurodegeneration or neuronal injury(N) (Jack et al., 2016), with the last class being the focus of this paper. Neurodegeneration can be measured noninvasively, such as through structural MRI, but is not specific to AD as it can reflect other diseases (Jack & Holtzman, 2013). For example, atrophy is a biomarker that is often associated with AD, but also occurs in a variety of disorders such as epilepsy and anoxia (Jack & Holtzman, 2013). The lack of specificity of these neurodegeneration biomarkers poses a major limitation in their utility for early-stage clinical diagnosis of AD. A promising approach to improving specificity is to consider patterns of volumetric changes across the whole brain, rather than focusing on a small number of regions. For example, in discussing the diagnosis of Mild Cognitive Imparment due to AD, Albert et al. suggests the possibility of biomarkers describing “complex patterns of tissue loss” through “data driven statistical approaches in which many different brain regions are evaluated simultaneously” (Albert et al., 2011).

While rare in volumetric analysis, such patterns have been studied extensively in functional brain imaging. One popular approach to studying brain connectomics is to extract information about interactions between volumetric pixels (voxels) from time-series functional magnetic resonance imaging (fMRI) data collected over a timescale of weeks, months or years (Cohen et al., 2017). Functional connectivity, in particular, studies temporal dependencies among anatomically separated regions (Van Den Heuvel & Pol, 2010). There are several approaches to study functional connectivity in fMRI data. As our interests lie within the whole brain instead of a single voxel, we discuss only multivoxel pattern analysis methods that study networks as a whole (Lewis-Peacock & Norman, 2014). A traditional method in the field is seed-based analysis, in which a region of interest (ROI) is selected, and all voxels correlated to the ROI is identified (Cole et al., 2010). For example, in an earlier work Biswal et al. studied the motor cortex to identify the sensorimotor network (Biswal et al., 1995). Unsupervised clustering methods including k-means, hierarchical and graph-based methods do not require a priori ROI and group voxels together by their similarities in time series data (Khosla et al., 2019). In their paper, Lee et al. uses fuzzy-c-means clustering to identify resting state networks (Lee et al., 2012).

Over the years, graph-based approaches have gained popularity in studying functional connectivity. To convert a brain into a graph, the regions are modeled as nodes and connections between regions as edges. Under this model, one can construct a matrix of all pairs of connections in the brain, known as the functional connectome (Fornito et al., 2016), of which decomposition or embedding methods can be applied to uncover latent variables. Independent component analysis decomposes data into linearly independent components, grouping brain regions into networks based on their voxel activation correlations (Calhoun et al., 2001). Non-negative matrix factorization is a dimensionality reduction method that forces non-negativity constraints on the components (Khosla et al., 2019). Some popular embedding methods include Adjacency Spectral Embedding (ASE) (Sussman et al., 2012), which embeds a single symmetric adjacency matrix using eigenvectors corresponding to the largest eigenvalues, and Laplacian Eigenmap (LE) (Belkin & Niyogi, 2003), which embeds a single graph-Laplacian matrix using its eigenvectors corresponding to the smallest nonzero eigenvalues. However, several limitations lie within these graph-based approaches. First, they embed one graph at a time, and combining individual embeddings across multiple graphs is not a straighforward task. Second, the results are difficult to interpret, and further analysis is required (Yang et al., 2020).

One technique for embedding multiple graphs at once is omnibus embedding, in which the matrices of multiple graphs are combined into one and embedding is done on the big matrix (Levin et al., 2017). However, the combined matrix is usually very large and require lots of computational power. Dictionary learning is another framework for uncovering low-dimensional embeddings across multiple graphs, allowing for group comparison. Drawing upon clustering and linear decomposition methods, this method allows for additional constraints to achieve better formed solutions (Abraham et al., 2013). In the work of D’Souza et al. (D’Souza et al., 2019), they use a dictionary learning method to model interactions between resting state functional MRI and behavioral data in Autism Spectrum Disorder. Their method finds shared dictionary elements across multiple graphs and a subject specific loading onto the elements, which are then used as inputs to a neural network for disease prediction (D’Souza et al., 2019). In the work of Wang et al. (Wang et al., 2019), they propose a joint graph embedding method to estimate a low dimensional embedding across multiple graphs and each graph’s projection onto that embedding, which we will discuss more in detail in Section 2.2.

In this work, we shift our attention away from functional connectivity and propose a graph-based approach to study neurodegeneration using correlations in volumetric data over time. In Alzheimer’s disease, tau tangle accumulation is known to follow a stereotyped pattern, beginning in the transentorhinal region (stage I-II), spreading to the limbic regions (stage III-IV), and eventually moving throughout the isocortex (stage V-VI) Braak and Braak (1991). Evidence is accumulating from digital pathology and brain morphology that patterns of neurodegeneration follow this tau deposition Tward et al. (2020); Stouffer et al. (2022, 2023); Sadaghiani et al. (2022); Xie et al. (2022); Lyu et al. (2023), and therefore we hypothesize that correlations in neurodegeneration among these temporal lobe structures may provide a signal that is specific to early Alzheimer’s. Our previous work Miller et al. (2015a) examined the timing of neurodegeneration throughout this medial temporal lobe network, but this spread of pathology throughout a characteristic network has been observed in other diseases as well such as Huntington’s Ross et al. (2014), Parkinson’s Visanji et al. (2013); Kordower (2014), depression Small et al. (2011) as well as other work in Alzheimer’s disease AD Yin et al. (2014). The analysis we propose here provides an opportunity to identify these networks from whole brain data in Alzheimer’s and other disorders.

Other authors have previously proposed graph-based analysis of volumetric brain imaging data applied to neurological disease. For example Zugman et al. (2015) describes the use of “structural covariance” and Yin et al. (2023) describes the use of a “morphological connectivity network” to quantify characteristic brain networks involved in schizophrenia. Earlier work Bullmore and Bassett (2011) has proposed similar approaches termed “anatomical connectivity”. However, each of these methods creates a single graph that describes a population, whereas our approach leverages time series data to produce a connectivity graph for each individual. This approach allows us to model variability between different individuals, and perform statistical testing on a well-posed joint model for graph valued random variables Chung et al. (2021). To our knowledge, this is the first time joint graph embeddings have been used to study volumetric brain data.

Similar to methods reviewed above, we model each brain as a graph where a node represents a structure of interest and an edge represents a correlation in atrophy patterns between two structures (Xu, 2021). We then use a multigraph embedding technique to try and understand these patterns and uncover potential biomarkers by applying multigraph embedding to study neurodegeneration over a long timescale (relative to fMRI measures). In addition, we hope to increase the specificity of neurodegeneration biomarkers by modeling the dataset with a more complex pattern than existing approaches. Rather than looking at volume changes in each structure individually in the traditional mass univariate method (Pengas et al., 2012), we add complexity by characterizing pair-wise relationships between structures in the context of uncovered networks. Furthermore, we illustrate how our embedding coefficients can be fed into machine learning algorithms for potential diagnostic applications, and how samples can be drawn from our model to visualize the brain’s typicality and variability, interpolating between (or extrapolating beyond) healthy and diseased states.

Material and Methods

Data Preprocessing

We obtain our data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner,MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). Specifically, we took the ADNI1 3Y1.5T Longitudinal FreeSurfer dataset (Wyman et al., 2013) prepared by University of California, San Francisco, comprised of 699 individuals in total. We selected 108 regions of interest common in studying neurodegeneration, excluding non-brain and whole-brain structures as we are interested in structures on a smaller scale. We selected a cohort in which each individual has at least 3 visits during the span of 3 years. Note here that the number of time point required for each subject is not fixed, as long as it is greater than 2, as 2 time points does not give a meaningful correlation matrix. For subjects with missing volumes, we forward filled in time by taking the measure from the most recent previous visit, note that the initial visit had no missing values for all subjects. We model each individual’s brain as a graph, where each anatomical structure is a node and an edge exists between two nodes if they are highly correlated during neurodegeneration. For each individual in the cohort, we converted volumetric data into a correlation matrix of size 108 by 108 and then an adjacency matrix based on a threshold of 0.8 by absolute value, as shown in Fig. 1. We note here as there is no "gold standard" for choosing a threshold, previous work have used 0.1 in Kiar et al. (2017), 0.8 in Zhuo et al. (2018), or based on graph density and statistical significance in Bullmore and Bassett (2011). To investigate the potential for sensitivity to this choice, we ran additional experiments with thresholds of 0.7 and 0.9, and produced similar results. In particular, the structure with the top 10 highest loadings in each significant network were the same. For defining disease groups, the Clinical Dementia Rating (CDR) is referenced, which consists of 5 levels: 0 (None), 0.5 (Questionable), 1 (Mild), 2 (Moderate), and 3 (Severe) (Morris, 1991). We decided a threshold of \(<=1\) to separate the group into none/mild and severe cognitive impairment. After filtering and pre-processing, the cohort contains 494 individuals, specifically the none/mild group with 322 and the severe group with 172.

Fig. 1
figure 1

An example of transformation for one individual’s selected structures

Multiple Random Eigengraphs and Joint Graph Embedding

We first review the mathematical model and original joint graph embedding algorithm proposed by Wang et al. (Wang et al., 2019), of which our algorithm is based on. In this work, we refer to a random graph as a graph in which the edges are generated under a probability distribution. The Multiple Random Eigengraphs (MREG) is a mathematical framework modeling multiple random graphs (Wang et al., 2019). Consider a set of m unweighted and undirected graphs with the same n vertices denoted by \(\{G_{s} = (V_{s}, E_{s})\}_{s=1}^{m}\). Let \(h_1,...,h_d\) be normalized vectors in \(\mathbb {R}^{n}\) that span a subspace of dimension d, contributing to a large amount of variability in the set of graphs, and \(\lambda _1,\ldots ,\lambda _d\) be vectors in \(\mathbb {R}^{m}\) such that \(\sum _{k=1}^d \lambda _{k}[s]h_{k}h^T_{k} \in [0,1]^{n \times n}\) for all \(\lambda\) (Wang et al., 2019) for subject s. With these parameters known, we could generate a sample graph described here and illustrated in Fig. 2:

  1. 1.

    Identify each group specific network, and subject specific loadings

  2. 2.

    Carry out the product to form a score for each edge

  3. 3.

    Apply the softmax function to form a probability for each edge

  4. 4.

    Sample independent Bernoulli random variables for each possible edge

Under this procedure, the adjacency matrix \(A_s\) for each graph \(G_s\) should be modeled as follows (Wang et al., 2019):

$$\begin{aligned} A_{s}[i,j]&\sim Bernoulli\left( \sum _{k=1}^{d} \lambda _{k}[s]h_{k}[i]h_{k}[j]\right) , \text { for } i\ge j\end{aligned}$$
(1)
$$\begin{aligned} A_s[i,j]&= A_s[j,i], \text { for } i <j \end{aligned}$$
(2)

Note here each \(A_s\) is symmetric, as opposed to sampling independently above and below the diagonal. The h vectors span the latent subspace shared by the set of multiple graphs, and the \(\lambda\) vectors represent graph-specific loadings onto the subspace (Wang et al., 2019).

The original joint graph embedding algorithm by Wang et al. (Wang et al., 2019) estimates a low dimensional embedding of the latent space across multiple graphs and each graph’s projection onto that embedding under the MREG model. It estimates the subspace by minimizing the sum of squared errors (SSE) between the subspace and adjacency matrices. (Wang et al., 2019). In this work, we implemented a modified version of the Wang et al. algorithm as discussed below.

Limitations of the Original Framework

The original framework assumes a large enough number of vertices to provide accurate estimates. In the original paper’s experiment analyzing brain data, there were 1105 vertices, but here we work with only 108 vertices. As such, we identify several limitations when applying the original framework and algorithm above to neurodegeneration data, which we will address and modify in our version:

  1. 1.

    The constraint that probabilities lie in [0, 1] is difficult to enforce in practice. The original method assumes a large enough sample space that gives desirable results, but does not hold up in smaller sample spaces. In fact, when we reanalyzed our data using the least squares estimator described in the original paper, the resulting matrix was not a probability matrix, with a maximum value of 1.53 and a minimum of \(-\)0.17. While embeddings from this model still have useful applications, it cannot be sampled from, for example, to produce visualizations as in our Fig. 6.

  2. 2.

    Incorrect estimation of diagonal entries contributes a negligible amount of error when the number of vertices is large, but contributes significantly in our case. We ran experiments by applying our algorithm with and without considering diagonal entries and comparing the binary cross entropy (BCE) loss between the two optimizations for various numbers of structures. If working with fewer than 40 structures, our method reduces the BCE loss by 29%. For our dataset, the BCE loss was reduced by 0.011%.

  3. 3.

    The SSE loss function does not correspond to a log likelihood under the proposed Bernoulli model, and therefore resulting parameter estimates do not have desirable properties of maximum likelihood estimators. For example, they are not guaranteed to be asymptotically unbiased or efficient, whereas our estimators are. On the other hand, the SSE loss can be optimized more efficiently than our method.

Now we state our main contribution and novelty in this work. First, we address the three limitations listed above by (1) introducing a sigmoid function to the model to guarantee edge probabilities are in [0, 1], (2) adding constraints on the diagonal such that parameter estimates are not forced to fit diagonal entries that carry no meaning (since diagonal entries of a correlation matrix are always 1), and (3) extending the original algorithm (Wang et al., 2019) to identify maximum likelihood estimators by gradient descent. With these modifications, our embedding algorithm generates probabilities suitable for a Bernoulli model, which we will describe more in detail below. Secondly, we develop and implement a novel statistical testing framework to detect complex patterns including network-structure pairs and triples, rather than a machine learning classifier. To our knowledge, our approach to testing patterns has not been performed to analyze joint graph embedding results on brain imaging data before.

Modifications to MREG

In this section, we precisely state our modifications to the MREG model in Wang et al. (Wang et al., 2019). Consider the set of m unweighted and undirected graphs with the same n vertices denoted by \(\{G_{s} = (V_{s}, E_{s})\}_{s=1}^{m}\), where a vertex represents a brain structure of our interest and an edge represents a strong correlation between structures. We modify the interpretation of h and state that the h vectors now span a space of parameters that encode the probability when acted on by a sigmoid function. This sigmoid function guarantees \(sigmoid(\sum _{k=1}^d \lambda _{k}[s]h_{k}h^T_{k}) \in [0,1]^{n \times n}\) even in the case of small samples, addressing limitation 1. Secondly, we force diagonal entries to be 1 since a structure’s relation with itself is not of interest in this work, addressing limitation 2. Our modifications to Eqs. 1 and 2 are as follows:

$$\begin{aligned} A_{s}[i,j]&\sim Bernoulli\left( sigmoid\left( \sum _{k=1}^{d} \lambda _{k}[s]h_{k}[i]h_{k}[j]\right) \right) \text {, for } i>j \end{aligned}$$
(3)
$$\begin{aligned} A_{s}[i,j]&= A_{s}[j,i] \text {, for } i<j \end{aligned}$$
(4)
$$\begin{aligned} A_{s}[i,j]&= 1 \text {, for } i=j\end{aligned}$$
(5)
$$\begin{aligned} sigmoid(x)&= \frac{1}{1 + \exp (-x)} \end{aligned}$$
(6)

An example transformation from the latent subspace and subject specific loading to adjacency matrices observed in our data is illustrated in Fig. 2.

Fig. 2
figure 2

An example of MREG for the same individual and structure in Fig. 1

Our Maximum Likelihood Joint Graph Embedding Algorithm

We change the algorithm to better fit the constraints of our study and estimate the subspace by minimizing the binary cross entropy (BCE) loss, which is the log likelihood under the model. Let \(E_s[i,j]\) be a symmetric matrix representing edge probabilities between structures i and j for subject s, then the BCE loss is minimized as in Eq. (7), where \(A_s\) are the observed adjacency matrices:

$$\begin{aligned} \mathop {\arg \min }\limits _{\lambda ,h} -&\frac{1}{m} \sum _{s=1}^m\sum _{i=1}^{n} \sum _{j=i+1}^{n} A_{s}[i,j] \log (E_s[i,j]) + (1-A_{s}[i,j]) \log (1-E_{s}[i,j]) \end{aligned}$$
(7)

Note here that the diagonal terms do not contribute to our loss function. By replacing the original loss function with BCE, we will have a well characterized maximum likelihood estimator even in the small sample case, addressing limitation 3. We take a greedy approach in finding the optimal representation of the latent space used in the original method Wang et al. (2019), where we start with a 1D optimization problem, and then expand to the second dimension while keeping the first dimension fixed to find the optimal representation. The algorithm is implemented as follows:

figure a

We coded in python and PyTorch, and used automatically calculated gradients for gradient descent. The algorithm took 30 minutes to run on a computer with 10-core CPU. Convergence in all 4 dimensions is shown in Fig. 3a, where at every 10,000 iterations we observe the loss dropping quickly after adding another dimension, and converging before the next dimension is added.

Fig. 3
figure 3

a. Convergence in all 4 dimensions with the algorithm b. Scree plot with elbow at 4 dimensions

Hyperparameter Tuning

For hyperparameter tuning, we first estimate an optimal dimension of the latent space to account for the majority of the variation in observed correlation matrices. To do this, we computed an eigendecomposition for each subject’s correlation matrix, and found that 4 components provided a reasonable reconstruction accuracy. A scree plot for one typical subject is shown in Fig. 3b. Next, we performed a grid search over several orders of magnitude for the gradient descent step sizes corresponding to h and \(\lambda\), and selected the largest parameters that gave convergence without oscillation. We decided to set the step size for h to 2 and the step size for \(\lambda\) to 20000.

Significance Testing

We develop and implement a novel test statistic similar to F-type statistics (i.e. comparing sum of square error under two different models) to test for differences in networks and structures between the two groups, and use permutation testing on a maximum statistic to control for Familywise Error Rate (FWER) at 5% (Nichols & Hayasaka, 2003). In addition, we compare the difference between groups after accounting for confounders: age, intracranial volume and APOE gene status by least squares regression. APOE gene status was modeled as a categorical (as opposed to cardinal) variable, where an individual may have 0, 1, or 2 copies. Other studies of neurodegeneration have used similar covariates in analysis. For example studying morphometry in early Alzheimer’s, Miller et al. (2013) and Miller et al. (2015b) accounted for intracranial volume and sex, in studying morphometry in schizophrenia Yin et al. (2023) accounted for age and sex. Our choices of confounders were inspired by Judea Pearl’s backdoor criteria Neuberg (2003), regressing out a sufficient set of variables we believe have a causal effect on both disease status and morphology. In particular, we chose not to adjust for sex, as we believe its largest impact on morphology is mediated by brain size, which we have already adjusted for. We included APOE status as a covariate to understand what additional information morphology can tell us about disease beyond what is already known from genetics. For a cohort with m subjects, d networks, and n structures, we define the test score as \(\text {Score}_{sij} = \sum _k \lambda _{k}[s] h_{k}[i] h_{k}[j]\) for subject s and structures i and j. We pass the score through a sigmoid to give probabilities E, and estimate these variables by maximum likelihood. We form the tensor \(T_{skij} = \lambda _{k}[s]h_{k}[i]h_{k}[j]\) to get a three-tuple for each subject s, of which we will perform statistical tests by comparing how close it is to its group-dependent or group-combined average, after accounting for confounders. By summing over various combinations of indices in T before statistical testing, we are able to test for:

  1. 1.

    networks only

  2. 2.

    structures only

  3. 3.

    network-structure pairs

  4. 4.

    structure-structure pairs

  5. 5.

    network-structure-structure triples

While network only and structure only are fairly straightforward to explain as they represent the networks’ and structures’ association with neurodegeneration in AD, the rest are more difficult to interpret. We can view network-structure pair as studying the structure’s association with disease status through its role in the network. We can view structure-structure pairs as discovering structures that, when correlated with another structure, have significant effect on disease status. Lastly for the triple, we can interpret it as the effect on disease status as the pair of structures, through their combined role in the network. We note here that tests for 2 include the standard mass univariate approach of Hayasaka et al. (2004), and test for 4 include permutation testing on the absolute value of correlation coefficients, described for example in Bullmore and Bassett (2011). As standard tests for items 2 and 4 exist, we will focus our work on 1, 3, and 5. To our knowledge, our framework for testing patterns involving networks, pairs, or triples has not been performed to analyze brain imaging data before.

Confounder Regression Analysis

We perform least square regression analysis to test for true signals not caused by common confounders for AD and neurodegeneration. We start with a design matrix D containing a column of 1 s (for mean), and columns for age, intracranial volume and APOE gene status, and estimate a coefficient matrix \(\hat{C}\) for confounders. From \(\hat{C}\) the SSE for one group is calculated as follows:

$$\begin{aligned} \hat{C}^{combined}&= (D^TD)^{-1}D^T T \end{aligned}$$
(8)
$$\begin{aligned} \hat{T}^{combined}&= \hat{C}^{combined} D \end{aligned}$$
(9)

Here T is reshaped into a matrix from a tensor for calculation. Next, we calculate the SSE for splitting the cohort into two groups: none/mild and severe AD. We first form \(D'\), which has an additional column to D indicating disease status. We then perform the same regression, with \(D'\) replacing D in Eqs. 8 and 9 to find \(\hat{T}^{twogroups}\). In the next sections, we will use \(\hat{T}^{combined}\) and \(\hat{T}^{twogroups}\) to calculate the sum of square errors (SSE) when considering one vs. two groups.

Networks

We start by testing for significant networks and obtain the test statistic \(X_k\) for network k by reducing (i.e. summing over) additional dimensions. Let \(\hat{T}^g_{k,i,j}\) be the expected value matrix under our linear model for dimension k, structures i and j, and subjects in group g. We remove extra dimensions by calculating the SSE between one group vs. two groups and \(X_k\) as follows:

$$\begin{aligned} SSE_k^{g}&= \sum _{s \in g,i,j} (T_{skij} - \hat{T}^{g}_{kij})^2 \text { , } g \in \{ {\text {combined, twogroups}} \} \end{aligned}$$
(10)
$$\begin{aligned} X_k&= SSE_k^{combined} - SSE_k^{twogroup} \end{aligned}$$
(11)

To control the FWER at 5%, we use permutation testing and take the max over k at each iteration and define the threshold as the 95 percentile of 10,000 simulation results (Nichols & Hayasaka, 2003).

Network Structure Pairs

Similar to networks only, we calculate the SSE in the two settings but reduce one fewer level as follows:

$$\begin{aligned} SSE_{k,i}^{g}&= \sum _{s \in g,j} (T_{skij} - \hat{T}^{g}_{kij})^2 \text { , } g \in \{ {\text {combined, twogroups}} \} \end{aligned}$$
(12)
$$\begin{aligned} X_{k,i}&= SSE_{k,i}^{combined} - SSE_{k,i}^{twogroup} \end{aligned}$$
(13)

We follow the same permutation testing procedure and take a max over ki instead of just k at each iteration.

Network Structure Structure Triples

Lastly for triples, we form the test statistic:

$$\begin{aligned} SSE_{k,i,j}^{g}&= \sum _{s \in g} (T_{skij} - \hat{T}^{g}_{kij})^2 \text { , } g \in \{ {\text {combined, twogroups}} \} \end{aligned}$$
(14)
$$\begin{aligned} X_{k,i,j}&= SSE_{k,i,j}^{combined} - SSE_{k,i,j}^{twogroup} \end{aligned}$$
(15)

We follow the same permutation testing procedure and take a max over kij at each iteration.

In our experiments, we performed the analysis twice: with and without adjusting for confounders. Note that not adjusting for confounders is a special case of T estimation, where \(\hat{T}^g\) is simply the mean over all subjects in group g. In the results section, we will present the former, and make brief comparisons to the latter.

Code Availability

Our source code and documentation are available on GitHub at https://github.com/twardlab/joint_graph_embedding_AD. To replicate our work, first agree to the user agreement by ADNI and download the ADNI1 3Y1.5T Longitudinal FreeSurfer dataset by University of California, San Francisco (Wyman et al., 2013). Put all files under a directory named dataset, and first run the Jupyter notebook preprocess, then the notebook \(joint\_graph\_embedding\_analysis\). We document our code with Sphinx (Brandl, 2021), and save documentations in docs. More details on how to replicate our work can be found in our GitHub repository.

Results

Significant Networks Associated with AD Neurodegeneration

Out of the 4 networks identified from joint graph embedding, we found the first 2 extremely significant, after accounting for confounders. Upon examination in Table 1, both networks are dominated by structures believed to be associated with AD neurodegeneration. For example, our work has demonstrated the involvement of amygdala (uncovered in network 1) in early Alzheimer’s Miller et al. (2015b). In defining criteria for diagnosing Alzheimer’s disease McKhann et al. (2011) describes “disproportionate atrophy on structural magnetic resonance imaging in me-dial, basal, and lateral temporal lobe” (uncovered in network 2). We show the results from permutation testing and the statistics for each network in Fig. 4 and visualize the structures in network 1 and 2 in Fig. 5. In Table 1, we include the top 10 structure by absolute value of loading in each network.

Fig. 4
figure 4

Histogram of permutation testing after accounting for confounders

Fig. 5
figure 5

Visualization of significant networks

Table 1 Top 10 structures ranked by loading of each network

Significant Network Structure Pairs Associated with AD Neurodegeneration

For network-structure pairs, we rejected 170 out of 432 network structure pairs. While most rejected pairs occur in networks previously found significant, the pairs are not exclusive to network 1 and 2 only. Three structure pairs in network 3 were found significant. In Table 2, we show the top 5 significant pairs ranked by p-value.

Table 2 Top 5 significant pairs ranked by FWER-corrected p value

Significant Network Structures Triples Associated with AD Neurodegeneration

For network-structure-structure triples, we rejected 753 out of 46656 possible triples. The network and structures found significant are again not a subset of those found significant in the network-structure pair. In Table 3, we show the top 5 significant triples ranked by p-value.

Table 3 Top 5 significant triples ranked by FWER-corrected p value

Results Comparing Confounder Regression and No Regression

While the results were similar to those shown above, the analysis without accounting for confounders found more structures and networks significant than analysis with confounder regression. For networks only, network 3 became slightly significant, and networks 1 and 2 remained highly significant. The top 10 structures were the same for networks 1 and 2 in both analysis, but those in structure 3 and 4 were different. For tuples analysis, we found 205 instead of 170 pairs and 1171 instead of 753 triples significant. The networks and structures found significant in triples are not a subset of those found in pairs and vice versa.

Results Comparing Significant Structures and Structure-Structure Pairs with Existing Methods

Here we compare the significant structures and structure-structure pairs found by our method and existing methods. For structures only, we compare results to mass univariate analysis Hayasaka et al. (2004). Here we calculate atrophy rate per patient for each structure and find the group SSE by subtracting from the group mean. The test statistic is calculated as \(SSE_{combined} - (SSE_{AD} + SSE_{healthy})\) and the 95 percentile threshold is found using permutation testing. The mass univariate method rejected 36 structures out of 108, which is a subset of the structures our method found significant.

For structure-structure pairs, we compare to a method described in Bernal-Rusiel et al. (2013). For each pair, we fit a linear line between considering the combined group only or AD/normal groups, and calculate the SSE difference. Again, we run permutation testing to find the 95 percentile threshold. The method rejected the null hypothesis for 127 pairs, 79 of which are commonly shared with our finding (62% overlap).

Results Comparing Greedy and Non-Greedy Optimization

We considered the effect of greedy optimization (as described in the original implementation in Wang et al. (2019) versus joint optimization over all dimensions simultaneously. The former has the advantage that network modes are ordered in terms of the variance they explain (similar to principal component analysis). We repeated our experiments using a non-greedy approach, and saw that each network recovered had the same 10 structures with the highest loadings. On the other hand, when using the non-greedy approach our statistical results were different, with our null hypothesis rejected for only one network, but more network-structure pairs and network-structure-structure triples rejected.

Subject Specific Analysis

To illustrate a possibility for further subject-specific analysis using the results from our method, we built a logistic regression model using subject-specific loadings from our method along with age, APOE status and ICV. The classification model predicts each subject’s disease status, 1 for AD and 0 for normal. We compare two model results with 10-fold cross validation: one with just age, APOE status and ICV, and the other with added loadings from our method. The first model had an ROC-AUC (area under the receiver operating characteristic curve) of 0.72 while the second had an improved ROC-AUC of 0.78, indicating some added utility by including our results in subject-specific analysis.

Discussion

In this work, we applied a joint graph embedding method (similar to a multivoxel dictionary embedding method), which would typically be used to study functional connectivity, to volumetric data in neurodegeneration. We extended the original algorithm (Wang et al., 2019) and implemented our maximum likelihood joint graph embedding algorithm to identify significant structural networks from volumetric data in Alzheimer’s disease cohorts (Wyman et al., 2013). We showed that our version of the algorithm has promise in uncovering latent dimensions that are easy to interpret and visualize. In addition, we developed and implemented a novel testing procedure and tested for significant networks, network-structure pairs and network-structure-structure triples. We performed analysis to regress out common confounders in AD in hope of gaining more discovery power and make a few comments here between our results. We found fewer significant pairs and triples when taking confounders into account than not. This is expected, as some structures previously found significant may be caused by confounders. While networks 1 and 2 remained unchanged and highly significant, indicating that changes in these networks are due to disease progression, network 3 was no longer a significant network. The structures’ change in networks 3 and 4 also indicate these structures may have been significant due to common confounders such as age.

Our framework shows promise in that it discovered structures commonly believed to be associated with AD neurodegeneration. We point out several strengths in our framework. The method offers a new way to view neurodegeneration in AD, where we can not only study networks and structures by themselves, but also their interaction with one another in terms of pairs and triples. Each of the three groups of findings gives us more information on structural correlations than traditional methods (Pengas et al., 2012), and we also find structures not found by previous methods, such as mass univariate analysis (Bernal-Rusiel et al., 2013). Since these pairs and triples are a novel description of neurodegeneration patterns, we will briefly state their interpretation. For example, our discovered pairs can be interpreted as “the right pallidum displays a significant association with disease status, through its role in network 1”. As another example, our discovered triples can be interpreted as “the interaction between the left inferior temporal lobe and right middle temporal lobe displays a significant association with disease status, through its role in network 2”. One future direction of this study is to examine more closely the interpretation and meaning of the networks and their loadings. Furthermore, the results can be used as an additional source of information in clinical studies. We gave an example of how the results may be used in further subject-specific analysis by implementing a classifier that predicts disease status, and showed that it outperforms the classifier with the same architecture but the loadings removed. Another example is to sample from a model with group average loadings, and interpolate and extrapolate connectivity to generate networks for subjects along a continuum between healthy and diseased. We include a visualization in Fig. 6 using 5 different values of \(\lambda\) (from top to bottom) for a fixed h, and generating networks by sampling from our probability matrix 3 times (from left to right) for each \(\lambda\). The \(\lambda\) values we used interpolate and extrapolate between the mean for the AD and control group: \(p\bar{\lambda }_{\text {control}}+(1-p)\bar{\lambda }_{\text {AD}}\) for \(p \in \{1.5,1.0,0.5,0.0,-0.5\}\). Values outside [0,1] represent extrapolation, which is possible because our model is guaranteed to result in probability matrices in the range [0,1].

Fig. 6
figure 6

Visualization of networks sampled from our model using a fixed h (measured from data), and values of \(\lambda\) along a continuum between the mean for control and the mean for AD. We include extrapolation (top and bottom row) as well as interpolation (middle row), and draw 3 independent samples to understand variability in each case (columns)

Secondly, the dataset used in our study, ADNI (Jack et al., 2008), is very well characterized, it is more robust and spans a longer period than most clinical data. Thirdly, though increased complexity in algorithm and testing procedure often require more computational time, our algorithm embeds 494 patients with 108 structures in just 30 minutes and runs significance testing in 4 hours. We experimented with cohort and structure sizes and show runtime results in Fig. 7. We note that while the algorithm’s runtime scales linearly in subjects, it is exponential in structure counts. However, as studies often do not include a large set of structures, it should not be of big concern to users. On the other hand, our algorithm shows good time complexity with increased cohort sizes.

Fig. 7
figure 7

Runtime with increasing parameters

By offering more flexibility in terms of patterns that we can identify, we hope that our framework will reveal biomarker patterns more sensitive to AD.

Next we discuss a few limitations in this work. First, due to our high standard of cohort selection, this dataset may not be representative of clinical MRI subjects. As such, a natural future direction for the study is to apply this framework to clinical datasets, which represent a more diverse population. Secondly, our findings are on a population scale instead of an individual scale, which may lead to additional biases when applying to individual clinical diagnosis. We believe that using a more fine-grained separation of the cohort, as opposed to modeling disease status as only two groups, may help address this issue. Thirdly, we note that since there has been little work done to apply graph embedding methods to study volumetric data in Alzheimer’s Disease cohorts, we do not have result comparisons with “state-of-the-art” methods. As with most unsupervised methods, there is no ground truth for evaluation to draw a conclusion on which method is best. Rather, we offer an additional method in analyzing group differences between healthy and diseased individuals based on neurodegeneration. As mentioned above, more work needs to be done to clearly and fully interpret the meaning of the network-structure pairs and triples.

In the fMRI community, network models enabled the transition from focusing on individual hotspots involved in specific tasks, to modeling brain wide activity in resting states Lee et al. (2013). These models helped to launch massive undertakings such as the human connectome project Elam et al. (2021). We believe that the incorporation of network models such as the one presented here in analysis of structural data may have a similar impact on the field of brain morphometry: transitioning from analysis of single regions in a mass univariate approach, to analyzing brainwide patterns of tissue loss. With the recent development of potential drugs to treat Alzheimer’s disease, methods like ours that can quantify complex patterns of neurodegeneration will be essential for noninvasively identifying patients who would benefit the most. We believe that the development and dissemination of algorithms such as this one, through open source code and well documented examples, will play an important role in helping to reduce the burden of Alzheimer’s disease on our aging population.

Information Sharing Statement

Our code is publicly available on Github, details on access in Sect. 2.8. The sample data can be downloaded from ADNI after accepting their data use agreement.