Background

The investigation of sex differences has a long tradition in neuropsychology and cognitive neuroscience. This pursuit is important given different vulnerabilities by sex in incidence, symptomatology, and progression of many neurological and psychiatric diseases. Chief among these is late onset Alzheimer’s disease dementia (ADD) [1,2,3,4,5,6,7]. Almost 70% of ADD patients are women [8], but the reasons for these sex disparities remain mostly unknown.

One hypothesis of sex disparities in ADD is that women’s brains are more vulnerable to ADD pathology. This is supported by longitudinal studies showing sex differences in volumetric change over time [9, 10], memory trajectories [11], and tau accumulation rates [12]. At the same time, work by our group and others has shown that cognitively normal women have a verbal memory advantage over men [13, 14] that persists in the presence of brain amyloid [1, 2, 4] and mild to moderate ADD pathological burden such as volume loss and brain hypometabolism [14,15,16]. It is unclear whether sex differences in brain structure or atrophy over time account for this pattern of women’s advantage followed by accelerated decline in memory between the MCI and the ADD stages.

There are few studies exploring sex differences in cortical thickness (CT) in the field of ADD [17, 18], and our knowledge on the role of the ways sex and CT might interact to inform understanding of cognitive changes in ADD is limited. Two concepts that can contribute to this pursuit are brain and cognitive reserve in aging. Brain reserve is defined as structural brain characteristics that protect, resist, or compensate against expression of pathology. Cognitive reserve (CR) refers to features of an individual such as years of education [19], which might provide means to better adapt and maintain cognitive performance despite early pathological brain changes [20].

Women in contemporary ADD studies often present with fewer years of education than men, suggesting general CR measures do not explain sex differences. From this, the question remains as to whether women have a more specific CR in memory and whether there are measurable neural underpinnings—or brain reserve—related to the memory reserve. Findings by our group have provided mixed results for brain reserve in the form of hippocampal volume [1, 3] and more consistent support for resting state functional connectivity differences [4, 5].

Recent classification approaches provide new ways to explore sex differences in brain atrophy using machine learning algorithms [21]. Statistical learning enables researchers to explore statistical patterns to build predictive systems and generalizable models using various features, for example, classifying individual subjects using surface area [22] or volumetric features [23] derived from structural magnetic resonance imaging (MRI).

The current investigation sought to determine sex differences in CT and memory in NC, MCI, and ADD subjects, in a large sample of men and women from the ADNI database. We hypothesized that CT declines over time in brain regions involved in memory and known to be impacted by ADD would show sex differences and that memory trajectories would differ by sex in a parallel fashion. We expected CT in brain regions implicated in the ADD pathology to relate to memory scores, particularly in women. We used machine learning as a complementary method to explore multivariate CT differences between women and men. We hypothesized that machine learning would differentiate the brains of men and women, with classification accuracies reducing as ADD progresses. We expected our hypotheses to apply best in individuals with confirmed brain amyloid aggregation, defined by amyloid Positron Emission Tomography (PET).

Materials and methods

Participants and data collection

A large sample of subjects from the ADNI database (http://www.adni-info.org) were included. Briefly, the ADNI is a multicenter, multi-phase study assessing clinical, imaging, and genetic biomarkers in AD. We assessed all 838 older participants with normal cognition (NC) and subjects with mild cognitive impairment (MCI) or ADD from the ADNI2/GO database, who had available clinical diagnosis, genetic information, verbal memory assessments, 3T structural MRI, and florbetapir amyloid PET imaging at the same visit. ADD subjects with a negative amyloid status were excluded from the study due to the potential presence of pathologies other than ADD. Therefore, we included a final sample of 265 NC subjects, 442 MCI subjects, and 117 ADD subjects (total 824 subjects). NC and MCI subjects were further divided into amyloid negative (NCAβ- (N=177) and MCIAβ- (N=191)) and amyloid positive (NCAβ+ (N=88) and MCIAβ+ (N=251)) groups; all ADD subjects had positive amyloid status (ADDAβ+ (N=117)).

Demographics and ApoE genotype

Subjects’ demographics including sex, age, years of education (YOE), handedness, and ApoE genotype were downloaded from the ADNI website. Two categorical variables were created to code individuals’ ApoE-ε2 (ε2) and ε4 carrier status, respectively, with the presence or absence of at least 1 copy of the ApoE-ε2 or ε4 allele.

Verbal memory assessment

Rey Auditory Verbal Learning Test (RAVLT) from the same visit as clinical diagnosis was used to assess verbal learning and memory performances. Both total learning score across five learning trials (RAVLT-Immediate) and delayed free recall scores (RAVLT-Delayed) were used.

Structural MRI processing

Fully processed CT data for all brain regions, at the same visit as clinical diagnosis, were downloaded from ADNI, with methods described in the UCSF FreeSurfer Methods Quality Control document (www.adni.loni.usc.edu). Briefly, subject-specific T1-weighted MRI images at the corresponding visit were preprocessed by the Mayo Clinic, and FreeSurfer (version 5.1 http://surfer.nmr.mgh.harvard.edu/) was then employed to generate a subject-specific anatomical labeling. Cortical thickness measures of 68 cortical regions [24] were finally obtained. Details of these 68 regions are listed in Supplement S1.

Florbetapir PET image processing

Subjects’ amyloid status were determined from the PET florbetapir images at the same visit as clinical diagnosis. The summarized standardized uptake value ratio (SUVR) normalized to the cerebellum were obtained from the ADNI database and amyloid positivity status was defined as the global SUVR greater than 1.1.

Statistical analysis

All statistical analysis was conducted in matrix laboratory (MATLAB) 2018b (https://www.mathworks.com/).

Demographic comparisons

In NC, MCI, and ADD groups, differences between men and women were assessed for demographic variables including age, YOE, ApoE status, handedness, and amyloid status. A chi-square test was used to examine categorical variables (ε2 and ε4 carrier status, amyloid status, and handedness) and a two-sample t test was used to determine differences among continuous variables (ages and YOE).

Analysis of covariance (ANCOVA): sex specific cognition and brain structure changes

For all 824 subjects, to investigate sex-differences of CT measures in NC, MCI, and ADD stages, we applied the following ANCOVA model (Eq. [1]) to examine whether CT measures were associated with sex, disease diagnosis (DX), or the interaction between sex and DX, with age, YOE, handedness,ε2 carrier status, ε4 carrier status, and total intracranial volume (TIV) as covariates:

$$\mathrm{CT}\ \mathrm{measures}\sim 1+\mathrm{Sex}+\mathrm{DX}+\mathrm{Sex}\times \mathrm{DX}+\mathrm{Age}+\mathrm{YOE}+\mathrm{Handedness}+\varepsilon 2+\varepsilon 4+\mathrm{TIV}$$
(1)

Since we were specifically interested in the effects of sex, DX, and the interaction between sex and DX in each of the 68 cortical thickness measures, uncorrected p values were corrected for 68x3 comparisons using the false discovery rate (FDR) method. The same ANCOVA model without TIV as a covariate was applied to RAVLT-Immediate and RAVLT-Delayed scores to investigate if there were significant sex, DX, or interaction effects in memory scores. Every CT measure and memory score was first normalized to z-score in all 824 subjects before input to the ANCOVA model.

Moderation analysis: sex-specific thickness-cognition associations

For regions demonstrating significant interaction effects of sex and DX in the ANCOVA analysis, we were further interested in whether sex would moderate the cognition-thickness relationships across all subjects from NC to ADD. To this end, a moderation regression analysis was performed on all subjects for RAVLT-immediate score, with each significant regional thickness measure in ANCOVA as the independent variable, sex as the moderator, and DX, age, YOE, handedness, ε2, and ε4 carrier status as covariates. Moderation regression analyses were followed by correlation analyses to evaluate cognition-thickness associations within men and women, respectively, partialling out effects of the same set of covariates. The same moderation analysis was repeated in Aβ+ and Aβ− subjects separately to further delineate the sex moderation effect on cognition-thickness relationships in subjects belonging to our different diagnostic groups.

Sex classification in each diagnostic group using cortical thickness features

To further jointly evaluate multivariate sex differences in whole-brain CT measures from NC to ADD, 68 CT measures were used as features to classify men from women in NCAβ−, NCAβ+, MCIAβ−, MCIAβ+, and ADDAβ+ groups, respectively. Briefly, in each diagnostic group, a linear support vector machine (SVM) classifier was used to evaluate the classification performance with a leave-one-out cross-validation strategy. More specifically, in each diagnostic group, CT measures were first adjusted for covariate effects of age, YOE, handedness, ε2, and ε4 carrier status, and a linear SVM classifier was then trained on N-1 subjects with adjusted CT measures as features and then tested on the remaining 1 subject. In this strategy, every subject was being left out as the testing subject once, and the inverse probability weighting was applied to offset sex-imbalances. The test results for every subject were finally compared with the true sex labels. Sensitivity, specificity, accuracy, and area under the receiver operating characteristic (ROC) curves were used to evaluate the classifier performance.

Results

Demographics

Demographics and sex differences of demographic variables in NC, MCI, and ADD subjects are summarized in Table 1.

Table 1 Subjects’ demographics of 824 subjects

Handedness and ApoE genotypes are matched between men and women in NC, MCI, and ADD subjects (Table 1). Overall, men are older than women in NC (p=0.01), and MCI (p=0.04) and ADD (p=0.02) subjects, with an average age difference of 2.04 years in NC subjects, 1.53 years in MCI subjects and 3.44 years in ADD subjects. Furthermore, men are more highly educated than women, as years of education are significantly higher in men than women in NC (p<0.001), MCI (p<0.001), and ADD (p=0.006) subjects.

ANCOVA analysis: sex specific cognition and brain structure changes along diagnostic groups

Cognition

Figure 1 plots the sex-specific changing trajectories of RAVLT-Immediate (left) and RAVLT-Delayed (right) scores along individuals of our diagnostic groups. Marginal means of the interaction effect in the ANCOVA model are plotted. Significant sex (p<0.001 and p<0.001) and DX (p<0.001 and p<0.001) effects in ANCOVA model are observed for RAVLT-Immediate and RAVLT-Delayed scores, respectively. In addition, a statistically significant interaction effect is observed for RAVLT-Delayed score (p=0.01) and a trend-level interaction effect is found for RAVLT-immediate score (p=0.058). Overall, women have significantly higher scores than men; sex-differences in both scores are evident in NCAβ−, NCAβ+, MCIAβ−, and MCIAβ+ groups, but these differences diminish in ADDAβ+ group.

Fig. 1
figure 1

Memory scores: sex-specific changing trajectories of RAVLT scores across stages. Significant sex (p<0.001) and DX (p<0.001) effects are observed for RAVLT-Immediate (Left) and RAVLT-Delayed scores (right), respectively. Statistically significant interaction effect is observed for RAVLT-Delayed score (p=0.01, Right) and trend-level interaction effect is found for RAVLT-immediate score (p=0.058, left). Estimated marginal means of the interaction effect in ANCOVA are plotted for women (red) and men (green)

Structural brain measures: cortical thickness

Table 2 summarizes the sex, DX, and interaction effects in the ANCOVA model for all 68 CT measures in our sample, with uncorrected p values<0.05 listed and significant p values after false discovery rate (FDR) correction (pcorr<0.05) highlighted in bold. Specifically, out of 68 brain regions, 55 regions demonstrate significant DX effects (pcorr<0.05), and 14 regions demonstrate significant sex effects (pcorr<0.05) in CT measures. For DX effect, significant declines across our diagnostic stages are evident in all regional CT measures, whereas for sex effects, women demonstrate greater CT measures in all 14 regions. More importantly, significant (pcorr<0.05) interaction effects of CT measures are found in 9 brain regions, including bilateral cingulate cortex, bilateral temporal regions, and left parietal regions including precuneus and inferior parietal cortex (4th and 8th columns in Table 2). Figure 2 plots the different trajectories between men and women of these 9 regional thickness measures along the ADD stages. Marginal means of the interaction effect in the ANCOVA model are plotted for each thickness feature.

Table 2 Cortical thickness measures ANCOVA results: p values of significant diagnosis (DX), sex, and interaction effects (uncorrected p<0.05) of cortical thickness measures in ANCOVA. Significant effects after FDR corrections are highlighted in bold. Out of 68 brain regions, 55 show significant DX effects, 14 show significant sex effects, and 9 show significant interaction effects in cortical thickness measures after FDR correction
Fig. 2
figure 2

Cortical thickness measures: sex-specific changing trajectories of cortical thickness measures with significant interaction effects (FDR corrected p<0.05) between sex and diagnosis along NC, MCI, and AD stages in ANCOVA. Estimated marginal means of the interaction effect in ANCOVA are plotted for women (red) and men (green), respectively

Sex moderates cognition-thickness associations across diagnosis

Table 3 (A, left) summarizes the moderation analyses results for the 9 regions with significant interaction effects in ANCOVA. Significant sex moderation effects are observed between RAVLT-immediate scores and CT measures of right isthmus-cingulate (p=0.002) for all subjects across DX. As detailed in Fig. 3 (left) and Table 3 (A, right), in all subjects, partial correlation analyses reveal that increased CT of right isthmus-cingulate is associated with better verbal learning in women (Pearson’s correlation (r) = 0.23, p<0.001), but not in men (r = 0.03). When we stratify subjects based on Aβ status, we found that this significant cognition-thickness association is driven by Aβ+ subjects (i.e., subjects along the ADD continuum), with partial correlation analyses again showing significant positive correlations between these two measures in women only (Fig. 3 (center) and Table 3 (B, right)). These cognition-thickness relationships are not observed for Aβ− subjects (Fig. 3 (right) and Table 3 (C, right)).

Table 3 Sex moderate relationships of cognition (RAVLT immediate learning score) with brain cortical thickness measures in all subjects (A), amyloid positive subjects (B), and amyloid negative subjects (C). Nine regions with significant interaction effect in ANCOVA analyses are selected. Significant p values (p≤0.05) of the moderation analyses (left) and post hoc partial correlation (r) analyses (right) are listed
Fig. 3
figure 3

Relationships of verbal learning with regional thickness measures by sex in all subjects (left), amyloid positive subjects only (middle), and amyloid negative subjects only (right). Out of the 9 regions showing significant sex-dependent changing trajectories in ANCOVA, 4 regions are also showing significant sex moderation effects on associations between RAVLT-immediate learning score and regional thickness measure in all, or amyloid positive subjects and are plotting here for women (red) and men (green), separately. P values for significant sex-moderation effects are listed in the insets. Significant (p≤0.05) post hoc partial correlations (r) in women or men are also listed in the insets and represented by solid lines

In Aβ+ subjects only, additional significant sex-moderation effects are found in right middle-temporal (p=0.02), left precuneus (p=0.05), and left superior temporal regions (p=0.008). A similar pattern of significantly stronger positive cognition-thickness associations in women than men were observed for all three regions in these subjects (Fig. 3 (center)), as revealed by the partial correlation values listed in Table 3 (B, right).

Sex classification results using machine learning technique

Figure 4 shows the sensitivity, specificity, accuracy, and area under the ROC curves of classification between men and women using CT measures as features in each diagnostic group. In NC subjects, CT measures can classify men from women at an accuracy of 75.00% and 56.50% for NCAβ+ and NCAβ− subjects, respectively. These accuracies drop to 65.34% and 56.02% for MCIAβ+ and MCIAβ− subjects and to 59.83% for ADDAβ+ subjects. To determine the chance levels of these accuracies, we performed 1000-run permutation tests (supplementary S2). The 95th percentiles of the permutation accuracies are 56.44%, 59.72%, 56.31%, 55.47%, and 57.76% in NCAβ−, NCAβ+, MCIAβ−, MCIAβ+, and ADDAβ+, respectively. Thus, the observed classification accuracies in NCAβ+, MCIAβ+, and ADAβ+ are statistically significant at p<0.05 level relative to these chance levels.

Fig. 4
figure 4

Support vector machine performance of sex classification in each diagnostic group with 68 cortical thickness measures as input features using 824 subjects. Sensitivity, specificity, classification accuracy, and area under the ROC curves are shown. Abbreviations: AUC area under the ROC curves

Discussion

As hypothesized, this study showed sex differences in CT and memory performance in NC, MCI, and ADD individuals. Women showed greater CT in several AD-relevant brain regions as well as more stable CT and memory performances, compared to men, from NC to MCI. However, women showed greater cross-sectional reduction in CT and memory from MCI to ADD. Where CT differed by sex, women, but not men, evidenced an association between greater CT in selected regions and better verbal learning, and this finding was particularly notable when analyses were limited to Aβ+ individuals (i.e., those on the AD continuum).

Regions where women showed different CT trajectories, compared to men, included the precuneus, the inferior parietal cortex, and isthmus-cingulate, located at the posterior end of the cingulate cortex, confirming the results from Sangha et al. (2021) [17] obtained in a larger sample (ADNI and AIBL datasets), albeit in that study the authors did not explore the cognitive relationships.

The precuneus is a complex area involved in recollection and episodic memory retrieval [25] and one of the first regions to be affected by Aβ deposition [26], an important observation since the post hoc plots within the ADD continuum groups (NCAβ+, MCIAβ+, and ADD subjects only) showed similar trajectories as plots for all NC, MCI, and ADD individuals (Supplement S3). This area lies posterior and superior to the posterior cingulate cortex. The cingulate cortex plays a fundamental role in many cognitive, motor, and emotional functions [27], and its posterior part terminates at the isthmus of the cingulate gyrus. Although the specific function of the isthmus-cingulate cortex is not well understood, there is evidence of its involvement in episodic memory [28], other cognitive functions, and cortical anatomy changes by ADD pathology [29].

Precuneus, isthmus-cingulate and the inferior parietal cortex are all part of the posterior Default Mode Network (DMN), which supports autobiographical memory, future planning, records of bodily sensations, self-reported mental processes, and monitoring psychological states [30,31,32]. This network seems to play a key role in the vulnerability to ADD pathology [30, 33]. Our results indicate potential sex-dependent DMN-region differences along the ADD stages, with women showing a pattern of maintenance of CT from NC to MCI and steeper loss from MCI to ADD.

As with CT measures, women showed significantly higher learning and memory scores than men, with more stability from NC to MCI and greater cross-sectional decline from MCI to ADD. Our results are consistent with women showing ADD-relevant memory and brain-based reserve, which impact early trajectory when in place and later trajectory when lost. This pattern (i.e., early resilience, followed by steeper decline) is consistent with that found in studies of individuals with higher cognitive reserve based on education levels [34,35,36]. At the same time, education does not explain the current effects, and in fact, women in our sample have lower education than men, yet still evidence a reserve-like pattern. This provides further support for the idea of domain-specific cognitive reserve, specifically verbal memory reserve, in women.

The cross-sectional characteristics of potential CT-based brain reserve mirror the pattern seen in memory findings, and women but not men show a link between greater CT and better memory. Our CT reserve effect seems to act at the stage of NC and MCI, whereas during the advanced stage of neurodegeneration (ADD), women’s brain, and cognitive decline is greater compared to men. Overall, these findings lend support to the idea that regional CT maintenance may play a role in early memory resilience in women and are consistent with recent structural [17] and functional findings [5].

CT can classify men from women with descending accuracies in NC, MCI, and ADD individuals and that this approach is most successful in NCAβ+ individuals. This finding supports the hypothesis that early structural brain differences may contribute to different early trajectories of decline in men and women with ADD. It suggests that CT sex differences may be less relevant in normal aging as compared to the MCI and ADD, as well as less relevant with progression to ADD. This finding indicates that it may be important to examine differences between Aβ+ and Aβ− women and to consider these differences when designing early interventions or clinical trials for women.

Limitation

Our study has limitations, for example our interpretation of reserve does not consider other contributors, such as occupational status, or other physical and cognitively stimulating activities, and does not speak to the reasons that women may gain memory or related brain reserve. In addition, the machine learning approach is unable to explore individual differences that may play a role in ADD pathogenesis. Finally, the ADNI sample is highly educated and predominantly White, and work in other samples will be important to ensure generalizability. On the other hand, our study has strengths in our consideration of multimodal sex differences in a relatively large sample and with consideration of the presence of brain amyloid.

Conclusion

In conclusion, we found that women show more stable memory and CT than men from NC to MCI, and steeper thickness declines from MCI to ADD in regions including the precuneus, temporal lobe, and cingulate gyrus, areas that play a key role in memory and are among the most affected by the ADD neuropathology. Using CT as structural measure, our machine learning approach was able to classify men from women with good accuracy, especially in NC Aβ+ subjects, losing accuracy with progressive cognitive impairment. Future structural and functional MRI studies should consider sex as a factor of interest rather than a covariate and should consider domain-specific cognitive reserve and early CT-based brain reserve in women in ADD.