Introduction

A global deficit in cognitive function is characteristic of schizophrenia1,2 with evidence indicating greater impairment for specific cognitive domains such as executive function, attention, episodic memory, and motor speed, compared to others3,4. Cognitive impairment is a major determinant of functional outcomes in schizophrenia5 yet most existing therapies for schizophrenia do not address cognitive symptoms6. Thus, there has been increasing interest in identifying the biological basis of cognitive dysfunction in schizophrenia in order to facilitate the identification of novel therapeutic targets6.

Both schizophrenia and cognitive ability are heritable, with estimates from twin and family studies ranging from 0.6 to 0.8 for schizophrenia7,8 and 0.5–0.8 for general cognitive ability9,10. Previous research has demonstrated impaired cognitive performance in first degree relatives of people with schizophrenia11,12, providing further evidence that there is a genetic contribution to cognitive dysfunction in the disorder. Molecular genetic studies have identified rare and common genetic variants that are associated with both schizophrenia and cognitive function, implicating genes involved in neurodevelopment, synaptic integrity, and neurotransmisison13,14. However, findings from studies investigating the association between genetic liability for schizophrenia and cognitive impairment in individuals with schizophrenia have been inconsistent. Some studies have found that a polygenic risk score (PRS) for schizophrenia is negatively associated with cognitive ability15,16 while others have found no association17,18,19.

One possible explanation for the inconsistent results is that most previous studies have focused on the association between the genetic determinants of schizophrenia and a single measure of general cognitive ability, such as Spearman’s g factor. Given that performance across cognitive domains is known to be differentially affected by schizophrenia, the use of a broad measure of cognitive ability may prevent a more detailed examination of the genetic contributions to cognitive impairment in the disorder. Additionally, schizophrenia is a clinically heterogenous disorder20 and there is evidence to suggest that unique biological processes contribute to the various schizophrenia symptom dimensions21,22. While it is plausible that the degree of genetic overlap with cognitive function will differ across the schizophrenia symptom dimensions, few genetic studies have taken a dimensional approach to assessing the relationship.

Here, we take a nuanced approach to investigating the shared genetic determinants between cognitive function and schizophrenia. We apply Genomic Structural Equation Modelling (SEM) to derive latent factors corresponding to broad dimensions of cognitive function from 12 cognitive traits measured in the UK Biobank (UKB). We use novel statistical approaches to identify genetic variants shared between the dimensions of cognitive function and schizophrenia. Lastly, we use data from the Norwegian Thematically Organised Psychosis (TOP) Study23 to explore whether the phenotypic distinction between cognitive dysfunction and other schizophrenia symptoms may be explained by differences in the underlying genetic architecture.

Results

Genome-wide association analyses of UKB cognitive traits

We conducted univariate GWASs of 12 cognitive traits from the UKB. The number of genome-wide significant loci identified for each cognitive trait ranged from 0 to 87 with the highest number of genome-wide significant loci observed for mean reaction time and no significant loci for Paired Associate Learning (PAL) and UKB Trail Making Test—Part B (TMT-B) (Fig. 1; Supplementary Table 1). The LD score regression intercept for all univariate genome-wide association studies (GWAS) was approximately 1 (range 0.99–1.02), consistent with minimal inflation of the test statistic due to population stratification (Supplementary Table 1).

Figure 1
figure 1

Overview of univariate GWAS for UKB cognitive traits. Bar chart illustrating the sample size, number of genome-wide significant loci, and SNP-based heritability (h2SNP) estimates for univariate GWAS of 12 cognitive traits from the UK Biobank. All h2SNP estimates are significant. Traits are colour-coded according to the three-factor model derived using Genomic Structural Equation Modelling. Pairs Match Pairs matching, TMT A Trail making test-part A, TMT B Trail making test-part B, SDS symbol digit substitution, Tower tower rearranging, Matrix matrix pattern completion, PAL paired associate learning, Fluid Int fluid intelligence, Num Mem numeric memory, Pro Mem prospective memory, Mean RT mean reaction time, RTV reaction time variability.

Estimations of SNP-based heritability and genetic correlations

The SNP-based heritability (h2SNP) for each cognitive trait and the genetic correlations between cognitive traits were estimated using linkage disequilibrium score regression (LDSC). Estimates for h2SNP ranged from 3 to 22% (Fig. 1; Supplementary Table 1). We observed positive genetic correlations between all cognitive traits (range 0.08–0.89, mean = 0.49, SD = 0.2; Fig. 2A; Supplementary Table 2). All genetic correlations were significant (α = 0.05/66 pairs of traits; p < 7.58 × 10–4) with the exception of the correlations between mean reaction time and PAL (rg = 0.11, SE = 0.04, p = 5.3 × 10–3), mean reaction time and numeric memory (rg = 0.07, SE = 0.03, p = 1.74 × 10–2), and UKB Trail Making Test—Part A (TMT-A) and PAL (rg = 0.25, SE = 0.09, p = 6.2 × 10–3). We applied a hierarchal clustering algorithm to the genetic correlation matrix and identified three distinct clusters of cognitive phenotypes. The first cluster included Pairs Matching, TMT-A, TMT-B, Symbol Digit Substitution and Tower Rearranging, the second cluster included Fluid Intelligence, Numeric Memory, Prospective Memory, Matrix Reasoning, and PAL, and the third cluster comprised of mean reaction time and reaction time variability (Fig. 2B).

Figure 2
figure 2

Multivariate structure of 12 cognitive traits from the UK Biobank. (A) Heatmap showing genetic correlations estimated using LDSC. A hierarchal clustering algorithm applied to the genetic correlation matrix identified three distinct clusters of cognitive phenotypes. The first cluster included Pairs Match, TMT-A, TMT-B, SDS and Tower; the second cluster included Fluid Int, Num Mem, Pro Mem, Matrix, and PAL, and the third cluster comprised of Mean RT and RTV. (B) Path diagram showing standardized results from the correlated three-factor model. Circles represent latent variables that are inferred from the data. Single-headed arrows depict regression relationships with the arrows pointing from the independent variables to the dependent variables. Two-headed arrows represent covariance relationships between variables or the residual variance of a variable if the arrow connects the variable to itself. Standard errors of the estimates are in parenthesis. Pairs Match Pairs matching, TMT A trail making test-part A, TMT B trail making test-part B, SDS symbol digit substitution, Tower tower rearranging, Matrix matrix pattern completion, PAL paired associate learning, Fluid Int fluid intelligence, Num Mem numeric memory, Pro Mem prospective memory, Mean RT mean reaction time, RTV reaction time variability.

Genomic structural equation modelling

We modeled the genetic covariance matrix for the 12 UKB cognitive traits using Genomic SEM. First, we fit a single common factor model in which the single latent factor represented a genetic g factor. Model fit was suboptimal for the single common factor model (chi-square, χ2(54) = 2106.75, AIC = 2154.75, CFI = 0.71, SRMR = 0.12; Supplementary Table 3; Supplementary Fig. 1). Next, we explored whether a correlated two- or three-factor model closely approximated the observed genetic covariance matrix. Consistent with the results from the hierarchal clustering of the genetic correlation matrix, we found that a three correlated factors model fit the data best (chi-square, χ2(49) = 320.72, AIC = 378.72, CFI = 0.96, SRMR = 0.07; Fig. 2B; Supplementary Table 3). In the three-factor model, factor 1 and factor 2 exhibit the highest genetic correlation among the cognitive factors (rg = 0.65, SE = 0.02, p = 2.59 × 10–215) and these factors may be conceptualized as capturing cognitive traits related to the broad cognitive ability, fluid reasoning24. Factor 1 is primarily defined by cognitive phenotypes that relate to visuospatial aspects of fluid reasoning and includes Pairs Matching, TMT-A, TMT-B, Symbol Digit Substitution and Tower Rearranging. Factor 2 largely captures measures that assess the verbal analytic component of fluid reasoning and is defined by Fluid Intelligence, Numeric Memory, Prospective Memory, Matrix Reasoning, and PAL. Factor 3 is characterized by cognitive phenotypes related to the broad cognitive ability, “decision/reaction time/speed”24. The cognitive measures, mean reaction time and reaction time variability load on the third factor, which is less correlated with factor 1 (rg = 0.37, SE = 0.03, p = 2.75 × 10–42) and factor 2 (rg = 0.29, SE = 0.02, p = 3.44 × 10–37).

Multivariate GWAS

We conducted multivariate GWASs of the three latent cognitive factors using Genomic SEM. The effective sample size ranged from 160,729 for factor 2 (verbal analytic reasoning) to 637,271 for factor 1 (visuospatial processing) (Supplementary Table 4). Substantial inflation of the test statistic was observed for all latent cognitive factors (Supplementary Fig. 2) however LD score regression intercepts were 1, suggesting that test-statistic inflation reflects high polygenicity and not other sources of bias (Supplementary Table 4).

Estimation of genetic overlap between cognitive factors and schizophrenia

We found a significant negative genetic correlation between all three latent cognitive factors and schizophrenia using LDSC25,26 (Supplementary Table 5). Bivariate MiXeR demonstrated substantial polygenic overlap between each latent cognitive factor and schizophrenia, beyond that captured by estimates of genetic correlation (Supplementary Fig. 3). Of the 9600 variants predicted to influence schizophrenia, almost all variants were also predicted to influence the latent cognitive factors. Notably, factor 1 (visuospatial processing) demonstrated the greatest negative global genetic correlation with schizophrenia (rg = − 0.38, SE = 0.025, p = 9 × 10–52) and the greatest percentage of shared variants (63%) with a discordant effect between a latent cognitive factor and schizophrenia (Supplementary Fig. 3).

We employed Local Analysis of [co]Variant Association (LAVA)27 to explore regional patterns of genetic correlation between the latent cognitive factors and schizophrenia. The number of genetic regions that were significantly heritable (p < 2.00 × 10–5) for schizophrenia and a latent cognitive factor ranged from 149 to 170 (Supplementary Table 6). Among the significantly heritable regions, we found the number of regions with a significant genetic correlation (α = 0.05/number of significantly heritable regions for both traits) between schizophrenia and a latent cognitive factor ranged from 6 to 21. Most significant local genetic correlations were negative (Supplementary Fig. 4; Supplementary Table 7) with the exception of a positive correlation between schizophrenia and factor 2 (verbal analytic reasoning) at two loci [chr 3: 47588462–50387742, chr 12: 77800464–79315178] (Supplementary Table 7).

Lastly, we conducted conjFDR analysis28,29 to identify individual SNPs that were jointly associated with each latent cognitive factor-schizophrenia pair. As shown in Fig. 3, schizophrenia shares loci with cognitive factor 1 (N = 93), cognitive factor 2 (N = 267), and cognitive factor 3 (N = 175). Consistent with the lowest degree of polygenic overlap demonstrated by bivariate MiXeR analysis, factor 1 shared the lowest number of loci with schizophrenia. Among the shared associations with schizophrenia, 46 were unique to cognitive factor 1, 189 were unique to factor 2, and 113 were unique to factor 3.

Figure 3
figure 3

Discovery of loci jointly associated with each latent cognitive factor and schizophrenia. Manhattan plots showing the − log10 transformed conjFDR values for a joint association with a latent cognitive factor and schizophrenia (SCZ) for each SNP (y-axis) plotted against chromosomal position (x-axis). The dotted line indicates the significance threshold (conjFDR < 0.05). Lead SNPs are outlined in black and independent significant SNPs are represented by larger circles.

Functional annotation of identified loci

Genes were mapped to unique significant loci for each latent cognitive factor-schizophrenia pair using data provided by Ensembl30 (Supplementary Tables 810). For the unique significant loci shared between cognitive factor 1 (visuospatial processing) and schizophrenia, gene-set analysis for cellular components demonstrated significant results for the synapse (FDR = 1.39 × 10–2), the synaptic membrane (FDR = 3.05 × 10–2), the postsynaptic membrane (FDR = 3.05 × 10–2), neurons (FDR = 3.05 × 10–2), and neuron projections (FDR = 4.80 × 10–2) and no significant gene-sets for biological processes (Supplementary Table 11). For unique significant loci associated with cognitive factor 2 (verbal analytic reasoning) and schizophrenia, gene-set analysis revealed significant results for genes involved in axon guidance (FDR = 2.02 × 10–2) and cell adhesion via plasma adhesion molecules (FDR = 1.49 × 10–3) (Supplementary Table 12). Lastly, the genes mapped to unique significant loci associated with latent cognitive factor 3 (decision/reaction time) and schizophrenia were significantly enriched for two gene-sets involved in neuronal development: regulation of neuron differentiation (FDR = 4.22 × 10–2) and regulation of neuron projection development (FDR = 4.22 × 10–2) (Supplementary Table 13).

Polygenic prediction of schizophrenia symptom dimensions

We created polygenic scores (PGS) for the three latent cognitive factors and tested the ability of each cognitive factor-PGS to predict schizophrenia and schizophrenia symptom severity, as assessed by the Positive and Negative Syndrome Scale (PANSS)31, in individuals from the TOP study. There was a significant association between schizophrenia diagnosis and the PGS for latent cognitive factor 1, visuospatial processing, (R2 = 0.026, p = 2.48 × 10–6) and latent cognitive factor 2, verbal analytic reasoning (R2 = 0.011, p = 1.80 × 10–3) (Fig. 4). There were no significant associations found between any of the PGS for the latent cognitive factors and schizophrenia symptom dimensions (Fig. 4; Supplementary Table 14).

Figure 4
figure 4

Association of polygenic scores for each latent cognitive factor with schizophrenia and three schizophrenia symptom dimensions. Associations of polygenic scores (PGS) for each latent cognitive factor with schizophrenia, and schizophrenia symptom dimensions (negative, positive, and general psychopathology) in individuals from the TOP study. Point estimates for beta values are shown with 95% confidence intervals. The dotted line represents a null model.

Discussion

In this study, we explored the common genetic determinants shared between schizophrenia and cognitive function using three latent factors that captured the genetic covariance structure of 12 cognitive measures from the UKB. We found evidence of substantial polygenic overlap between schizophrenia and the genetically determined latent cognitive factors. All three latent cognitive factors exhibited a negative genetic correlation with schizophrenia that was largely consistent for both global and local patterns of genetic correlation. We identified loci jointly associated with schizophrenia and the latent cognitive factors and biological annotation of the shared loci implicated genes involved in the development and functioning of the central nervous system. Additionally, we demonstrated that PGS for the latent cognitive factors were not predictive of schizophrenia symptoms, suggesting distinctions in the underlying genetic architecture of cognitive function and phenotypic dimensions in schizophrenia.

We applied Genomic SEM to GWAS summary statistics for 12 cognitive traits in the UKB and found that a three-factor model best explained the genetic correlations between the cognitive traits. This is consistent with findings from a previous study that applied structural equation modeling to phenotypic data from the UKB cognitive assessments and found that a three-factor solution fit the data best32. The three cognitive factors identified in the current study may be characterized using the framework provided by the Cattell–Horn–Carrol (CHC) theory of human cognitive abilities, which proposes a three-stratum model of human intelligence24,33. Consistent with the relatively high genetic correlation between factor 1 and factor 2, the first two latent cognitive factors appear to capture the same broad cognitive ability, fluid reasoning (Gf). However, factor 1 is largely defined by cognitive traits that capture visuospatial processing whereas factor 2 is defined by cognitive tests that measure verbal analytic reasoning. Factor 3 is less genetically correlated with the first two factors and measures a distinct cognitive ability, decision/reaction time/speed (Gt). The convergence of the genetic and phenotypic factor structures for the UKB cognitive traits suggests that the existing phenotypic structure is rooted in the genetic underpinnings of the traits.

Consistent with the literature, all three latent cognitive factors demonstrated a significant negative genetic correlation with the diagnosis of schizophrenia with the greatest negative correlation observed between the visuospatial factor (factor 1) and schizophrenia. The genetic correlation between the visuospatial factor and schizophrenia is of greater magnitude than that reported for general cognitive ability and schizophrenia (rg = − 0.23)34. This finding suggests that we may be missing unique patterns of genetic association between schizophrenia and cognitive abilities when using a composite measure of cognitive function. Patterns of local genetic correlation between the latent cognitive factors and schizophrenia were generally consistent with global genetic correlations. These findings indicate that, in general, genetic liability for schizophrenia has a negative effect on cognitive abilities and that the phenotypic relationship between schizophrenia and cognitive impairment has a genetic basis. LAVA analysis revealed that most significantly heritable regions demonstrated a negative association between the cognitive factors and schizophrenia and that most significant regional genetic correlations were negative. One of the regions that showed a positive genetic correlation between the verbal analytic factor and schizophrenia was located on chromosome 3 (Chr3p21). This region is enriched for genes that have been associated with intelligence35 and general cognitive ability34 in previous GWAS. Further research is required to explore the relationship between genes in this region and cognitive function in schizophrenia.

Results from bivariate MiXeR and conjFDR analysis converged with each cognitive factor demonstrating substantial polygenic overlap with schizophrenia. Notably, the visuospatial factor demonstrated the greatest genome-wide genetic correlation with schizophrenia despite conjFDR showing that this factor shared the lowest number of loci with schizophrenia. These findings demonstrate that a large genetic correlation does not necessarily correspond to the greatest overlap in genetic architecture. Instead, these findings imply that the variants that are shared between the visuospatial factor and schizophrenia demonstrate a more consistent direction of effect for the two traits than those shared between the other latent cognitive factors and schizophrenia. The conjFDR analysis showed that the majority of loci found to be jointly associated with each latent cognitive factor-schizophrenia pair were unique to the pair. Given the almost complete overlap between the genetic determinants of each cognitive factor and schizophrenia demonstrated by bivariate MiXeR, the difference in significant loci shared between each cognitive factor and schizophrenia is unlikely to reflect a unique set of variants associated with each cognitive factor. Instead, this result indicates that despite all cognitive factors sharing most causal variants, the magnitude and potentially direction of effects of these shared variants vary between the latent cognitive factors.

The findings from the conjFDR analysis extend current knowledge of the loci shared between schizophrenia and cognitive abilities. The use of well-powered GWAS and multiple latent cognitive factors, rather than a single g factor, facilitated the identification of loci shared between schizophrenia and cognitive abilities. This builds upon a previous study that used conjFDR to explore the overlap between intelligence and schizophrenia, which identified 75 shared loci13. Our study expands on these findings by identifying 70 new loci associated with factor 1, 213 with factor 2, and 145 with factor 3. Annotation of these loci has provided insights into the potential biological mechanisms linking cognitive abilities and schizophrenia. For instance, gene-set enrichment analysis revealed that loci associated with factor 1 (visuospatial processing) and schizophrenia are linked to synaptic structure and function. Meanwhile, loci associated with factor 3 (decision/reaction time) and schizophrenia are related to genes that play a role in neuronal development. These findings underscore the importance of synaptic function and neuronal development in schizophrenia pathophysiology and suggest these processes might also contribute to the cognitive impairments observed in the disorder. Furthermore, loci jointly associated with factor 2 (verbal analytic reasoning) and schizophrenia map to genes involved in diverse biological processes, including axon guidance and functions related to the immune system. Previous studies have linked immune process dysregulation in the central nervous system to schizophrenia risk and to impaired cognitive function in the disorder36,37,38; however, the impact of immune processes on cognition in schizophrenia remains to be fully explored. Additional research is necessary to identify the causal variants underlying these shared associations and to clarify the mechanisms by which these variants influence both cognitive function and the risk of schizophrenia.

We annotated the unique significant loci for each latent cognitive factor-schizophrenia pair and conducted gene-set enrichment analysis to explore putative biological mechanisms underlying the association between cognitive abilities and schizophrenia. Gene-set enrichment analysis implicated distinct gene-sets for each latent cognitive factor-schizophrenia pair but converged on processes and cellular components related to neurodevelopment and neuronal function. This is consistent with previous reports that have found that loci shared between schizophrenia and intelligence implicated genes involved in neurodevelopment, synaptic integrity, and neurotransmission13.

A further aim of the study was to understand the relationship between genetic liability to broad dimensions of cognitive function and schizophrenia symptoms. We calculated PGS for each latent cognitive factor and assessed their relationship with schizophrenia as well as positive symptoms, negative symptoms, and general psychopathology as measured by the PANSS in individuals with schizophrenia in the TOP study. We found that the visuospatial factor PGS and verbal analytic factor PGS significantly predicted schizophrenia but that there were no significant associations between any latent cognitive factor PGS and schizophrenia symptoms in our study. Previous research has consistently demonstrated a significant relationship between PGS for general cognitive ability and schizophrenia however, findings on the relationship between PGS for specific cognitive abilities and schizophrenia are mixed39,40. The lack of consistent results may be due to differences in the methods of assessing and defining cognitive abilities or domains and a consensus on the measurement of cognitive domains would improve comparability across studies. The lack of association between PGS for the latent cognitive factors and schizophrenia symptoms is in keeping with results from a previous study which found no association between an intelligence PGS and positive, negative, and disorganized symptoms in schizophrenia16. These findings suggest that there is distinction in the genetic determinants of cognitive abilities and other symptoms of schizophrenia and extend findings from phenotypic analyses that show minimal association between positive and negative symptoms and cognitive impairment in schizophrenia6.

The results of the current study should be interpreted in the context of several limitations. First, the cognitive assessments from the UKB are brief and bespoke. While previous studies have demonstrated that the psychometric properties for most assessments are adequate41,42, the findings of the current study require replication in other samples with psychometrically valid measures of cognitive function. Second, the sample sizes and heritability estimates for the cognitive traits differed and differences in power for the univariate GWAS defining each factor may have affected the results of the multivariate GWAS for each factor. Third, the mean scores for the subscales of the PANSS were relatively low and variance amongst the scores was low, which is expected given that the TOP study limited enrolment to individuals with the capacity to provide informed consent. As the PANSS assesses current symptom severity, a lifetime measure of schizophrenia symptoms may have been more appropriate for assessing the relationship between cognitive abilities and schizophrenia symptoms. Fourth, the modest sample size of the target sample, the TOP study, and the small variation in symptom levels may have affected our power to detect associations between the polygenic scores for the latent cognitive factors and schizophrenia symptoms. Lastly, we restricted our analyses to individuals of European ancestry and our results may not be generalizable to other populations. Efforts to improve the representation of diverse global populations in genomic studies are ongoing43,44 and once the relevant large scale datasets for non-European populations become available, the generalizability of the results from the current study should be examined.

In summary, we estimated three genetically determined correlated cognitive factors and applied a variety of genomic methods to explore the relationship between the cognitive factors and schizophrenia. We found extensive polygenic overlap between the latent cognitive factors and schizophrenia and demonstrated that most shared common genetic variants have opposite directions of effect on cognitive abilities and schizophrenia risk. This study demonstrated that most loci shared between the latent cognitive factors and schizophrenia show unique patterns of association with each cognitive factor. Results from biological annotation of shared loci converged and implicated biological processes related to neurodevelopment and neuronal functioning. Lastly, we extended current knowledge with our polygenic risk score analyses which showed a distinction in the common genetic determinants of cognitive abilities and schizophrenia symptoms. Collectively, our results suggest that heterogeneity in the extent of cognitive impairment observed across cognitive domains in schizophrenia reflects differences in genetic risk sharing between specific cognitive domains and schizophrenia.

Materials and methods

Sample description

This study used data from the UKB, a large-scale biomedical database with genotype and phenotype data for approximately 500,000 people45. Data for this study was obtained under accession number 27412. As Genomic SEM requires well-powered GWAS and LD reference panels that match the ancestry of the GWAS population46, our analysis uses genotype and cognitive data for individuals with a self-reported ethnicity of “white British” or “white non-British”. Cognitive tests included in this study were completed during the baseline assessment, and later follow-up assessments. Sample sizes for each cognitive measure vary (n = 28,156–436,853; Fig. 1) and participants’ ages range from 40 to 70 years at baseline and 45–75 years at later assessments.

Definition of the cognitive phenotypes

This study included 12 cognitive measures that were derived from cognitive tests administered as part of the baseline and follow up assessments for the UKB. The four cognitive tests that were administered at baseline are Fluid Intelligence, Reaction Time, Numeric Memory, and Pairs Matching Test. The six tests that were administered during a follow up assessment are Prospective Memory, Matrix Pattern Completion, PAL, UKB Symbol Digit Substitution, Tower Rearranging, TMT-A and TMT-B. All cognitive tests were fully automated and were designed to be administered with minimal supervision. A detailed description of each cognitive test is provided in the Supplementary Note.

Genome-wide association analyses

Version 3 of the UKB genetic data was used for this study. Genotyping, imputation, and central quality control procedures for the UKB genotypes are described in detail elsewhere47. Univariate GWAS for each cognitive phenotype was conducted using the REGENIE tool48, which consists of two steps. For step 1, polygenic predictors are calculated by fitting a whole genome regression model to genotype data. Prior to conducting step 1, the following quality control filters were applied to the UKB genotype calls: removal of individuals with > 10% missing genotype data, removal of SNPs with > 10% genotype missingness, removal of SNPs failing the Hardy–Weinberg equilibrium tests at p = 1 × 10–15, and removal of SNPs with a minor allele frequency (MAF) < 1%. After quality control, 581,299 variants remained for inclusion in step 1 of the analysis. For step 2, a linear regression model is used to test for phenotype–genotype associations using imputed genotype data, conditional upon the predictions of the model from step 1. Variants with an INFO score < 0.8 and MAC < 20 were excluded from this step, leaving a maximum of 20,241,796 variants for analysis. Sex, age, age2, age by sex interaction, assessment centre, genotype array, and the first 40 genetic principal components were included as covariates in each GWAS.

Estimations of SNP-based heritability and genetic correlations

The h2SNP for each cognitive phenotype from the UKB was estimated using LDSC25,26. LDSC was also used to estimate the genetic correlations between the 12 cognitive phenotypes. A hierarchal clustering algorithm was applied to identify clusters of correlated cognitive traits. The analysis was performed using the “clustermap” function of the seaborn Python library implemented with default parameters49.

Genomic structural equation modelling

We conducted exploratory and confirmatory factor analysis of the 12 UKB cognitive phenotypes. First, the multivariable extension of LDSC employed in Genomic SEM was used to derive a genetic covariance matrix (S) and sampling covariance matrix (V). Next, exploratory factor analysis (EFA) with promax rotation was conducted on the standardized S matrix using the R package, stats50. Results from the EFA were used to guide confirmatory factor analysis (CFA) for a one, two-, and three-factor model. CFA was performed using Genomic SEM and standardized factor loadings of > 0.4 were retained for CFA. Model fit for each factor model was assessed using recommended fit indices: standardized root mean square residual (SRMR), model χ2 statistic, Akaike Information Criterion (AIC), and Comparative Fit Index (CFI). Model fit was considered acceptable for CFI values ≥ 0.90 and SRMR values ≤ 1051. A three-factor solution demonstrated superior model fit to a one- or two-factor solution and was selected for subsequent analysis.

Multivariate GWAS in genomic SEM

Following identification of the confirmatory factor model that best explained the genetic covariance structure among the UKB cognitive phenotypes, Genomic SEM was used to estimate the individual SNP associations with each latent factor in the model. As the cross-trait intercepts estimated by multivariable LDSC account for sample overlap, SNP association estimates derived using Genomic SEM are robust to varying and unknown degrees of sample overlap across the contributing univariate GWAS46,51,52. The multivariate GWAS was conducted using summary statistics for the univariate GWAS for each cognitive phenotype. Prior to conducting the multivariate GWAS, effect alleles were aligned across univariate GWAS and beta coefficients were standardized. Summary statistics for input into the multivariate GWAS were restricted to SNPs that were present for all 12 cognitive phenotypes and present in the 1000 Genomes Project Phase 3 release European reference panel53. After filtering, 8,041,728 SNPs remained for inclusion in the multivariate GWAS. The method for calculating the effective sample size for each latent factor is described in the Supplementary Note.

Estimation of genetic overlap between cognitive factors and schizophrenia

We explored the genetic overlap between the three latent cognitive factors and schizophrenia using summary statistics from the multivariate GWASs and for participants of European ancestry in the latest PGC Schizophrenia GWAS (Ncase = 53,386, Ncontrol = 77,258)54.

First, global genetic correlations between the latent cognitive factors and schizophrenia were estimated using LDSC25,26. Bivariate MiXeR was used to estimate the number of phenotype-specific and shared causal variants between each cognitive factor and schizophrenia55. A bivariate Gaussian mixture model with four components was constructed using summary statistics for each cognitive factor and schizophrenia. The four components of the model represent (1) SNPs with a null effect for both phenotypes, (2 and 3) SNPs with a non-null effect for either the first or second phenotype, and (4) SNPs with a non-null effect for both phenotypes. Model fit was evaluated by the AIC.

Next, local genetic correlations between the latent cognitive factors and schizophrenia were estimated using LAVA27. For each phenotype, LAVA was used to estimate the genetic variance across 2495 semi-independent genetic loci of approximately equal size (~ 1 Mb) defined by Werme et al.27 Loci with a significant local SNP based heritability (α = 0.05/2495 loci; p < 2 × 10–5) for each phenotype were included in the bivariate analysis. LAVA estimates local genetic correlations for each phenotype pair by constructing a matrix of local genetic covariance for each locus using the method of moments.

Lastly, to compliment the estimates of genome-wide genetic overlap provided by bivariate MiXeR, we applied the conjFDR method, an extension of the condFDR approach28,56, which enables the identification of specific loci that are shared between each latent cognitive factor-schizophrenia pair. For each latent cognitive factor-schizophrenia pair, we used SNP associations with the latent cognitive factor to re-rank the test statistics and recalculate the significance of the SNP associations with schizophrenia. We then reversed the phenotypes and re-calculated the strength of a SNP association with each cognitive factor conditional on the SNP association with schizophrenia. Next, conjFDR analysis was used to estimate the likelihood, represented as a conjFDR value, that a SNP has a non-null association with both phenotypes in a phenotype pair. A conjFDR value < 0.05 was considered significant.

Functional annotation

We used standard Functional Mapping and Annotation of Genome-wide Association Studies (FUMA) definitions to define genomic loci, lead SNPs, independent significant SNPs and candidate SNPs by clumping the conjFDR output for each latent cognitive factor-schizophrenia pair at an FDR of < 0.05. Next, we used Bedtools v2.27.157 with default parameters to identify significant loci that were unique to a latent cognitive factor-schizophrenia pair. We mapped genes to the unique significant loci using data provided by Ensembl30 based on the GRCh38 reference genome. We used the GENE2FUNC function in FUMA to test for enrichment of the identified genes for each latent cognitive factor-schizophrenia pair in gene sets obtained from MsigDB v7.058. The Benjamini–Hochberg correction for multiple testing was applied per category of gene-sets.

Polygenic prediction of schizophrenia symptom dimensions

For PGS analyses, the target dataset comprised 306 individuals with schizophrenia and 1060 controls of European ancestry from the TOP Study23. For individuals with schizophrenia, symptoms were measured using the positive, negative, and general psychopathology subscales of the Positive and Negative Syndrome Scale (PANSS)31. Mean scores for the PANSS in the TOP sample were 14.37 (SD = 5.29) for the positive scale, 15.30 (SD = 6.26) for the negative scale, and 12.95 (SD = 4.33) for the general psychopathology scale. The recruitment and genotyping procedures for the TOP study are described in the Supplementary Note. A PGS for each of the latent cognitive factors was calculated from the effect size estimates from the multivariate GWAS summary statistics using PRS-CS-auto59. PRS-CS-auto is a Bayesian polygenic prediction method that estimates posterior effect sizes of SNPs by placing a continuous shrinkage prior on SNP effect sizes and incorporating information from an external LD reference panel. PRS-CS-auto automatically estimates the global shrinkage prior from the discovery dataset and does not require a validation dataset59. In the present study, the 1000 Genomes Phase 3 release European sample53 was used as the LD reference panel. SNPs with a minor allele frequency < 0.01 were excluded from the analysis, which left 964,446 SNPs for calculation of the PGS. We used regression models to examine the relationship between PGS for each latent cognitive factor and schizophrenia, as well as the three dimensions of schizophrenia symptoms. Specifically, we estimated a logistic regression model with a schizophrenia diagnosis as the outcome, and separate linear regression models for each of the symptom dimensions. Age, sex, and the first 20 genetic principal components were included as covariates in the model. Phenotypic variance explained by the PGS (Nagelkerke’s pseudo-R2 for schizophrenia diagnosis and R2 for PANSS subscale scores) was estimated as the difference between the R2 of the full regression model (PGS and covariates) and the R2 of the null model (covariates only). The Bonferroni correction was applied to account for 12 tests (3 polygenic scores and 4 outcomes; α = 0.05/12; p < 4.17 × 10–3).

Ethical standards

This study was conducted in accordance with the principles outlined in the Declaration of Helsinki. This work was approved by the University of Cape Town Human Research Ethics Committee (reference number—734/2021). The UKB has ethical approval (REC reference number—11/NW/0382) and is overseen by an Independent Ethics and Governance council. The TOP Study was approved by the Norwegian Scientific Ethical Committee and the Norwegian Data Protection Agency. Informed consent was obtained from participants in the UKB and TOP Study.