The Netherlands Twin Register (NTR) was established around 1987 by the Department of Biological Psychology at the Vrije Universiteit Amsterdam and recruits approximately 40% of new-born twins or higher-order multiples in the Netherlands for longitudinal research. Parents of twins receive a survey about the development of their children every 2 to 3 years until the twins are 12 years old. Starting at age 7, parents are asked consent to also approach the primary school teacher(s) of their twin and other children. The survey sent to mothers, fathers and teachers includes the age and context appropriate version of the Achenbach System of Empirically Based Assessment (ASEBA) (Achenbach et al. 2017). Adult twins were registered with the NTR through several approaches, including, for example, recruitment through city council offices in the Netherlands, advertising in NTR newsletters and the internet. Parents, siblings, spouses and offspring of adult twins are also invited to take part. Since 1991, participants receive a survey every 2 to 3 years with questions on, amongst others, health, personality, and lifestyle. The NTR has also been collecting genotype data in both children and adults in several large projects. More details concerning the NTR’s data collection, the methods of recruitment, participants’ background and response rates are described elsewhere (Ligthart et al. 2019).
For 5900 offspring (from 2649 families) their own, as well as the genotype data of both of their parents were available. Genotyping in the NTR has been carried out in subsamples that are, in general, unselected for phenotype. Data were excluded if an individual had a non-European ancestry (n = 472). In this group of families, information on EA was available in people over 25 years of age for 1931 adult offspring (662 males and 1260 females) from birth cohorts 1946–1991. Childhood academic achievement was assessed around age 12 and available for 1120 offspring (509 boys and 611 girls) from birth cohorts 1983–2002. ADHD symptoms were assessed at age 10 and 12. If available, age 12 data were analyzed, otherwise age 10 data. Data on ADHD symptoms at home were available for 2518 children (1202 boys and 1316 girls) from birth cohorts 1986–2008. Data on ADHD symptoms at school were available for 1969 (968 boys and 1001 girls) children from birth cohorts 1986–2011.
EA in adults was measured by means of a self-report on highest obtained degree. The responses were recoded into four categories: primary education (level 0), lower secondary education (level 1), higher secondary education (level 2) and tertiary education (level 3).
Academic achievement in children was assessed by a nationwide standardized educational achievement test (Cito 2002). The results on this test are, in combination with teacher advice, used to determine the most suitable level of secondary education. Around 75% of Dutch children took this test in their final year of primary school as administration of the test was not compulsory. The test consisted of multiple choice items in four domains, namely Arithmetic, Language, Study Skills and Science and Social Studies. The first three test scales were combined into a Total Score, which was converted into a score ranging from 500 to 550, which reflects the child's standing relative to the total group of children who took the test in a given school year (van Boxtel et al. 2010).
ADHD symptoms were assessed with the ASEBA system empirically based syndrome Attention Problems scale (Achenbach et al. 2017). The Child Behavior Check List (CBCL) for school aged children (6–18 years) was used to assess behavior at home, and the Teacher Report Form (TRF) for behavior at school. The ASEBA Attention Problems scale includes items (CBCL: 10 items and TRF: 26 items) on inattention (e.g. ‘Fails to finish things he/she starts’) and hyperactivity/impulsivity (e.g. ‘Can’t sit still’). The items are scored on a 3 point scale from 0 (‘not true or never’) to 2 (‘completely true or very often’). Missing items were imputed by the average item score of the scale for a child if missingness on the scale items was less than 20%. The data showed an L-shaped distribution and were square root transformed prior to analyses.
The genotype data used for this study included 17,620 unique DNA samples, done on several different platforms: Affymetrix–Perlegen (n = 1117), Illumina 660 (n = 1323), Illumina Omni Express 1 M (n = 234), Affymetrix 6.0 (n = 7086), Affymetrix Axiom (n = 2665) and Illumina GSA (n = 5195). Genotype calls were made with the platform specific software (Birdseed, APT-Genotyper, Beadstudio) following manufacturers' protocols. For the Affymetrix-Perlegen and Illumina 660 platforms, the single nucleotide polymorphisms (SNPs) were lifted over to build 37 (HG19) of the Human reference genome.
Per platform, a sample was removed if the call rate for this person was < 90%, the Plink 1.07 F heterozygosity value was < − 0.10 or > 0.10, the gender of the person did not match the DNA of the person, the IBD status did not match the expected familial relations, or the sample had > 20 Mendelian errors. In case a subject was genotyped on multiple platforms, only the platform with the highest number of SNPs was selected if genotypes were concordant (> 0.97). Allele and strand alignment of SNPs was done against the Dutch Genome of the Netherlands (GONL) reference panel for each platform (Boomsma et al. 2014). SNPs were removed in each platform when Minor Allele Frequency (MAF) < 0.01, Hardy–Weinberg Equilibrium (HWE) test p-value < 10–5 or the call rate of the SNP was < 95%. Subsequently, only SNPs were selected if the allele frequency of the SNP deviated < 0.10 as compared to the GONL data. All palindromic SNPs with a MAF > 0.40 were also removed. The individual platform data were then merged into a single dataset. In this dataset, the sample IBD was re-compared with their expected familial relations and samples were removed if these did not match. Because the number of completely overlapping SNPs within this combined set off platforms is too small (~ 70 K) for imputation against 1000G, the data were first phased and imputed with Mach-admix, using GONL as a reference panel. This was done for those SNPs that survived quality control and were present on at least one platform, forcing missing genotype imputation for all SNPs. Best guess genotypes were generated from these data, and the following SNPs were selected: SNPs with a R2 > 0.90, with HWE p > 10–5, with a Mendelian error rate < 2%, and if the association of one platform = case vs. the other platforms = controls p-value > 10–5 (applied for each platform) resulting in a genetic backbone of 1.2 M SNPs. After this step, 3017 DNA confirmed monozygotic twin samples were returned into the dataset by duplicating the SNP data of their co-twin. Another 364 DNA samples were added, 335 out of the original 349 samples, plus 29 of their confirmed monozygotic twins, of the NTR that were also sequenced in the GONL reference population. The resulting was a final dataset of 21,001 individuals from 6671 families with 1.2 M SNPs. The cross-chip imputed data were used to calculate genetic principal components using SmartPCA software, and the PCAs were subsequently used to determine if a person was from non-European descent (Galinsky et al. 2016). The full set with 1.2 M SNPs was then aligned against the 1000 genomes phase 3 version 5 reference panel, and imputed on the Michigan imputation server (Das et al. 2016). From the imputed 1000G VCF files, best guess genotypes were calculated for all markers using Plink 1.96.
In total there were 2649 families having two genotyped parents, with 5900 offspring, including 1245 MZ twin pairs, for which allele transmission could be calculated on the 1000G imputed data. Before this calculation, the genotype data were filtered using the following criteria: only ACGT SNPs on the autosomes, no SNPs with duplicate positions, no SNPs with 3 or more alleles, MAF > 0.01, HWE p > 10–5 and genotype call rate > 0.99, leaving 7,411,699 SNPs. For the 5900 offspring, this is the transmitted alleles dataset. Subsequently, all children were defined as being a case, and then the Plink–tucc option was used to generate a single TDT pseudo-control genotype for each child (given the 2 parents), resulting in the non-transmitted alleles dataset. Both datasets were then used to calculate PGSs.
For the EA PGS calculation we used the GWA summary statistics from the EA meta-analysis (Lee et al. 2018) and for ADHD we used the statistics from the meta-analysis for ADHD (Demontis et al. 2019), both excluding the NTR and 23andMe cohorts. After excluding these cohorts, the meta-analysis was redone for EA and ADHD symptoms. Since the NTR was present in the quantitative EAGLE summary statistics, which were combined with the Psychiatric Genetics Consortium (PGC) case–control ADHD summary statistics, we also re-applied the correction method to join case–control and quantitative summary statistics (Demontis et al. 2019).
Based on these summary statistics sets, linkage disequilibrium (LD) weighted Beta's were calculated using the LDpred package with different cut-offs of the fraction of SNPs with a causal effect (Vilhjálmsson et al. 2015) with an LD pruning window of 250 KB (See Fig. 1). The reference population to calculate the LD patterns was a selection of the first 2500 2nd degree unrelated 1000G imputed individuals from the 5900 NTR individuals that were used for scoring. The detection of unrelated individuals was done with the King software (Manichaikul et al. 2010). The resulting LD corrected Beta's were used to calculate polygenic scores using the Plink 1.90 software, in the transmitted and non-transmitted alleles datasets.
The current study included one offspring outcome in adulthood, i.e. EA, and two in childhood, i.e. academic achievement and ADHD symptoms. In adulthood, EA was regressed on the transmitted and non-transmitted EA PGS to test for replication of previous findings (Bates et al. 2018; Kong et al. 2018). In childhood, academic achievement, ADHD symptoms at home and ADHD symptoms at school were regressed on the transmitted and non-transmitted EA PGSs (model 1), on the transmitted and non-transmitted ADHD PGSs (model 2) and on the transmitted and non-transmitted EA plus ADHD PGSs (model 3). All outcome measures were residualized for the effects of sex, year of birth (only for EA), the interaction between sex and year of birth (only for EA), 10 principal components reflecting Dutch ancestry differences, and the genotyping platform. Within each analysis, the predictors and residualized outcome measures were standardized in the subset of individuals that had both PGS and phenotype data. A random intercept was added to correct for dependency of the observations due to family clustering. Generalized linear models were fitted in the statistical program SPSS Statistics for Windows 25.0 (IBM Corp. 2017) with maximum likelihood estimation. The type of model depended on the measurement level of the outcome: EA (ordinal logistic), academic achievement (linear) and ADHD symptoms (linear). To correct for multiple testing an alpha level of 0.01 was adopted.
The sample included twin pairs and their siblings, which meant that observations were not independent. To facilitate the power analysis, we used the effective sample size, i.e. NE = (N*M)/(1 + ICC*(M−1)) in which N = the number of families, M = the number of individuals in a family and ICC = the (average) phenotypic correlation within a family. We applied this separately for MZ and DZ (and siblings) families, given the expected differences in ICC. The power to detect a particular effect size (i.e. percentage of phenotypic variance explained) of the non-transmitted PGS was based on the non-central F-distribution. Power equals the percentage of significant tests of the regression coefficient given an alpha level of 0.01.