Exome sequencing reveals predominantly de novo variants in disorders with intellectual disability (ID) in the founder population of Finland

The genetics of autosomal recessive intellectual disability (ARID) has mainly been studied in consanguineous families, however, founder populations may also be of interest to study intellectual disability (ID) and the contribution of ARID. Here, we used a genotype-driven approach to study the genetic landscape of ID in the founder population of Finland. A total of 39 families with syndromic and non-syndromic ID were analyzed using exome sequencing, which revealed a variant in a known ID gene in 27 families. Notably, 75% of these variants in known ID genes were de novo or suspected de novo (64% autosomal dominant; 11% X-linked) and 25% were inherited (14% autosomal recessive; 7% X-linked; and 4% autosomal dominant). A dual molecular diagnosis was suggested in two families (5%). Via additional analysis and molecular testing, we identified three cases with an abnormal molecular karyotype, including chr21q22.12q22.2 uniparental disomy with a mosaic interstitial 2.7 Mb deletion covering DYRK1A and KCNJ6. Overall, a pathogenic or likely pathogenic variant was identified in 64% (25/39) of the families. Last, we report an alternate inheritance model for 3 known ID genes (UBA7, DDX47, DHX58) and discuss potential candidate genes for ID, including SYPL1 and ERGIC3 with homozygous founder variants and de novo variants in POLR2F and DNAH3. In summary, similar to other European populations, de novo variants were the most common variants underlying ID in the studied Finnish population, with limited contribution of ARID to ID etiology, though mainly driven by founder and potential founder variation in the latter case. Supplementary Information The online version contains supplementary material available at 10.1007/s00439-021-02268-1.


Introduction
It is estimated that variants that affect the functions of more than 2500 genes can give rise to ID, and roughly half of these genes remain unknown. Identifying the genetic etiology of ID has been complicated by extreme genetic heterogeneity. In most studies from mixed populations, de novo variants have been reported to be the most common cause of ID (Rauch et al. 2012;de Ligt et al. 2012) whereas X-chromosomal ID (XLID) contributes 10-12% of cases (de Brouwer et al. 2007). Most evidence for ARID genes has been obtained from populations where consanguineous marriages are common (Monies et al. 2017;Martin et al. 2018) whereas data about genetic variants underlying ARID are rare in outbred populations (Martin et al. 2018).
Founder populations can serve as a middle ground between mixed and consanguineous populations in the identification of ARID genes where the enrichment of a disease allele is strongly affected by genetic drift, and founder effects. The Finnish population represents a founder population where nearly 40 rare autosomal recessive (AR) diseases with one founder variant have enriched (Peltonen et al. 1999). To further dissect the landscape of the genetic causes underlying ID in a founder population, a genomic sequencebased approach of exome sequencing (ES) was used.

3
Methods A total of 39 families with mild to profound ID, and both non-syndromic and syndromic forms, were enrolled in the study. Of them 27 were trios and 12 had one parent and/or one sibling available for the analysis. Affected individuals were clinically evaluated by a child neurologist and clinical geneticist for the study. Photographs display syndromic features from affected individuals and in relevant cases, MRI was also obtained (Figure S1& S2). The parents or legal guardians of all patients and their healthy siblings in this study provided written informed consent to participate and to publish photos of the patients. The study was approved by the ethics committees of the Hospital District of Helsinki and Uusimaa and the Institutional Review Boards of Columbia University (IRB-AAAS3433) and Baylor College of Medicine (protocol H-29697).

Exome sequencing (ES) and bioinformatic analysis
Exomic libraries were prepared using either the SureSelect Human All Exon V6 kit (Agilent Technologies, Santa Clara, CA, USA), the TruSeq DNA exome kit (Illumina Inc, San Diego, CA, USA) or the Baylor College of Medicine Human Genome Sequencing Center VCRome 2.1 design (42 Mb Nimblegen, Cat. No. 06266380001). 100 bp pairedend sequencing was performed on a HiSeq2500/4000/2000 instrument (Illumina Inc, San Diego, CA, USA). Details on bioinformatic analyses for both single nucleotide variants (SNVs), small insertion/deletions (InDels), copy number variants (CNVs) and variant filtering can be found in the supplementary methods (Supplemental Methods). In short, rare variants following several inheritance models (e.g. AR, Autosomal Dominant, X-linked) with a predicted effect on protein function or pre-mRNA splicing were retained. Known and candidate genes for ID were prioritized (sysID database; https ://sysid .cmbi.umcn.nl/), and if no known or candidate genes were found, variants were assessed further using additional annotations such as gene expression and literature. Sanger sequencing was performed using an ABI3130XL Genetic Analyzer to verify candidate SNV and InDel variants and to examine segregation amongst the family members that were not exome sequenced. The classification of variants is based on the American College of Medical Genetics and Genomics (ACMG) recommendations (Richards et al. 2015).

Molecular karyotyping
Molecular karyotyping was performed in FIN10. In short, microarray analysis was performed from DNA extracted from a lymphoblastoid cell line using the HumanCy-toSNP-12 (v2.1) (Illumina, San Diego, CA). SNP genotype analysis of FIN10 and her parents' samples were done to evaluate the origin of deletion and uniparental disomies. FISH-analysis was done both from uncultured (n = 200 interphases) and cultured (n = 300 interphases, 25 metaphases) peripheral blood lymphocytes using a probe mix detecting DNA-sequences from the DYRK1A gene region and from the 21q21.1 control region. Additional details are available in the Supplementary information.

Runs of homozygosity analysis
Runs of homozygosity and inbreeding were assessed using plink(v1.90) (Chang et al. 2015) in the probands of the Finnish families, and an in-house collection of unrelated samples of European (N = 15; outbred) and South Asian ethnicity (N = 133; inbred) exome sequenced with the SureSelect Human All Exon V6 kit. In short, InDels were removed and only SNVs with a 90% genotyping rate, Hardy-Weinberg Equilibrium p value > 0.00001 and MAF > 0.01 were retained. One Mb or larger runs of homozygosity were assessed across the genome using a sliding window (5 Mb; 50 SNVs). Inbreeding coefficients were calculated for each sample using three different methods (Fhat 1-3) after additional filtering (MAF > 0.05) and linkage disequilibrium pruning (window size: 100; step size 10; r 2 > 0.5) (Chang et al. 2015). A Kruskal-Wallis test was done to compare > 2 groups. Post-hoc analysis of pairwise comparisons was done with the Wilcoxon rank sum test with multiple testing adjustments (false discovery rate). A T test or Mann-Whitney U test was used to compare 2 groups.

Results
Detailed phenotypic and clinical characteristics of all patients are provided in the Supplementary information. Following the analysis of the ES data, we identified a total of seven previously reported pathogenic (P) variants, 11 novel pathogenic or likely pathogenic (LP) variants, and four variants of unknown significance (VUS) in known genes in 19 families with neurodevelopmental disorders (Tables 1 and  2; Suppl information; Figures S1A & S1B). Additionally, six novel variants associated with a phenotypic expansion beyond that characterized the known disease gene and three variants with an alternate inheritance model were identified (Tables 1 and 2; Figures S1C & S1D). Nine novel candidate genes for ID were also found ( Table 1, 2 and 3; Table S1; Figure S1E). For three cases where standard ES analysis did not reveal a putatively causal result, CNV analysis of ES data and/or molecular karyotyping revealed a rearrangement, of 1.25 Mb del, mosaic UPD21q22.12-22, 2.7 Mb del, and 1 3  Not a known ID gene but could contribute to the skeletal phenotype seen in this family 1 3

Novel variants in known genes
We identified 11 novel pathogenic or likely pathogenic variants and four variants of unknown significance (VUS) in known genes in 13 families with neurodevelopmental disorders with ID (Tables 1 and 2). In family FIN12 a novel and in-frame hemizygous duplication [p.(Pro187dup)] in exon 2 of ARX (X-linked) was inherited from the healthy heterozygous mother. The phenotype of the index patient is in agreement with Partington disease with mild ID, dystonic hand movements, and epileptic fits (OMIM # 309,510). His brother who has Down syndrome, carries the same ARX variant but has no signs resembling Partington syndrome, however, it is unknown whether his trisomy 21 may mask/rescue defects in ARX. The variant is rare with no hemizygotes in gnomAD (Table 2; Table S1). Currently, its significance remains unknown.
Study subject FIN14-3 had a novel likely pathogenic de novo non-frameshift deletion in CTBP1 [p. (Phe53del)]. The subject's phenotype is characterized by DD/ID, frontal bossing ( Figure S1B), hypotonia, difficulties in feeding, psychomotor and growth delay, and ataxic gait. Brain atrophy was found already at one year of age. Interestingly, ophthalmological findings differed from previous cases as FIN14-3 has severe myopia and was operated for cataract at 27 years of age. In the literature, there is only one CTBP1 variant [p.(Arg331Trp)] which was observed in four unrelated patients who shared features with FIN14-3 (OMIM # 602618). Interestingly, no evidence for tooth enamel defects was detected in FIN-14-3. Both variants are located in the PLDLS-domains of the CTPB1 and are critically related to transcriptional repression.
A de novo novel pathogenic nonsense variant [p.(Lys620*)] in CHAMP1 was identified in the DNA sample obtained from FIN20-3 ( Figure S1B), which presented with moderate to severe ID, strabismus, constipation, gastroesophageal reflux (GER) and frontal hypoplasia.
For study subject FIN33-3, a de novo likely pathogenic missense variant [p.(Arg1198Ser)] in RAI1 was identified. This gene underlies Smith-Magenis syndrome (Figure S1B). The phenotypes for FIN20-3 and FIN33-3 were both consistent with previous cases with variants in these genes (OMIM # 616579; OMIM # 182290). A novel likely pathogenic homozygous missense variant in P4HTM [p.(Pro413Leu)] was found in family FIN42 which has two affected sons ( Figure S1B; Tables 1 and 2). The variant was not present in the homozygous state in three healthy siblings. Interestingly, P4HTM was recently established as a human disease gene that causes HIDEA-syndrome (OMIM # 618493). The hallmarks of the HIDEA syndrome are hypotonia, ID, sleeping problems, eye abnormalities, and obesity which were present in both affected siblings.
The elderly male patient FIN49-1 has a unique heterozygous missense variant in SCN1A [p.(Met631Val)]. Intriguingly, the participant started to move in a crouched gait at 22 years of age, which is a feature recently reported as a characteristic of Dravet syndrome (OMIM # 182389). Given the severe phenotype, he may represent one of the The only autosomal dominant variant was found in family FIN-ID8 where a 12 bp deletion variant in GRIN2A [p.(Ile151_Ala155delinsThr)] was inherited from the father to his daughter. Both had mild epilepsy during childhood that resolved similar to that previously seen in GRIN2A cases (# 245570; FESD). In addition to the GRIN2A variant, the daughter has an unknown syndrome (data not shown) that was not solved in this study.
The phenotype of the proband with de novo missense variant in p.(Ile908Val) in SAMD9L is characterized by moderate ID, clumsiness, and delayed speech development. Variants in SAMD9L have been reported to cause ataxiapancytopenia syndrome (AP) (OMIM # 159550). None of these characteristics were detected in our subject. Due to the extensive phenotypic variability associated with SAMD9L variants, more cases need to be identified to properly define the phenotype caused by SAMD9L variants.
A heterozygous pathogenic frameshift deletion [p.(Val392fs)] in the C-terminal region of MECP2 was detected in FIN28-3 ( Figure S1C). Variants affecting the function of MECP2 are typically the foundation of classical Rett syndrome. However, the phenotype of the young female here was mild compared to classical Rett or Rettlike syndrome and resembles the phenotype described by Huppke et al. (Huppke et al. 2006). Consequently, we tested X-inactivation and found no evidence of skewed X-inactivation (ratio: 62:38) based on a blood sample. As the variant is in the 3′ region and last exon of MECP2, it is predicted to escape nonsense-mediated decay, therefore the milder phenotype might be due to a partially functional protein that is still expressed. The p.(Val392fs) variant was classified as pathogenic (SCV001168944.1) in the ClinVar database, however no phenotypic details were provided.
Last, a de novo missense variant p.(Glu264Gln) in MYT1 was identified in FIN35-3. Previously, a de novo subtelomeric deletion on chromosome 20 containing MYT1 and PCMTD2 have been reported (VCV000058980.1). Both of these genes affect myelination and neural differentiation. Interestingly, these genetic changes share some common phenotypic features including ID, abnormal facial features, lack of speech and communication, and structural abnormalities of the fingers ( Figure S1C). Surprisingly, variants in MYT1 have been identified on the oculo-auriculo-vertebral spectrum (OAVS) in patients who have normal intelligence. FIN35-3 also carried a de novo variant p.(Gly382Val) in COL9A2 implicated in AR Stickler syndrome and AD epiphyseal dysplasia. It is unclear whether this variant contributes to the skeletal phenotype observed in this patient.
For the female patient FIN-AIC3-3, we identified a homozygous splice region variant c.1177 + 9 T > C in ZC3H14 (OMIM # 617125; MRT5) of unknown significance. Although this gene has been implicated in nonsyndromic AR mental retardation 56, the disorder in our patient is a much more severe multi-system disorder that, based on ophthalmological and brain MRI findings, was resembling Aicardi syndrome. Interestingly, loss of the ortholog of ZC3H14, dNab2, in drosophila leads to morphological defects, including those of the eye and displayed severely compromised flight behavior, and poor locomotor activity (Pak et al. 2011). The severe phenotype in the fly more closely resembles the severe phenotype in our patient.

Alternate inheritance model
Three genes followed a different mode of inheritance than was previously reported for a similar disorder (Table 1). Two of them (DDX47 and DHX58) are members of the DDX/ DHX family, which has recently been implicated in neurodevelopmental disorders (Paine et al. 2019). In contrast to previously reported AR variants in DDX47 and DHX58 ( Figure  S1D), both of our cases have de novo variants. Two affected males of families FIN46 and FIN-ID9, both display a severe neurodevelopmental disorder first noted in the newborn period. FIN46-3, with a variant in DDX47, has a profound ID, no speech, hypnic jerks, and has been non-ambulatory from seven years of age. FIN-ID9-4, with a DHX58 variant had severe feeding difficulties, delayed growth, no speech and epilepsy. As a child, he was prone to infections and was extensively studied for metabolic diseases. The third case (FIN7-3) has a mosaic (de novo somatic or gonosomal) variant predicted to impact splicing of UBA7 (c.1904 + 3A > G). The phenotype was characterized by moderate ID without syndromic features (Suppl information). Previously, a homozygous variant p.(Glu397*) in UBA7 was reported in a Pakistani family with an AR inheritance pattern (Harripaul et al. 2018). It has been speculated that heterozygosity for this nonsense variant is a risk factor for milder cognitive disability (Harripaul et al. 2018), and it is present in higher frequencies in South Asian populations (gnomAD MAF = 0.0047).

Candidate variants in potential novel genes
We identified variants of interest in a total of 9 novel candidate ID genes ( Figure S1E; Tables 1, 2 and 3; Table S1). Of them, three were autosomal recessive homozygous variants which originated from a sub-isolate of North Eastern Finland where an increased size and frequency in runs of homozygosity were detected (Table 4).
In short, we identified a de novo variant in NTRK1 also called TRKA [mosaic mixture of p.(Tyr757*) and p.(Tyr757 =)] in a child with moderate to severe ID, unclear speech, and Lennox epilepsy (FIN4), a gene important in the development of the central and peripheral nervous system (Bibel 2000) and currently only associated with insensitivity to pain (Indo et al. 1996).
In FIN23-3, a candidate de novo variant [p.(Lys181Glu)] in 1,4,5-triphosphate receptor, type 2 (ITRP2) was identified in the sample from the patient, who has a phenotype resembling Gillespie syndrome caused by both AD and AR variants in ITPR1 (OMIM # 147265). Previously, there was only one variant [p.(Gly2498Ser)] reported in ITPR2 that causes AR anhidrosis with normal sweat gland in one family (OMIM #106190) (Klar et al. 2014). Thus, the phenotype of FIN23-3 with neurological and ophthalmological abnormalities and deafness differs markedly from the aforementioned phenotype. 1 3 FIN27-3, a young male has a severe syndromic ID (Figure S1E). He has a de novo candidate missense variant [p.(Phe434Leu)] in ZKSCAN1 (OMIM # 601260). This gene has been found to regulate the expression of GABA type-A receptors, the major inhibitory neurotransmitter in the brain (Mulligan et al. 2012).
FIN32-3 is a young female who displays slender habitus, mild ID and neuropsychiatric symptoms has a de novo ZFR missense variant [p.(Asp889Glu)] (OMIM # 615635). ZFR has been implicated in axon guidance, neurogenesis, and mRNA transport in neurons (Kjaergaard et al. 2015). It has previously been suggested as a candidate gene for spastic paraplegia (Novarino et al. 2014) and is also a strong candidate gene for ID.
For patient FIN45-3 a previously unknown de novo splice variant (c.294-2A > G) in POLR2F was identified (OMIM # 604414). The phenotype was detectable for FIN45-3 as a newborn and manifests as profound ID. Recently, 16 cases of neurodevelopmental syndromes characterized by profound infantile-onset hypotonia, and developmental delays with de novo variants in another RNA polymerase II subunit A gene, POLR2A (OMIM # 180660) were described by Haijes et al. (Haijes et al. 2019). In addition, heterozygous mice in the International Mouse Phenotyping Consortium (IMPC) show low circulating albumin levels. Interestingly, a low albumin level was also detected in FIN45-3.
For male patient FIN47-3, we identified a heterozygous variant in DNAH3 [p.(Ile3989Val)], which was absent from the unaffected mother and sibling. Six DNAH3 de novo missense variants were found in patients with a neurodevelopmental disorder and ID in the large Deciphering Developmental Disorders (DDD) Study (4293 families) (Deciphering Developmental Disorders Study 2017) and it was suggested as a candidate gene for ID (Kochinke et al. 2016). Unfortunately, the clinical details of these six patients are unavailable. Of notice, several of the novel candidate genes show intolerance towards missense and/or loss-of-function variants (LOF) based on constraint metrics (Table 3), particularly ZKSCAN1, ZFR, KIF1B, and ZC3H14.
In family FIN-ID3, we identified a homozygous splice site variant in ERGIC3 (c.717 + 1G > A) (OMIM # 616,971) in both affected sibs. This gene is important in mediating the transport from the endoplasmic reticulum to the Golgi and has been mentioned as a possible AR ID candidate gene based on a single male patient with growth retardation, microcephaly, learning disability, facial dysmorphism and abnormal pigmentation (Monies et al. 2019).

Structural variants (SVs)
The phenotype of FIN10 is consistent with a pathogenic partial mosaic maternal uniparental disomy (UPD) with a 2.7 Mb deletion on chromosome 21q22.12-q22.2 on paternal chromosome 21, which was identified via microarray analysis. A deletion was found in 22-23% of the cells analyzed from peripheral blood. Although the proportion of the deletion is relatively small, all symptoms-profound ID, absence of speech, epilepsy, microcephaly, growth retardation, and dysmorphic features-are similar to those described in 21q22.12q22.2-deletions which includes DYRK1A (Figure S1F). In addition to the DYRK1A deletion mosaicism, complex mosaicism of three different homozygosity regions, 21q22.11q22.3 (78%), 21q21.3q22.11 (30%), and 21q21.1q21.3 (15%), was detected resulting from maternal uniparental disomy (UPD) ( Figure S4). The region contains also KCNJ6 underlying Keppen-Lubinsky syndrome (OMIM #614098) characterized by severe ID, seizures and microcephaly. It can have an additional effect on the phenotype. The study has been validated by FISH using a probe close to the DYRK1A region (see KF19-276) ( Figure S3A). Parental samples were not available to rule out rare balanced rearrangements involving the 21q22.12q22.2-region using FISH.
Using both microarray analysis and exome sequencing an inherited heterozygous deletion in NDE1 (16p13.11del) was found in FIN43 ( Figure S1F; S2B). The deletion was inherited from the unaffected father. The phenotype (primary and severe microcephaly, partial agenesis of the corpus callosum and simplified gyral pattern) was compatible to what has been previously described (OMIM # 614019). Unfortunately, we were unable to detect the second variant in this gene. The patient of FIN48 has a 106 kb heterozygous deletion at 22q13.33 ( Figure S1F; S2C), which was not found in his unaffected mother. The deletion covers SHANK3 for which 15 C-terminal exons are deleted. The participant's phenotype is characterized by normal growth, hypotonia, absent speech, severe ID, and autistic features ( Figure S1F). He also has high pain tolerance, aggressions, and hand movements resembling choreoathetosis. The phenotype is compatible with the 22q13.3 deletion syndrome (OMIM # 606232; PHMDS).

Runs of homozygosity analysis
As the Finnish population is an isolated population and only a limited number of homozygous variants were identified, we analyzed the absence of heterozygosity and level of inbreeding in comparison to an inbred and outbred population (Table 4). Overall, the number (NSEG) and size of runs (Mb: Total length of runs in Mb; MbA: Average length of runs in Mb) of homozygosity was higher in the Finnish 1 3 individuals with ID (FIN) compared to individuals of non-Finnish European (EUR) ancestry (Table 4; p NSEG = 0.003; p Mb = 3.4 × 10 -4 ; p MbA = 0.002), which was more pronounced in the Finnish individuals from the North Eastern sub-isolate Kainuu region versus individuals from other regions in Finland (Table 4; p NSEG = 0.014; p Mb = 0.006; p MbA = 0.015). However, the inbreeding coefficients (IBC; Fhat1-3; Table 4), all measures of inbreeding, did not demonstrate excess homozygosity or a higher level of inbreeding in the FIN groups compared to EUR (Table 4; p Fhat1 = 0.927; p Fhat2 = 0.370; p Fhat3 = 0.170). Increased IBCs were only seen in an inbred population from Pakistan compared to both EUR and FIN (Table 4; p Fhat1-3 < 1.0 × 10 -15 ). Parental consanguinity was only reported in three families, FIN21, FIN-ID3 and FIN-ID9, of them two harbor AR candidate variants. No additional families were found to be consanguineous based on identity-by-descent analysis and KING.

Discussion
Our study demonstrates that de novo variants are the most common cause of ID in the founder population of Finland. Of the 39 families, known or novel likely pathogenic and pathogenic variants and SVs in previously identified IDgenes were found in 25 families (64%). We suggested a phenotypic extension in five families (13%), an alternate inheritance model in three families (8%), and an abnormal molecular karyotype finding in three families (8%). For a total of 56% of families de novo (or suspected de novo) variants (22/39, including SVs and mosaic variants) were identified. Eighteen (46%) families had de novo variants in known ID genes, which is in line with previously published studies in European populations (Martin et al. 2018). The number of X-linked variants, of them two suspected de novo and three inherited (5/39; 13%) is in agreement with previous studies (de Brouwer et al. 2007). Dual molecular diagnosis was suggested in two families (FIN35, FIN53) (5%) and dual suspected genetic diagnoses in three families (FIN12, FIN-ID8, FIN-ID10) ( Table 1). This is consistent with other reports in patients referred to exome sequencing in a clinical setting (Posey et al. 2017). There is evidence that 6% of individuals with autozygosity equivalent to first cousin marriage or greater have a plausibly pathogenic de novo variant in developmental disorders (Deciphering Developmental Disorders Study 2017). This is notable as autosomal recessive variants are known to contribute to ID in populations with high consanguinity (Monies et al. 2017) and isolated populations (Peltonen et al. 1999). Both situations can lead to the absence of heterozygosity (AOH). Although our analysis shows longer stretches of homozygosity exist in the Finnish population compared to mixed populations, there is no excess homozygosity found based on IBC calculations (Table 4). This finding may reflect the distant relationships in the Finnish population traced back to the internal migrations during the sixteenth century (Peltonen et al. 1999;Polla et al. 2019) and that distant relationship reduces the prevalence of ARID. In this study, 5% (2/39) of the families had variants inherited with an AR mode of inheritance which were diagnostic (pathogenic or likely pathogenic), and 15% (6/39) of the families had suspected causal AR variants (VUS) (Tables 1 and 2). This result shows that Finns have a similar or slightly increased contribution of recessive IDcausing variants than mixed European populations (Martin et al. 2018), however, much lower than seen in inbred populations (49%) (Anazi et al. 2017). In fact, previous studies have indicated that recent consanguinity is more important than small population size for detecting a strong effect of AR variants (Mooney et al. 2018), and here, the same trend for ID as for outbred populations is seen, i.e. the majority of variants are de novo.
We identified four homozygous variants which segregated with the disease trait in an AR manner and are either unique to or show enrichment in allele frequency in the Finnish population (Tables 1, 2; Table S1). Three homozygous variants in candidate genes originated from the North Eastern part of Finland where an increased size and frequency in runs of homozygosity were detected (Table 4). First, a homozygous missense variant [p.(Cys51Tyr)] in SYPL1 (OMIM # 616665), was enriched in the Finnish population in gno-mAD. Based on our exome data and additional screening in this sparsely populated region of Finland, we found a carrier frequency of 1:37 suggesting a founder effect for the SYPL1 variant. No homozygous variants were found in unaffected individuals. In line with several AR disorders that have been discovered in North-Eastern Finland (Peltonen et al. 1999), SYPL1, ERGIC3, and ZC3H14 may be novel founder variants in this sparsely populated region of Finland.
The high presence of de novo variants supports the hypothesis of clan genomics, the concept that novel rare variation more significantly contributes to disease in populations, in the development of ID (Lupski et al. 2011). However, in addition to the known Finnish founder variant in CRADD [p.(Arg170His)] we identified in family FIN38 (Polla et al. 2019), all the homozygous variants identified were found to be present at a low frequency in gnomAD as well, either exclusive or with a larger frequency in the Finnish population (Table 2). Therefore, some of these might represent older founder alleles enriched due to the unique history of the Finnish population.
We also identified 3 genes which followed a different inheritance model yet still showing a similar disorder (Table 1). DDX47 and DHX58 are members of the DDX/ DHX family which has recently been implicated in neurodevelopmental disorders (Paine et al. 2019). However, the severe clinical phenotype, present as a newborn, resembles 1 3 previous cases (Paine et al. 2019). In fact, a majority of the published variants in the DExD/H-box RNA helicase genes have been de novo. There are several examples of both autosomal dominant and recessive inheritance in neurodevelopmental disorders leading to a slight variability in phenotype have been described (Harel et al. 2016). The DDX/DHX family also shows several genes with both mono and biallelic variants suggested to be implicated in neurodevelopmental disease (Paine et al. 2019). The presentation of a phenotype in an AR or AD phenotype is likely related to the impact of the variant on protein function. Due to the LOF tolerance of several of the DDX/DHX genes implicated in neurodevelopment, including DDX47 and DHX58, AD variants are more likely to have a gain-of-function or dominant-negative effect.
The majority of novel candidates also displayed de novo variants. Several of these genes had striking similarities with syndromes associated with a human paralog gene. For example, in FIN23-3, a candidate de novo variant [p.(Lys181Glu)] in 1,4,5-triphosphate receptor, type 2 (ITRP2) was identified. Previously, there was only one variant in ITPR2 reported in OMIM-database [p.(Gly2498Ser)] that causes AR anhidrosis with normal sweat gland in one family (OMIM #106190). A closer look at the phenotype suggests that the clinical features resemble more Gillespie syndrome (Table 1; Suppl information; Figure S1B and S2C) caused by both AD and AR variants in the paralog gene ITPR1 (OMIM # 147265). The domain structure between ITPR1 and ITPR2 seems similar. The p.(Lys181Glu) ITPR2 variant is located at the N-terminal region (IP3 binding domain), where many of the causative variants are located. The homologous region and phenotype similarities suggests that ITPR2 may cause the phenotype observed in FIN23-3.
SVs were found in three families (8%). The result is in line with other molecular karyotyping studies in ID (6%-12%) (Moeschler et al. 2014;Cheng et al. 2019). Perhaps the most interesting SV is the complex upd21q22.12-22mat/2.7 Mb del mosaicism that covers DYRK1A and KCNJ6. The maternal UPD involving the 21q22.11q22.3 region arose mitotically. This is probably a correction mechanism to the deleterious 21q22.12q22.2-deletion originated in the paternal chromosome (Jongmans et al. 2012), and a proportion of the 21q22.12q22.2-deletion and UPD might differ between tissues. DYRK1A gene deletion mosaicism has been described earlier at least in five cases (Oegema et al. 2010;Yamamoto et al. 2011), of them, the clinical picture of the patient with the largest deletion (11 Mb) mosaicism was more severe. UPD of chromosome 21 is not known to cause any specific syndrome, but isodisomic regions are risk areas for homozygosity in recessive disease genes. In this patient, the most relevant region is 21q22.11q22.3 presenting 78% homozygosity. Individual cell populations cannot be detected by array analysis, but the homozygous regions are most probably formed independently, presenting UPD of 21q22.11qter (48%), 21q21.3qter (15%), and 21q21.1qter (15%) cell lines. Still, three successive recombination events leading first to UPD (21q22.11q22.3) homozygosity followed by the formation of UPD (21q21.3q22.11) and UPD (21q21.1q21.3) regions cannot be ruled out. The formation of three different homozygosity regions might also cause disruption of the function of the genes located in recombination areas.
In conclusion, our study shows that de novo variants represent the most frequent cause of intellectual disorders in the Finnish founder population. In addition, we expand the phenotypic and genotypic spectrum of several ID genes and present novel candidate genes that could be involved in ID etiology.
1 3 ceives revenue from clinical genetic testing conducted at Baylor Genetics (BG) Laboratories. Other authors declare no conflict of interest.
Code availability (software application or custom code) Bioinformatic analyses were done with publicly available software packages as described in the methods. Custom codes to aid in variant filtering are available upon request.

Consent to participate
The parents or legal guardians of all patients and their healthy siblings in this study provided written informed consent to participate in the study. The study was approved by the ethics committees of the Hospital District of Helsinki and Uusimaa (# HUS/2532/2017) and the Institutional Review Boards of Columbia University (IRB-AAAS3433) and Baylor College of Medicine (protocol H-29697).

Consent to publish
The parents or legal guardians of all patients and their healthy siblings in this study provided written consent to publish the results of this study. A separate written informed consent was obtained from the parents or legal guardians to publish the photos of the patients. The study was approved by the ethics committees of the Hospital District of Helsinki and Uusimaa (# HUS/2532/2017) and the Institutional Review Boards of Columbia University (IRB-AAAS3433) and Baylor College of Medicine (protocol H-29697).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. 1 3