Loss of UGP2 in brain leads to a severe epileptic encephalopathy, emphasizing that bi-allelic isoform-specific start-loss mutations of essential genes can cause genetic diseases
Developmental and/or epileptic encephalopathies (DEEs) are a group of devastating genetic disorders, resulting in early-onset, therapy-resistant seizures and developmental delay. Here we report on 22 individuals from 15 families presenting with a severe form of intractable epilepsy, severe developmental delay, progressive microcephaly, visual disturbance and similar minor dysmorphisms. Whole exome sequencing identified a recurrent, homozygous variant (chr2:64083454A > G) in the essential UDP-glucose pyrophosphorylase (UGP2) gene in all probands. This rare variant results in a tolerable Met12Val missense change of the longer UGP2 protein isoform but causes a disruption of the start codon of the shorter isoform, which is predominant in brain. We show that the absence of the shorter isoform leads to a reduction of functional UGP2 enzyme in neural stem cells, leading to altered glycogen metabolism, upregulated unfolded protein response and premature neuronal differentiation, as modeled during pluripotent stem cell differentiation in vitro. In contrast, the complete lack of all UGP2 isoforms leads to differentiation defects in multiple lineages in human cells. Reduced expression of Ugp2a/Ugp2b in vivo in zebrafish mimics visual disturbance and mutant animals show a behavioral phenotype. Our study identifies a recurrent start codon mutation in UGP2 as a cause of a novel autosomal recessive DEE syndrome. Importantly, it also shows that isoform-specific start-loss mutations causing expression loss of a tissue-relevant isoform of an essential protein can cause a genetic disease, even when an organism-wide protein absence is incompatible with life. We provide additional examples where a similar disease mechanism applies.
KeywordsEpileptic encephalopathy UGP2 ATG mutations Start-loss mutation Genetics Whole exome sequencing Microcephaly Recurrent mutation Founder mutation Essential gene
Developmental and/or epileptic encephalopathies (DEEs) are a heterogeneous group of genetic disorders, characterized by severe epileptic seizures in combination with developmental delay or regression . Genes involved in multiple pathophysiological pathways have been implicated in DEEs, including synaptic impairment, ion channel alterations, transporter defects and metabolic processes such as disorders of glycosylation . Mostly, dominant acting, de novo mutations have been identified in children suffering from DEEs , and only a limited number of genes with a recessive mode of inheritance are known so far, with a higher occurrence rate in consanguineous populations . A recent cohort study on DEEs employing whole exome sequencing (WES) and copy number analysis, however, found that up to 38% of diagnosed cases might be caused by recessive genes, indicating that the importance of this mode of inheritance in DEEs has been underestimated .
The human genome contains ~ 20,000 genes of which more than 5000 have been implicated in genetic disorders. Wide-scale population genomic studies and CRISPR–Cas9-based loss-of-function (LoF) screens have identified around 3000–7000 genes that are essential for the viability of the human organism or result in profound loss of fitness when mutated, in agreement with that they are depleted for LoF variants in the human population . For some of these essential genes, it is believed that LoF variants are incompatible with life and are, therefore, unlikely to be implicated in genetic disorders presenting in postnatal life . One such example is the UDP-glucose pyrophosphorylase (UGP2) gene at chromosome 2. UGP2 is an essential octameric enzyme in nucleotide sugar metabolism [38, 39, 121], as it is the only known enzyme capable of catalyzing the conversion of glucose-1-phosphate to UDP-glucose [36, 108]. UDP-glucose is a crucial precursor for the production of glycogen by glycogen synthase (GYS) [2, 44], and also serves as a substrate for UDP-glucose:glycoprotein transferases (UGGT) and UDP-glucose-6-dehydrogenase (UGDH), thereby playing important roles in glycoprotein folding control, glycoconjugation and UDP-glucuronic acid synthesis. The latter is an obligate precursor for the synthesis of glycosaminoglycans and proteoglycans of the extracellular matrix [65, 110], of which aberrations have been associated with DEEs and neurological disorders [4, 24, 77, 98]. UGP2 has previously been identified as a marker protein in various types of malignancies including gliomas where its upregulation is correlated with a poor disease outcome [27, 59, 61, 101, 103, 111, 112, 122], but has so far not been implicated in genetic diseases and it has been speculated that this is given its essential role in metabolism .
Many genes are differentially expressed amongst tissues, regulated by non-coding regulatory elements . In addition, it has become clear that there are more than 40,000 protein isoforms encoded in the human genome, whose expression levels vary amongst tissues. Although there are examples of genetic disorders caused by the loss of tissue-specific protein isoforms [41, 47, 57, 100], it is unknown whether a tissue-relevant loss of an essential gene can be involved in human disease. Here, we report on such a scenario, providing evidence that a novel form of a severe DEE syndrome is caused by the brain-relevant loss of the essential gene UGP2 due to an isoform-specific and germ line-transmitted start codon mutation. We present data that this is likely a more frequent disease mechanism in human genetics, illustrating that essential genes for which organism-wide loss is lethal can still be implicated in genetic disease when only absent in certain tissues due to expression misregulation.
All affected probands were investigated by their referring physicians and all genetic analyses were performed in a diagnostic setting. Legal guardians of affected probands gave informed consent for genomic investigations and publication of their anonymized data.
Next-generation sequencing of index patients
Genomic DNA was isolated from peripheral blood leukocytes of the proband and both parents, and exome-coding DNA was captured with the Agilent SureSelect Clinical Research Exome (CRE) kit (v2). Sequencing was performed on an Illumina HiSeq 4000 with 150-bp paired-end reads. Reads were aligned to hg19 using BWA (BWA-MEM v0.7.13) and variants were called using the GATK haplotype caller (v3.7 (reference: https://www.broadinstitute.org/gatk/) . Detected variants were annotated, filtered and prioritized using the Bench lab NGS v5.0.2 platform (Agilent technologies). Initially, only genes known to be involved in epilepsy were analyzed, followed by a full exome analysis revealing the homozygous UGP2 variant.
Individuals 2, 3 and 4
Using genomic DNA from the proband and parents (individual 4) or the proband, parents, and affected sibling (individuals 2 and 3), the exonic regions and flanking splice junctions of the genome were captured using the SureSelect Human All Exon V4 (50 Mb) (individual 4) or the IDT xGen Exome Research Panel v1.0 (individuals 2 and 3). Massively parallel (NextGen) sequencing was done on an Illumina system with 100 bp or greater paired-end reads. Reads were aligned to human genome build GRCh37/UCSC hg19 and analyzed for sequence variants using a custom-developed analysis tool. Additional sequencing technology and variant interpretation protocol has been previously described . The general assertion criteria for variant classification are publicly available on the GeneDx ClinVar submission page (https://www.ncbi.nlm.nih.gov/clinvar/submitters/26957/).
Diagnostic exome sequencing was done at the Departments of Human Genetics of the Radboud University Medical Center Nijmegen, The Netherlands, and performed essentially as described previously .
Individuals 6, 7, 8, 9, 10, 14, 15, 16, 17 and 18
After informed consent, we collected blood samples from the probands, their parents and unaffected siblings, and extracted DNA using standard procedures. To investigate the genetic cause of the disease, WES was performed in the affected proband. Nextera Rapid Capture Enrichment kit (Illumina) was used according to the manufacturer’s instructions. Libraries were sequenced in an Illumina HiSeq3000 using a 100-bp paired-end reads protocol. Sequence alignment to the human reference genome (UCSC hg19), variants calling, and annotation were performed as described elsewhere . After removing all synonymous changes, we filtered single nucleotide variants (SNVs) and indels, only considering exonic and donor/acceptor splicing variants. In accordance with the pedigree and phenotype, priority was given to rare variants [< 1% in public databases, including 1000 Genomes project, NHLBI Exome Variant Server, Complete Genomics 69, and Exome Aggregation Consortium (ExAC v0.2)] that were fitting a recessive or a de novo model. After identifying the UGP2 variant in the proband, Sanger sequencing was used to confirm segregation in other affected and unaffected family members.
Individuals 11, 20, 21 and 22
Whole exome sequencing was performed at CENTOGENE AG, as previously described .
Individuals 12 and 13
High-quality DNA was used to capture exons using the SureSelect kit (Agilent, Santa Clara, CA, US). Then genomic libraries were created according to the manufacturer’s protocols. Sequences were read on Proton (Life Technologies Inc., Carlsbad, CA, US). Downstream analyses such as sequence alignment, indexing and raw variant calling were done using publicly and commercially available tools such as Ion Reporter, SAMTools, and Genomic Analysis ToolKit. Moreover, variant interrogations were done using sequence-variant databases, such as dbSNP, Ensembl, and the National Heart, Lung, and Blood Institute (NHLBI) Exome Variant Server (EVS), 1000 genome project.
Whole exome sequencing was performed in a diagnostic setting at MEDGENOME, India. DNA extracted from blood was used to perform targeted gene capture using the Agilent SureSelect V5 exome capture kit. The libraries were sequenced to mean > 80–100 × coverage on Illumina sequencing platform. GATK best practices framework was used for variant identification using Sentieon (v201808.01), sequences obtained were aligned to GRCh37/hg19 using Sentieon aligner and analyzed using Sentieon for removing duplicates, recalibration and re-alignment of indels. Sentieon haplotypecaller has been used to identify variants which are relevant to the clinical indication. Gene annotation of the variants was performed using VEP program against the Ensemble release 91 human gene model.
Human brain samples
Tissue was obtained, upon informed consent, and used in a manner compliant with the Declaration of Helsinki and the Research Code provided by the local ethical committees. Fetal brains were preserved after spontaneous or induced abortions with appropriate written consent for brain autopsy and use of rest material for research. We performed a careful histological and immunohistochemical analysis, and evaluation of clinical data (including genetic data, when available). We only included specimens displaying a normal cortical structure for the corresponding age and without any significant brain pathology.
Brain tissue immunohistochemistry
For immunohistochemical analysis, we used two cases from the first trimester (GW6 and GW9), four cases from the second trimester (GW21, GW23, GW24 and GW26) and two cases from the third trimester (GW33 and GW36). Anatomical regions were determined according to the atlas of human brain development [11, 12, 13, 14]. We cut 4-µm sections from formalin-fixed, paraffin-embedded whole fetuses (GW6 and GW9) and brain tissue from cerebral, mesencephalic, cerebellar and brain stem regions (from GW21 to GW36). Slides were stained with mouse anti-UGP2 (C-6) in a 1:150 dilution (Santa Cruz) and visualized using Mouse and Rabbit Specific HRP/DAB (ABC) Detection IHC kit (Abcam). Mayer’s hematoxylin was used as a counterstain for immunohistochemistry followed by mounting and coverslipping (Bio-Optica) for slides. Prepared slides were analyzed and scanned under a VisionTek® Live Digital Microscope (Sakura).
Cloning of UGP2 cDNA
RNA was isolated using TRI reagent (Sigma) from whole peripheral blood of index patient 1 and her parents, after red blood cell depletion with RBC lysis buffer (168 mM NH4Cl, 10 mM KHCO3, 0.1 mM EDTA). cDNA was synthesized following the iSCRIPT cDNA Synthesis Kit (Bio-Rad) protocol, and the coding sequence of the long and short UGP2 isoform (wild type or mutant) was PCR amplified together with homology arms for Gibson assembly (see Supplementary Table 8, online resource, for primer sequences) using Phusion High-Fidelity DNA polymerase (NEB). PCR-amplified DNA was then cloned by Gibson assembly as previously described  in a pPyCAG-IRES-puro plasmid (a kind gift from Ian Chambers, Edinburgh) opened with EcoRI for experiments in mammalian cells. All obtained plasmids were sequence verified by Sanger sequencing (complete plasmid sequences available upon request).
Fibroblast cell culture
Fibroblasts from index patient 1 and her parents were obtained using a punch biopsy according to standard procedures, upon informed consent (IRB approval MEC-2017-341). Fibroblasts from the parents of index patients 2 and 3 were also obtained upon informed consent at McMaster Children’s Hospital. All fibroblasts were cultured in standard DMEM medium supplemented with 15% fetal calf serum, MEM non-essential amino acids (Sigma), 100 U/ml penicillin and 100 µg/ml streptomycin, as done previously , in routine humidified cell culture incubators at 20% O2. Fibroblast cell lines were transfected using Lipofectamine 3000 (Invitrogen) with the indicated plasmid constructs. All cell lines used in this report were regularly checked for the presence of mycoplasma and were negative during all experiments.
Genome engineering in human embryonic stem cells
H9 human embryonic stem cells were cultured as previously described [8, 9]. In short, cells were maintained on feeder-free conditions in mTeSR-1 medium (STEMCELL technologies) on Matrigel (Corning)-coated culture dishes. To engineer the patient-specific UGP2 mutation by homologous recombination , ESCs were transfected using Lipofectamine 3000 with a plasmid expressing eSpCas9-t2a-GFP (a kind gift of Feng Zhang) and a gRNA targeting the UGP2 gene (see Supplementary Table 8, online resource, for the sequence), together with a 60-bp single-stranded oligonucleotide (ssODN) homology template encoding the patient mutation (synthesized at IDT). To increase the stability of the ssODN and, therefore, homologous recombination efficiency, the first two 5′ and 3′ nucleotides were synthesized using phosphorothiorate bonds . 48 h post-transfection, GFP-expressing cells were sorted, and 6000 single GFP-positive cells were plated on a Matrigel-coated six-well plate in the presence of 10 µM ROCK-inhibitor (Y27632, Millipore). After approximately 10 days, single colonies where manually picked, expanded and genotyped using Sanger sequencing (see Supplementary Table 8, online resource, for primer sequences). As a by-product of non-homologous end joining, knockout clones were obtained which showed a single nucleotide A insertion at position 42 of UGP2 transcript 1 (chr2:64083462_64083463insA), leading to an out-of-frame transcript and a premature termination of the protein at amino acid position 47 (D15Rfs*33). Western blotting confirmed the absence of all UGP2 proteins in knockout clones and the loss of the short UGP2 isoform in clones with the patient mutation. To produce a stable rescue cell line, ESCs were transfected as previously described with the pPyCAG-IRES-puro plasmid expressing either the long WT or mutant UGP2 isoform. After 48 h, the population of cells with the transgene integration was selected with 1 µg/ml puromycin. Engineered ESC clones had a normal colony morphology and pluripotency factor expression.
Patient-specific induced pluripotent stem cell generation
Patient fibroblast cell lines were reprogrammed using the CytoTune™-iPS 2.0 Sendai Reprogramming Kit (Thermo Scientific, A16517) expressing the reprogramming factors OCT4, SOX2, KLF4 and C-MYC on Matrigel-coated cell culture plates, upon informed consent (IRB approval MEC-2017–341). After approximately 4–5 weeks, emerging colonies were manually picked and expanded. Multiple clones were assessed for their karyotype, pluripotency factor expression and three lineage differentiation potential (Stem Cell Technologies, #05230), following the routine procedures of the Erasmus MC iPS Cell core facility, as previously described . Sanger sequencing was used to verify the genotype of each obtained iPSC line. We used three validated clones for each individual in our experiments.
Neural stem cell differentiation
Pluripotent cells were differentiated in neural stem cells (NSCs), using a modified dual SMAD inhibition protocol . In short, 18,000 cells/cm2 were plated on Matrigel-coated cell culture dishes in mTeSR-1 medium in the presence of 10 µM Y27632. When cells reached 90% confluency, the medium was switched to differentiation medium (KnockOut DMEM (Gibco), 15% KnockOut serum replacement (Gibco), 2 mM l-glutamine (Gibco), MEM non-essential amino acids (Sigma), 0.1 mM β-mercaptoethanol, 100U/ml penicillin and 100 µg/ml streptomycin) supplemented with 2 µM A 83-01 (Tocris) and 2 µM Dorsomorphin (Sigma-Aldrich). At day 6, medium was changed to an equal ratio of differentiation medium and NSC medium (KnockOut DMEM-F12 (Gibco), 2 mM l-glutamine (Gibco), 20 ng/ml bFGF (Peprotech), 20 ng/ml EGF (Peprotech), 2% StemPro Neural supplement (Gibco), 100U/ml penicillin and 100 µg/ml streptomycin) supplemented with 2 µM A 83-01 (Tocris) and 2 µM Dorsomorphin (Sigma-Aldrich). At day 10, cells were passaged (NSC p = 0) using Accutase (Sigma) and maintained in NSC medium. We used commercially available H9-derived NSCs (Gibco) as a control (a kind gift from Raymond Poot, Rotterdam).
Other stem cell differentiation experiments
ESCs were differentiated into hematopoietic stem cells and cardiomyocytes using commercially available STEMCELL technology kits (STEMdiff Hematopoietic kit #05310, STEMdiff Cardiomyocyte differentiation kit #05010) according to the manufacturer’s instructions. Cells were finally harvested and lysed with TRI reagent to isolate RNA for further qRT-PCR analysis.
RNA-sequencing and data analysis
For patient RNA-seq, peripheral blood was obtained from index patient 1 and her parents, collected in PAX tubes and RNA was isolated following standard diagnostic procedures in the diagnostics unit of the Erasmus MC Clinical Genetics department. RNA-seq occurred in a diagnostic setting, and sequencing was performed at GenomeScan (Leiden, The Netherlands). For RNA-seq of in vitro-cultured cell lines, RNA was obtained from six-well cultures using TRI reagent, and further purified using column purification (Qiagen, #74204). mRNA capture, library prep including barcoding and sequencing on an Illumina HiSeq2500 machine were performed according to standard procedures of the Erasmus MC Biomics facility. Approximately 20 million reads were obtained per sample. For cell line experiments, two independent H9 wild-type cultures, two independent knockout clones harboring the same homozygous UGP2 genetic alteration and two independent clones harboring the patient homozygous UGP2 mutation were used. Each cell line was sequenced in two technical replicates at ESC state and differentiated NSC state (at passage 5). FASTQ files obtained after de-multiplexing of single-end, 50-bp sequencing reads were trimmed by removing possible adapters using Cutadapt after quality control checks on raw data using the FastQC tool. Trimmed reads were aligned to the human genome (hg38) using the HISAT2 aligner . To produce Genome Browser Tracks, aligned reads were converted to bedgraph using bedtools genomecov, after which the bedGraphToBigWig tool from the UCSC Genome Browser was used to create a bigwig file. Aligned reads were counted for each gene using htseq-count  and GenomicFeatures  was used to determine the gene length by merging all non-overlapping exons per gene from the Homo_sapiens.GRCh38.92.gtf file (Ensemble). Differential gene expression and RPKM (Reads Per Kilobase per Million) values were calculated using edgeR  after removing low-expressed genes and normalizing data. The threshold for significant differences in gene expression was FDR < 0.05. To obtain a list of ESC and NSC reference genes used in Supplementary Fig. 6F, online resource, we retrieved genes annotated in the following GO terms using GSEA/MSigDB web site v7.0: GO_FOREBRAIN_NEURON_DEVELOPMENT (GO:0021884), GO_CEREBRAL_CORTEX_DEVELOPMENT (GO:0021987), GO_NEURAL_TUBE_DEVELOPMENT (GO:0021915), BHATTACHARYA_EMBRYONIC_STEM_CELL (PMID: 15070671) and BENPORATH_NOS_TARGETS (PMID: 18443585).
Functional enrichment analysis
Metascape , g:profiler  and Enrichr  were used to assess functional enrichment of differentially expressed genes. Supplementary Table 4, online resource, reports all outputs in LogP, log(q value) and Adjusted p value (q value) for Metascape and g:profiler, and in p value, Adjusted p value (q value) and combined score (which is the estimation of significance based on the combination of Fisher's exact test p value and z score deviation from the expected rank) for Enrichr. All tools were used with default parameters and whole genome set as background.
Genome-wide homology search
To make a genome-wide list of transcripts sharing a similar structure as UGP2 transcripts, 42,976 transcripts from 21,522 genes (Human genes GRCh38.p12) were extracted using BioMart of Ensembl (biomaRt R package). 11,056 out of 21,522 genes had only 1 transcript and the remaining 31,920 transcripts from 10,466 genes were selected, the protein sequences were obtained with biomaRt R package and homology analysis was performed using the NCBI’s blastp (formatting option: -outfmt = 6) command line. We grouped longest and shorter transcript based on coding sequence length and only kept those that matched a pairwise homology comparison between the longest and the shorter transcript with the following criteria: complete 100 percent identity, without any gap and mismatch, and starting ATG codon of shortest transcript being part of the longest transcript(s). 1766 genes meet these criteria. We then filtered these genes for published essential genes , leaving us with 1197 genes. Using BioMart (Attributes: Phenotype description and Study external reference) of Ensembl we then evaluated the probability that these genes were implicated in disease and identified 850 genes that did not have an association with disease phenotype/OMIM number. Of those, 247 genes encoded proteins of which the shorter isoform differed less than 50 amino acids from the longer isoform. We chose this arbitrary threshold to exclude those genes where both isoforms could encode proteins differing largely in size and might, therefore, encode functionally completely differing proteins (although we cannot exclude that this will also hold true for some of the genes in our selection).
Differential isoform expression in fetal tissues
Publically available RNA-seq data from various fetal tissue samples (Supplementary Table 2, online resource) were analyzed using the same workflow as described for the RNA-seq data analysis above. To determine differential isoform expression in these tissues, we calculated a ratio between the unique exon(s) of the shortest and longest transcript for each gene and assessed its variability across different fetal tissue samples. The number of reads for each unique exon of a transcript was calculated by mapping aligned RNA-seq reads against the unique exon coordinate using bedtools multicov. The longest and shortest transcripts were separated and the transcript ratio (number of counts of shortest transcript/(number of counts of shortest transcript + number of counts of longest transcript)) for each gene was obtained from the average reads of RNA-seq samples per tissue. 382 genes out of 1197 genes showed high variability across different samples (defined as a difference between highest and lowest ratio > 0.5), 277 of those highly variable genes were not associated with a disease phenotype/OMIM number and of these 83 genes had a length less than 50 amino acids (a subset of the 247 genes with no OMIM and length less than 50 amino acids).
ROH around the UGP2 variant was identified in all five probands examined. The minimum ROH in common between all samples was a 5-Mb region at chr2: 60679942–65667235. We note that targeted sequencing leads to uneven SNP density, so the shared ROH may, in fact, be larger or smaller. Next, we used recombination maps from deCODE to estimate the size of the region in centiMorgans (cM). We then used the region size in cM to estimate the time to event in generations using methods previously described .
RNA was obtained using TRI reagent, and cDNA prepared using iSCRIPT cDNA Synthesis Kit according to the manufacturer’s instructions. qPCR was performed using iTaq universal SYBR Green Supermix in a CFX96RTS thermal cycler (Bio-Rad). Supplementary Table 8, online resource, summarizes all primers used in this study. Relative gene expression was determined following the ΔΔct method. To calculate the ratio of the short isoform, we performed absolute quantification as previously described . Briefly, we performed qPCR on known copy numbers, ranging from 103 to 108 copies, of a plasmid containing the short UGP2 isoform (5′ UTR included) using primers detecting specifically either the total or the short isoform. After plotting the log copy number versus the ct, we obtained a standard curve that we used to extrapolate the copy number of the unknown samples. To test for significance, we used Student’s T test and considered p < 0.05 as significant.
Proteins were extracted with NE buffer (20 mM HEPES, pH 7.6, 1.5 mM MgCl2, 350 mM KCl, 0.2 mM EDTA and 20% glycerol) supplemented with 0.5% NP40, 0.5 mM DTT, cOmplete Protease Inhibitor Cocktail (Roche) and 150 U/ml benzonase. Protein concentration was determined by BCA (Pierce) and 20–50 µg of proteins was loaded onto a 4–15% Criterion TGX gel (Bio-Rad). Proteins were then transferred to a nitrocellulose membrane using the Trans-Blot Turbo Transfer System (Bio-Rad). The membrane was blocked in 5% milk in PBST and subsequently incubated overnight at 4 °C with primary antibody diluted in milk. After PBST washes, the membrane was incubated 1 h at RT with the secondary antibody and imaged with an Odyssey CLX scanning system (Li-Cor). Band intensities were quantified using Image Studio (Li-cor). Antibodies used were Ms-α-UGP2 (sc-514174) 1:250; Ms-α-Vinculin (sc-59803) 1:10,000; Gt-α-actin (sc-1616) 1:500; Ms-α-LAMP2 (H4B4) 1:200; IRDye 800CW Goat anti-Mouse (926-32210) 1:5000; IRDye 680 Donkey anti-Goat (926-32224) 1:5000.
Zebrafish disease modeling
Animal experiments were approved by the Animal Experimentation Committee at Erasmus MC, Rotterdam. Zebrafish embryos and larvae were kept at 28 °C on a 14–10‐h light–dark cycle in 1 M HEPES buffered (pH 7.2) E3 medium (34.8 g NaCl, 1.6 g KCl, 5.8 g CaCl2·2H2O, 9.78 g MgCl2·6H2O). For live imaging, the medium was changed at 1 dpf to E3 + 0.003% 1‐phenyl 2‐thiourea (PTU) to prevent pigmentation. Ugp2a and ugp2b were targeted by Cas9/gRNA RNP complex as we did before . Briefly, fertilized oocytes from a tgBAC(slc1a2b:Citrine)re01tg reporter line  maintained on an TL background strain were obtained, and injected with Cas9 protein and crRNA and tracrRNA synthesized by IDT (Alt-R CRISPR–Cas9 System), targeting the open reading frame of zebrafish ugp2a and ugp2b. DNA was extracted from fin clips and used for genotyping using primers flanking the gRNA location (Supplementary Table 8, online resource) followed by sequencing. Mutants with a high level of out-of-frame indels in both genes were identified using TIDE  and intercrossed to obtain germ line transmission. Upon re-genotyping, mutant zebrafish with the following mutations as indicated in Fig. 6 were selected and further intercrossed. In this study, we describe two new mutant fish lines containing deletions in ugp2a (ugp2aΔ/Δ) and ugp2b (ugp2bΔ/Δ): ugp2are08/re08 containing a 37 bp deletion in exon 2 and ugp2bre09/re09 containing a 5 bp deletion in exon 2. Intravital imaging, and analysis of eye movement, was performed as previously described . Briefly, zebrafish larvae anesthetized in tricaine were mounted in low-melting point agarose-containing tricaine and imaged using a Leica SP5 intravital imaging setup with a 20 × /1.0 NA water-dipping lens. To assess the locomotor activity of zebrafish larvae from 3 to 5 dpf, locomotor activity assays were performed using an infrared camera system (DanioVision™ Observation chamber, Noldus) and using EthoVision® XT software (Noldus) as described . Briefly, control (n = 24) and ugp2aΔ/Δ; ugp2bΔ/Δ (n = 24) zebrafish larvae, in 48-well plates, were subjected to gradually increasing (to bright light) and decreasing light conditions (darkness) as in Kuil et al. . Distance traveled (mm) per second was measured. For 4-AP (Sigma) stimulation, animals were treated with 4-AP dissolved in DMSO 30 min before the onset of the experiments. For these experiments, locomotor activity was measured over 35 min, with the first 5 min going from dark to light, followed by 30 min under constant light exposure.
Periodic acid–Schiff (PAS) staining
ESCs or differentiated NSCs (wild type, KO, KI or rescue) were incubated under hypoxia conditions (3% O2) for 48 h. Cells were fixed with 5.2% formaldehyde in ethanol, incubated 10 min with 1% periodic acid, 15 min at 37 °C with Schiff’s reagent (Merck) and 5 min with hematoxylin solution (Klinipath) prior to air drying and mounting. Every step of the protocol is followed by a 10-min wash with tap water. Imaging occurred on an Olympus BX40 microscope. Images were acquired at a 100 × magnification, and ImageJ software was used for quantification. For ESCs, we used a minimum of 20 images per genotype for the quantification, containing on average 20 cells each, calculating the percentage of PAS-positive area. For NSCs, we imaged between 80 and 100 cells per genotype, counting the number of glycogen granules in the cytoplasm. We report the average of two independent experiments at 48 h low oxygen.
UGP2 enzymatic activity
The measurement of UGP2 enzyme activity was performed according to a modified GALT enzyme activity assay as described previously . Frozen cell pellets were defrosted and homogenized on ice. 10 µl of each cell homogenate (around 0.5 mg protein/ml as established by BSA protein concentration determination) was pre-incubated with 10 µl of dithiothreitol (DDT) for 5 min at 25 °C. 80 µl of a mixture of glucose-1-phosphate (final concentration 1 mM), UTP (0.2 mM), magnesium chloride (1 mM), glycine (125 mM) and Tris–HCl (pH8) (40 mM) was added and incubated for another 15 min at 25 °C. The reaction was stopped by adding 150 µl of 3.3% perchloric acid. After 10 min on ice, the mixture was centrifuged (10,000 rpm for 5 min at 4 °C), the supernatant isolated and neutralized with ice-cold 8 µl potassium carbonate for 10 min on ice. After centrifugation, the supernatant was isolated and 1:1 diluted with eluent B (see below) after which the mixture was added to a MilliPore Amicon centrifugal filter unit. After centrifugation, the supernatant was stored at − 20 °C until use. The separation was performed by injection of 10 µl of the defrosted supernatant onto a HPLC system with UV/VIS detector (wavelength 262 nm) equipped with a reversed-phase Supelcosil LC-18-S 150 mm × 4.6 mm, particle size 5 µm, analytical column and Supelguard LC18S guard column (Sigma-Aldrich). During the experiments, the temperature of the column was maintained at 25 °C. The mobile phase consisted of eluent A (100% methanol) and eluent B (50 mM ammonium phosphate buffer pH 7.0 and 4 mM tetrabutylammonium bisulphate). A gradient of 99% eluent B (0–20 min), 75% eluent B (20–30 min) and 99% eluent B (30–45 min) at a flow rate of 0.5 m/min was used. The reaction product UDP-glucose was quantified using a calibration curve with known concentrations of UDP-glucose. UGP2 activity was expressed as the amount of UDP-glucose formed per mg protein per min. Experiments were performed in duplicate and for every cell line two independently grown cell pellets were used.
For immunofluorescence staining, cells were seeded on coverslips coated with 100 µg/ml poly-d-lysine (Sigma) overnight. For ESC, coverslips were further coated with Matrigel (Corning) for 1 h at 37 °C. At 70% confluency, cells were fixed with 4% PFA for 15 min at RT. Cells were then permeabilized with 0.5% Triton in PBS, incubated 1 h in blocking solution (3% BSA in PBS) and then overnight at 4 °C with the primary antibody diluted in blocking solution. The next day coverslips were incubated 1 h at room temperature in the dark with a Cy3-conjugated secondary antibody and mounted using ProLong Gold antifade reagent with DAPI (Invitrogen) to counterstain the nuclei. Images were acquired with a ZEISS Axio Imager M2 using a 63X objective.
RNA-Seq of in vitro studies is publicly available through the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) under accession number GSE137129. Due to privacy regulations and consent, raw RNA-seq data from patient blood and genomic sequencing data cannot be made available. To retrieve tissue wide expression levels of UGP2, the GTEx Portal was accessed on 16/07/2019 (https://gtexportal.org/home/). RNA-seq data from various tissues were downloaded from various publications [46, 83, 94, 118]. All publically available data that were re-analyzed here are summarized in Supplementary Table 2, online resource.
A recurrent ATG mutation in UGP2 in 22 individuals presenting with a severe DEE
Having identified at least 22 individuals with an almost identical clinical phenotype and an identical homozygous variant in the same gene led us to pursue UGP2 as a candidate gene for a new genetic form of DEE. UGP2 is highly expressed in various brain regions (Fig. 1f), and also widely expressed amongst other tissues, including liver and muscle according to the data from the GTEx portal  (Supplementary Fig. 1d, online resource). The (chr2:64083454A>G) variant is predicted to cause a missense variant (c.34A>G, p.Met12Val) in UGP2 isoform 1 (NM_006759), and to cause a translation start loss (c.1A>G, p.?.) of UGP2 isoform 2 (NM_001001521), referred to as long and short isoforms, respectively. The variant has not been reported in the Epi25 web browser , ClinVar , LOVD , Exome Variant Server , DECIPHER , GENESIS , GME variome  or Iranome databases , is absent from our in-house data bases and is found only 15 times in a heterozygous, but not homozygous, state in the 280,902 alleles present in gnomAD (MAF: 0.00005340) . In the GeneDx unaffected adult cohort, the variant was found heterozygous 10 times out of 173,502 alleles (MAF: 0.00005764), in the ~ 10,000 exomes of the Queen Square Genomic Center database two heterozygous individuals were identified, and out of 45,921 individuals in the Centogene cohort, 10 individuals are heterozygous for this variant. The identified variant has a CADD score (v1.4) of 19.22  and Mutation Taster  predicted this variant as disease causing. The nucleotide is strongly conserved over multiple species (Fig. 1 g). Analysis of WES data from 5 patients did provide evidence of a shared ROH between patients from different families (including the Dutch family), indicating that this same variant might represent an ancient mutation that originated some 26 generations ago (Supplementary Fig. 1c, online resource). Interestingly, since most families originally came from regions of India, Pakistan and Iran, overlapping with an area called Balochistan, this could indicate that the mutation has originated there around 600 years ago. As Dutch traders settled in that area in the seventeenth century, it is tempting to speculate that this could explain the co-occurrence of the variant in these distant places .
Short UGP2 isoform is predominantly expressed in brain and absent in patients with ATG mutations
Lack of the short UGP2 isoform leads to transcriptome changes upon differentiation into neural stem cells
To model the disease in vitro, we first engineered the homozygous A>G mutation in H9 ESCs to study the mutation in a patient independent genetic background and compare it to isogenic parental cells. We obtained two independent clones harboring the homozygous A>G change (referred to as knock-in, KI, mutant) and two cell lines harboring an insertion of an additional A after nucleotide position 42 of UGP2 transcript 1 (chr2:64083462_64083463insA) (Supplementary Fig. 3a, b, online resource) (referred to as knockout, KO). This causes a premature stop codon at amino acid position 47 (D15Rfs*33), leading to nonsense-mediated mRNA decay and complete absence of UGP2 protein (Supplementary Fig. 3c, online resource). All derived ESCs had a normal morphology and remained pluripotent as assessed by marker expression (Supplementary Fig. 3d, e, online resource), indicating that the absence of UGP2 in ESCs is tolerated, in agreement with genome-wide LoF CRISPR screens which did not identify UGP2 as an essential gene in ESCs [66, 119]. We differentiated wild type, KI and KO ESCs into NSCs, using dual SMAD inhibition (Supplementary Fig. 4a–c, online resource). Wild-type cells could readily differentiate into NSCs, having a normal morphology and marker expression, whereas differentiation of KI and KO cells was more variable and not all differentiations resulted in viable, proliferating NSCs. KO cells could not be propagated for more than five passages under NSC culture conditions (data not shown), which could indicate that the total absence of UGP2 protein is not tolerated in NSCs. When assessed by Western blotting, total UGP2 protein levels were reduced in KI cells and depleted in KO cells compared to wild type (Supplementary Fig. 4d, e, online resource).
Absence of short UGP2 isoform leads to metabolic defects in neural stem cells
ugp2a and ugp2b double mutant zebrafish recapitulate metabolic changes during brain development, have an abnormal behavioral phenotype, visual disturbance, and increased seizure susceptibility
UGP2 is an essential gene in humans and ATG mutations of tissue-specific isoforms of essential genes potentially cause more rare genetic diseases
Here we describe a recurrent variant in 22 individuals from 15 families, affecting the start codon of the shorter isoform of the essential gene UGP2 as a novel cause of a severe DEE. Using in vitro and in vivo disease modeling, we provide evidence that the reduction of UGP2 expression in brain cells leads to global transcriptome changes, a reduced ability to produce glycogen, alterations in glycosylation and increased sensitivity to ER stress, which together can explain the phenotype observed in the patients. Most likely our findings in vitro underestimate the downstream effects in patient cells, as in fetal brain, the longer isoform expression is almost completely silenced and virtually all UGP2 come from the shorter isoform, which in patient cells cannot be translated. During our in vitro NSC differentiation, this isoform switch is less complete, leaving cells with the patient mutation with some residual UGP2. Strikingly, the clinical phenotype seems to be very similar in all cases, including intractable seizures, absence of developmental milestones, progressive microcephaly and a disturbance of vision, with retinal pigment changes observed in all patients who had undergone ophthalmological examination. Also, all patients seem to share similar, although mild, dysmorphisms, possibly making this condition a recognizable syndrome.
The involvement of UGP2 in genetic disease is surprising. Given its central role in nucleotide-sugar metabolism it is expected that loss of this essential protein would be incompatible with life and, therefore, loss-of-function should not be found in association with postnatal disease. Our data argue that indeed a total absence of UGP2 in all cells is lethal, but that tissue-specific loss, as caused here by the start codon alteration of an isoform important for brain, can be compatible with postnatal development but still results in a severe phenotype. Given that any other LoF variant across this gene would most likely affect both protein isoforms, this could also explain why only a single mutation is found in all individuals. The fact that the Met12Val long isoform was able to rescue the full KO phenotype indicates that the missense change introduced to the long protein isoform does not affect UGP2 function. As other variants at this start codon, even heterozygous, are not found, possibly missense variants encoding for leucine, lysine, threonine, arginine or isoleucine (e.g., amino acids that would be encoded by alternative changes affecting the ATG codon) at this amino acid location in the long isoform could not produce a functional protein and are, therefore, not tolerated. Although start codon mutations have previously been implicated in disease [16, 19], there are no reports, to our knowledge, on disorders describing start codon alterations of other essential genes, leading to alterations of tissue-specific isoforms. Using a genome-wide homology search, we have identified a large list of other essential genes with a similar locus structure and variable isoform expression amongst tissues, where similar ATG-altering variants could affect tissue-relevant expression. An intriguing question is why evolution has resulted in a large number of genes encoding almost identical protein isoforms. It will be interesting to further explore the mutational landscape of these genes in cohorts of currently unexplained patients.
We are indebted to the parents of the patients for their kind cooperation. We thank Virginie Verhoeven and Gerben Schaaf for critically reading our manuscript and Grazia Mancini for helpful discussions. We thank Gerben Schaaf for providing the LAMP2 antibody, and Eskeatnaf Mulugeta for bioinformatics advice. We would like to thank Reviewer 1 for proposing the name “Barakat-Perenthaler-syndrome of developmental epileptic encephalopathy” for this new disorder. DP was supported by an Erasmus + Traineeship Programme. MAS was supported by the King Saud University (RSP-2019/38). AGES was supported by the Yale Center for Mendelian Genomics (NIH Grant M#UM1HG006504-05). HH is supported by the Rosetree Trust, Ataxia UK, MSA Trust, Brain Research UK, Muscular Dystrophy UK, Muscular Dystrophy Association, Higher Education Commission of Pakistan, The MRC (MR/S01165X/1, MR/S005021/1, G0601943), Wellcome Trust (WT093205MA, WT104033AIA, Synaptopathies Strategic Award, 165908) and National Institute for Health Research University College London Hospitals Biomedical Research Centre. Families 5–8 were collected as part of the SYNaPS Study Group collaboration funded by The Wellcome Trust and strategic award (Synaptopathies) funding. Research for these families was conducted as part of the Queen Square Genomics group at University College London, supported by the National Institute for Health Research University College London Hospitals Biomedical Research Centre. NK is supported by intramural funds provided by King Faisal Specialist Hospital and Research Center, the National Plan for Science, Technology and Innovation program under King Abdulaziz City for Science and Technology and the King Salman Center for Disability Research. TVH is supported by an Erasmus University Rotterdam (EUR) fellowship. TSB’s lab is supported by the Netherlands Organisation for Scientific Research (ZonMW Veni, Grant 91617021), a NARSAD Young Investigator Grant from the Brain & Behavior Research Foundation, an Erasmus MC Fellowship 2017 and Erasmus MC Human Disease Model Award 2018. TSB, IC and EA acknowledge support from COST action CA16118 that facilitated this collaboration.
EP performed molecular biology experiments, with help from AN and DP. HvdL, WB and TvH performed zebrafish work. PvdB and EHJ performed enzymatic analyses. IC performed brain immunohistochemistry and supplied tissue samples. EA supplied tissue samples. MG generated iPSCs. WvI and WGdV performed and SY analyzed RNA-seq. SY performed gene homology search. Patient recruitment and diagnosis was performed in the different families as follows: Family 1: TSB, ASB, and EM phenotyped patient 1, MvS analyzed WES; Family 2: LB and MK phenotyped patients 2 and 3, KGM, AB, KR analyzed WES; Family 3: JNK and JB phenotyped patient 4, KGM, AB, KR analyzed WES. Family 4: AaF, FaM, RM and FaA phenotyped patient 5, EJK analyzed WES; Family 5: FZ and NR phenotyped patient 6, SE, HH analyzed WES; Family 6, family 7 and family 8: MM, AE, ZK, FMD, MD, EGK phenotyped patients 7–10, JV, RM, HH analyzed WES; Family 9: JH phenotyped patient 11, KKK, ABA analyzed WES; Family 10: MA, MAA, MAS, MA, RA, LAQ, WQ, SC, KA, MHAH, SA, KA, AD, FA, DC and NK phenotyped patients 12 and 13, performed WES analysis and PGD; Family 11: MDe, MYVM, MG, AGES and RM performed WES and phenotyped patients 14–17; Family 12: GRP phenotyped patient 19; Family 13: HAC phenotyped patient 20, KKK, ABA analyzed WES; Family 14, patient 21, and family 15, patient 22: KKK, ABA analyzed WES. RT, KR, KKK, PB, ABA, RM, HH provided genetic data for population analysis. TSB identified patient 1, conceived the study, obtained funding, supervised the lab work and wrote the manuscript, with input from all main authors. All authors approved the final version of the manuscript.
Compliance with ethical standards
Conflict of interest
KGM, AB, RT and KR are employees of GeneDx, Inc. KR holds stock in OPKO Health, Inc. KKK, PB and ABA are employees of CENTOGENE AG.
- 1.(2010) Baluchistan i. Geography, history and ethnography. Encyclopædia Iranica City, pp fasc. 6, pp 598–632Google Scholar
- 9.Barakat TS, Halbritter F, Zhang M, Rendeiro AF, Perenthaler E, Bock C et al (2018) Functional dissection of the enhancer repertoire in human embryonic stem cells. Cell Stem Cell 23(276–288):e278Google Scholar
- 11.Bayer SA, Altman J (2004) Atlas of human central nervous system development: the human brain during the third trimester, vol 2. CRC Press, New YorkGoogle Scholar
- 14.Bayer SA, Altman J (2008) Atlas of human central nervous system development: the human brain during the early first trimester, vol 5. CRC Press, New YorkGoogle Scholar
- 15.Bertomeu T, Coulombe-Huntington J, Chatr-Aryamontri A, Bourdages KG, Coyaud E, Raught B et al (2018) A high-resolution genome-wide CRISPR/Cas9 viability screen reveals structural features and contextual diversity of the human cell-essential proteome. Mol Cell Biol 38:10Google Scholar
- 31.Epi25 Collaborative, Electronic address sbuea, Epi C (2019) Ultra-rare genetic variation in the epilepsies: a whole-exome sequencing study of 17,606 individuals. Am J Hum Genet 1:4Google Scholar
- 33.Exome Variant Server NHLBI GO Exome Sequencing Project (ESP) Seattle WA (accessed Juli 2019).Google Scholar
- 34.Fattahi Z, Beheshtian M, Mohseni M, Poustchi H, Sellars E, Nezhadi SH et al (2019) Iranome: a catalog of genomic variations in the Iranian population. Hum Mutat 1:4Google Scholar
- 42.Guo H, Zhang B, Nairn AV, Nagy T, Moremen KW, Buckhaults P et al (2017) O-Linked N-acetylglucosamine (O-GlcNAc) expression levels epigenetically regulate colon cancer tumorigenesis by affecting the cancer stem cell compartment via modulating expression of transcriptional factor MYBL1. J Biol Chem 292:4123–4137PubMedPubMedCentralCrossRefGoogle Scholar
- 58.Li M, Chen T, Gao T, Miao Z, Jiang A, Shi L et al (2015) UDP-glucose pyrophosphorylase influences polysaccharide synthesis, cell wall components, and hyphal branching in Ganoderma lucidum via regulation of the balance between glucose-1-phosphate and UDP-glucose. Fungal Genet Biol 82:251–263PubMedCrossRefPubMedCentralGoogle Scholar
- 107.Turnbull J, Tiberia E, Striano P, Genton P, Carpenter S, Ackerley CA et al (2016) Lafora disease. Epilept Disord 18:38–62Google Scholar
- 109.Turton KB, Esnault S, Delain LP, Mosher DF (2016) Merging absolute and relative quantitative PCR data to quantify STAT3 splice variant transcripts. J Vis Exp 1:4Google Scholar
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.