Introduction

The genetic basis for multiple Mendelian conditions was initially identified by studying individuals harboring chromosomal translocations, which provided a signpost for where in the genome a gene was disrupted. It quickly became apparent that many of these chromosomal translocations did not disrupt coding sequence, but rather disrupted the positioning of coding sequence relative to a distal regulatory element or gene promoter (Vortkamp et al. 1991; Wallis et al. 1999; Fang et al. 2000; Crisponi et al. 2001). These initial studies helped establish that non-coding genetic variation can cause numerous Mendelian conditions, and work over the past several decades has solidified the central role of non-coding genetic variation in the pathogenesis of hundreds of Mendelian conditions.

In this review, we compiled hundreds of non-coding genetic variants from ClinVar and the literature that cause rare human diseases via the disruption of gene regulatory patterns. In doing so, we have recognized that there is no unified vocabulary for describing how this class of genetic variation contributes to Mendelian conditions. Specifically, as is detailed in the section “Functional categorization of gene regulatory variants”, the current functional classification system used for coding variants (i.e., loss-of-function, gain-of-function, and dominant-negative) is not well suited to non-coding variants, because it does not capture the diversity of functional consequences associated with this class of genetic variation. Furthermore, simply describing non-coding variants based on their distance to gene promoters is similarly inadequate. We present a new functional classification system for describing non-coding variants impacting regulatory elements (Table 1) and provide specific examples of variants that fall into each category of this new functional classification system (Tables 2, 3, 4).

Table 1 Regulatory variant functional classifications
Table 2 Variants causing LOE
Table 3 Variants causing mLOE
Table 4 Variants causing GOE

Of note, non-coding variants can cause Mendelian conditions via a myriad of mechanisms, and this review specifically focuses on rare variants that cause Mendelian conditions by disrupting gene regulatory elements. Other classes of non-coding genetic variants include intronic variants that disrupt transcript splicing, 5’UTR genetic variants that alter initiation codon usage, 3’UTR and/or 5’UTR genetic variants that impact transcript stability, localization, or signal response, genetic variants within non-coding RNAs (ncRNAs), and non-coding repeat expansions that form altered RNA products (Stenson et al. 2017; French and Edwards 2020). Additionally, CpG methylation alterations at imprinted loci cause a handful of Mendelian conditions without altering the underlying DNA sequence (Cerrato et al. 2020). Common non-coding variants associated with common disease risk are covered elsewhere (Zhang and Lupski 2015; Spielmann and Mundlos 2016; French and Edwards 2020).

Structure of gene regulatory elements

Although the DNA content of a gene is present in every cell of the body, each gene may only be expressed within certain cell types and/or developmental time windows (herein referred to as the ‘intrinsic’ expression pattern of a gene). The intrinsic expression pattern of a gene is governed by regulatory elements, which are short DNA segments (often less than 400 bp in size) containing short binding elements (less than 30 bp) that govern the occupancy of sequence-specific transcription factors (TFs). Regulatory elements are distinguished based on their location relative to the transcriptional start site (TSS) of a gene (i.e., promoter-proximal, vs distal), as well as their functional impact on transcription (i.e., enhancers vs insulators vs silencers).

Promoters overlap the TSS of a gene and contain binding elements (e.g., TATA, CAAT, GC, CACCC boxes, etc.) that modulate RNA polymerase II binding and transcription (Juven-Gershon et al. 2008). Distal regulatory elements have complex roles in regulating the activity of promoters. For example, enhancers upregulate the transcriptional activity of a gene, and can be located adjacent to or over a megabase away from their target gene (Panigrahi and O’Malley 2021). Expression of each gene can be dependent on multiple enhancers, some of which may be common to several cell types. This combinatorial system allows different cell types to express the same gene through overlapping, but distinct, regulatory mechanisms. For example, there are nine different neural enhancers with overlapping spatial expression domains that drive Sonic Hedgehog (SHH) expression in the brain. Between different brain regions, the set of active enhancers is overlapping, but not identical. This mechanism allows for nuanced control of SHH expression in different parts of the brain (Amano 2020).

Distal regulatory elements can also serve as insulators, which function to compartmentalize adjacent gene regulatory domains along the genome (Gaszner and Felsenfeld 2006). For example, binding of the sequence-specific TF CTCF can create a barrier that limits an enhancer from regulating genes located on the opposite side of a CTCF boundary (Kim et al. 2015). CTCF-bound insulator sequences are often located at the boundaries of topologically associating domains (TADs), which are large chromatin loops (often > 100 kb in size) that enable the creation of transcriptionally independent chromatin domains wherein the activity of regulatory elements is primarily restricted to genes within the same TAD.

In addition, regulatory elements can exhibit context-specific activity. For example, some regulatory elements can act as either enhancers or silencers depending on their cellular context (Erceg et al. 2017; Huang and Ovcharenko 2022). Furthermore, the classification of regulatory elements is actively evolving as we learn how different regulatory elements influence gene expression in different cellular contexts, or in conjunction with neighboring regulatory elements (Ngan et al. 2020). Finally, although the overwhelming majority of variants that impact gene regulation are in the non-coding genome, coding variants can also impact gene regulatory elements (Lango Allen et al. 2014) as ~ 3% of all TF binding elements are located within coding sequences (Stergachis et al. 2013).

Functional categorization of gene regulatory variants

The functional consequence of coding variants is classified into three distinct categories, loss-of-function (LOF), gain-of-function (GOF), and dominant-negative (DN). LOF variants result in the loss of the normal biological function of a protein via either complete (amorphic) or partial (hypomorphic) LOF. In contrast, GOF variants create a protein with a function distinct from that of the wild-type protein via increasing protein activity (hypermorphic) or creating a completely new function (neomorphic). DN variants create a protein that either directly or indirectly blocks the normal function of the remaining wild-type protein (antimorphic). Notably, both GOF and DN variants are largely defined based on their alterations to the protein product of a gene. In contrast, since gene regulatory variants do not alter the protein sequence of a gene, but rather modulate gene expression patterns, these coding-centric categorizations are often ill suited for gene regulatory variants and create confusion.

We propose an alternative framework designed to functionally categorize genetic variants that disrupt gene regulatory elements. Specifically, we identify three distinct classes based on their impact on gene regulation at the level of gene transcripts (Table 1): (1) non-modular loss-of-expression (LOE) variants; (2) modular loss-of-expression (mLOE) variants; and (3) gain-of-ectopic-expression (GOE) variants. LOE variants are defined as variants that diminish or completely abolish the expression of a gene universally across all cell types that intrinsically express that gene. In contrast, mLOE variants are defined as variants that diminish or completely abolish the expression of a gene within a limited subset of the cell types or developmental windows that intrinsically express it (i.e., a modular loss of expression). GOE variants are defined as variants that result in the ectopic spatial and/or temporal expression of a gene (Fig. 1). Of note, unlike LOE variants, we chose not to further subdivide GOE variants into modular GOE variants, as it is quite challenging to obtain the appropriate clinical and molecular data that are necessary to firmly state that a GOE variant is truly limited to only a specific developmental window or cell type. For example, with mLOE variants, one can infer that there is a modular loss of expression based on a modular phenotype when compared to coding LOF variants in the same gene, which cause loss of protein function in all tissues or developmental windows. In contrast, with GOE variants, one cannot readily infer from clinical data that the ectopic gain of expression is limited to only a specific cell type. Specifically, a gene product may gain ectopic expression across all cell types, but only have a functional consequence in a select number of cell types—limiting the utility of tissue-selective phenotypes for inferring modular ectopic expression.

Fig. 1
figure 1

Schematic of the proposed gene regulatory variants functional classifications. a Schematic depicting the intrinsic expression pattern of a gene that is limited to the heart and brain, as well as the gene regulatory elements driving this intrinsic expression pattern. bd Schematics depicting the impact of b LOE, c mLOE, and d GOE variants on the expression pattern of that gene across different tissues, as well as how various genetic variants can cause these changes

As opposed to the traditional LOF, GOF, and DN categories, this functional categorization more intuitively reflects the mechanisms by which disruptions in gene expression patterns cause Mendelian conditions. Of note, these functional classifications can be related to LOF or GOF variant types. For example, LOE variants can correspond to either amorphic or hypomorphic LOF variants. Although mLOE variants could be categorized as hypomorphic LOF variants, the mechanism by which mLOE variants cause disease is quite distinct from that of coding hypomorphic LOF variants, making ‘hypomorphic LOF’ an imprecise label for mLOE variants. In contrast, GOE variants can correspond to hypermorphic GOF, neomorphic GOF, and even LOF variants. Notably, there are no examples of DN gene regulatory variants causing Mendelian conditions in humans. However, this type of regulatory variant has been observed in other organisms and is termed “transvection” (Lewis 1954), which is a phenomenon where a regulatory element on one chromosome interacts with and enhances or silences its corresponding regulatory element on the homologous chromosome. More recently, this mechanism has been described in human cancers, wherein strong enhancers encoded on extrachromosomal circular DNA (ecDNA) can enhance the expression of autosomal genes (Zhu et al. 2021). It is possible that examples of transvection as a cause of Mendelian disease could be described in the future. The LOE, mLOE, and GOE functional categorizations represent the molecular consequences of regulatory variants more closely than the traditional LOF, DN, and GOF classification, and provide an improved framework for conceptualizing the putative role of novel gene regulatory variants in the pathogenesis of Mendelian conditions.

Non-modular loss-of-expression (LOE) variants

LOE variants diminish or abolish the expression of a gene across all cell types that intrinsically express that gene. Consequently, these variants often mirror the clinical manifestations of coding LOF variants for the same gene, as both LOE and LOF variants result in reduced/absent functional protein levels within the cell, albeit via distinct mechanisms. While some LOE variants cause complete loss of expression (analogous to amorphic LOF variants), others reduce the intrinsic expression level of a gene (analogous to hypomorphic LOF variants). The latter class of variants often results in a more attenuated clinical phenotype compared to variants that result in complete LOE. We have provided examples of over a hundred LOE variants (Table 2), and detail below some key examples of diverse LOE variants.

Many LOE variants are located within the gene promoter, where they disrupt essential TF binding elements required for the intrinsic expression of a gene. For example, genetic variants that disrupt the TATA box and/or CACCC box within the HBB gene promoter decrease the intrinsic expression of HBB by abrogating the ability of TFs to bind these elements. Notably, these variants do not completely abolish HBB transcription, and consequently, individuals harboring these variants in trans with HBB LOF variants often still produce adult hemoglobin (HbA), resulting in a milder form of beta-thalassemia (i.e., beta-thalassemia intermedia) compared to individuals with biallelic HBB LOF coding variants (Ropero et al. 2017).

Of note, different variants within the same gene promoter can cause varying magnitudes of LOE. For example, variants within the UROS promoter that disrupt the GATA1 or CP2-binding elements significantly reduce UROS transcription and cause a severe form of congenital erythropoietic porphyria (CEP), whereas other UROS promoter variants that do not disrupt these elements only cause a modest reduction in UROS transcription and mild cutaneous manifestations (Solis et al. 2001).

LOE variants can also disrupt distal regulatory elements. For example, monocytopenia and mycobacterial infection (MonoMAC) syndrome is typically caused by LOF coding variants within the gene GATA2. However, MonoMAC syndrome can also be caused by small deletions or single-nucleotide variants (SNVs) in a GATA2 intronic enhancer 9.5 kb downstream of the GATA2 promoter. These variants result in the loss of GATA2 expression via the disruption of enhancer TF binding elements that are essential for GATA2 transcription (i.e., an E-box, GATA, and ETS binding element) (Johnson et al. 2012; Hsu et al. 2013).

Variants within regulatory elements that are quite distal to a gene promoter can also cause LOE. For example, most cases of hereditary aniridia are caused by heterozygous LOF coding variants within PAX6. However, hereditary aniridia can also be caused by SNVs within a PAX6 enhancer located 150 kb downstream of the PAX6 promoter that disrupt a PAX6 autoregulatory element, causing loss of enhancer activity and subsequent loss of PAX6 transcription (Bhatia et al. 2013). Furthermore, some patients with hereditary aniridia will have deletions or chromosomal translocations that disrupt this PAX6 enhancer (Fantes et al. 1995), highlighting the diversity of genetic variant classes that can cause LOE.

Variants that disrupt insulators can also cause LOE. For example, a homozygous deletion of a CTCF-binding site within the first intron of LRBA has been reported to cause autoantibody-mediated pancytopenia, a phenotype associated with biallelic coding LOF LRBA variants (Turro et al. 2020). It is presumed that loss of this CTCF insulator element alters a TAD boundary, permitting heterochromatin spreading to silence LRBA promoter activity.

In summary, LOE variants can be located within promoter or distal gene regulatory elements, can completely mimic coding LOF variants or cause an attenuated phenotype relative to complete LOF variants, and are caused by diverse classes of genetic variants.

Modular loss-of-expression (mLOE) variants

In contrast to non-modular LOE variants, mLOE variants reduce or abolish the expression of a gene only in a subset of cell types that intrinsically express that gene. mLOE variants represent a disease mechanism largely unique to gene regulatory variants, as coding LOF variants typically disrupt the function of a gene across all cell types that intrinsically express that gene, with the exception of coding LOF variants within exons that are alternatively spliced only within certain tissues or somatic coding LOF variants that only exist within certain tissues (Poduri et al. 2013; Biesecker and Spinner 2013; Jaiswal and Ebert 2019). As a result of their modular impact on gene expression, mLOE variants can produce a subset of features associated with coding LOF variants in that same gene (i.e., phenotype modularity) (Table 3). As gene expression patterns are not typically measured across multiple tissues or developmental stages in individuals with Mendelian conditions, the modular nature of these variants is often inferred based on their phenotypic spectrum relative to individuals with coding LOF variants.

To illustrate the functional impact of mLOE variants, it is helpful to compare the full phenotype associated with coding LOF variants to the modular phenotype associated with mLOE variants in a gene regulatory element for the same gene. For example, coding LOF variants in GATA1 result in both severe platelet and red blood cell abnormalities, because GATA1 expression is critical for both of these cell types (Gutiérrez et al. 2020). In contrast, a 4 kb deletion of a megakaryocyte-specific enhancer element for GATA1 is associated with platelet abnormalities, but normal red blood cell parameters (Turro et al. 2020), as this enhancer is necessary for GATA1 expression within megakaryocytes but not within red blood cells.

Similarly, whereas coding LOF variants in PTF1A cause both pancreatic and cerebellar agenesis (Sellick et al. 2004), deletions or single-nucleotide variants within a pancreas-specific enhancer located 25 kb downstream of PTF1A cause only isolated pancreatic agenesis, likely because PTF1A expression during cerebellar neurogenesis is maintained (Weedon et al. 2014).

mLOE variants can also be located in promoters. For example, LOF variants in APC cause familial adenomatous polyposis, a condition associated with adenocarcinoma and numerous polyps in the stomach and colon. However, APC has two distinct promoters termed 1A and 1B, and APC transcription within the stomach mucosa is selectively initiated via promoter 1B. Consequently, individuals with variants in APC promoter 1B are at risk for developing gastric adenocarcinoma and proximal polyposis isolated to the stomach (GAPPS) without colon polyposis as a comorbidity (Li et al. 2016).

By selectively disrupting the expression of a gene in only a particular cell type, mLOE variants have the potential to produce a disease phenotype mediated by genes associated with embryonic lethality in the context of coding LOF variants. For example, biallelic LOF variants in PIGM are embryonic lethal in mice. In contrast, biallelic variants within the PIGM promoter that disrupt an SP1-binding element cause an inherited glycosylphosphatidylinositol deficiency characterized by a propensity for venous thrombosis and seizures (Almeida et al. 2006). The modular phenotype associated with this promoter variant results from the differential importance of this SP1 element in PIGM expression across cell types (Costa et al. 2014).

In addition to cell type selectivity, mLOE variants can also cause loss of expression at particular developmental stages. For example, variants that disrupt a C/EBP or HNF4-binding element within the F9 gene promoter cause Hemophilia B Leyden, which is characterized by severe factor IX deficiency at birth that ameliorates after puberty (Veltkamp et al. 1970). The affected C/EBP- and HNF4-binding elements are essential for F9 transcription in early childhood. However, after puberty, androgen-responsive TFs bind to an androgen response element within the F9 promoter, dramatically increasing F9 transcription to levels that largely resolve the disease phenotype (Crossley et al. 1992). Consequently, Hemophilia B Leyden is caused by a modular loss of F9 expression only within the prepubescent developmental stage.

In summary, mLOE variants are located within promoter or distal gene regulatory elements, can restrict the disease phenotype associated with coding LOF variants to only a specific tissue or developmental stage, and can result in a disease phenotype for genes wherein coding LOF variants would be embryonic lethal.

Gain-of-ectopic-expression (GOE) variants

GOE variants cause ectopic spatial and/or temporal expression patterns and represent a disease mechanism that is largely unique to regulatory variants (Table 4). Notably, some GOE variants can mimic Mendelian conditions caused by duplications of the target gene. For example, autosomal-dominant adult-onset demyelinating leukodystrophy (ADLD) is caused by overexpression of LMNB1 protein usually attributed to duplication of the LMNB1 gene. However, an ADLD family was discovered to have a deletion that begins 66 kb upstream of the LMNB1 promoter. This deletion encompasses a TAD boundary and results in overexpression of LMNB1 protein via a mechanism termed ‘enhancer adoption’. Specifically, a strong enhancer that typically does not regulate LMNB1 is now brought into the same TAD as the LMNB1 promoter, resulting in LMNB1 overexpression analogous to that seen with LMNB1 duplication (Giorgio et al. 2014).

Enhancer adoption is a common mechanism through which structural variants can cause regulatory element GOE (Fig. 1D). For example, structural variants within the WNT6/IHH/EPHA4/PAX3 locus can cause distinct phenotypes depending on where a strong cluster of limb enhancers for EPHA4 is situated relative to the WNT6, IHH, or PAX3 genes. Specifically, deletion of a TAD boundary between EPHA4 and PAX3 results in PAX3 adopting this cluster of limb enhancers, resulting in ectopic PAX3 expression and brachydactyly. In contrast, inversions or duplications involving IHH and the TAD boundary between IHH and EPHA4 result in WNT6 adopting this cluster of limb enhancers, resulting in ectopic WNT6 expression and F-syndrome (Lupiáñez et al. 2015).

SNVs within distal regulatory elements can also cause GOE. For example, the zone of polarizing activity regulatory sequence (ZRS), located in intron 5 of the LMBR1 gene, regulates SHH. SNVs within the ZRS located ~ 1 Mb upstream of SHH cause preaxial polydactyly (Lettice et al. 2002; Gurnett et al. 2007; Furniss et al. 2008) via the introduction of novel ETV2-binding sites in the ZRS, resulting in ectopic SHH expression within the developing limb bud (Koyano-Nakagawa et al. 2022). However, it is important to recognize that for a given gene, not all SNVs within distal regulatory elements result in the same phenotype, as non-coding SNVs within the SHH brain enhancer-2 (SBE2) located 460 kb upstream of SHH cause holoprosencephaly via an LOE mechanism (Jeong et al. 2008).

In addition to distal regulatory elements, GOE variants can also affect promoters. For example, glucocorticoid-remediable aldosteronism (GRA) is caused by ‘promoter switching’ between the genes CYP11B1 and CYP11B2, resulting in a chimeric gene wherein the adrenocorticotropic hormone (ACTH)-responsive promoter of the 11-beta-hydroxylase gene (CYP11B1) is fused with the coding region of the aldosterone synthase gene (CYP11B2) (Lifton et al. 1992). This results in ectopic expression of aldosterone synthase in zona fasciculata cells of the adrenal cortex, causing aldosterone synthase to be overexpressed and inducible by ACTH, hence a hyperaldosteronism state that normalizes upon treatment with glucocorticoids.

GOE variants can also cause Mendelian conditions for which the target gene does not have a known human phenotype associated with coding LOF variants, such as when coding LOF variants would result in embryonic lethality. This is notable, because the clinical identification of non-coding variants that cause Mendelian conditions is often informed by comparison to known LOF phenotypes. For example, complete loss of OVOL2 expression has been associated with embryonic lethality in mice, likely because OVOL2 is a transcription factor critical for epithelial cell lineage determination and differentiation (Mackay et al. 2006). Meanwhile, in humans, OVOL2 promoter variants that result in GOE can cause autosomal-dominant corneal endothelial dystrophies. These promoter variants result in the creation of binding elements for several activating TFs within the OVOL2 promoter, resulting in the inappropriate ectopic expression of OVOL2 in the developing or adult corneal endothelium (Davidson et al. 2016).

Promoter GOE variants can also disrupt the ability of transcriptional repressors to appropriately silence a gene at a particular developmental stage. For example, the gamma-globin genes HBG1 and HBG2 encode a component of fetal hemoglobin (HbF) and are normally expressed only during fetal erythropoiesis, as their promoters are silenced during adult erythropoiesis by the transcriptional repressors BCL11A and ZBTB7A. However, regulatory variants within the HBG1 and HBG2 promoters that disrupt BCL11A- and ZBTB7A-binding elements result in the hereditary persistence of fetal hemoglobin (HPFH) into adulthood (Martyn et al. 2018). As HbF is capable of preventing red blood cell sickling from sickle hemoglobin (HbS) and can compensate for deficient HbA as seen in beta-thalassemia, these HPFH variants can attenuate the phenotype of sickle cell disease and beta-thalassemia (Jackson et al. 1961; Cappellini et al. 1981; Labie et al. 1985; Weatherall 2001; Thein 2008; Thein et al. 2009). The discovery of HPFH variants has fortuitously enabled the development of gene editing therapies, which introduce these variants into adult erythroid progenitor cells to reactivate HbF as treatment for sickle cell disease and beta-thalassemia (Traxler et al. 2016; Li et al. 2021).

GOE variants and LOE variants impacting the same gene can lead to similar phenotypes. For example, a “Goldilocks” level of FOXG1 expression is likely required for normal brain development, because both FOXG1 duplications and deletions are associated with Rett-like phenotypes (Florian et al. 2012). Thus, it is unsurprising that GOE variants that remove a silencer and LOE variants that remove an enhancer have both been reported to cause Rett-like phenotypes via increasing and decreasing FOXG1 expression, respectively (Kortüm et al. 2011; Allou et al. 2012).

Finally, variants within gene regulatory elements can cause mixed effects. For example, the POMP gene typically has a short 5’UTR that originates from a TSS located at position c.-81. A single-nucleotide deletion in the POMP promoter at position c.-95 does not change the overall transcript levels of POMP, but results in decreased utilization of the canonical TSS and increased utilization of an upstream TSS located at position c.-181. This results in POMP transcripts that preferentially contain a long 5’UTR with reduced translational efficiency. Consequently, POMP expression within the granular layer of the epidermis is reduced, causing keratosis linearis with ichthyosis congenita and sclerosing keratoderma (KLICK) syndrome (Dahlqvist et al. 2010). This example illustrates how non-coding variants can have mixed effects, wherein they result in GOE of one transcript, LOE of a different transcript, and LOF at the protein level. In contrast, coding GOF variants in POMP result in proteasome-associated autoinflammatory syndrome 2 (PRAAS2) which has a quite distinct clinical presentation (Poli et al. 2018), demonstrating that gene regulatory GOE and coding GOF variants involving the same gene can cause completely different clinical phenotypes.

In summary, GOE can result from structural variants and SNVs located within promoters and distal gene regulatory elements. GOE variants often arise from the ectopic activity of enhancers (e.g., enhancer adoption) or promoters, or the disruption of normal repressive gene regulatory machinery. Furthermore, variants can cause complex gene regulatory outcomes wherein they cause GOE for one transcript, but LOE for a different one. Importantly, GOE variants often result in clinical phenotypes that markedly diverge from that of coding variants, complicating efforts to systematically identify this class of genetic variation using our current catalog of phenotypes associated with coding variants.

Concluding thoughts

In this review, we summarize the literature on gene regulatory variants that are known to cause Mendelian conditions and present a framework for categorizing these variants based on their proximate impact on gene expression patterns. We highlight that certain classes of gene regulatory variants can mimic coding LOF variants and gene duplication variants. However, gene regulatory variants can also create novel phenotypes. Specifically, the phenotypes associated with GOE and mLOE variants may markedly differ from those associated with LOE or LOF variants impacting the same gene. Consequently, extrapolating our knowledge of coding variants to the other 99% of the genome is insufficient for resolving how variants within gene regulatory elements cause Mendelian conditions. The current practice for identifying non-coding variants that cause Mendelian conditions often relies upon the phenotypic similarity to known coding LOF phenotypes, delaying or missing the identification of non-coding genetic variants when the resulting phenotype differs substantially from coding LOF of the same target gene. A functional classification system tailored to the impact of non-coding variants can facilitate the organization of knowledge, so that novel non-coding variants are more readily identified. Additionally, this functional classification system has the potential to improve how we integrate results from regulatory element mutational scanning experiments with observed genetic variants in databases like ClinVar. Specifically, this functional framework can serve as a standardized framework to articulate the functional impact of non-coding variants in relation to different disease phenotypes.

Although it has been well established for several decades that gene regulatory variants cause numerous Mendelian conditions in a dominant, recessive, or X-linked inheritance pattern, our current catalog of disease-causing variants is overwhelmingly populated with coding variants. Specifically, whereas ClinVar contains over 150,000 pathogenic or likely pathogenic coding variants (Landrum et al. 2018), our non-systematic review of the literature identified only several hundred genetic variants known to disrupt gene regulatory elements (Fig. 2). It is possible that this imbalance accurately reflects the relative contributions of coding and gene regulatory variants to Mendelian conditions. However, it is notable that the rate of discovery of non-coding regulatory variants has only modestly increased since the transition in 2010 from family-based linkage analysis to exome sequencing as the predominant mode for gene discovery and clinical testing (Fig. 2). In contrast, the rate of discovery of pathogenic coding variants has substantially increased over the past 10 years (Landrum et al. 2018; Bamshad et al. 2019). Consequently, we hypothesize the current imbalance in the identification of pathogenic coding variants over gene regulatory variants more likely reflects the inadequacy of exome sequencing and current tools for analysis and interpretation to implicate this class of variation in disease. As the use of genome sequencing and epigenetic profiling becomes more common within clinical genomics, we anticipate that more examples of gene regulatory variation causing Mendelian conditions will emerge.

Fig. 2
figure 2

Pace of discovery of gene regulatory variants causing Mendelian conditions. Histogram of the year of publication for all variants cited in this manuscript