Paired involvement of human-specific Olduvai domains and NOTCH2NL genes in human brain evolution
Sequences encoding Olduvai (DUF1220) protein domains show the largest human-specific increase in copy number of any coding region in the genome and have been linked to human brain evolution. Most human-specific copies of Olduvai (119/165) are encoded by three NBPF genes that are adjacent to three human-specific NOTCH2NL genes that have been shown to promote cortical neurogenesis. Here, employing genomic, phylogenetic, and transcriptomic evidence, we show that these NOTCH2NL/NBPF gene pairs evolved jointly, as two-gene units, very recently in human evolution, and are likely co-regulated. Remarkably, while three NOTCH2NL paralogs were added, adjacent Olduvai sequences hyper-amplified, adding 119 human-specific copies. The data suggest that human-specific Olduvai domains and adjacent NOTCH2NL genes may function in a coordinated, complementary fashion to promote neurogenesis and human brain expansion in a dosage-related manner.
The increasing availability of primate genomic data is providing a unique opportunity to identify lineage-specific sequence differences among human and other primates. As part of these efforts, genomic factors are beginning to be uncovered that potentially contributed to the evolutionary expansion of the human brain (Sikela 2006; O’Bleness et al. 2012a). While there are a number of genomic mechanisms that may have been critical to brain expansion, several candidate sequences have been reported that involve human-specific gene duplications (Dennis et al. 2012; Florio et al. 2015). A number of these sequences map to 1q21, a region on chromosome 1 that is highly enriched for human lineage-specific gene duplications (Fortna et al. 2004; Dumas et al. 2007). This finding should not be unexpected, as unique evolutionary features of the region have been known to cytogeneticists for decades: it is the site of a human-specific pericentric inversion and is adjacent to a human-specific C-band (Yunis and Prakash 1982).
Among the duplicated genes and coding regions in 1q21 that have been associated with brain evolution, the most dramatically changed are those specifying Olduvai protein domains (formerly DUF1220) (Popesco et al. 2006; Sikela and van Roy 2017). Encoded primarily by the NBPF gene family (Vandepoele et al. 2005), Olduvai sequences have undergone the largest human lineage-specific increase in copy number of any coding region in the genome (~ 300 total copies of which ~ 165 are human-specific) (Popesco et al. 2006; O’Bleness et al. 2012a, b). They have been implicated, in a dose-dependent manner, in brain size [both evolutionarily as well as within the human population (Dumas et al. 2012; Keeney et al. 2014; Zimmer and Montgomery 2015)] and cognitive function (Davis et al. 2014a), and have been shown to promote proliferation in neural stem cells (Keeney et al. 2015). In addition, variation in Olduvai copy number has been associated with cognitive disease: autism, schizophrenia, microcephaly and macrocephaly (Dumas et al. 2012; Davis et al. 2014b, 2015, 2019; Quick et al. 2015). This ability to potentially confer both beneficial and detrimental effects has led to the proposal that the Olduvai family may constitute a cognitive genomic trade-off specific to the human lineage (Dumas and Sikela 2009; Sikela and Quick 2018).
Genomic and phylogenetic analysis
Genome coordinates for the locations of the NBPF and NOTCH genes were obtained from the hg38 human genome assembly at UCSC and also from several publications (O’Bleness et al. 2014; Sikela and Quick 2018; Fiddes et al. 2018; Suzuki et al. 2018). For phylogenetic analyses of NBPF genes, sequences were used that included the predicted start codon and 1 kb of sequence that flanked the start codon. This avoided complications that would arise from inclusion of the highly duplicated human Olduvai sequences, and allowed the analysis to focus specifically on the evolutionary relatedness of NBPF genes. Phylogenetic analysis of NBPF gene sequences was carried out with the Geneious Treebuilder program using the genetic distance model of Tamura-Nei (Tamura and Nei 1993) and the neighbor-joining tree build method (Geneious version 11.1 created by Biomatters. Available from https://www.geneious.com).
Transcriptional analysis of correlations between NBPF and NOTCH2NL genes
Recently generated scRNA-seq data for human developing cortex were obtained from Nowakowski et al. (2017) available through the UCSC cell browser (https://cells.ucsc.edu/?ds=cortex-dev). Because reads mapping to NOTCH2 and NOTCHNL transcripts were not well differentiated, we re-aligned reads to an aggregate NOTCH2NL gene model and to NOTCH2 by mapping to the collapsed hg19 version of NOTCH2NL. We then performed a gene-by-gene Pearson correlation between NOTCH2NL expression and the profiles of all human genes detected across all cells (n = 22,988), and of all genes detected across radial glia only (n = 22,750).
For 3′ pileup studies, expression coverage tracks were obtained from the UCSC human cortex single-cell expression track hub for 572 radial glia cells. For each subtype of radial glia cells, the coverage track was normalized to sum to 1 and then multiplied by the number of cells in that subtype. The resulting cell and sequencing depth normalized coverage tracks were then summed to produce a final radial glial cell expression track.
Genomic and phylogenetic evidence for paired evolution
Current human genomic data indicate that there are four human-specific NOTCH2NL genes: NOTCH2NL-A, NOTCH2NL-B and NOTCH2NL-C, located on 1q21.1, and NOTCH2NL-R located on 1p11.2 (Fiddes et al. 2018; Suzuki et al. 2018). While chimpanzee and gorilla have copies of NOTCH2NL, none are functional (Fiddes et al. 2018). Immediately adjacent to, and downstream of, each of these four NOTCH paralogs is an NBPF gene in the same orientation as its NOTCH2NL partner (O’Bleness et al. 2014; Fiddes et al. 2018; Suzuki et al. 2018) (Fig. 1a). This striking genomic arrangement suggests that each of the additional copies of NOTCH2NL that appeared in the human genome did not duplicate as a single gene, but rather did so as a two-gene module, composed of one NOTCH2NL gene and one NBPF gene.
While there remains some ambiguity about the precise steps by which the NOTCH2NL paralogs and adjacent NBPF genes appeared, several solutions have been proposed (Fig. 1a, Supplemental Fig. 1). Due to the rapid evolution of this region in both humans and other great apes, as well as the observed gene conversion between paralogs, it is difficult to conclusively determine which scenario happened. However, there are some lines of evidence in the human lineage that favor scenarios where NOTCH2NL-A/-B/-C derived from NOTCH2L-R. Because all NOTCH2NL genes in the human lineage are associated with PDE4DIP, and both chimpanzee and gorilla have one PDE4DIP-associated copy, the four human copies must derive from that copy. The previous analysis of the Simons Diversity Project (Fiddes et al. 2018) showed gene conversion between NOTCH2NL-A and NOTCH2NL-B is prevalent in the population, and gene conversion between NOTCH2NL-C and NOTCH2NL-A/-B happens infrequently. No gene conversion was observed between NOTCH2 and NOTCH2NL-R and the three paralogs on 1q21.1. This is supported by the sequence identity observed between pairwise comparisons of NOTCH2 and NOTCH2NL paralogs. In GRCh38, NOTCH2 has 98.19% sequence similarity to NOTCH2NL-R, while NOTCH2NL-A/-B/-C have 98.89% to 98.91% identity to each other.
These lines of evidence suggest that a likely model for NOTCH2NL evolution in hominids involves first a gene conversion event between NOTCH2NL-R and NOTCH2 restoring a functional 5′ end, and then, a subsequent duplication event sometime either shortly before or after the pericentric inversion of chromosome 1 (Yunis and Prakash 1982). Following this event, NOTCH2NL-R ceased gene converting with NOTCH2, while the new NOTCH2NL in 1q21.1 quickly gave rise to the other two paralogs. These three paralogs continue to have segregating gene conversion events happening to this day, which makes determining the exact order of their appearance difficult. Further supporting this model is that in the human assembly GRCh38 NOTCH2NL-R is 1.8% diverged from NOTCH2, while NOTCH2NL-A/-B/-C are 1.1% diverged, representing 61.1% fewer accumulated mutations, which is in line with expectation if the three loci are undergoing continual gene conversion and as a result evolving together.
If the four human-specific NOTCH2NL genes and four adjacent human-specific NBPF genes duplicated together as two-gene units, the phylogenetic history of these two sets of genes should be identical. To examine the relationships among the four NBPF genes that are paired with NOTCH2NL genes, we focused on sequence near the start codon that did not contain repeated Olduvai domains and was, therefore, less likely to be subject to gene conversion. These comparisons indicate that the three NBPF genes on 1q21.1-2 (NBPF10, NBPF14, and NBPF19) are more similar to one another than they are to NBPF26 on 1p12 (Fig. 1b). In addition, NBPF26 is the most related to ancestral NBPF genes, suggesting that the most plausible history involves NOTCH2NL-R/NBPF26 being the original PDE4DIP-associated copy that survived from the most recent common ancestor with chimpanzee. Taken together, the data indicate that the NOTCH2NL-R/NBPF26 gene pair on 1p12 duplicated as a unit, with the resulting new pair (NOTCH2NL-B/NBPF14) moving to the 1q21.1 region (Fig. 1a). Subsequently, NOTCH2NL-B/NBPF14 duplicated two more times in the 1q21.1-2 region, generating NOTCH2NL-A/NBPF10 and NOTCH2NL-C/NBPF19.
A striking difference becomes evident, however, when one compares the human-specific NOTCH2NL/NBPF gene increases with the Olduvai increases encoded by these NBPF genes. While the NOTCH2NL paralogs (and their NBPF partners) went from one gene to four in humans, Olduvai copies encoded by these NBPF genes underwent human-specific hyper-amplification, increasing from 13 copies (encoded by NBPF26) to 132 (i.e., adding 119 copies encoded by NBPF10, NBPF14, and NBPF19) (Fig. 1a, b).
The two most plausible scenarios for how these expansions occurred are the following: The long tandem Olduvai expansions found on NBPF10, NBPF14, and NBPF19 either had to occur independently on each of the three new human-specific NBPF genes after the genes appeared in 1q21.1-2, or a duplicated copy of NBPF26 appeared in 1q21.1-2 and, after adding many Olduvai copies (i.e., becoming NBPF14), duplicated twice more (i.e., producing NBPF10 and NBPF19). The latter scenario is the most parsimonious and if, true, implies that each time one of these two new expanded duplicate genes appeared it would have instantaneously added large numbers of Olduvai copies to the human genome. In either case, the human-specific Olduvai hyper-amplification would have had to take place very recently (within the past 3 million years as has been proposed for the three NOTCH2NL genes on 1q21.1) and, thus, would correlate with the extreme enlargement of the human brain that occurred during this time (e.g., a threefold increase in size over the past 1.8 million years) (Florio et al. 2017).
Transcriptional evidence for coordinated regulation
The most probable model of how the four human-specific NBPF/NOTCH2NL gene pairs appeared in the human genome (Fig S1C) not only fits well with available phylogenetic data, but also may help explain the existence of two well-established cytogenetic landmarks associated with human chromosome 1: the human-specific pericentric inversion and the human-specific C-band at 1q12. In this model, the NPBF26/NOTCHNL-R gene pair appeared on 1p12 and then duplicated forming NBPF14/NOTCH2NL-B, which was then moved to the 1q21.1-2 region as a result of the human-specific pericentric inversion event. This placed NBPF14/NOTCH2NL-B next to the large human-specific band of constitutive heterochromatin (C-band) at 1q12. The C-band is rich with highly repeated sequences and a strong promoter of non-allelic homologous recombination (NAHR) events. Thus, proximity to the C-band could have driven both the production of the two additional human-specific 1q21.1 copies (NBPF14/NOTCH2NL-A and NBPF19/NOTCH2NL-C), as well as the extreme expansions of the human-specific tandemly repeated copies of Olduvai that are found in NBPF10, NBPF14, and NBPF19).
In addition to these three expanded NBPF genes, there is a fourth gene in 1q21.1-2, NBPF20, that also contains long tandemly arranged human-specific copies of Olduvai. However, NBPF20 does not have an adjacent NOTCH2NL gene and, while the other expanded NBPF genes (NBPF10, NBPF14, and NBPF19) show high sequence relatedness to one another, NBPF20 is phylogenetically more distant (Fig. 1b). Given these findings, it is likely that the intragenic Olduvai expansions found on NBPF20 arose independently, separate from the expansions found on NBPF10, NBPF14, and NBPF19. However, the fact that all four of the most highly expanded human-specific NBPF genes are found in the 1q21.1 region (and no highly expanded NBPF genes are found outside of this region) is also consistent with the possibility that proximity to the human-specific C-band at 1q12 may have promoted these extreme tandem expansions of Olduvai copy number in humans.
The tandemly expanded Olduvai sequences are found typically as Olduvai triplets in NBPF10, NBPF14, NBPF19, and NBPF20 genes and, while triplet sequences are different between these genes, they are highly similar within each gene. This is likely the result of gene conversion and concerted evolution, which tends to homogenize tandemly repeated sequences within a gene (Ganley and Kobayashi 2007). In addition, the fact that, within an NBPF gene, the triplet sequences, as a unit, are more highly conserved than the individual domains supports the idea that homogenization is occurring at the level of the triplets within a gene.
While the 1q21 region is enriched for human-specific Olduvai sequences, it has also been linked with a number of disease-associated copy number variations. Among these, 1q21-associated duplications have been implicated in autism and macrocephaly, while reciprocal deletions have been linked with schizophrenia and microcephaly (Dumas et al. 2012; Brunetti-Pierri et al. 2008; Girirajan et al. 2013; Mefford et al. 2008; Stefansson et al. 2008). It has been suggested that the NOTCH2NL-A and NOTCH2NL-B genes provide the breakpoints for this syndrome, and, thus, have been instrumental in the recombination events that led to these gains/losses (Fiddes et al. 2018). However, because of the highly duplicated nature of the Olduvai sequences encoded by the adjacent NBPF genes (NBPF10 and NBPF14), the involvement of Olduvai sequences in providing the breakpoints for this 1q21-associated duplication/deletion syndrome has not yet been directly examined. Given that highly duplicated, tandemly arranged sequences, such as the Olduvai repeats on NBPF10 and NBPF14, are especially prone to recombination, it is likely that, in some cases, the breakpoints are within these NBPF genes. Higher resolution analyses may be able to more definitively address this issue in the future.
Because human-specific Olduvai sequences are found primarily as long tandemly arranged repeats, one might suspect that they serve only to promote recombination and, thus, are functionally unimportant. While such a recombinogenic architecture may have been instrumental in the generation of the newly duplicated NOTCH2NL paralogs and, as mentioned above, may be a significant contributor to the 1q21-duplication/deletion syndrome, there are several reasons why it is unlikely that Olduvai sequences are functionally silent partners of the NOTCH2NL genes. The Olduvai sequences found in the three NBPF genes adjacent to the NOTCH2NL genes encode long open-reading frames (O’Bleness et al. 2012a, b; O’Bleness et al. 2014) express a protein product that in brain is restricted to neurons (Popesco et al. 2006), and show strong signals of positive selection at the protein sequence level (Popesco et al. 2006). Functional data also support a more active role for Olduvai sequences in corticogenesis, as introduction of Olduvai sequences has been shown to promote proliferation of neural stem cells (Keeney et al. 2015). These observations, when combined with the findings reported here, suggest that human-specific Olduvai domains and adjacent NOTCH2NL genes may function in a coordinated, complementary fashion to promote neurogenesis and brain expansion in a dosage-related manner.
We thank Nate Anderson for help with figures and manuscript preparation and Ilea Heft for providing NBPF gene sequences. We also thank Bruce Appel, Santhosh Girirajan, Kirk Hansen, David Haussler, Aaron Issaian, Mark Johnston, Kent Riemondy, and Debby Silver for helpful discussions. This research was supported by NIH award R01MH108684 to JMS.
JMS conceived and supervised the study. ITF, JMD, and JMS analyzed genomic and phylogenetic data. JMD and AAP carried out statistical analysis of scRNA-seq data. AAP, ITF, and JMS obtained and analyzed scRNA-seq data. All authors discussed the data and contributed to data interpretation. JMS wrote the paper with contributions from ITF, AAP, and JMD. JMS is founder of GATC Science, a biotech company focused on genomics. The other authors declare no competing financial interests. Requests for materials should be addressed to the authors.
- Florio M, Heide M, Pinson A, Brandl H, Albert M, Winkler S, Wimberger P, Huttner, WB, Hiller M (2018) Evolution and cell-type specificity of human-specific genes preferentially expressed in progenitors of fetal neocortex. eLife 7:e32332. https://doi.org/10.7554/eLife.32332 CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.