Approximately ninety-eight percent of our genome is noncoding. Contrary to initial descriptions of this vast sea of sequence comprising “junk DNA” (Ohno 1972), comparative genomics and various next-generation sequencing studies have revealed millions of transcription factor binding sites (TFBS) (Vierstra et al. 2020) and tens of thousands of noncoding genes, most notably the class of long noncoding RNAs (LncRNAs), defined currently as processed transcripts of length > 200 base pairs with no protein-coding capacity (Rinn and Chang 2020; Statello et al. 2021). The widespread transcription of LncRNAs and abundance of regulatory sequences such as enhancers support the concept of a genome that is largely functional (ENCODE Project Consortium 2012). Such a dynamic genome should not be surprising given the complex nature of gene expression and gene function necessary for embryonic and postnatal development as well as disease processes.

Unlike coding genes, which are ultimately translated into proteins with conserved domains predictive of function, most LncRNAs lack conserved sequence motifs that foretell biological utility. Consequently, the study of LncRNA genes has been challenging, with few examples of well-defined functions in an in vivo setting (Rinn and Chang 2020; Statello et al. 2021). At a minimum, mechanistic insight into the biological role of an LncRNA requires an understanding of (a) where the processed LncRNA accumulates in a cell (Kopp and Mendell 2018), (b) the molecular docking sites of an LncRNA for nucleic acid or protein association (McDonel and Guttman 2019), and (c) phenotypes (e.g., developmental, metabolic, transcriptomic) manifested following LncRNA loss-of-function in vivo (Sauvageau et al. 2013). It should be noted that in some cases, the mere act of transcribing the LncRNA confers functionality on the expression of an adjacent transcription unit, with the processed LncRNA perhaps having an independent role (Ali and Grote 2020; Anderson et al. 2016; Paralkar et al. 2016). Mature LncRNAs, or regulatory elements embedded within the LncRNA locus, may activate or repress local gene transcription (Gil and Ulitsky 2020). Further, a number of LncRNA loci are host genes for other genic units such as microRNAs that provide another level of finely-tuned gene expression (Sun et al. 2021).

Most wet-lab studies of human-specific LncRNAs are confined to cells in a dish. For example, a frequently reported role of human LncRNAs in vitro relates to their competition with mRNAs for microRNA binding. These so-called competing endogenous RNAs fine-tune gene expression by “sponging” microRNAs that otherwise bind the 3’ untranslated region of an mRNA, targeting the mRNA for degradation. However, interpretation of most data ascribing a competing RNA function to LncRNAs is difficult in the absence of careful stoichiometric measures of the LncRNA, target mRNA, and associated microRNA (Denzler et al. 2014). Gene editing of a microRNA binding site (MREs) within an LncRNA represents a rigorous approach to invoke a competing endogenous RNA mechanism of action. Surprisingly, there are very few studies that target an endogenous MRE via editing tools such as CRISPR and none have yet to do so in the mouse (Bassett et al. 2014; Broughton et al. 2016; Ohtsuki et al. 2021). Given the expansive number of human-specific LncRNAs reported to function as competing endogenous RNAs, largely through standard luciferase assays that interrogate an MRE out of normal sequence context, there should be increased efforts to formally demonstrate the importance of an MRE in vivo through genome editing approaches (Wu et al. 2017). This is of particular interest since mammalian MREs may carry functionally relevant single-nucleotide polymorphisms (Miller et al. 2014).

Growth, migration, differentiation, and MRE functionality should be assayed in cell culture or organoid model systems to gain some foundational insight into the biology of human-specific LncRNAs. However, illuminating the function of human-specific LncRNAs in the complex milieu of a multisystem organism requires a combination of evolving technologies in mouse genetics and genome editing. Herein, some strengths and weaknesses of mouse transgenesis and genome editing are briefly summarized in the context of elucidating expression and regulation of LncRNAs. Two examples are then presented as to how specialized transgenesis, combined with genome editing, may afford important insight into the biological role of human-specific LncRNAs in the mouse.

Transgenic Human LncRNAs in Mice

Traditional approaches to study gene regulation and function in the mouse involve pronuclear injection of a cDNA encoding a protein or a reporter gene such as beta galactosidase under the control of a strong heterologous or cell-restricted promoter (Brinster et al. 1989). Transgenic mice carrying the human hepatitis C virus regulated 1 LncRNA exhibited deleterious expression of the mouse sterol regulatory element binding protein and reduced lipid metabolism (Li et al. 2017). In a similar fashion, overexpression of the human colon cancer associated transcript 2 LncRNA caused chromosomal instability with resultant myeloid malignancies (Shah et al. 2018). Although these examples offer some insight into the in vivo function of human LncRNAs, they are limited by the heterologous nature of the promoter driving widespread expression of the LncRNA. Moreover, even if the endogenous promoter were to have been utilized, distal regulatory regions may be absent from the transgene precluding full recapitulation of the LncRNA’s expression profile. To circumvent these constraints, artificial chromosome vectors have evolved to better capture all regulatory elements and avoid the ambiguity of a strong heterologous promoter which often directs supraphysiological levels of an LncRNA that otherwise exhibits low-level, cell compartment-specific expression.

The development of yeast artificial chromosome (YAC) and bacterial artificial chromosome (BAC) vectors represented a significant advance in mouse transgenesis (Giraldo and Montoliu 2001; Heaney and Bronson 2006). Artificial chromosome vectors can harbor large (> 100 kilobases) sequences, thus enabling the integration of human transgenes that exceed the cloning capacity of conventional vectors into the mouse genome. In addition, the transgene within an artificial chromosome will contain most, if not all, regulatory sequences, including enhancers and insulators in their correct sequence context, ensuring proper spatiotemporal expression of the transgene (Long and Miano 2007). Relatively few human LncRNAs have been incorporated into the mouse genome through artificial chromosome transgenesis. The human X inactivation specific transcript, XIST, is 32 kilobases in length and the processed 19 kilobase transcript drives X chromosome dosage compensation in females through propagated hypoacetylation. The human XIST LncRNA was packaged in a 480 kilobase YAC for transfer into the mouse genome, and results revealed expression and X chromosome inactivation in the mouse, demonstrating the conservation of XIST function between human and mouse (Migeon et al. 1999). The imprinted human H19 LncRNA is a host gene for microRNA-675 (Cai and Cullen 2007). This LncRNA was studied in the context of a 100 kilobase artificial chromosome and found to be correctly expressed in the mouse, but incorrectly imprinted suggesting species-specific mechanisms for methylation-dependent repression of H19 (Jones et al. 2002). Using a BAC scanning reporter assay in mice, the human moesin pseudogene 1 antisense (MSNPS1AS) LncRNA was found to be expressed in cortex, striatum, and cerebellum, and expression was ascribed to enhancer regions that overlap a series of single-nucleotide polymorphisms implicated in autism spectrum disorder (ASD) (Inoue and Inoue 2016). These findings suggest that elevated levels of MSNPS1AS, shown recently to provoke neuronal phenotypes considered important in ASD (Luo et al. 2020), may occur through altered enhancer activities. Of note, the BAC transgenes under study contained the variants associated with ASD; however, expression levels of MSNPS1AS were not assessed in the context of a wild-type allele (Inoue and Inoue 2016).

While YAC/BAC integration of human LncRNAs has the advantage of native promoter and enhancer sequences for proper expression levels, pronuclear transgenes insert randomly in the genome, often as concatemers and sometimes in more than one locus, complicating the genotyping of mice homozygous for the transgene (Nakanishi et al. 2002). The emergence of PacBio and Oxford Nanopore Technologies sequencing platforms (Amarasinghe et al. 2020) allows for the determination of the site of transgene integration as well as transgene copy number, thus permitting facile breeding strategies to distinguish heterozygous from homozygous mice (Nicholls et al. 2019). These third-generation sequencing platforms will be of great utility in pinpointing the integration site of many of the 95% of reported transgenes that remain unmapped in mouse models (Nicholls et al. 2019). Another challenge to overcome with random integration of a BAC/YAC carrying an LncRNA is the possible disruption of coding or noncoding genic units or regulatory sequences such as enhancers or individual transcription factor binding sites (TFBS). The disruption of regulatory cassettes is of particular concern given widespread transcription of the genome and the presence of millions of predicted TFBS (Jensen et al. 2013; Vierstra et al. 2020). Beyond the obvious perturbation in local sequence topology, random insertion of a transgene can result in loss of host genome sequence with unpredictable consequences (Suzuki et al. 2020). Finally, phenotyping of mice could be confounded by disruption of a genic unit exhibiting haploinsufficiency. To circumvent these limitations, it should be possible to target a human LncRNA and associated coding gene/regulatory regions to the corresponding mouse region using a recombinase-mediated strategy wherein an entire mouse genomic region is swapped out for the orthologous human sequence (Devoy et al. 2011). This method of orthologous gene replacement has yet to be done in the context of a BAC-containing human LncRNA, though we shall introduce a potentially important candidate below. However, before introducing this idea, the power of genome editing of LncRNAs is summarized.

Genome Editing of LncRNAs in Mice

The clustered regularly interspaced short palindromic repeat (CRISPR) platform of gene editing (Jinek et al. 2012) has forever transformed the development of genetically modified mouse models (Harms et al. 2014; Miano et al. 2016; Singh et al. 2015). Whereas germline transmission of a genetic modification in mice, using traditional embryonic stem cell targeting, can take a year or more (or never), a CRISPR edit enables germline transmission in a matter of just a few months (Miano et al. 2019). Since the initial reporting of CRISPR editing in mice (Shen et al. 2013), additional gene editing systems have been developed, including base editing and the very recent prime editing (Anzalone et al. 2020).

The absence of well-annotated functional motifs in most LncRNAs renders CRISPR targeting of this class of genes in the mouse challenging, though not insurmountable (Miano et al. 2019). Indeed, several LncRNAs have been targeted with CRISPR in rodents through large deletions of multiple exons or the entire LncRNA locus (Han et al. 2014; Zhou et al. 2021b; Zhuang et al. 2021). The approach of removing such large sequences runs the risk of deleting regulatory elements or small intronic RNAs that may compromise accurate interpretation of phenotypes. Alternatively, smaller deletions such as in the promoter region or a single exon of an LncRNA have been reported that minimize the risk of removing other functionally important sequences (Allou et al. 2021; Li et al. 2021; Saba et al. 2021). In addition, CRISPR-mediated insertion of a polyadenylation signal that arrests transcription of an LncRNA can be used to address the role of active transcription in LncRNA function (Allou et al. 2021; Anderson et al. 2016; Ballarino et al. 2018). An alternative approach to permanently silence transcription of an LncRNA is through strategic nucleotide substitutions across a key TFBS (Choi et al. 2020). Using the prime editing platform (Anzalone et al. 2019), a recent study showed that a single-nucleotide substitution in a TFBS nearly extinguished expression of an LncRNA. Interestingly, this single base change also nullified the expression of a divergently transcribed protein-coding gene (Gao et al. 2021). The latter finding highlights the need for careful deliberation over the specific strategy implemented in gene editing of an LncRNA in mice (Miano et al. 2019). For example, there could be a TFBS embedded inside the LncRNA locus that controls the expression of another locus independent of the transcribed LncRNA (Ali and Grote 2020). As of this writing, there has been no report of the editing of a human-specific LncRNA in mice. Below, we introduce two examples of human-specific LncRNA integration in the mouse and how genome editing may unveil important regulatory and functional features of each LncRNA.

A Humanized Mouse Model for SENCR

The Smooth muscle and Endothelial cell-enriched migration/differentiation-associated long Non-Coding RNA (SENCR, pronounced sen-sər) was first reported in early 2014 from an RNA-seq study of human coronary artery smooth muscle cells (Bell et al. 2014). This 3-exon LncRNA overlaps the 5’ end of Friend Leukemia Integration 1 (FLI1), a member of the E26 transformation specific family of DNA-binding transcription factors. SENCR and FLI1 display similar patterns of tissue-specific RNA expression (Fig. 1). However, data thus far suggest that the RNA expression of one is independent of the other (Bell et al. 2014). Further, whereas FLI1 is a nuclear transcription factor, most SENCR transcripts are cytoplasmic suggesting each gene product exerts distinct functions (Bell et al. 2014). Knockdown studies combined with RNA-seq revealed functions of SENCR related to the maintenance of a non-motile, differentiated smooth muscle cell phenotype (Bell et al. 2014). A subsequent study demonstrated SENCR to promote the commitment of human embryonic stem cells to an endothelial cell lineage (Boulberdaa et al. 2016). SENCR also facilitated endothelial cell proliferation and migration, key processes in angiogenesis (Boulberdaa et al. 2016). In this context, patients with critical limb ischemia or premature coronary artery disease showed reduced levels of SENCR in ischemic tissue or in endothelial cells derived from blood vessels, respectively (Boulberdaa et al. 2016). The latter report provided some intriguing insight into the in vivo function of SENCR. However, these proposed functions and others require validation and further study of SENCR in an animal model.

Fig. 1
figure 1

Tissue RNA profile of SENCR and FLI1. Data obtained from the GTEx portal website (https://www.gtexportal.org/home/)

To date, there has been no compelling evidence for a mouse ortholog of human SENCR. CRISPR-directed SENCR deletion studies in an immortalized human endothelial cell line (EA.hy926 cells) were thwarted by the presence of four copies of the host chromosome 11 (unpublished). However, the in vivo function of SENCR could be revealed by its introduction into the mouse genome, with the assumption that spatial expression and function of SENCR in the mouse would mirror SENCR expression and function in the human body. To begin to address these important points, a recent study reported the integration of a 217 kilobase BAC harboring the entire human FLI1 and SENCR genes into the mouse using the piggyBAC transposase system of transgene integration (Lyu et al. 2019). Studies in cultured human endothelial cells revealed an increase in SENCR expression under laminar flow conditions, which approximated the biophysical forces endothelial cells encounter with blood flow in vivo (Lyu et al. 2019). Notably, immuno-RNA fluorescence in situ hybridization experiments disclosed expected increases in SENCR expression where laminar flow conditions exist across the aortic arch of the humanized mouse model (Lyu et al. 2019). These results demonstrated the utility of studying proposed functions of SENCR as a mediator of smooth muscle and endothelial cell homeostasis in vivo. In addition, the opportunity now exists to uncouple FLI1 and SENCR through BAC editing in the background of a Fli1 null mouse. Since genetic loss of Fli1 is embryonic lethal (Spyropoulos et al. 2000), the expectation is human FLI1 will rescue the lethal phenotype. One important caveat to the BAC editing of the FLI1/SENCR human transgene is the need for a single-copy BAC transgene. The piggyBAC system for in vivo BAC integration supports a single-copy integration event (Jung et al. 2016). However, transgene copy number and the site of integration will require third-generation sequencing platforms (Amarasinghe et al. 2020) to determine the suitability for BAC editing and the breeding of heterozygous mice to homozygosity for gene dosage effects. As discussed next, targeting a single-copy human LncRNA-mRNA gene pair to a defined locus obviates the need for such mapping studies.

An ACE2-LncRNA Gene Pair and Development of a New Mouse Model for COVID-19

Over the last two years, Severe Acute Respiratory Syndrome CoronaVirus-2 (SARS-CoV-2), the etiologic agent underlying the COronaVIrus Disease-2019 (COVID-19) pandemic, has ravaged the world, precipitating economical, sociological, and political upheaval as well as an unprecedented ‘infodemic’ that has hampered efforts to disseminate scientific facts regarding SARS-CoV-2 infection, COVID-19, and the vaccination campaign (Tentolouris et al. 2021). Moreover, health care systems around the world have been overstrained, making prioritization of health care delivery ever-challenging. Cumulatively, as of January 27, 2022, the COVID-19 pandemic has resulted in 363,582,071 positive cases of SARS-CoV-2 infection and 5,629,317 deaths, 15% of which have occurred in the United States (https://coronavirus.jhu.edu/map.html).

The receptor mediating SARS-CoV-2 entry into human cells is angiotensin-converting enzyme 2 (ACE2) (Zhou et al. 2020). There are at least three isoforms of human ACE2, spanning ~41 kilobases of DNA on the X chromosome, each of which appears to be under control of distinct promoters (Fig. 2). The longest isoform of ACE2 comprises 19 exons, encoding an 805 amino acid protein. A slightly shorter isoform of ACE2 exists, encoding the same number of amino acids (Fig. 2). High-level expression of ACE2 protein is seen in human small intestine and kidney (Fig. 3A). Several human cell lines also express ACE2 protein, though levels of ACE2 are undetectable in vascular smooth muscle cells and endothelial cells (Fig. 3A). The latter cell type has been the focus of numerous studies given the mounting evidence for SARS-CoV-2-induced endotheliopathy, considered an important contributor to the pathogenesis of COVID-19 (Goshua et al. 2020). The undetectable levels of ACE2 protein in human endothelial cells shown here is consistent with a recent report that failed to detect ACE2 mRNA in several human endothelial cell types (McCracken et al. 2021), but inconsistent with other reports (Hamming et al. 2004; Targosz-Korecka et al. 2021; Wagner et al. 2021). These disparate findings highlight the ongoing controversy over whether endothelial cells are prone to SARS-CoV-2 infection (Goldsmith et al. 2020; McCracken et al. 2021; Targosz-Korecka et al. 2021; Varga et al. 2020; Wagner et al. 2021).

Fig. 2
figure 2

ACE2 locus. Modified UCSC Genome Browser (http://genome.ucsc.edu/) screenshot showing the three isoforms of ACE2 and the upstream GS1-594A7.3 LncRNA. Vertebrate conservation (bottom green track) reveals conservation of ACE2 coding exons, but a notable lack of conservation in the two exons of GS1-594A7.3 (cream-colored rectangles overlapping exons)

Fig. 3
figure 3

ACE2 protein expression and localization of GS1-594A7.3 LncRNA. A Western blotting shows ACE2 protein (molecular weight 120 kDa) in the indicated cell lines and human tissue types. B Cellular localization of GS1-594A7.3 LncRNA by two RNA-FISH methods, (i) ViewRNA which combines fluorescence in situ hybridization and sequential branched-DNA amplification in HEK-293 cells and (ii) biotin-labeled probes in Caco-2 cells pre-treated with a siRNA control (ii) or a siRNA targeting exon 2 of GS1-594A7.3 (iii). Scale bar is 10 µm. C Real-time quantitative PCR with standard curve to determine nuclear and cytoplasmic abundance of GS1-594A7.3 LncRNA with GAPDH as internal control

In addition to the two long isoforms of ACE2, there is at least one shorter isoform (Fig. 2). This shorter deltaACE2 (dACE2) isoform is elevated following interferon stimulation of several human cell lines, including nasal epithelial cells (Onabajo et al. 2020). Similar induction of the dACE2 isoform is observed upon stimulation of Caco-2 cells (immortal colorectal adenocarcinoma cell line) with interferon alpha, interferon beta, or interferon gamma (unpublished). The dACE2 isoform lacks ~350 N-terminal amino acids and does not bind SARS-CoV-2 (Onabajo et al. 2020).

Interestingly, a non-overlapping antisense LncRNA, designated GS1-594A7.3, resides just upstream of the human ACE2 locus (Fig. 2). This LncRNA, which is poorly conserved across vertebrate species (Fig. 2), is only 722 base pairs upstream of the longest ACE2 isoform, suggesting the ACE2-GS1-594A7.3 mRNA-LncRNA gene pair may share a common promoter. Evidence in support of such a bifunctional promoter exists with the partial overlap in RNA expression of ACE2 and GS1-594A7.3 across human tissues (Fig. 4). Rapid amplification of cDNA ends and long range qRT-PCR validated the annotation of GS1-594A7.3 as an independently transcribed LncRNA (unpublished). Of intriguing importance is the finding that the GS1-594A7.3 LncRNA is confined largely to the nucleus of several human cell lines (Fig. 3B–C). This observation suggests that GS1-594A7.3 possesses the potential to regulate ACE2 levels in cis. However, repeated attempts to CRISPR edit this LncRNA in cultured cells have been unsuccessful, likely because of the known difficulties in establishing stable cell lines in Caco-2 and Calu-3 cells and their state of aneuploidy.

Fig. 4
figure 4

Tissue RNA profile of ACE2 and GS1-594A7.3 LncRNA. Data obtained from the GTEx portal website (https://www.gtexportal.org/home/). Insets show heat maps of RNA expression for each transcript in human tissues

X-ray crystallographic analysis of the receptor binding domain of SARS-CoV-2 bound to human ACE2 (Lan et al. 2020) revealed critical contact residues that are not conserved in the mouse ACE2 protein, rendering mice resistant to SARS-CoV-2 infection and disease (Lan et al. 2020). Accordingly, several humanized ACE2 mouse models for SARS-CoV-2 infection and COVID-19 exist (Lutz et al. 2020). Most of these mouse models were generated through pronuclear transgenesis (Table 1). As discussed earlier, limitations of mouse transgenesis include the unknown site of integration and copy number of transgene. Moreover, the majority of humanized ACE2 mouse models utilize chimeric or cell-specific promoters that likely do not fully recapitulate the human ACE2 pattern of expression in humans (Table 1), though at least one of these models has proved useful for testing vaccines and therapeutics (Chen et al. 2021; Hoffmann et al. 2021). To control the inherent limitations of transgenesis and more closely approximate the endogenous expression profile of human ACE2, two models targeted exon 2 of the endogenous mouse Ace2 locus with a human ACE2 cDNA (Table 1). These knockin models not only safeguard against multiple copies of the transgene, but also take advantage of the mouse Ace2 regulome, thus better modeling the true spatiotemporal pattern of ACE2 protein expression. However, there may be differences between promoter/enhancer sequences in the mouse Ace2 regulome versus the human ACE2 regulome. Moreover, the GS1-594A7.3 LncRNA appears to be a human-specific LncRNA as there is no similarly arranged LncRNA in the mouse, and analysis of sequencing data around the 5’ region of mouse Ace2 has failed to reveal transcription of an LncRNA. Since there presently is no evidence for a mouse Ace2- associated LncRNA, humanized BAC transgenic studies, as described above for the SENCR LncRNA, offer a unique opportunity to assess the expression and function of GS1-594A7.3 in the mouse.

Table 1 List of published humanized ACE2 rodent models for study of SARS-CoV-2 and COVID-19

Beginning in the summer of 2020, this lab set out to develop a new humanized ACE2 mouse model in order to capture the entire human ACE2 locus (Fig. 2) as well as the upstream GS1-594A7.3 LncRNA. However, rather than risk random integration of the BAC harboring the ACE2-GS1-594A7.3 mRNA-LncRNA gene pair (BAC clone CTD-2522M16), a different strategy was used. The basic approach involves the swapping in of the entire human ACE2 locus for the mouse Ace2 locus. A CRISPR-mediated method has been used for targeting large sequences, such as BACs, to define gene loci in the rat genome (Yoshimi et al. 2016). An alternative method, and the one we adopted, uses recombinase-mediated genomic replacement (RMGR) in mouse embryonic stem cells, which are then implanted into the blastocyst for generation of chimeric mice (Wallace et al. 2007). In this model, all human ACE2 coding exons and noncoding introns are present in their proper sequence context, allowing for all isoforms, including the interferon-induced dACE2 (Fig. 2), to be expressed. Important validations are required, including correct spatiotemporal expression of ACE2 mRNA and ACE2 protein using molecular probes and scRNA-seq; susceptibility of mice to SARS-CoV-2 infection and attending pathology in the lung, blood, intestinal tract, and brain; phenotyping homozygous ACE2 mice for evidence of developmental defects, altered blood pressure regulation or behavioral deficits due to loss of the mouse Ace2 locus and unannotated critical noncoding genes that are unable to be rescued by the human ACE2 locus; and, most importantly, the expression and localization of the GS1-594A7.3 LncRNA.

Beyond the targeting of a single-copy transgene to a defined locus, there are several advantages to this more fully humanized ACE2 mouse model. First, the definitive role of the upstream GS1-594A7.3 LncRNA can be studied with genome editing, either through deletion of the entire LncRNA locus, insertion of a polyadenylation signal sequence in the first exon, or more subtle editing of a TFBS as reported in other mouse models of LncRNA regulation (Choi et al. 2020; Gao et al. 2021). The hypothesis would be that loss of GS1-594A7.3 LncRNA will alter normal expression of human ACE2, rendering mice either more or less susceptible to SARS-CoV-2 infection and COVID-like symptoms. Second, a more representative human ACE2 expression profile would likely reflect the nuanced expression of this receptor, especially under conditions that model human comorbidities (e.g., type 2 diabetes, hypertension, and obesity), where the risk for severe COVID-associated pathology and death is high. Third, mechanisms underlying so-called long COVID (Nalbandian et al. 2021) may be illuminated with the correct spatiotemporal expression of human ACE2 and multisystem infection and pathology; the increasingly problematic ‘long COVID’ has been barely touched upon in mouse models. Finally, several noncoding variants associated with altered ACE2 expression (Bakhshandeh et al. 2021; Brest et al. 2020) can be addressed with conventional CRISPR, as was done for a variant in the atherosclerosis-associated risk allele, SORT1 (Wang et al. 2018). Alternatively, coding and noncoding single-nucleotide variant modeling can be accomplished in and around the human ACE2 locus, with low on-target and off-target collateral damage, using the prime editing platform (Anzalone et al. 2019; Gao et al. 2021). No currently published humanized ACE2 model affords such versatility.

Challenges, Limitations, and Alternative Approaches to Humanized BAC Mice

The study of human-specific LncRNAs has been confined mainly to cell culture models. However, most cell culture systems are either transformed or phenotypically altered with poor reproduction of their natural in vivo state. Further, cells in a dish lack correct integration with neighboring cell types encountered in an in vivo setting as well as neuronal- and circulatory-derived inputs. To circumvent these limitations, whole organ or human embryonic stem cell-derived organoid model systems have been developed to interrogate human LncRNAs. For example, the human-specific LncRNA, SMILR, was investigated in organ cultures of human saphenous vein grafts to define its role in mediating smooth muscle cell proliferation (Mahmoud et al. 2019). Meanwhile, the PAUPAR LncRNA was studied in human organoids and shown to regulate cortical differentiation (Xu et al. 2021). These ex vivo model systems represent a higher order level of investigation over simple, two-dimensional cell culture models. In order to realize whether what is observed in vitro or in organ culture models applies to a complex living animal, humanized BAC rodent models offer another level of exploration.

To be sure, there are several limitations and challenges with humanized BAC transgenic mouse experiments. First, BAC transgenesis, whether via pronuclear injection or RMGR, requires highly skilled methods of handling and delivery into the mouse genome with no guarantee of targeting or germline transmission. Beyond academic cores, commercial vendors can perform these genetic manipulations, typically at a cost >$30,000. Second, some LncRNAs (e.g., the 363 kilobase STXBP5-AS1) exceed the cloning capacity of BAC vectors, thus requiring larger cloning capacity vectors such as YACs (see above). The latter limitation serves as a reminder that annotation of many LncRNAs may be incomplete with rapid amplification of cDNA ends needed to fully extend the LncRNA transcript at both the 5’ and 3’ ends (Freedman and Miano 2017). Third, phenotypic analysis of a mouse carrying a human LncRNA can be challenging if insertion of the BAC disrupts a critical regulatory or coding sequence or if human-specific sequences such as enhancers or other genic units within the BAC create an unrelated phenotype to that of the LncRNA. Fourth, BAC models of human LncRNAs may confer phenotypes not easily discerned in the mouse (e.g., cognitive functions). Fifth, human LncRNAs may not fully recapitulate their spatiotemporal pattern expression profile in the mouse, due to the absence of human-specific regulatory cassettes or cofactors. Finally, the random, multicopy integration of BAC transgenes in the mouse requires mapping analysis using, for example, third-generation sequencing platforms, in order to optimize breeding schedules and learn of any potential genetic confounders such as disruption of a protein-coding gene or regulatory sequence. Where conserved LncRNAs exist, we suggest replacement of a mouse locus with the orthologous human sequence using RMGR, as described for the ACE2-GS1-594A7.3 mRNA-LncRNA gene pair, as an alternative approach to pronuclear injection of a BAC for the study of human LncRNA expression regulation and function in the mouse. In addition to a single integration event at a known genomic location, thus facilitating genotyping of heterozygous intercrosses for the generation of homozygous animals, RMGR renders the mouse more amenable to genome editing strategies (Fig. 5). The development of new mouse models, coupled with genome editing, holds promise for advancing our understanding of the expression and function of human LncRNAs under normal and pathological conditions.

Fig. 5
figure 5

Schematic of two methods of generating humanized BAC transgenic mice. Hypothetical LncRNA-mRNA gene pair within a human BAC (top). At least two methods exist for incorporating a BAC-containing LncRNA transgene into the mouse genome (arrows). One involves standard pronuclear injection with attending random integration, often as multiple copies (right arrow). An alternative method is recombinase-mediated genomic replacement (RMGR) (left arrow). Features of each method are indicated at bottom. See text for details