The emerging importance of mobile DNA elements in disease

It has been generally held that about half of the human genome is derived from mobile genomic elements, but according to a recent estimate over two-thirds of our genome may result from the presence or ancient activity of 'jumping genes' [1]. This massive amount of DNA includes domesticated elements with evolutionarily spectacular functions, such as the RAG (recombination activating) genes that form the basis of V(D)J recombination in our immune system [2, 3], and the ERVWE1 (endogenous retrovirus group W, member 1) gene that plays a role in placental development [4]. Contrary to their evolutionary gifts, mobile elements are most notorious for being junk or for causing life-threatening human disorders by insertional mutagenesis and homologous recombination. As we usher in the era of genome-scale studies, it is clear that these elements have the potential to cause intra-individual and inter-individual variation and probably common disease through structural variation, deregulated transcriptional activity or epigenetic effects. Furthermore, large-scale studies have expanded the pool of human disorders resulting from retrotransposon-mediated insertional mutagenesis. Recent reviews have discussed the technical aspects of these new methods [58]. We focus here on the known, as well as inferred, potential health impact of their novel findings.

Types of mobile elements in the human genome and the main disease mechanisms

Human mobile elements can be categorized as DNA transposons or retrotransposons. DNA transposons move by a cut-and-paste mechanism, while retrotransposons mobilize by a copy-and-paste mechanism via an RNA intermediate, a process called retrotransposition. Retrotransposons can be further subdivided into long terminal repeat (LTR) and non-LTR elements. LTR retrotransposons are human endogenous retroviruses (HERVs) that have an intracellular existence as a result of a non-functional envelope gene. Non-LTR elements are classified as long interspersed elements (LINEs; the prototype of which is the RNA polymerase II transcribed LINE-1 (L1)), and short interspersed elements (SINEs), the latter consisting essentially of RNA polymerase III transcribed Alus. SVAs (SINE-R/VNTR (variable number of tandem repeat)/Alu) are also active non-LTR retrotransposable elements that are intermediate in size relative to Alus and L1s, and are likely to be transcribed by RNA polymerase II. Thus, they are best thought of as neither LINEs nor SINEs. L1s are the only known autonomously active human retrotransposons, as only they encode proteins (open reading frame 1 (ORF1) and ORF2) with which they can be mobilized. Retrotransposition occurs through a process called target primed reverse transcription (TPRT) [9]. L1s are also responsible for the mobilization of the non-autonomous Alus [10], SVAs [1013] and processed pseudogenes, which are cellular mRNAs that are reverse transcribed and inserted into the genome [14].

DNA transposons are considered to be immobile in the human genome. Accordingly, no human disease is known to arise as a result of their activity. Also, HERVs are thought to be retrotransposition defective in humans, but some may have retained their ability to move. For example, HERV-K113 has intact ORFs and has insertional polymorphisms in the human population, implying recent evolutionary activity [15]. Although no insertional mutagenesis by HERVs has been described, oncogenic ETV1 (ets variant 1)-HERV-K fusions generated by chromosomal translocation have been observed in prostate cancer [16, 17], and HERV expression has been suggested as a potential contributor to autoimmune diseases [18]. Furthermore, syncytin, an endogenous retroviral envelope protein playing a role in placental trophoblast cell fusion, is involved in breast cancer-endothelial cell and endometrial carcinoma cell fusions [19, 20]. Also, a LTR of a MaLR human endogenous retrovirus has been shown to aberrantly activate a proto-oncogene, thereby causing lymphoma [21].

The predominant mechanism by which L1s cause disease is insertional mutagenesis into or near genes [22, 23]. L1 insertions are often accompanied by 3' transduction, the co-mobilization of DNA sequences downstream of an L1 as a consequence of transcriptional read-through resulting from a weak L1 poly(A) signal [2427]. Alu sequences predominantly cause disease by homologous recombination between two Alu sequences, but insertion into or near exons, and aberrant Alu splicing from introns, also frequently result in pathological conditions [2830]. Furthermore, Alu RNA toxicity, a new disease-causing phenomenon, has been recently proposed to result in macular degeneration by DICER1 deficit [31]. Regarding the role of Alu elements in eye disorders, it is interesting that an Alu insertion polymorphism in the ACE (angiotensin I converting enzyme) gene has been associated with protection from the dry/atrophic form of age-related macular degeneration [32]. SVA elements also have the ability to interrupt genes through insertional mutagenesis that can be coupled with 3' transduction, genomic deletion or aberrant splicing [11, 3335]. The wide spectrum of disease cases caused by retrotransposons ranges from hemophilia to muscular dystrophy and cancer, and has been thoroughly reviewed [3638]. There have been 96 known retrotransposon insertions in disease cases, of which 25 are caused by L1s, while the other 71 are also L1-mediated. Among the latter, 60 cases are attributable to Alus, 7 to SVAs, and 4 to truncated inserts with only poly(A) sequence remaining [38]. Overall, retrotransposon insertions account for about 1 in 250 (0.4%) of disease-causing mutations [29]. Processed pseudogenes have not yet been found to cause human disease by de novo insertional mutagenesis, but facioscapulohumeral dystrophy has been demonstrated to arise as a result of the contraction of macrosatellite repeats leading to aberrant expression of an array of DUX4 retrogenes residing within the repeats [39, 40]. In addition, mutations in functional processed pseudogenes can cause disease. For instance, mutations in UTP14C have been associated with male infertility [41], mutations in TACSTD2/M1S1 result in gelatinous drop-like corneal dystrophy [42, 43], and PTENP1 is selectively lost in human cancers [44, 45]. The main characteristics of mobile elements capable of causing human disease are summarized in Figure 1.

Figure 1
figure 1

Types of retroelements implicated in human disease. ENV, envelope; GAG, group specific antigen; HERV, human endogenous retrovirus; kb, kilobase; LTR, long terminal repeat; ORF, open reading frame; POL, polymerase; SINE, short interspersed element; SVA, SINE-R/VNTR/Alu; TPRT, target primed reverse transcription; UTR, untranslated region; VNTR, variable number of tandem repeats. [99]. Black triangles indicate target site duplications.

In the next two sections, we discuss large-scale genome, transcriptome and methylation profiling studies of mobile elements in human diseases. We also discuss some non-genome-scale studies that support or contradict the implications of these novel findings.

Genome-scale approaches to identify new retrotransposon insertions

High-throughput sequencing has increased our capacity to generate large datasets at an unprecedented resolution. It is now possible to characterize genome sequences of scarce samples or even single cells. A next-generation sequencing technique with a high coverage of germline polymorphic human-specific L1 (L1Hs) retrotransposition events has been developed by Ewing and Kazazian, comprising hemi-specific PCR coupled to Illumina sequencing [46]. Using this approach, it has been demonstrated that many L1Hs elements are population-specific [46, 47], and recapitulate genetic ancestry similar to Alu insertion polymorphisms [48]. Retrotransposons are not only excellent markers for exploring population history, but can also give rise to population-specific diseases. For example, a homozygous Alu insertion in an exon of the MAK (male germ-cell-associated kinase) gene has been identified in 21 patients of Jewish ancestry who were diagnosed with retinitis pigmentosa [49]. Oddly, the discovery of this mutation using Agilent exome capture and subsequent Illumina and ABI sequencing was paradoxical, as attempts to remove repetitive sequences from the analysis led to the identification of the insertion [49]. Another population-specific disease caused by retrotransposon mutagenesis is Fukuyama-type congenital muscular dystrophy. It is one of the most common autosomal recessive disorders in Japan and was the first human disease found to result from ancestral insertion of an SVA element [35, 50, 51]. A similar example of an apparently ethnic-specific retrotransposon allele-mediated disease is an L1-mediated orphan 3' transduction into the dystrophin gene leading to Duchenne muscular dystrophy in a Japanese boy [52, 53].

An unexpected finding of state-of-the-art, large-scale approaches to study retrotransposon insertions has been that highly active (or 'hot' [54]) L1s are much more abundant in humans than previously appreciated. The outcome of a fosmid-based paired-end DNA sequencing strategy, coupled with a cell culture assay for retrotransposition, was that over half of newly identified L1s are hot, expanding the number of known hot L1s from 6 to 43 [55]. These L1s are not only expected to be a major source of inter-individual genetic variation [55], but hot L1s account for most examples of disease-causing insertions [54].

Another unusual finding of genome-scale approaches in retrotransposon biology has been that retrotransposition occurs at a very high frequency in somatic cells. Specifically, the brain was announced to be a bona fide territory for retrotransposition. Among three somatic tissues tested, this organ supported the highest level of endogenous L1 copy number, as assessed by quantitative PCR (qPCR) [56]. In another study that awaits further validation, over 7,700 L1s, 13,600 Alus and 1,300 SVA putative somatic insertions were found in the hippocampus of three individuals using retrotransposon capture sequencing, which is based on transposon array capture followed by Illumina paired-end sequencing [57]. Surprisingly, in this study, L1 and Alu insertions were over-represented in protein-coding genes and targeted genes, such as HDAC1 (histone deacetylase 1) and RAI1 (retinoic acid induced 1), which are known to be mutated in neurological disorders [57]. These findings suggest that if a retrotransposon inserts into a gene that functions in neurological development or psychological functioning early in development, it might affect a large enough area of the brain to lead to disease. One might further speculate that retrotransposition in a single brain cell could have some physiological consequences or impact memory formation through altered extracellular signaling to neighboring neurons. If such neuronal plasticity exists, it could affect behavioral phenotypes, and could be modulated by environmental factors [58]. Conversely, knockdown of an L1 regulating cellular factor has demonstrated an effect on L1 retrotransposition in the neurodevelopmental disorder Rett syndrome [59]. MeCP2 (methyl CpG binding protein 2) has been shown to repress L1 expression and retrotransposition [60], and increased L1 retrotransposition has been observed in induced pluripotent stem cells of patients with Rett syndrome who carry MECP2 mutations [59].

Ten brain tumors were examined for somatic L1 insertions by 454 pyrosequencing, but interestingly no retrotransposon insertions were discovered [61]. However, nine somatic L1 insertions were found in 6 out of 20 lung tumors with the same technique [61]. It was not determined though whether the normal tissues also contained some number of L1 insertions relative to the tumor tissue, and thus whether insertions in the cancer represented an elevated level of retrotransposon mobilization. Furthermore, it is not known if these insertions are transcribed or affect gene expression, and whether they were drivers or merely passengers of the tumorigenic process. The genome-wide methylation status of the lung tumors and adjacent normal tissue was also examined using an Illumina platform. All 6 patient DNA samples exhibiting tumor-specific L1 insertions were clustered together as hypomethylated, compared with 13 out of the remaining 14 samples that lacked somatic insertions. These data imply that a methylation signature distinguishes L1-permissive tumors from non-permissive tumors [61].

Another genome-scale method to genotype common retrotransposon insertion polymorphisms (RIPs) to identify genotype-phenotype associations uses array-based technology. Commonly, single nucleotide polymorphisms (SNPs) and copy-number variants have been used as markers in genome-wide association studies (GWASs) to map loci involved in human disease. RIPs are a valuable resource to investigate the role of these elements in phenotypic variation and disease. Also, generally a RIP is much more likely to be the causal variant than a SNP, because a large insertion is more likely to be disruptive of gene function than a single nucleotide alteration, and retrotransposons have many features that can interfere with gene expression (reviewed by Goodier and Kazazian [62]). On the other hand, strong selection exists against retroelement insertions into coding regions, where they are under-represented compared with SNPs [63]. Currently, one array-based approach has been conducted to detect retrotransposon insertions in human disease. Using transposon insertion profiling by microarray (TIP-chip), several novel L1 insertions on the X chromosome were discovered in male probands with presumptively X-linked intellectual disability [64]. Interestingly, one of the insertions occurred in the NHS gene, which is mutated in Nance-Horan syndrome, a condition associated with intellectual disability. Another promising insertion occurred in the DACH2 (dachshund homolog 2) gene that regulates neuronal differentiation [64]. However, confirmation studies are needed to demonstrate whether these insertions are the underlying cause of intellectual disability in these patients.

Except for the Baillie et al. study [57], which analyzed L1, Alu and SVA somatic insertions, the studies mentioned above concentrated on genome-wide detection of new L1 insertions. A notable study by Witherspoon et al. [65] developed a robust technique, termed mobile element scanning, to find new insertions of young Alu elements using PCR methods coupled with high-throughput sequencing. The group found approximately 500 de novo Alu insertions [65]. Their technique is applicable to all mobile elements, and is amenable to significant multiplexing of a number of DNA samples in one sequencing run.

Genome-wide methylation studies and transcriptome analysis of retrotransposons

It is speculated that one of the main roles of DNA methylation, in addition to epigenetic reprogramming, is to silence transposable elements [66]. Most methylation studies of human transposons have investigated malignancies and showed consistent hypomethylation (for example, [67, 68]), the extent of which, however, was variable in different tissues [69]. As the malignant phenotype is inherently associated with global as well as tumor-type-specific methylation changes [70], and transposable elements comprise the majority of the human genome, it is difficult to establish the role of transposon demethylation per se in tumorigenesis, especially without accompanying functional studies. It is possible that pathogenic cellular stress responses could result in local or global transposon deregulation - for example, via demethylation or chromatin modification. Once out of control, such an epigenetic deregulation might result in single or multiple retrotransposition events.

Retrotransposons located 5' of protein coding loci frequently function as alternative promoters. They might also express non-coding RNAs, and retrotransposons in the 3' UTR (untranslated region) of genes show strong evidence of reducing the expression of the respective gene, as assessed by cap analysis gene expression and pyrosequencing [71]. Thus, an altered retrotransposon methylation state is expected to affect either the transcription of the retrotransposon itself or that of nearby genes. Accordingly, it has been shown that hypomethylation of L1s can cause altered gene expression. Specifically, an L1 is located in the MET (hepatocyte growth factor receptor) oncogene, and hypomethylation of a promoter in this L1 induced an alternative MET transcript within the urothelium of tumor-bearing bladders. At the same time, in the bladder epithelium of cancer-free donors the methylation level of this L1 promoter was high and expression of the alternative MET transcript was low [72].

There are few studies that correlate human retrotransposon methylation with their transcription level on a genome-wide scale. According to one study, expression of L1 5' and 3' UTR sequences in prostate cancer was rather decreased, despite significant hypomethylation of the L1 promoter. Different HERV-K families showed opposite trends in expression levels, and the expression of evolutionarily young Alu families was restricted to individual prostate tumors as assessed by RT-qPCR and pyrosequencing [73]. In agreement with that study, transcriptional activation of L1s was not observed in globally hypomethylated hepatocellular carcinoma compared with matched normal tissue, as assessed by RT-qPCR [74]. Of note, the quantification of some types of expressed retroelements by using classical methods may prove ambiguous. Since Alu sequences are abundant in RNA polymerase II transcripts, quantification of the relatively rare RNA polymerase III transcribed Alu transcripts by RT-qPCR is fraught with potential error [75]. Transcribed L1 sequences embedded in genes may similarly confound the results of such quantitative measurements of L1 expression.

Using a custom GeneChip microarray for transcriptome analysis of several HERV families, it was shown that numerous HERV-W loci were overexpressed in testicular cancer [76]. Interestingly, one of these was an ERVWE1 transcript whose expression is usually restricted to the placenta. Methylation was severely or completely diminished at HERV-W sequences in the tumor DNA, suggesting that DNA methylation and HERV-W expression is interrelated in this tumor context [76]. With a genome-wide technique termed selective differential display of RNAs containing interspersed repeats and with its modified version, termed L1 chimera display, it has been also demonstrated that the levels of many HERV-K LTR transcripts differ between normal and testicular germ cell tumor tissues [77], and that the L1 antisense promoter gives rise to novel chimeric transcripts that are unique in tumor samples [78]. Furthermore, the cancer-specific chimeric L1 transcripts could be induced in non-malignant cells by using the demethylating drug 5-azacytidine [78].

It will be interesting to learn if tumor-specific retrotransposon profiles reveal enhanced retroelement mobility. For example, L1 retrotransposition is associated with genetic instability [79], a hallmark of cancer [80]. Thus, local or global overactivation of L1s could have the potential to contribute to tumorigenesis. In particular, germ cell tumors are good candidates to examine cancer-specific retroelement activity, because the genome of germ cells goes through epigenetic reprogramming through methylation at CpG sites. Thus, deregulation of this process might easily lead to the derepression of transposable elements, and potentially to germ cell tumors. In support of this hypothesis, the L1 ORF1 protein was overexpressed in all 62 cases of investigated childhood malignant germ cell tumors relative to adjacent normal tissue and was associated with poor differentiation [81]. Testicular germ cell tumors should also be examined for L1-conferred hereditary disease, as no high penetrance susceptibility genes have been identified in this condition. With pyrosequencing of bisulfite-treated DNA using L1-specific primers, transgenerational L1 methylation inheritance was implicated to be associated with testicular cancer risk [82]. Thus, L1s are attractive candidates for both somatic drivers and hereditary predisposition factors in germ cell tumors and possibly in other cancer types. However, currently their functional impact in malignancy is poorly understood.

Concluding remarks and future directions

Genome-scale technologies now provide us with the opportunity to investigate retrotransposon biology in unprecedented detail. Ultimately, it will be important to test the functional consequences of these results, such as the effect of RIPs on gene function, and their role in cancer and neurological disorders. This outcome might be accomplished by classical functional studies, or by combining the results of several genome-scale experiments. For instance, if comprehensive RIP profiles were coupled with next-generation RNA sequencing data, it would allow testing of hypotheses pertaining to retrotransposons and their effects upon gene expression. Such platforms would also be useful to explore whether there is a role for common RIPs in common disease and if these RIPs convey the disease phenotype through expression. In a similar manner, one could incorporate chromatin/methyl-seq/RNA/ChIP-seq profiles for DNA-binding or RNA-binding proteins with the respective RIP profiles. It would also be advantageous to carry out studies to explore whether any overlap between a GWAS hit and a known RIP exists, as the RIP might indeed be the causal variant.

As an alternative genome-scale approach to understand the impact of human transposons on disease, functional genetic screening strategies could be developed in cell culture. For instance, haploid cell lines [83, 84] and BLM (Bloom syndrome, RecQ helicase-like)-deficient cells that can be converted to generate a genome-wide library of homozygous mutant cells [8587] are available to be mutagenized and screened for any desirable phenotype, such as altered retrotransposition activity, using suitable read-out systems. One such system could be a retrotransposition assay, where an L1 reporter construct has been designed so that translation of the reporter (drug-resistance gene or enhanced green fluorescent protein) occurs only after L1 reverse transcription and insertion of its cDNA copy into the genome [88, 89]. Also, genome-wide mutagenesis might be accomplished with mobile elements themselves, such as retroviruses or DNA transposons [8587]. Similarly, large-scale small interfering RNA (siRNA) and cDNA functional genetics screening strategies could be designed to identify host cell factors modulating L1 activity. One should also investigate whether some host factors elicit a disease phenotype through deregulated retrotransposon activity. For example, the remarkable finding of the role of Alu RNA toxicity due to DICER1 deficiency in macular degeneration [31] needs to be replicated by alternative methods. Those methods should exclude the possibility that what is really being detected is amplification of the closely related 7SL RNA or Alu sequences contained in RNA polymerase II transcripts, which vastly outnumber the RNA polymerase III-transcribed Alu elements [75]. Also, functional genetic follow-up studies should circumvent - if at all feasible - non-specific toxicity arising as a result of ectopic Alu overexpression or antisense oligo-mediated downregulation of essential RNA polymerase II transcripts with embedded Alu sequences. DICER1 may also have a role in tumorigenesis through retrotransposon overexpression, as germline mutations in this gene have been found in familial pleuropulmonary blastoma [90] and in familial multinodular goiter with ovarian Sertoli-Leydig cell tumors [91]. Sense and antisense transcripts derived from L1 promoters could be processed to siRNAs that might suppress retrotransposition by RNA interference [92], and DICER1 has been implicated in this process [93]. These data raise the possibility that genomic instability in some malignancies could arise - at least partly - from retrotransposon overdose as a consequence of a mutated small non-coding RNA pathway. This could lead eventually to retrotransposon RNA toxicity [31], genotoxic stress through DNA nicking by ORF2 [94], or elevated insertional mutagenesis [61].

For the future of personalized medicine it will be vital not to exclude the transposon profile of patients, as exemplified by the case of an Alu insertion in a retinitis pigmentosa proband [49]. Another aspect of personalized medicine is gene therapy. In one form of gene therapy, antisense oligonucleotides that block aberrant splicing into an intronic SVA that causes Fukuyama muscular dystrophy has been suggested [35]. Another aspect of gene therapy is the use of DNA transposons that hold the promise of lower immunogenicity, enhanced safety profile and reduced manufacturing costs compared with viral vectors [95]. Two DNA transposons from non-mammalian species have emerged as gene therapy tools based on their efficient transposition in humans: the reconstructed Tc1/mariner element Sleeping Beauty from salmonid fish and piggyBac from the baculovirus genome [95, 96]. The first ex vivo gene therapy clinical trial using Sleeping Beauty has been approved [97], and induced pluripotent stem cells are now being generated after targeted gene correction using piggyBac technology [98]. Once the potential side-effects of these therapies - such as secondary mutagenesis resulting from transposon hopping or activation of nearby genes - are overcome, the roles of mobile elements can be redefined from being just 'junk' or 'enemy' to 'life-guards' of our genomes.