Gene editing in the context of an increasingly complex genome
The reporting of the first draft of the human genome in 2000 brought with it much hope for the future in what was felt as a paradigm shift toward improved health outcomes. Indeed, we have now mapped the majority of variation across human populations with landmark projects such as 1000 Genomes; in cancer, we have catalogued mutations across the primary carcinomas; whilst, for other diseases, we have identified the genetic variants with strongest association. Despite this, we are still awaiting the genetic revolution in healthcare to materialise and translate itself into the health benefits for which we had hoped. A major problem we face relates to our underestimation of the complexity of the genome, and that of biological mechanisms, generally. Fixation on DNA sequence alone and a ‘rigid’ mode of thinking about the genome has meant that the folding and structure of the DNA molecule —and how these relate to regulation— have been underappreciated. Projects like ENCODE have additionally taught us that regulation at the level of RNA is just as important as that at the spatiotemporal level of chromatin.
In this review, we chart the course of the major advances in the biomedical sciences in the era pre- and post the release of the first draft sequence of the human genome, taking a focus on technology and how its development has influenced these. We additionally focus on gene editing via CRISPR/Cas9 as a key technique, in particular its use in the context of complex biological mechanisms. Our aim is to shift the mode of thinking about the genome to that which encompasses a greater appreciation of the folding of the DNA molecule, DNA- RNA/protein interactions, and how these regulate expression and elaborate disease mechanisms.
Through the composition of our work, we recognise that technological improvement is conducive to a greater understanding of biological processes and life within the cell. We believe we now have the technology at our disposal that permits a better understanding of disease mechanisms, achievable through integrative data analyses. Finally, only with greater understanding of disease mechanisms can techniques such as gene editing be faithfully conducted.
KeywordsGene editing Genomic complexity Genome Transcriptome Epigenome Sequencing technology development Complex genetics CRISPR Integrated omics
Chromosome conformation capture
Chromosome conformation capture on chip
Chromosome conformation capture carbon copy
Acute coronary syndrome
Activation-induced cytidine deaminase
Acute myocardial infarction
Assay for Transposase Accessible Chromatin sequencing
B-type natriuretic peptide
Coronary artery disease
Circulating free DNA
Congestive heart failure
Chromatin Interaction Analysis by Paired-End Tag sequencing
Chromatin isolation by RNA purification sequencing
Calf Intestinal alkaline Phosphatase Tobacco Acid Pyrophosphatase
Crosslinking, ligation, and sequencing of hybrids
Clustered regularly interspaced short palindromic repeats
Capped small RNAs
Circulating tumour cells (CTCs)
Circulating tumour DNA
- DNase I HS site
DNase I hypersensitive site
DNase I HS site sequencing
ENCyclopedia Of DNA Elements in the human genome
Formaldehyde-Assisted Isolation of Regulatory Elements sequencing
Functional ANnoTation Of the Mammalian genome
Gradient gel electrophoresis
Global Run-On sequencing
Genome-Wide Association Studies / Study
Human Genome Project (HGP)
High-throughput chromosome conformation capture
High Throughput Sequencing Crosslinking and Immunoprecipitation
HOX transcript antisense RNA
High performance liquid chromatography
Inosine Chemical Erasing
International Cancer Genome Consortium
Individual-nucleotide resolution UV cross-linking and immunoprecipitation
Leber congenital amaurosis
Long intergenic non-coding RNA
Loss of heterozygosity
Methylation of the N6 position of adenosine
MNase-Assisted Isolation of Nucleosomes Sequencing
Methylated RNA Immunoprecipitation sequencing
Massively parallel signature sequencing
Native elongating transcript sequencing
The National Human Genome Research Institute
National Health Service
Nuclear magnetic resonance
Oxford Nanopore Technologies
Protospacer adjacent motifs
Photoactivatable Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation
Parallel Analysis of RNA Ends sequencing
Parallel analysis of RNA structure
Proprotein convertase subtilisin/kexin type 9
- PRE1 / PRE2
putative regulatory element 1 / 2
RNA binding protein
Retrotransposon Capture sequencing
RNA immunoprecipitation sequencing
Serial analysis of gene expression
Synergistic Activation Mediator
Sequencing by synthesis
Selective 2’-Hydroxyl Acylation analyzed by Primer Extension sequencing
Somatic cellular hypermutation
T-cell acute lymphoblastic leukaemia
The Cancer Genome Atlas
Translocation Capture sequencing
Transcript Isoform Sequencing
Translating Ribosome Affinity Purification sequencing
Transcription start site
- US NCEP
US National Cholesterol Education Program
Vertical auto profile
Variant Creutzfeldt-Jakob Disease
X-Inactive Specific Transcript
Life is more complex than we had previously thought. We have mapped the entire healthy human genome [1, 2] but many unanswered questions and challenges remain in terms of the genome’s relationship with disease [3, 4, 5]. Indeed, when former President Clinton exited the White House to announce the first draft of the human genome, his words were met with the belief that we had made a paradigm shift toward a better understanding of human disease, with DNA being likened by Clinton to “the language in which God created life” . Fast approaching 20 years since that announcement from the White House in June, 2000, and it may feel as if the fanfare that accompanied the occasion was premature. Perspective is a luxury, though, and although it can feel like research in the biological and medical sciences (‘biomedical sciences’) since that time has been slower than expected, we have nevertheless made huge progress, even looking far beyond the genome.
Indeed, international landmark projects such as the encyclopaedia of DNA elements in the human genome (ENCODE)  and functional annotation of the mammalian genome (FANTOM)  have shone much light on life’s complexity through their studies on the transcriptome and epigenome, confirming the earliest conclusions by Lander and colleagues in their summary of the first human genome sequence : “The potential numbers of different proteins and protein–protein interactions are vast, and their actual numbers cannot readily be discerned from the genome sequence. Elucidating such system-level properties presents one of the great challenges for modern biology”. The challenge to which Lander alludes is still very much felt today, and these words are being confirmed as we delve even further into disease mechanisms and pathobiology.
Projects like ENCODE  and FANTOM  provide evidence that it’s no longer sufficient to think of DNA as the Holy Grail. Despite this, much focus and attention is still given to the genome and its usage in tackling disease through ‘genomic medicine’ and ‘personalized medicine’ [9, 10, 11, 12]. However, there is doubt [13, 14, 15], and it has become apparent that simply knowing the sequence of DNA is not enough to fully understand disease and to drive us forward.
breast cancer CCND1 locus. Status: unsolved
In breast cancer, germline SNPs at 11q13 in the vicinity of CCND1 have puzzled researchers for decades. Cyclin D1 (CCND1) is key to cancer development: over-expression of CCND1 has been found in numerous cancers, whilst repression of CCND1 impairs homologous recombination-mediated DNA repair, making cells more sensitive to damaging agents.
From GWAS, rs614367 is one of the SNPs most associated with ER+ (oestrogen-positive) breast cancer (p = 10− 39) . The only problem with rs614367 is that it is located in a large intergenic region, upstream of CCND1 - its function and how it alters CCND1 expression remains unknown. A separate study then found more intergenic SNPs at 11q13 in linkage disequilibrium with the original SNP, rs614367. These newly-identified SNPs are located within known enhancers and silencers of CCND1: PRE1 and PRE2 (putative regulatory elements 1 and 2) . Their role is thought to be in modulating the binding of the ELK4 and GATA3 transcription factors, most likely modifying transcription of CCND1.
Conclusion: The exact mechanism is still yet to be understood.
In genomics, currently, many studies have shifted focus to rare variants in the belief that these will help us to better understand disease. The Department of Health in England has also launched a company, Genomics England, who are in the process of sequencing the genomes of patients recruited from within the National Health Service (NHS). The emphasis of Genomics England is on the study of rare diseases and the contribution of genomic variants to these (Genomics England, available from: http://www.genomicsengland.co.uk [Accessed March 4, 2017]). With the aim of sequencing 100,000 genomes, this project will undoubtedly add much to our knowledge of rare variants and rare disease but, as per other landmark sequencing projects, it will equally leave us with many questions and not bring us much closer to fully understanding disease mechanisms. The hypothesis that rare variants even contribute greatly to disease must be brought into question, and it has been [32, 33, 34, 35, 36]. Results from recent studies infer that complex phenotypes and diseases are in fact brought about by a mixture of both common and rare variants, each with different effect sizes [37, 38, 39, 40, 41]. Additionally, as monogenic diseases appear to be in the minority, with most phenotypic traits and diseases appearing to be dictated by complex genetics, sequencing projects will never advance our knowledge of these to a great extent without thinking beyond the genome. Unfortunately, we can neither abandon these genome sequencing efforts because the information they provide is complementary to everything observed elsewhere in the cell.
Including knowledge of the transcriptome with that of the genome can help to hone down the list of genomic regions that are likely to be implicated in disease and, as we’ll see, the transcriptome and genome are inextricably connected. Again, in cancer, studies looking at gene expression in the past have been very successful in both segregating cancer into subtypes and also identifying the key oncogenic drivers of each [42, 43, 44]; yet, despite this, these still fail to complete our understanding of the underlying biological mechanisms for most findings. In fact, the results from ENCODE  prove to us that regulation at the level of the transcriptome is just as complex as that at the level of the genome, a finding echoed elsewhere in an earlier study by Mercer et al. . Indeed, the original estimate on the number of protein coding genes upon the completion of the Human Genome Project (HGP) was 30,000–40,000 , which is a reasonable estimate, but it fails to take into account the now almost 200,000 identified transcripts and their splice isoforms that code for a messenger RNA (mRNA) that are either protein coding or have regulatory potential . In fact, we now realise that only a small fraction —up to 2%— of the genome is actually transcribed into mRNA and then translated into protein . Surprisingly, a much larger fraction —up to 70%— is transcribed into mRNA but not translated into protein - these are the non-coding RNAs (ncRNAs). Although for most of these ncRNAs the function (if any) remains unknown, some have been known for a long time, such as X-inactive specific transcript (XIST), which acts as an effector in female chromosome X inactivation . Others, such as HOX transcript antisense RNA (HOTAIR), are strongly implicated in cancer . In addition, regulation at the level of the transcriptome is intertwined with that of both itself and the genome through ncRNA interactions  —including micro-RNA (miRNA) , antisense RNA , long intergenic non-coding RNA (lincRNA) [51, 52, 53], etc.— and also further afield at the level of chromatin  and the proteome.
One could make the argument that the complexity of the transcriptome, in fact, far supersedes that of the genome due to the almost innumerable number of potential RNA interactions that can occur between DNA, proteins, and other RNA species, echoing Lander’s earlier words. Transcription at a given locus is also quantifiable, with different levels of a transcript having potentially key roles in determining pathway and cell-type lineages (e.g. Sox2, Oct4, and Nanog) , and also functioning as buffers and dictating the transcription of other RNA species, as is seen with antisense RNA . Antisense RNA transcripts are of particular interest because they stump the long held belief that transcription only occurs on a particular DNA strand. As transcription factors and enhancers do not know the rules that we believe they follow and merely bind to wherever there is an accessible matching motif, be it on the coding or non-coding strands, transcription on both strands can be expected. At certain genomic regions, transcription may even be physically ‘blocked’ when the same gene is being transcribed concurrently on both the sense and antisense strands as both RNA polymerases collide .
A gambit of technological methods to interrogate the genome’s complexity in every possible way
RNA transcription, translation, and binding
Chromatin Isolation by RNA purification sequencing (ChIRP-seq) is used to determine regions of the genome that are bound by a specific RNA species.
Crosslinking, Ligation, And Sequencing of Hybrids (CLASH) is capable of determining RNA-RNA binding interactions.
Active RNA transcription
Global Run-On sequencing (GRO-seq) determines the sites in the genome at which active transcription is occurring by targeting transcriptionally-engaged RNA polymerases.
Active RNA transcription
Native elongating transcript sequencing (NET-seq) determines, at nucleotide resolution, the sites in the genome at which active transcription is occurring by targeting the 3’ends of nascent transcripts associated with RNA polymerases.
Active RNA translation
Ribosome sequencing (Ribo-seq) is capable of identifying ribosome-bound messenger RNAs (mRNAs), i.e., mRNAs that are under active translation.
Active RNA translation
Translating Ribosome Affinity Purification sequencing (TRAP-seq) quantifies all mRNAs that are associated with 80s ribosome.
RNA Immunoprecipitation sequencing (RIP-seq) is used to determine RNA species that are bound to a RNA binding protein (RBP) of interest.
High Throughput Sequencing Crosslinking and Immunoprecipitation (HITS-CLIP) is used to determine RNA species that are bound to a RBP of interest.
HITS-CLIP is similar to RIP-seq with an added in vivo UV crosslinking step that improves specificity at the RNA-protein boundary.
Photoactivatable Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation (PAR-CLIP) determines RNA species that are bound to a RBP of interest. PAR-CLIP improves on HITS-CLIP and RIP-seq through the inclusion photoreactive ribonucleoside analogs, which further improves specificity at the RNA-protein boundary during crosslinking.
Individual-nucleotide resolution UV cross-linking and immunoprecipitation (iCLIP) determines RNA species that are bound to a RBP of interest, and provides base-level specificity at the RNA-protein boundary.
miRNA target RNA
Parallel Analysis of RNA Ends sequencing (PARE-seq) looks at the 5′ ends of polyadenylated products of miRNA-mediated mRNA decay to identify miRNA-target RNA pairs.
RNA transcript isoforms
Transcript Isoform Sequencing (TIF-seq) allows for the identification of transcript isoforms by mapping their exact 5’ start and 3’end boundaries.
RNA form and structure
RNA secondary and tertiary conformation
Selective 2’-Hydroxyl Acylation analyzed by Primer Extension sequencing (SHAPE-seq) utilizes SHAPE chemistry followed by multiplexed paired-end deep sequencing of primer extension products and bioinformatic analysis using a maximum likelihood model to infer secondary and tertiary RNA structure.
RNA secondary structure
Parallel analysis of RNA structure (PARS) determines RNA secondary structure simultaneously for thousands of RNA molecules via enzymatic footprinting with different RNAses.
RNA secondary structure
Fragmentation sequencing (Frag-seq) determines RNA secondary structure transcriptome-wide via P1 endonuclease, which cleaves single-stranded nucleic acids.
Inosine Chemical Erasing (ICE) identifies inosines on RNA species in the context of adenosine-to-inosine (A-to-I) conversion, a post-transcriptional modification that diversifies the transcriptome in various pathways.
RNA methylation of the N6 position of adenosine (m6A)
Methylated RNA Immunoprecipitation sequencing (MeRIP-Seq) identifies RNA species with methylation of the N6 position of adenosine (m6A), a post-transcriptional RNA modification.
Cap-seq / CIP-TAP
RNA 5′ capping
Cap sequencing (Cap-seq) and Calf Intestinal alkaline Phosphatase Tobacco Acid Pyrophosphatase (CIP-TAP) both enrich for the 5′ ends of Pol II RNA species and differ based on the following: Cap-seq is selective for long-capped RNAs; CIP-TAP is selective for capped small RNAs (csRNAs). Both therefore define Pol II transcription start sites (TSSs).
Global mapping of active regulatory chromatin, i.e., nucleosome-depleted
DNase-seq identifies regulatory regions by targeting DNase I hypersensitive (HS) sites.
Global mapping of active regulatory chromatin, i.e., nucleosome-depleted
Formaldehyde-Assisted Isolation of Regulatory Elements sequencing (FAIRE-seq) identifies regions of active chromatin that coincide with DNase I HS sites and others.
Global mapping of histone-bound DNA, i.e., nucleosome positioning
MNase-Assisted Isolation of Nucleosomes Sequencing (MAINE-seq) identifies histone-bound DNA via digestion by micrococcal nuclease (MN).
Global mapping of both active regulatory chromatin and histone-bound DNA
Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) identifies regions of DNA via hyperactive Tn5 transposase, which inserts adapters into accessible regions of chromatin.
Detects global chromatin interactions and infers 3-D structure
Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) isolates chromatin interactions by formaldehyde cross-linking, sonication, and then chromatin immunoprecipitation (ChIP). Paired chromatin DNA fragments are then connected with linkers.
3-C, 4-C, 5-C, Hi-C
Captures interactions within and between chromosomes and infers 3-D structure
Chromosome conformation capture (3C), chromosome conformation capture on chip (4C), 3C-carbon copy (5C), and high-throughput chromosome conformation capture are methods used to identify chromatin interactions at short ranges between 2 loci (3C) or long ranges via multiple loci (Hi-C).
Retrotransposon Capture sequencing (RC-seq) enriches for mobile the 5′ and 3′ termini of mobile genetic elements.
TN-seq / INseq
Mariner transposon insertions
Transposon sequencing (TN-seq) and Insertion sequencing (INseq) study the Himar I Mariner transposon.
DNA double strand break-mediated rearrangements
Translocation Capture sequencing (TC-seq) identifies AID-dependent chromosomal rearrangements.
Chromatin structure and folding
The transcriptome and its innumerable potential interactions operate within the spatiotemporal confines of densely-packed chromatin, i.e., DNA tightly wound around histones, which is itself ever changing in relation to cell cycle processes  and in preparation and response to transcription [61, 62]. Although research at the level of chromatin is still not a primary interest for many research groups, we are nevertheless now beginning to better appreciate the 3-dimensional structure and folding of the DNA molecule and the role that this plays in regulation and disease mechanisms. DNA ‘accessibility’ is also key, as much of the genome remains inaccessible to the cytosol, thus, shielding these regions ―including any binding motifs within them― from transcription factors and other proteins.
Mercer and Mattick provide an outstanding review of genomic complexity, highlighting the importance of DNA-protein interactions and ncRNAs in, literally, shaping the genome and regulating gene expression in diverse ways . The ability to capture the 3-dimensional structure of a portion of chromatin can be achieved through chromosome conformation capture (3C) technology  - other, more complex, ways of interrogating chromatin and its interactions, including chromosome conformation capture on chip (4C), chromosome conformation capture carbon copy (5C), and high-throughput chromosome conformation capture (Hi-C), are mentioned in Table 2. Achieving this genome-wide to produce a ‘structural reference chromatin’, akin to the feats achieved by the HGP and ENCODE for the genome and transcriptome, respectively, is currently over-ambitious and poses a major challenge . Moreover, based on what we now understand, DNA in its chromatin state is a ‘fluid’ molecule ―not ‘fixed’ and static― that is constantly altering its structure inside the nucleus in relation to protein, ncRNA, and environmental interactions.
The inherent genetic makeup of each individual’s genome —mainly in terms of copy number variation, SNPs, short tandem repeats, retrotransposons, etc. — would additionally translate to subtle variation in chromatin structure. Trying to delineate this level of subtlety could only be accurately predicted by entering the realm of quantum chemistry and by shifting the view of DNA from being a sequence of letters to that of a large, complex, deoxyribonucleic molecule, as it was when it was first discovered , which interacts with proteins and other nucleic acids in the cytosol via diverse electrochemical and electromagnetic interactions. Such work is currently being done in the quantum chemical and mechanical sciences [66, 67, 68], but is currently not a primary focus of this review. In addition, although trying to model an entire human DNA molecule in this way would be useful, it is computationally unfeasible.
With a greater appreciation of the importance and complexity of the genome, transcriptome, and epigenome, one can thus begin to imagine a very dynamic environment within the cytosol —a cellular ‘microcosm’ of activity—, whereby transcription is a pervasive process with transcription factors binding at numerous loci in the genome and initiating transcription where the electromagnetic potential, i.e. ‘binding strength’, mediated via certain DNA motifs or interactions with other proteins, is sufficiently strong such that transcription of downstream targets can ultimately occur - where the binding is not sufficiently strong, transcription of targets may be weak or not occur at all; an environment where the ‘pillars’ that give chromatin its shape and form, i.e., histones, are responding to environmental stressors  in a cell type-specific manner and, in this way, increasing or decreasing the accessibility —or ‘opening up’ or ‘closing’ loops— of certain DNA regions to factors in the cytosol, thus modifying expression profiles; finally, an environment where chemical modification of DNA bases, e.g., the addition of methyl groups (or ‘methylation’) is again brought about via environmental interactions and which actively hampers the expression of genes by, in part, reducing the binding of transcription factors [70, 71].
The technology that has driven research
A historical perspective: C.1980s onwards
Much of the challenge for understanding the mechanisms that drive the structure and function of nucleic acid, i.e., DNA and RNA, are limited by available technology. Although we now have numerous ways of interrogating the secrets of the genome (Table 2), automated sequencers utilising the dideoxy-sequencing method of Sanger  have been relied upon for DNA sequence information since 1977. The first successful automated sequencing runs utilised the Applied Biosystems (ABI) 370A and sequenced two cDNA clones encoding the muscarinic cholinergic receptor and the ß-adrenergic receptor within a rat heart cDNA library  - at the time, it was claimed that one sequencer could obtain > 30,000 bases with five overnight sequencing runs. Given the fact that the haploid human genome is approximately 3.5 billion bases-pairs, in 1987 sequencing one human genome on 100 of these instruments would have taken 5000 days or 13.7 years, with a cost of undoubtedly astronomic proportions.
Thus, whilst sequencing the cellular genome was first discussed as early as 1984  and was a chief goal of the HGP , clearly no one intended to sequence an entire human genome with the ABI 370A on a routine basis. However, innovations ensued, detection methods were enhanced with the advent of capillary electrophoresis  and, in 2001, with multiple high throughput DNA sequencers (ABI 3700) running in tandem, the human genome was sequenced in two efforts [1, 2] with roughly 90–95% genomic coverage, and in a relatively short amount of time: 15 months  and 9 months .
These efforts provided for a momentous event in our quest to understand DNA, colloquially referred to as ‘the code of life’, and they provided impetus to sequence and understand DNA at an even quicker pace in the future. Whilst saying this, the first attempt to then move beyond ABI’s automated sequencer was not driven by efforts to sequence the human genome; rather, “to discover and understand the function and variation of genes” . The term massively parallel signature sequencing (MPSS) was used to describe a sequencing platform that would become the prototype for what was to follow as we entered the twenty-first century . This platform was able to sequence millions of DNA strands at one time in conjunction with in vitro cloning of cDNA on microbeads. The instrument employed an innovative system that utilised a charge-coupled device (CCD) detector followed by image processing of fluorescent signals corresponding to each of the 4 deoxynucleotides. The method harnessed biochemical and enzymatic reactions to deliver short tags that were 16 to 20 bases long, referred to as ‘signature sequences’. This approach, developed as an alternative to the highly variable probe hybridising methods of microarray chips  was known, previous to MPSS, as serial analysis of gene expression (SAGE), which originally relied on short tags of 9 nucleotide bases . Each of these methods —MPSS, SAGE, and the hybridisation method of arrayed cDNA libraries (microarrays)— relied upon previous knowledge of the mRNA sequences that code for the genes of interest. These platforms in a strict sense were not and are not DNA sequencers in the same way that a sequencer is defined today. Thus, it was impractical to expect MPSS to be able to carry out de novo sequencing on the genome of biological organisms that had not yet been deciphered.
In 2005 and 2006, after years of academic research into improved biochemical processes, two sequencing platforms emerged: the 454 sequencer  and the Illumina/Solexa Genome Analyzer, which both utilised sequencing by synthesis (SBS). This method, outlined in Hyman , involves the detection of the base-by-base addition of each of the 4 nucleotide bases facilitated by a biochemically engineered DNA polymerase. The detection method utilised in the 454 sequencer  takes advantage of the release of pyrophosphate (PPi), which occurs after the addition of each base, and then becomes the substrate for a coupled enzymatic reaction with luciferase that results in the release of light . Another group at the University of Cambridge developed a platform that involved a novel single molecule approach with a laser detection system  that utilised nucleotides adapted with florescent and reversible 3′ terminator moieties, which in effect preserved the viability of the growing DNA molecule as it was replicated from the double-stranded template. This sequencing method became the driving force behind the technology spawned by engineers at Solexa, later acquired by Illumina . A similar detection method involving fluorescently-labelled nucleotide bases was developed by a group at Columbia University [85, 86]. At the time, several competing technologies were attempting to replace the dideoxy Sanger sequencing method, then considered the gold standard for DNA sequencing .
What was driving this profusion of technological innovation? The goal for all of the competing technologies was to introduce a massively parallel sequencing platform that could sequence a genome in a matter of days instead of months. Thus, one could argue that we have had such an intense interest in the relationship of DNA sequence to disease due in part to the fact that the first technological successes that came out were specifically designed to read DNA sequence quickly, reminiscent of the series of technological advances that came from Apollo Program. Indeed, the concept of the ‘personal genome’, which envisions a world where everyone can have their genome sequenced for as little as $1000 , has propelled much of the change and innovation that has occurred during the past 15 years. While the technologies introduced by 454 Life Sciences in 2005 and Illumina/Solexa in 2006 demonstrated a remarkable ability to sequence DNA at a rate that was orders of magnitude faster than the ABI sequencers, they did not deliver the $1000 genome.
Then, in 2008, Baylor College of Medicine reported the sequencing of Dr. James Watson’s complete genome with the 454 sequencing platform to a depth of 7.4-fold  - it took 2 months and cost less than US$1 million. Comparative bioinformatics revealed 3.3 million SNPs and structural variation in Dr. Watson’s genome. Also in 2008, in a report outlining the SBS method first developed by Balasubramanian and Klenerman  at Cambridge, the genome of a male Yoruba from Nigeria was sequenced to > 30× with the Genome Analyzer (Illumina/Solexa) , taking 8 weeks to complete at a cost of US$250,000.
Modern technological advances: C.2010 onward
The utilitarian needs that serve to advance technology often result in unanticipated discoveries that carry research in new directions. Pacific Biosciences (PacBio) developed a platform based on single-molecule real-time (SMRT) sequencing that was able to successfully sequence very long fragments of DNA . In 2010, it was recognised that the SMRT technology would be able to secure read lengths greater than 1 Kbp, which far surpassed the capability of the SBS method at that time, i.e., 100-150 bp (Genome Analyzer) and 330 bp (Roche 454) . Soon thereafter, the SMRT technology was utilised in a de novo sequencing method to demonstrate its ability to sequence the entire genome of a bacteria using only a single, long insert shotgun DNA library . The mean length of the reads for this work was 5777 bp with a mean accuracy of 99.9%. Prior to this research conducted by Chin et al. , the SMRT platform was already deemed valuable as a tool for microbial phylogenetic profiling. The platform has inherent advantages over Sanger and Roche 454 for sequencing the 16S ribosomal RNA (rRNA) genes within microbial populations, which require longer reads to give finer resolution . Due to the fact that the SMRT platform gives reads that are four times longer than the 454 platform and does not require a library amplification step, the cost was at that time significantly less than other sequencing technologies.
In addition to the recent proliferation of research conducted in the field of microbial profiling, longer read sequencing technologies have been utilised in attempts to produce haplotype-resolved genome sequences, i.e. haplotype phasing. The need for this type of sequence information becomes apparent when considering hereditary disorders, which are invariably linked to the haplotype and mode of inheritance . In addition to SMRT, Oxford Nanopore Technologies (ONT) also developed a platform that provides haplotype phasing; however, high error rates seen in both of these platforms proved to be a difficult hurdle to move past when it was discovered that PCR-chimera formation was not detected by software assembly programs . An alternative approach to increasing the read length to gain long contiguous reads is to manipulate the upfront library preparation with a method that assigns a molecular barcode to very long (> 50 Kbp) DNA fragments, which are then sequenced with a short read NGS platform. This approach ensures that excessive chimera formation will not take place. After sequencing, bioinformatic algorithms assemble the fragments into a haplotype-resolved genomic sequence, e.g., 10× sequencing (10× Genomics, Pleasanton, USA). This method (from c.2015), along with single cell DNA and RNA sequencing, represents the current state of the art in terms of technological advances in sequencing since the HGP in 2000, and involves the attachment of several million synthetic barcodes —each to one DNA fragment within the genome of interest—, which can then furnish a de novo assembly of any genome and incidentally provide the haplotype phasing of that genome .
Regarding the role of PCR and NGS, it is important to grasp that, for most if not all sequencing methods, DNA amplification is a necessary preliminary step in order to increase the detection signal, whether that signal will originate from the excitation of a fluorescently labelled molecule (e.g. SBS), emitted light resulting from an enzymatic reaction (e.g. via PPi release), or the disruption of an electrical current (e.g. ONT). However, PCR-driven amplification will result in artefacts such as chimera formation, mentioned above, as well as random base modification errors . To overcome base errors, NGS methods are designed to sequence at great depths of coverage to ensure that these errors —and indeed basecalling errors due to the sequencing process itself— can be bioinfomatically removed from the final data, or at best reduce their influence. For example, thresholds can be set for a minimum sequencing read depth over each base position during variant calling to ensure that errors retain less influence. On the other hand, PCR-chimera formation cannot be entirely eliminated from any NGS method without specific algorithms designed to target each region of interest within the sequencing data in order to computationally identify the chimeric events. Of importance, however, the length of the PCR amplicon affects the prevalence of chimera formation, with shorter PCR amplicons resulting in lower numbers of chimeric sequences. In saying this, when NGS is utilised to gain insight into the presence of SNPs without regard to how these variants relate to one another, in terms of haplotypes, then chimeric artefacts do not pose the same problem as when a definitive haplotype phasing determination is the goal.
Cutting edge gene editing technology
Complex genetics, complex disease: Room for gene editing?
Crisis ‘bee’. Status: imminent problem
In recent years, domesticated honeybees (Apis mellifera) and commercially-reared bumblebees (Bombus terrestris) have become increasingly important in global crop production by enhancing pollination , as global agriculture faces the major challenge to maintain food security to feed an ever-increasing human population. The challenge grows bigger by the severe declines suffered by these pollinators due to land use change, causing habitat loss, fragmentation, degradation and resource diversity , pesticides , introduction of alien species for crop pollination and honey production, causing decline on native pollinators , and with these, introduction of bee pests and pathogens . Despite extensive research efforts, no single factor has been identified as the definitive cause of bee colony decline [228, 229], and it is likely that the interaction amongst all these factors constitutes the driver for the bee losses. At global level, however, most managed A. mellifera colonies are infected with the ecto-parasite mite Varroa destructor, while other important bee pathogens (e.g. Nosema spp. and several viruses) display global distributions . This points to the significance of these parasites and pathogens in interacting anywhere in the world with other bee colony decline factors, thus intensifying the problem.
The arrival of the powerful gene editing tool, CRISPR , could aid towards the alleviation of the situation, particularly now that we have access to honeybee  and bumblebee  genomes. Certain bee populations practice ‘hive hygiene’ by removing sick and infested bee larvae, and such populations are less likely to succumb to parasite pathogens .
Conclusion: Identification of genes associated with the hygiene behaviour and editing them in less hygienic populations would help enhance health of hives globally.
However, these screens have also highlighted a major issue, with researchers finding little correlation between the results from CRISPR/Cas9-driven screens and those previously carried out using techniques such as RNA interference (RNAi) . A recent CRISPR/Cas9 screen for essential genes involved in tumour growth revealed that the MELK protein known to be essential in tumour growth does not drive cell proliferation in cancer cells as previously thought . As CRISPR/Cas9 and RNAi mediate their effects by different mechanisms, it does not seem irrational that they can yield different results, although, drawing conclusions from contradictory results is problematic. RNAi has a well-documented tendency for off-target effects [111, 112, 113, 114, 115]. This underlines the need to validate results by complementary shRNA and CRISPR/Cas9 screening approaches to produce a more comprehensive analysis .
The generation of a catalytically inactive ―or ‘dead’― Cas9 (dCas9) introduced the possibility of fusing functional proteins to dCas9, allowing targeting in a sequence-specific manner without initiating a double strand break . This has led to the generation of innovative adaptations of the CRISPR system that have greatly expanded the molecular biology toolkit and advanced both the scope and effectiveness of genome editing. Further, an inventive strategy termed ‘CRISPR-X’ has created a novel and rapid approach to investigate protein function . It involves fusion of dCas9 to activation-induced cytidine deaminase (AID), which mediates somatic cellular hypermutation (SHM). This can be used to rapidly generate a diverse library of mutants with improved or novel functions, which can then be investigated. Another approach utilises the same enzyme to achieve ‘base-editing’ . This provides a novel programmable way to directly change a mutated base at a greater efficiency than point mutations by homology-directed repair. However, as previously described, to get a full appreciation of complex disease, we need to look beyond the genome level. To facilitate this investigation, researchers have now generated adaptations to the CRISPR system that allow interrogation of both the transcriptome and epigenome.
CRISPR and the transcriptome
Transcriptional regulation provides a powerful approach to further the understanding of gene function and regulatory networks. However, the mechanism of transcriptional regulation in eukaryotic cells is complex and involves the interaction of many different transcription factors at DNA regulatory elements that can span large regions of DNA . Previous techniques such as RNAi have been employed to investigate transcriptional repression but, as mentioned, they are prone to off-target effects that can complicate the interpretation. In addition, RNAi is limited to targeting protein coding transcripts only, whereas CRISPR interference (CRISPRi) involves the fusion to a repressive KRAB effector domain , thus allowing transcriptional repression beyond the coding sequence to include miRNAs, lincRNAs, ncRNAs, etc. Alternatively, fusion of dCas9 to transcriptional activation domains such as VP64 can be used to upregulate gene expression, known as CRISPR activation (CRISPRa) [120, 121].
Building on this initial approach, transcriptional activation in a real-life scenario was considered, whereby transcriptional factors act in synergy with multiple co-factors. This hypothesis resulted in a CRISPR complex termed ‘Synergistic Activation Mediator’ (SAM) . SAM combines VP64 with additional activation domains to further achieve higher levels of activation. The capacity to upregulate selected genes offers vast possibilities for reprogramming cellular identity in addition to understanding gene function. Furthermore, whilst wild-type Cas9 can be utilised to implement loss-of-function genome-wide screens, no technology was available previously that allows large-scale gain-of-function (GOF) screens to be conducted in a reliable and cost-effective way. Indeed, SAM was previously utilised for genome-scale transcriptional activation and resulted in the identification of genes that, upon GOF, may have resulted in resistance to a BRAF inhibitor .
CRISPR and the epigenome
The epigenome is a complex regulatory layer that acts in concert with the underlying DNA sequence to result in the immense array of variation that exists between cells. The epigenome has well documented strong links to disease status, for example, in its role in imprinting disorders and neurological disease [123, 124]. For many diseases, the problems may lie within this additional regulatory layer rather than the genomic sequence itself. Until now, progress in the field of epigenetics has been limited by the availability of appropriate molecular biology techniques to investigate the functional impact of deposition or removal of chromatin modifications . Recent developments utilise dCas9 nuclease as a targeting domain fused to chromatin-modifying enzymes such as Dnmt3a, Tet1, Lsd1, or Hat catalytic domain of p300 [126, 127, 128]. This introduces an innovative capability to add or remove chromatin modifications in a site-specific manner, providing new insight into the downstream effects on chromatin state and gene expression of specific sequences, offering a better understanding of the role that epigenetics plays in disease. In addition, dCas9 has now been fused to EGFP or a combination of fluorescent proteins which has been called CRISPRainbow [129, 130]. This provides an insightful approach to visualise the native chromatin. The spatiotemporal organisation and dynamics of chromatin have a direct role in the functional output of genome function, and the ability to track real-time in a site-specific manner will provide another dimension of our understanding of the chromatin structure. Although these advancements introduce a new realm of possibilities for the field of epigenetics, such as advanced cellular reprogramming and functional studies, epigenome editing is still in very early stages. The effect of a stably bound Cas9 nuclease may itself affect the chromatin state and chromatin modifications, thus complicating interpretation . Indeed, although much remains to be elucidated about the chromatin modification network, these advances offer promising steps in unravelling the complexity of the genome.
CRISPR in a therapeutic setting
Thus, whilst it is clear that the genome engineering revolution is fast living up to its potential, and that the wild-type CRISPR/Cas system, along with the ever-growing list of adaptations, has massively expanded our ability to investigate the genome to a new depth, two central issues persist: specificity and delivery. For CRISPR/Cas9 to be used in a therapeutic setting, these two issues need to be thoroughly addressed. Off-target cleavage is a known caveat of the CRISPR/Cas system, with many groups reporting indels at off-target sites [131, 132]. However, it is clear that initial guide-design is absolutely critical in achieving both good on-target cleavage in addition to low levels of off-target cleavage [133, 134, 135]. An attempt to rationally engineer Cas9 in order to improve the specificity has led to the development of high-fidelity Cas9 (HF-Cas9), enhanced Cas9 (eCas9), and hyper-active Cas9 variant (HypaCas9) - in all cases off-target cleavage was greatly reduced [136, 137, 138].
Furthermore, orthologues of S. pyogenes Cas9 from different species can be considered, which recognise more intricate PAMs (protospacer adjacent motifs) and thus have a reduced number of off-target sites within the genome . Following the emergence of Cas9 for use in mammalian cells, an additional Class II nuclease, Cas12a, formerly known as Cpf1, was discovered . Cas12a offers several distinct differences compared to Cas9, such as its use of T-rich PAMs and its generation of staggered-end double strand breaks with 5′ overhangs. Interestingly, Cas12a has been shown to be more specific than S. pyogenes Cas9, offering a promising alternative [141, 142].
Another hurdle to overcome is the delivery of the CRISPR/Cas system. For productive gene editing, an optimal delivery vehicle should be highly specific and efficient for a particular cell type, not produce an immune response, exhibit minimal genotoxicity and, in order to minimise off-target effects, the expression of the cargo should not persist for an extended period of time. Currently, no vehicle exists that meets all of these requirements; however, the field of gene-editing is nascent and the potential delivery options are continually evolving; therefore it is likely the current limitations of delivery vehicles will be overcome. Current strategies for delivery of CRISPR/Cas9 components have been extensively reviewed by Glass et al. .
Genome editing can additionally be only implemented in a setting where there exists a high level of understanding of the underlying disease mechanism. We now focus on 3 major disease areas in which genome editing could be applicable.
Complex genetics: A focus on 3 disease areas
Asthma is a heterogeneous syndrome characterised by chronic airway inflammation, airway hyperresponsiveness and intermittent airway obstruction that result in recurrent episodes of breathlessness, wheeze and cough. Asthma is emblematic of a truly complex genetic disease thought to develop through the interaction of multiple genetic loci and environmental factors and is estimated to affect approximately 300 million worldwide . Asthma most often debuts during early childhood and it is currently the most common chronic disease in childhood  - its heritability is estimated to be up to 70% [146, 147].
The earliest childhood asthma disease-gene mapping approaches, including linkage and candidate gene based studies, had mixed results, resulting in identification of only a handful of reproducible loci. However, the advent of technical and statistical methods for comprehensive GWAS has identified numerous reproducible asthma-susceptibility loci including ORMDL3, IL1RL1, WDR36, PDE4D, DENND1B, RAD50, IL13, IL18R1, SMAD3, HLA-DQB1, GSDMB, IL33, IL2RB, RORA, HLA-DPA1, IL6R, LRRC32, C11orf30, TNIP1 [146, 148, 149, 150]. More recently, two consortia, one European (GABRIEL)  and one North-American (EVE) , conducted independent large-scale meta-analyses of nearly all available asthma GWAS data, reporting striking overlap in the abovementioned loci, which predominantly reside in regulatory regions of the genome and are involved in immune regulation, which is an integral part of asthma pathogenesis. However, as has been observed in virtually all complex diseases, the asthma loci identified to date explain only a small proportion of the total observed heritability of the disease, suggesting that novel approaches are required to identify the additional risk variants underlying this ‘missing heritability’.
Childhood asthma and the 17q21 locus. Status: partially solved
Childhood asthma is the most common chronic childhood disorder with up to 50% of all children experiencing asthma-like symptoms before the age of 6 years, and 15% being diagnosed with persistent asthma during school-age . Asthma is considered a heterogeneous syndrome consisting of several endophenotypes with distinct clinical features, divergent underlying molecular causes, and different prevention and treatment options . There is a substantial genetic contribution to asthma susceptibility and studies have revealed more than 100 implicated genes.
Importantly, one of the first GWAS studies focusing on childhood onset asthma discovered a risk locus at 17q21, increasing the risk of asthma by 20% , which has since then been robustly replicated across different ethnicities in large meta-GWAS consortia [151, 152]. Thereafter, it was shown that genetic risk variants in the 17q21 locus up-regulate transcription of the ORMDL3 gene in EBV-transformed lymphoblastoid cell lines  and that rs12936231 is the functional SNP, which, via allele-specific changes in chromatin binding of the insulator protein CTCF, is responsible for ORMDL3 expression . However, the mechanistic link between the ORMDL3 gene and asthma susceptibility was unknown.
Further studies showed that the ORMDL3 protein is expressed in airway epithelium cells  and that ORMDL3 and other related orm proteins in the endoplasmic reticulum have a major role in sphingolipid homeostasis via inhibition of serine palmitoyltransferase (SPT), which is the rate-limiting enzyme in de novo sphingolipid biosynthesis [238, 239]. This finding triggered the hypothesis that the ORMDL3 gene increases the risk of asthma through the sphingolipid metabolism , which has been confirmed in mouse studies showing that decreased sphingolipid biosynthesis in lung epithelial tissue  and SPT knockout  associate with airway hyper-reactivity via altered levels of ceramides, sphingosine-1-phosphate and sphingomyelins, subsequently affecting lung magnesium homeostasis.
Conclusion: Our understanding of the underlying biology of the initial GWAS discovery of 17q21 as a strong childhood asthma susceptibility locus has led to the recognition that the ORMDL3 protein, the SPT enzyme, and the sphingolipid metabolism are important players in airway reactivity and asthma pathogenesis, which may lead to novel therapeutics targeting this pathway. However, it is still unknown exactly how the sphingolipid homeostasis is regulated by expression of ORMDL3 and external environmental perturbants, but this presumably involves a network of multiple interconnected mechanisms that can be disentangled by metabolomics studies.
More recently, a genome-wide association study identified CDHR3 as a novel susceptibility locus for early childhood asthma with severe exacerbations . The CDHR3 gene is highly expressed in airway epithelium and was, in a subsequent study, shown to be a rhinovirus C receptor of importance for both binding and replication of the virus . Thus, novel therapeutics targeting this specific gene product may alleviate the burden of acute virus-induced exacerbations in children with the risk variant.
Another important field in asthma genetics is pharmacogenomics, which is the study of the role of genetic determinants in the variable, inter-individual response to medications. Pharmacogenomic studies are of particular interest as up to one-half of children with asthma do not respond to treatment with inhaled β2-agonists, leukotriene modifiers, or inhaled corticosteroids. There has been numerous studies and findings, including ADRB2  and CRISPLD2, which has been shown to regulate the anti-inflammatory effects of corticosteroids in airway smooth muscle cells .
All of the above findings highlight how genetic studies in asthma have provided important and clinically-applicable knowledge that may be utilised by CRISPR in the future.
Ocular genetic disease offers distinct benefits as a test bed in the field of genome engineering. A high proportion of the causative genes in ocular diseases have been elucidated and are due to a single mutation in a single gene [158, 159]. In addition, the eye offers unique anatomical and physiological qualities that make it amenable to treatment; it is easily accessible, has a small surface area and holds an immune-privileged status making ocular diseases an ideal system in which to develop CRISPR/Cas9 gene therapy .
Gene-therapy for recessive retinal diseases caused, largely, by loss-of-function mutations is more advanced than for therapies for dominant, gain-of-function diseases. There are several on-going clinical trials for retinal diseases including choroideremia, Leber congenital amaurosis (LCA), Retinitis pigmentosa, Usher syndrome, and Stargardt disease [161, 162, 163, 164, 165]. These therapies all employ a gene-replacement strategy in which a functional copy of the gene is introduced to target cells by either adeno-associated virus (AAV) or lentiviral vectors.
Gene-replacement is not always a viable approach as vector carrying capacity restricts the spectrum of disorders that can be treated and, while lentivirus has a larger carrying capacity, the potential for it to integrate into the genome raises safety concerns. A much more attractive treatment strategy would be to correct the defect itself, utilising the novel CRISPR technology. Editas Medicine have a clinical trial planned for LCA in which CRISPR will be targeted to delete a cryptic splice site and restore normal splicing. They have subsequently announced future plans for a similar trial targeted to Usher Syndrome.
An innovative allele-specific approach emerged when Courtney el al.  identified the potential to utilise a mutation that generates a novel PAM to achieve allele-specificity. Although this work focused on corneal dystrophy, the technique has also been exploited for use in retinal disease by Bakondi et al. . This approach provided a highly specific treatment strategy for certain autosomal dominant disorders. As the CRISPR technology develops at a rapid pace it is conceivable that soon an array of therapeutics will materialise that will allow safe and efficient correction of a range of genetic defects.
The future for ocular disorders looks bright and, as we begin to understand the integral players and interactions of complex disease, treatment strategies via genome editing technologies will become apparent. The previous optimisation groundwork using well characterised disease as models will allow for a smooth translation to treatment.
In the field of cancer, the primary issue in the future will surround tumour heterogeneity and how this will complicate treatment strategies . The revelation that a single tumour biopsy represents, in fact, multiple distinct tumour cell populations  was a pivotal moment in the field of cancer research. Since the discovery, a variety of studies have additionally confirmed that metastases from the primary tumour are invariably representative of only one or more sub-populations . The concept of clonal evolution in cancer has been around since 1976  and has been adopted in the field in order to explain these recent findings [172, 173]. This comes as a startling realisation when one considers the implications for personalised medicine: whilst we may be capable of identifying a metastatic clone with a key driver mutation and eradicating this with a specific drug or therapy (if available), in the situation where the primary tumour is highly heterogeneous, by eradicating the initial metastatic clone we may be merely paving the way for a different clone to rise up, which may necessitate an entirely different treatment strategy [168, 172]. Thus, tumour heterogeneity and the driver of this, genomic instability, have been other key focuses of research and will continue to be.
Identification and functional validation of such driver mutations amongst the large number of passenger mutations is thus an ongoing challenge. Genome editing technology such as CRISPR/Cas9 is going some way to address these challenges. It is now possible to reproduce the complex genome states observed in human tumours, such as translocations and inversions, as well as point mutations and deletions, in both cell lines and mouse models. Until recently, cancer mouse models were both laboriously slow and costly to generate, requiring the injection of genetically modified embryonic stem cells into blastocytes. CRISPR has enabled the generation of knockout and knock-in mouse models in as little as four weeks, developing both germline and somatic mutation mouse models.
Taking breast cancer as just one example, CRISPR has facilitated the discovery of point mutations conferring endocrine therapy resistance and, in doing so, has enabled researchers to understand the mechanism by which this happens . Further, CRISPR-engineered mouse models have been used to identify the secondary mutations that confer resistance to PARP inhibitors in BRCA1 and BRCA2 mutant cancers, which are initially responsive . Others have shown that in a HER2 positive model, a CRIPSR-induced mutation within an amplified HER2 region instead confers a dominant negative effect, resulting in cell growth inhibition via the MAPK/ERK axis, with no effect on HER2 protein levels . That this response is potentiated by PARP inhibition, and is a distinct pathway from current HER2 therapies like Trastuzumab, gives some idea of the potential of CRISPR-mediated engineering in identifying new targets for therapy. However, whilst cancer research has been catapulted by the discovery of CRISPR, the reality remains that delivery of Cas9 continues to be a significant obstacle in both the generation of cancer mouse models and the delivery of therapeutic Cas9 guide RNA systems to treat cancer.
Our desire to achieve a greater understanding of the genome in the past 3 decades has been the main driver of technological development in this area. Now that we have achieved a greater understanding, we are realising that the genome is not the end of the line, in terms of understanding disease. In fact, one could argue that simply understanding DNA has opened a Pandora’s Box and that the real work has only just begun. Thankfully, the technological advances that have allowed us to understand the genome have indirectly given us opportunities to study beyond the genome, specifically at the transcriptome and epigenome (see Table 2 for a list of these), and further beyond these.
One striking revelation from the deluge of data that has already been produced in the biomedical sciences is that it points out just how much we don’t yet understand about disease and how much work there is still to be done. Indeed, biological data is complex, having diverse internal structures that scientists have struggled to interpret using traditional methods and approaches , and whereas we are attempting to define how life within the cell functions in a relatively short space of time in order to better understand disease, life itself has had millions of years for various processes to diversify and become ‘fixed’, which has given us the wide diversity of life that we now see. The main players in this diversity are the genome, transcriptome, epigenome, and environment, with the amount of possible configurations between these being limitless.
Cardiovascular disease and gene editing. Status: gene editing’s clinical utility in the cardiovascular realm
Cardiovascular disease (CVD) consists of acute coronary syndrome (ACS), acute myocardial infarction (AMI), angina, arrhythmia, atherosclerosis, congestive heart failure (CHF), coronary artery disease (CAD), myocardial ischemia, etc. In the USA, per year, approximately 700,000 people suffer their first AMI and 500,000 experience a second or recurrent AMI, with 1.7 million being hospitalised annually due to ACS . Clinical laboratories play a vital role in detecting and characterising risk of cardiovascular diseases and there is already a gambit of tests available for this purpose. For example, cardiac troponin is an important test for detecting myocardial injury, whilst B-type natriuretic peptide (BNP) and N-terminal portion of proBNP are used to detect CHF and risk for an acute event. Numerous other biomarkers are used to monitor various cardiovascular conditions.
However, not all biochemical tests are accurate. For example, it is known that half of AMIs occur in individuals with normal lipid panels . The lipid panel (total, LDL, and HDL cholesterol, as well as triglycerides) —in addition to apolipoproteins (ApoA1 and ApoB), Lp(a), hsCRP, homocysteine, and Lp-pla2— are used to manage and monitor CHD. These tests can all be run using commercially-available reagents on various biochemical analysers, some of which may provide inaccurate results, possibly due to the complexity and stability of lipid molecules . To improve the quality of results, alternative and more accurate methods have been developed to measure subclasses of HDL and LDL, such as: 1, β-quantification method , i.e., the reference method according to The U.S. National Cholesterol Education Program (NCEP); 2, gradient gel electrophoresis (GGE) [245, 246]; 3, vertical auto profile (VAP) ; 4, nuclear magnetic resonance spectroscopy (NMR) ; 5, ion mobility (IM) ; 6, high performance liquid chromatography (HPLC) .
Advances in the management of patients with cardiovascular disease through improved pharmacologic therapy have lessened impact; however, various limitations including patient compliance, side effects, and the need for repeat procedures keep patients in symptomatic status . Gene and stem cell therapies in conjunction have shown promise in animal models of myocardial ischemia . CRISPR/Cas9 gene editing of the loss-of-function proprotein convertase subtilisin/kexin type 9 (PCSK9) has also proven to reduce LDL cholesterol levels and protect against cardiovascular disease . The major advantage of gene therapy is that, in a single administration, permanent benefits can be obtained, and with the advent of molecular research, further genes associated with lipoproteins and CVD risk have been discovered, e.g. APOA1, APOA5, APOE, CETP, GALNT2, LIPC, LPL, and MLXIPL , which may prove future targets of gene therapies.
Current gene therapy clinical trials have proven short-term safety; however, long term surveillance over a period of decades is still under investigation. Also, the cost-effectiveness of gene therapy has to be considered due to the laborious nature of the procedures. Current pharmacological approaches may still be more favourable in terms of cost benefit ratio , albeit in terms of cardiovascular disease treatment.
T-cell acute lymphoblastic leukaemia. Status: solved
In T-cell acute lymphoblastic leukaemia (T-ALL), 25% of cases exhibit high expression of the TAL1 oncogene, which is due to a large deletion occurring at 1q33 that brings the coding sequences of TAL1 (a transcription factor) in proximity to the promoter of STIL, a ubiquitously-expressed gene. This results in the ubiquitous/over- expression of TAL1 and drives cancer. In many cases of T-ALL, however, overexpression of TAL1 is observed without the large deletion – in these cases, H3K27ac binding (a marker of an enhancer region) is also found upstream of TAL1. Despite this information, the exact mechanism of disease had remained elusive for many years in these cases. Mansour and colleagues  observed these cases and found small heterozygous insertion variants of varying lengths in the same region as the previously found H3K27ac marks. The insertion variants, they found, were introducing new binding sites for the MYB transcription factor family, resulting in the over-expression of TAL1 and the driving of cancer.
Conclusion: The Mansour study shows how data from DNA, RNA, and DNA-binding interactions can be used in combination to clearly define a disease mechanism. In this example, observing the intergenic upstream insertion variants (DNA), the heightened expression of TAL1 (RNA), or the acetylation marks (DNA-binding interactions) alone would not explain the mechanism of disease. The Mansour study, however, although difficult and summing up years of work and studies, was made relatively easier by the fact that only a single gene was involved: TAL1. Thus, technically, no expert analytics or bioinformatics input was required. However, for complex diseases like most other cancers, cardiovascular diseases, etc., describing disease mechanisms is made extremely difficult by the fact that there can be any number of variants —be they SNPs, insertions, deletions, translocations, or copy number variants— involved in augmenting risk of the disease, with none on their own contributing a large amount to the disease phenotype. Thus, for complex diseases, there is much room for computational methods to be introduced in order to assist in clearly defining diseases mechanisms, but it involves a greater appreciation away from solely the genome.
Many thanks to John Mattick (Genomics England & Garvan Institute of Medical Research) and David Guttery (University of Leicester) for their advice on shaping the structure of the review.
KB conceived the original idea to compose the review, formed and managed the collaboration, wrote the background, conclusions, and Table 6, provided additional text to link all contributors’ sections together, produced the artworks, and provided final editing across all sections. LDD wrote the section on technology, and Table 2 together with KB. KAC, MAN, and CBTM wrote the section on gene editing and CRISPR, and ocular genetics. SS, VH, LC, and JS wrote the section on cancer and Table 1 together with KB. TK-D wrote Table 3 on CRISPR’s utility in bees. CCS wrote Table 5 on cardiovascular disease. BC, JAL-S, and RSK jointly wrote the section on asthma and Table 4. All authors have reviewed and approved the final version of the review.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 6.Clinton WJ. In 'June 2000 White House Event'. The White House Office of the Press Secretary. 2000. https://www.genome.gov/10001356/june-2000-white-house-event/.
- 38.Diogo D, Kurreeman F, Stahl Eli A, Liao Katherine P, Gupta N, Greenberg Jeffrey D, Rivas Manuel A, Hickey B, Flannick J, Thomson B, et al. Rare, low-frequency, and common variants in the protein-coding sequence of biological candidate genes from GWASs contribute to risk of rheumatoid arthritis. Am J Hum Genet. 2013;92(1):15–27.PubMedPubMedCentralCrossRefGoogle Scholar
- 40.Fritsche LG, Igl W, Bailey JNC, Grassmann F, Sengupta S, Bragg-Gresham JL, Burdon KP, Hebbring SJ, Wen C, Gorski M, et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat Genet. 2015;48:134.PubMedPubMedCentralCrossRefGoogle Scholar
- 62.Eberharter A, Becker PB. Histone acetylation: a switch between repressive and permissive chromatin. Second in review series on chromatin dynamics. 2002;3(3):224–9.Google Scholar
- 67.Harrison JG, Zheng YB, Beal PA, Tantillo DJ. Computational approaches to predicting the impact of novel bases on RNA structure and stability. ACS chemical biology. 2013;8(11) https://doi.org/10.1021/cb4006062.
- 69.Fang L, Wuptra K, Chen D, Li H, Huang S-K, Jin C, Yokoyama KK. Environmental-stress-induced chromatin regulation and its heritability. Journal of carcinogenesis & mutagenesis. 2014;5(1):22058.Google Scholar
- 73.Gocayne J, Robinson DA, FitzGerald MG, Chung FZ, Kerlavage AR, Lentes KU, Lai J, Wang CD, Fraser CM, Venter JC. Primary structure of rat cardiac beta-adrenergic and muscarinic cholinergic receptors obtained by automated DNA sequence analysis: further evidence for a multigene family. Proc Natl Acad Sci U S A. 1987;84(23):8296–300.PubMedPubMedCentralCrossRefGoogle Scholar
- 120.Gilbert Luke A, Larson Matthew H, Morsut L, Liu Z, Brar Gloria A, Torres Sandra E, Stern-Ginossar N, Brandman O, Whitehead Evan H, Doudna Jennifer A, et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 2013;154(2):442–51.PubMedPubMedCentralCrossRefGoogle Scholar
- 123.Horsthemke B, Buiting K. Chapter 8 Genomic Imprinting and Imprinting Defects in Humans. In: Advances in Genetics, vol. 61: Academic Press; 2008. p. 225–46.Google Scholar
- 152.Torgerson DG, Ampleford EJ, Chiu GY, Gauderman WJ, Gignoux CR, Graves PE, Himes BE, Levin AM, Mathias RA, Hancock DB, et al. Meta-analysis of genome-wide association studies of asthma in ethnically diverse north American populations. Nat Genet. 2011;43(9):887–92.PubMedPubMedCentralCrossRefGoogle Scholar
- 154.Bønnelykke K, Sleiman P, Nielsen K, Kreiner-Møller E, Mercader JM, Belgrave D, den Dekker HT, Husby A, Sevelsted A, Faura-Tellez G, et al. A genome-wide association study identifies CDHR3 as a susceptibility locus for early childhood asthma with severe exacerbations. Nat Genet. 2013;46:51.PubMedCrossRefGoogle Scholar
- 155.Bochkov YA, Watters K, Ashraf S, Griggs TF, Devries MK, Jackson DJ, Palmenberg AC, Gern JE. Cadherin-related family member 3, a childhood asthma susceptibility gene product, mediates rhinovirus C binding and replication. Proc Natl Acad Sci U S A. 2015;112(17):5485–90.PubMedPubMedCentralCrossRefGoogle Scholar
- 156.Hawkins GA, Tantisira K, Meyers DA, Ampleford EJ, Moore WC, Klanderman B, Liggett SB, Peters SP, Weiss ST, Bleecker ER. Sequence, haplotype, and association analysis of ADRβ2 in a multiethnic asthma case-control study. Am J Respir Crit Care Med. 2006;174(10):1101–9.PubMedPubMedCentralCrossRefGoogle Scholar
- 157.Himes BE, Jiang X, Wagner P, Hu R, Wang Q, Klanderman B, Whitaker RM, Duan Q, Lasky-Su J, Nikolos C, et al. RNA-Seq Transcriptome profiling identifies CRISPLD2 as a glucocorticoid responsive gene that modulates cytokine function in airway smooth muscle cells. PLoS One. 2014;9(6):e99625.PubMedPubMedCentralCrossRefGoogle Scholar
- 160.Moore C, Christie K, Marshall J, Nesbit M. Personalised genome editing – the future for corneal dystrophies. Prog Retin Eye Res. 2018;1Google Scholar
- 163.Ghazi NG, Abboud EB, Nowilaty SR, Alkuraya H, Alhommadi A, Cai H, Hou R, Deng W-T, Boye SL, Almaghamsi A, et al. Treatment of retinitis pigmentosa due to MERTK mutations by ocular subretinal injection of adeno-associated virus gene vector: results of a phase I trial. Hum Genet. 2016;135(3):327–43.PubMedCrossRefGoogle Scholar
- 164.Parker MA, Choi D, Erker LR, Pennesi ME, Yang P, Chegarnov EN, Steinkamp PN, Schlechter CL, Dhaenens C-M, Mohand-Said S, et al. Test–retest variability of functional and structural parameters in patients with Stargardt disease participating in the SAR422459 gene therapy trial. Translational Vision Science & Technology. 2016;5(5):10.CrossRefGoogle Scholar
- 167.Bakondi B, Lv W, Lu B, Jones MK, Tsai Y, Kim KJ, Levy R, Akhtar AA, Breunig JJ, Svendsen CN, et al. In vivo CRISPR/Cas9 gene editing corrects retinal dystrophy in the S334ter-3 rat model of autosomal dominant retinitis Pigmentosa. Mol Ther. 2016;24(3):556–63.PubMedPubMedCentralCrossRefGoogle Scholar
- 174.Harrod A, Fulton J, Nguyen VTM, Periyasamy M, Ramos-Garcia L, Lai CF, Metodieva G, de Giorgio A, Williams RL, Santos DB, et al. Genomic modelling of the ESR1 Y537S mutation for evaluating function and new therapeutic approaches for metastatic breast cancer. Oncogene. 2017;36(16):2286–96.PubMedCrossRefGoogle Scholar
- 179.Shaw JA, Guttery DS, Hills A, Fernandez-Garcia D, Page K, Rosales BM, Goddard KS, Hastings RK, Luo J, Ogle O, et al. Mutation analysis of cell-free DNA and single circulating tumor cells in metastatic breast Cancer patients with high circulating tumor cell counts. Clin Cancer Res. 2017;23(1):88–96.PubMedCrossRefGoogle Scholar
- 184.Nash DB. Harnessing the power of big data in healthcare. American Health & Drug Benefits. 2014;7(2):69–70.Google Scholar
- 188.French Juliet D, Ghoussaini M, Edwards Stacey L, Meyer Kerstin B, Michailidou K, Ahmed S, Khan S, Maranian Mel J, O’Reilly M, Hillman Kristine M, et al. Functional variants at the 11q13 risk locus for breast Cancer regulate Cyclin D1 expression through long-range enhancers. Am J Hum Genet. 2013;92(4):489–503.PubMedPubMedCentralCrossRefGoogle Scholar
- 192.Reynoso MA, Juntawong P, Lancia M, Blanco FA, Bailey-Serres J, Zanetti ME: Translating Ribosome Affinity Purification (TRAP) Followed by RNA Sequencing Technology (TRAP-SEQ) for Quantitative Assessment of Plant Translatomes. In: Plant Functional Genomics: Methods and Protocols. Alonso JM, Stepanova AN. New York, NY: Springer New York; 2015: 185–207.Google Scholar
- 211.Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Current Protocols in Molecular Biology. 2015;109(1):21.29.21–9.Google Scholar
- 214.Zhao Z, Tavoosidana G, Sjölinder M, Göndör A, Mariano P, Wang S, Kanduri C, Lezcano M, Singh Sandhu K, Singh U, et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat Genet. 2006;38:1341.PubMedCrossRefGoogle Scholar
- 217.Sanchez-Luque FJ, Richardson SR, Faulkner GJ. Retrotransposon Capture Sequencing (RC-Seq): A Targeted, High-Throughput Approach to Resolve Somatic L1 Retrotransposition in Humans. In: Garcia-Pérez JL, editor. Transposons and Retrotransposons: Methods and Protocols. New York, NY: Springer New York; 2016. p. 47–77.CrossRefGoogle Scholar
- 220.van Opijnen T, Camilli A. Transposon insertion sequencing: a new tool for systems-level analysis of microorganisms. Nature reviews Microbiology. 2013;11(7) https://doi.org/10.1038/nrmicro3033.
- 221.Klein Isaac A, Resch W, Jankovic M, Oliveira T, Yamane A, Nakahashi H, Di Virgilio M, Bothmer A, Nussenzweig A, Robbiani Davide F, et al. Translocation-capture sequencing reveals the extent and nature of chromosomal rearrangements in B lymphocytes. Cell. 2011;147(1):95–106.PubMedPubMedCentralCrossRefGoogle Scholar
- 236.Verlaan DJ, Berlivet S, Hunninghake GM, Madore A-M, Larivière M, Moussette S, Grundberg E, Kwan T, Ouimet M, Ge B, et al. Allele-specific chromatin remodeling in the ZPBP2/GSDMB/ORMDL3 locus associated with the risk of asthma and autoimmune disease. Am J Hum Genet. 2009;85(3):377–93.PubMedPubMedCentralCrossRefGoogle Scholar
- 246.Mora S, Otvos JD, Rifai N, Rosenson RS, Buring JE, Ridker PM. Lipoprotein particle profiles by nuclear magnetic resonance compared with standard lipids and Apolipoproteins in predicting incident cardiovascular disease in women. Circulation. 2009;119(7):931–9.PubMedPubMedCentralCrossRefGoogle Scholar
- 251.Musunuru K, Orho-Melander M, Caulfield MP, Li S, Salameh WA, Reitz RE, Berglund G, Hedblad B, Engström G, Williams PT, et al. Ion mobility analysis of lipoprotein subfractions identifies three independent axes of cardiovascular risk. Arterioscler Thromb Vasc Biol. 2009;29(11):1975–80.PubMedPubMedCentralCrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.