Bionano Genome Mapping: High-Throughput, Ultra-Long Molecule Genome Analysis System for Precision Genome Assembly and Haploid-Resolved Structural Variation Discovery
Next Generation Sequencing (NGS) has rapidly advanced genomic research with tremendously increased throughput and reduced cost, through reading the fragmented genome content in massively parallel fashion. We have been able to sequence and map genomes to reference sequences with relative ease compared to the past. However, this mapping can only be accurately accomplished in the single copy regions of the genome, leaving out most duplicated genes and structural variation. Additionally, assembly of long genomic segments remains elusive since multi copy regions of the genome produce ambiguity when short read sequence is used.
Most of the large genomes are complex in that they contain not only millions of single or multiple base level variants called SNPs (Single Nucleotide Polymorphism) and indels (small insertions and deletions), they also contain many thousands of much larger structural variants, repetitive regions composed of identical or similar stretches of sequences, mobile elements such as transposons, large insertions, deletions, translations and inversions up to millions of bases, even partial or entire chromosomes altered. Often more than half of the genome is composed of these non-unique and highly variable regions such as in human and up to 90% in certain plants (Jiao et al. 2017). And now through studying thousands upon thousands of genomes, we have come to realize that each genome from each individual bears the mark of its own evolutionary journey and environment. This is seen in the different code in each of the two haplotypes from each family or different ethnicity specific signatures in populations (Sudmant et al. 2015), and even the genomes in different cells derived from the same gamete carry non-static sequence variation accumulated throughout its lifetime, sometimes leading to tumorigenesis and contributing to the natural aging process.
It is not enough just to re-sequence each genome by aligning short reads from NGS to an existing relatively contiguous reference genome, calling only the SNPs and small indels. To realize the full potential of the so-called precision medicine, we need to get to the true, accurate, and complete genome information de novo to understand how these large structural variations might affect biological functions. The first step is creating and identifying technologies that are able to preserve and access native long range genomic content, including SNPS, small indels and all classes of SVs, without gaps.
Large structural variations (SVs) are less common than SNPs and indels in the population in numbers of events but collectively account for a significantly larger number of base pair variations, although the impact on genetic variation and diseases is yet unknown. While single nucleotide mutations might impact the 2% of protein coding regions and key small regulatory elements such as the transcription factor binding sites, larger structural variations could have additional large effects, including eliminating, truncating or altering the coding regions or regulatory elements directly, and also changing the copy number, position or orientation of these genes or promoters, placing them into different genomic context. Moreover, large SVs can alter the complex three-dimensional folding of the chromatin within the cell and how genomic, epigenomic and protein elements interact with each other dynamically in a much more profound time and spatial order. However, the existing prevailing methods cannot comprehensively and cost effectively detect all of the large structural variations due to the limited read lengths of the existing technologies (Huddleston and Eichler 2016).
To address these challenges, Bionano Genomics applies a high-throughput, native, single molecule level genome mapping technology to comprehensively determine genome wide structure using de novo assembly of sequence motif-specific labeled long molecules (>150 kb), linearized in massive parallel nanofluidic channels fabricated on a solid-state material (Lam et al. 2012). Exploiting this technology, structurally accurate whole genome de novo assemblies can be generated. Typically, comparing to the human reference, thousands of SVs (>500 bp) are obtained in a single human genome. Because ultra-long read technology is so new, currently only a fraction of SVs is verified in SVs databases while a large portion are novel. Due to the nature of ultra-long molecules, haplotypes preserving native structural information are phased in differently clustered molecules by the association of the same labeling patterns in a straightforward fashion. In addition, these data are validated instantly by supporting raw images of the long molecules, not inferred by an algorithm (Bickhart et al. 2017). Without excessive processing performed in a typical sample prep such as PCR, adaptor addition, cloning, or library construction, the longest possible genomic DNA molecules (150 kb to megabases) are isolated directly from cells to be labeled, and imaged at single molecule level, ensuring that the integrity of the most native information is preserved at the genomic and epigenomic level (Grunwald et al. 2017).
The genome mapping technology enabled by NanoChannel arrays, Bionano optical mapping, provides valuable information for endogenous highly variable regions such as areas related to immunity (MHC, KIR, TCR, etc.) as well as exogenous elements such as free or integrated viral sequence without a priori knowledge on a whole genome scale (Cao et al. 2014). Furthermore, dynamic intracellular genomic events such as DNA replication can be imaged with this platform process. DNA replication is often implicated as a major cause of genomic error generation eventually causing genome instability and cancers (Klein et al. 2017).
Bionano maps often reach chromosome arm lengths and are therefore highly informative for de novo genome assembly projects where they scaffold fragmented NGS assemblies and correct assembly errors. Many of the resulting assemblies are among the most contiguous and accurate assembled to date.
By employing Bionano mapping’s long-range genome analysis, large SVs can be identified in each individual and across multiple ethnic populations where population-specific structural variation sets are seen. These results highlight the need for a comprehensive set of alternate haplotypes derived from different populations to resolve structural variation patterns in complex regions of the genome, providing evidence for population genomic based diagnosis and drug development.
Bionano mapping technology is a high throughput, high fidelity and versatile platform with high potential to transform clinical cytogenetic and genetic analysis in a fully automated and standardized fashion in a cost-effective way. It has been recently demonstrated that comprehensive large SVs can be profiled in prostate cancer samples, where novel potential causal events were discovered efficiently de novo (Jaratlerdsiri et al. 2017). In a separate study involving rare and undiagnosed diseases, a very large 5.1 Mbp inversion in the genome of a patient with Duchenne Muscular Dystrophy was discovered with Bionano technology in a single, 1 week experiment, leading to the definitive molecular mechanism caused by a truncation of Dystrophin gene (Barseghyan et al. 2017). This inversion had previously evaded a wide range of standard clinical and molecular tests.
These clinical studies have paved the way for demonstration of the potential of routine comprehensive genomic analysis for complex diseases in precision medicine era.
Existing technologies including chromosomal microarrays and whole genome sequencing diagnose less than 50% of patients with genetic disorders (Lee et al. 2014; Miller et al. 2010). This leaves a majority of patients without ever receiving a molecular diagnosis. Undiagnosed disorders are individually rare but their combined incidence and the associated diagnostic odyssey, with resultant delays in treatment, are a drain on families and the healthcare system. Many of these diseases remain medical mysteries with no root cause or clear basis for treatment.
To close this diagnostic sensitivity gap and get a better understanding of the genetic causes of disease, we need better tools to access the entire genome, and large translational research studies to apply these tools to the discovery of novel biomarkers. Genetic disorders for which no molecular basis is currently known are either caused by genomic events that are poorly detected with current technology, events occurring in inaccessible parts of the genome, or a combination of events that is too complex to analyze using existing tools. Better molecular tools are needed to analyze the entire range of genomic variations. Armed with such tools, large translational research studies are needed to identify disease correlated biomarkers spanning all genomic variants in patients with genetic disorders.
Structural variants make up the majority of human genomic variation, but Next-Generation Sequencing technology can’t correctly identify them. Clinical exome sequencing solves about 30% of rare diseases (Lee et al. 2014). NGS, consisting of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) reliably identify single nucleotide variants and small insertions and deletions. However, NGS relies on short-read sequences that are mapped to a reference human genome and fails to identify most large insertions, deletions, or copy-number variations in repetitive regions of the genome. It is incapable of easily detecting other structural variations (SVs) such as inversions and translocations. Non-allelic homologous recombination of repetitive sequences is thought to be a predominant mechanism for the origin of many large SVs. The non-unique sequences flanking these SVs often make them invisible to sequencing-based detection methods. Together, structural variable regions cover 13% of the genome and individuals show structural variation covering as much as 30 Mbp between each other (Sudmant et al. 2015).
Mechanism of Bionano Technology and Workflow
Ultra-Long Range Linear DNA Analysis Technology Enabled by NanoChannel Array Technology
Since structurally accurate genome interrogation and assembly requires long molecules, traditional purification methods are not suitable for DNA isolation for optical mapping. Bionano Genomics adapted the plug lysis strategy commonly used to construct BAC libraries for optical mapping. Briefly, cells/nuclei are embedded into an agarose matrix to protect DNA from mechanical shearing during the purification process. Agarose is then melted and solubilized, and the resulting megabase DNA is further cleaned by drop dialysis prior to labeling at sequence-specific sites.
Megabase size molecules of genomic DNA are labeled at a specific 6 or 7 basepair sequence motif, occuring approximately 8–28 times per 100 kbp, depending on its frequency in a particular genome. The label patterns allow each long molecule to be uniquely identified and aligned.
Labeled DNA is loaded onto the Saphyr chip and placed into the Saphyr instrument where Saphyr initiates electrophoresis to move megabase length molecules from bulk solution into the silicon chip micro environment before unwinding and linearizing the DNA in the NanoChannel arrays. The instrument uses machine learning initially and throughout the run to provide adaptive loading of DNA, optimize run conditions and maximize throughput.
When molecules are fully loaded into the NanoChannels of one flow cell, electrophoresis is halted and the entire surface of the NanoChannel array of that flow cell is rapidly imaged. During the imaging phase of the run, electrophoresis in the second flow cell is initiated. Cycles of loading of the NanoChannels followed by imaging are performed until sufficient data is collected.
Bioinformatics of Bionano Mapping Using Sequence Motif Pattern Specific Labeling
Bionano image detection software creates extracts molecules from raw image data. The backbone stain signal of the DNA molecules is used to identify molecules and to determine their position and size. The distance between the labels on each molecule is recorded to generate an extracted molecule file called a BNX file. The BNX file is the only input needed for the Bionano de novo assembly process.
Images generated by Saphyr are sent to the analysis server for real time data extraction during the run. Image detection is typically completed shortly after the run is finished and de novo assembly can be automatically initiated (for human genomes).
Using pairwise alignment of the single molecules, an assembly graph is constructed and a consensus genome map is produced, refined, extended and merged. Molecules are then clustered into two alleles, where there is heterozygous structural variation, and a diploid assembly is created to allow for heterozygous SV detection. Genome maps can be created using different enzymes labeling different sequence motifs to generate broader coverage and higher label density.
A standard automated pipeline for de novo assembly and SV calling was developed by Bionano Genomics to enable comprehensive SV analysis. The Python-based pipeline manages job submission, drives execution of alignment and assembly tools, and provides data summary information. It features a haplotype-aware assembler designed to detect and differentiate parental alleles. The de novo assembly algorithm is a custom implementation of the overlap-layout-consensus strategy. The assembler assembles extracted molecules from raw image data, and the final consensus maps are used as input for SV calling.
Examples of Applications with Bionano Mapping
De novo Assembly of Complex Genomes – Long Contiguity with Accurate Complex Structural Context
Hybrid Assembly Combining Mapping and Sequencing Data Derived from All Platforms – Single and Multiple Sequence Motif-Based Assembly
The de novo Bionano genome maps are a whole genome de novo assembly and can be used to learn about various characteristics of the genome such as size, repetitive content, and extent of heterozygosity. They can also be integrated with a sequence assembly to order and orient sequence fragments, identify and correct potential chimeric joins in the sequence assembly, and estimate the gap size between adjacent sequences. In order to do so, the Bionano Solve software imports the sequence assembly and identifies the recognition sites for the specifying nick sites in the sequence based on the nicking endonuclease-specific recognition site. These in silico maps for the sequence contigs are then aligned to the de novo Bionano genome maps. Conflicts between the two are identified and resolved, and hybrid scaffolds are generated in which sequence maps are used to bridge Bionano maps and vice versa. Finally, the sequence assembly corresponding to this hybrid scaffold is generated and exported as FASTA and AGP files.
The pipeline is fully integrated with Bionano Access which provides a convenient interface for running Hybrid Scaffold and viewing scaffolding results.
The hybrid scaffolding process considerably reduces the number of contigs found in the initial NGS assembly, improving assembly accuracy and quality while reducing the need for deep sequencing coverage.
The hybrid scaffolding approach can yield significant improvements in contiguity, as expressed by the assembly N50 values. Assembly contiguity can be further increased by performing hybrid scaffolding with maps using two separate nicking enzymes. Two sets of Bionano maps, each generated with a different nicking enzyme, can be integrated with NGS sequences together. This enables the NGS sequences to function as a bridge to merge single-enzyme Bionano maps into two-enzyme maps that contain the sequence motif patterns from both nicking enzymes. Since the Bionano maps are generated independently they serve as orthogonal sources of evidence to detect and correct assembly errors in input data. The complementarity of different data also greatly improves the contiguity of the merged Bionano map while doubling the information density, which substantially increases the ability to anchor short NGS sequences in the final scaffolds.
At the time of writing, all published data using Bionano mapping has been generated by labeling DNA with nicking endonucleases. These highly sequence-specific enzymes create a single stranded nick at the presence of a 6- or 7 bp motif. At the site of the nicked DNA, fluorescently labeled nucleotides are inserted by polymerization and the molecules are repaired. This method (Nick Label Repair Stain, or NLRS) performs with extremely high specificity but can create double stranded breaks when nick site appear within about 200 bp on opposite strands. Recently Bionano Genomics has developed a novel labeling technology that avoids nicking, and instead uses a direct labeling method where the fluorophore is attached directly to the DNA at the location of a specific sequence motif. Since this Direct Labeling and Staining (DLS) method does not create systematic double stranded breaks, Bionano maps created from molecules labeled with DLS typically show a 50x improvement in contiguity compared to NLRS maps. Bionano maps now typically reach chromosome arm length, and the contiguity of sequence assemblies built using DLS reaches chromosome arm or full chromosome length in a variety of species.
Error Correction and Validation of Sequencing Data
The Bionano hybrid scaffold pipeline detects and resolves chimeric joins. Chimeric joins are typically formed when short reads, molecules, or paired-end inserts are unable to span across long DNA repeats. The errors appear as conflicting junctions in the alignment between the Bionano map and NGS assemblies.
Automated cuts using Bionano Solve help to resolve conflicts with a high level of accuracy. The majority of cuts made using Bionano Solve can be confirmed by comparison to the species’ reference assembly. There are several reasons why some cuts cannot be confirmed: the reference assembly is incomplete, the two separate input assemblies may represent different alleles, or the chimeric joins may have been caused by segmental duplications that are too long for Bionano molecules to resolve.
The two-enzyme scaffolding method improves the error correction even further. Since the Bionano maps were generated independently they serve as orthogonal sources of evidences to detect and correct assembly errors in input data. Compared to the published one-enzyme hybrid-scaffolds, the two-enzyme approach corrects up to 50% more assembly errors in NGS sequences.
Users can manually inspect all conflict resolution results. Bionano Solve notes the IDs and coordinates of the sequences and maps where conflicts have been detected and the corresponding resolution approaches taken. This file can be edited and modified, and then run again in the hybrid scaffold pipeline to produce a new set of scaffolds based on the manual conflict resolution. This manual enhancement process can be performed multiple times, giving users fine control in generating high-quality, complete hybrid scaffolds.
Comprehensive Genomic Structural Variation Discovery and Identification
Detecting All Classes of Structural Variants, Mobile Elements and Repeats, at Haploid Resolved Level
Bionano genome mapping is the only technology that detects all SV types, homozygous and heterozygous, starting at 500 bp up to millions of bp. Bionano maps are built completely de novo, without any reference guidance or bias. This differentiates Bionano from NGS, where short-read sequences are typically aligned to a reference. This alignment often fails to detect true structural variants by forcing the short-reads to map to an incorrect or too divergent reference, or by excluding mismatched reads from the alignment. Only de novo constructed genomes, like Bionano maps, allow for a completely unbiased, accurate assembly.
Bionano’s SVs are observed, and not inferred as with NGS. When short-read NGS sequences are aligned to the reference genome, algorithms piece together sequence fragments in an attempt to rebuild the actual structure of the genome. SVs are inferred from the fragmented data, with mixed success. With Bionano mapping, megabase-size native DNA molecules are imaged, and most large SVs or their breakpoints (in the case of inter-chromosomal translocations) can be observed directly in the label pattern on the molecules. If a native-state DNA molecule with a specific SV exists, then that SV call cannot be wrong.
Gain/Loss of material: Labels moving closer together, with or without loss of labels, are evidence of deletions. Label spacing that increases with or without additional labels detected are called as insertions.
Copy number change: Expansions or contractions of tandem arrays or segmental duplications. Duplications are called automatically in direct or inverted orientation.
Balanced events: Genome maps aligning partially with two or more different chromosomes or genomic locations indicate translocations. When label patterns are inverted relative to the reference, an inversion is called.
Zygosity and confidence are assigned to each SV call to facilitate downstream analysis. An SV call can be labeled as homozygous, heterozygous, or unknown. Confidence scores are scaled such that they range from 0 to 1.
SV calls can be exported in a dbVar compliant VCF file. This file format contains all genomic variants identified in sample including SNVs, small indels, and SVs of various sizes. The VCF file generated by Bionano Access can be used in downstream analysis using a variety of existing tools.
Bionano algorithms call SVs by comparing genome structures. To identify a structural variation, a de novo genome map assembly can be aligned to a reference genome, or two samples can be aligned to each other directly. When aligning a genome map to a reference assembly, Bionano software identifies the location of the same recognition sequence used to label the DNA molecules in the reference genome and aligns matching label patterns in the sample and reference. This alignment provides all the annotation of the reference to the de novo assembled genome.
By observing changes in label spacing and comparisons of order, position, and orientation of label patterns, Bionano’s automated structural variation calling algorithms detect all major structural variation types.
Bionano detects seven times more SVs larger than 5 kbp compared to NGS. Professor Pui-Yan Kwok at the University of California, San Francisco, demonstrated the robustness of Bionano mapping for genome-wide discovery of SVs in a trio from the 1000 Genomes Project. Since high quality NGS data on these samples is publicly available, structural variation analysis using short-read data has been performed with over a dozen different algorithms. Using Bionano maps, hundreds of insertions, deletions, and inversions greater than 5 kbp were uncovered, 7 times more than the large SV events previously detected by NGS (Mak et al. 2016). Several are located in regions likely leading to disruption of gene function or regulation.
Bionano has exceptional sensitivity and specificity to detect insertions and deletions over a wide size range as demonstrated using simulated data. Insertions and deletions were randomly introduced into an in-silico map of the human reference genome hg19. The simulated events were at least 500 kbp from each other or N-base gaps. They ranged from 200 bp to 1 Mbp, with smaller SVs more frequent than larger ones.
Based on the edited and the unedited hg19, molecules were simulated to resemble actual molecules collected on a Bionano system and mixed such that all events would be heterozygous. Two sets of molecules were simulated, each labeled with a different nicking endonuclease. Datasets with 70x effective coverage were generated. The simulated molecules were used as input to the Bionano Solve pipeline and SV calls were made by combining the single-enzyme SV calls from both nicking endonucleases using the SV Merge algorithm. SV calls were compared to the ground truth.
Bionano mapping has exceptional sensitivity and specificity to detect heterozygous insertions and deletions over a wide size range as demonstrated using experimental data. Since there is no perfectly characterized human genome that can be considered the ground truth, a diploid human genome was simulated by combining data from two hydatidiform mole derived cell lines. These moles occur when an oocyte without nuclear DNA gets fertilized by a sperm. The haploid genome in the sperm gets duplicated, and the cell lines resulting from this tissue (CHM1 and CHM13) are therefore entirely homozygous.
Structural variants detected in the homozygous cell lines were considered the (conditional) ground truth. An equal mixture of single molecule data from two such cell lines was assembled to simulate a diploid genome, and SV calls made from this mixture were used to calculate the sensitivity to detect heterozygous SVs.
Two homozygous cell lines, CHM1 and CHM13 were independently de novo assembled and insertions and deletions >1.5 kbp called
CHM1 and CHM13 assemblies
CHM1 andCHM13 assemblies
A similar experiment on PacBio long-read sequencing was described recently (Huddleston et al. 2017). Structural variants were called with the SMRT-SV algorithm in CHM1 and CHM13, and compared to those called in an equal mixture of both. The sensitivity to detect homozygous SVs using PacBio was 87%, compared to 99.2% using Bionano. The sensitivity to detect heterozygous SVs using PacBio was only 41%, which is less than half the 86% sensitivity for heterozygous SV detection using Bionano. Even when the PacBio SV calls were limited to insertions and deletions larger than 1.5 kbp, the sensitivity for homozygous SVs was only 78%, and for heterozygous SVs 54% (Table 1).
Bionano genome mapping detects 98% of large inversions. Inversions are the invisible variants and have traditionally been the hardest to detect structural events. They are balanced, without gain or loss of sequence, and unlike translocations they don’t create easily visible changes in genomic context. Inversions often escape detection by traditional cytogenetic techniques. Chromosomal Microarray can not identify balanced events, and metaphase chromosome spreads can only visualize some megabase size inversions. Next Generation Sequencing approaches tend to miss inversions because reads from inside the inversion map back to the reference without any indication that the orientation has changed. Detection of the breakpoints often fails, especially if the inversion is flanked by segmental duplications, repeat arrays or other non-unique sequences.
Bionano’s imaging of extremely long molecules overcomes these obstacles to identifying inversions. Simulations of thousands of heterozygous inversions of various sizes demonstrated that our SV detection algorithms have high sensitivity to detect inversions larger than 30 kbp, reaching 98% sensitivity to pick up inversions larger than 70 kbp throughout the genome.
Bionano far outperforms other technologies in the detection of translocations. Thousands of translocations were simulated similarly to insertions and deletions in an in-silico map of the human reference genome hg19. The sensitivity for heterozygous translocations was shown to be 98% for breakpoint detection in both balanced and unbalanced translocations. Genome mapping can define the true positions of breakpoints within a median distance of 2.9 kbp, which is approximately 1000 times more precise than karyotyping and FISH. This accuracy is often sufficient for PCR and sequencing if single nucleotide resolution of the fusion point is desired for subsequent gene function studies.
Bionano Genomics developed a variant annotation pipeline (VAP) to help prioritize variants and to determine if a variant is relevant to the disease or phenotype of interest. In particular, it is useful for family-based and case-control studies. The two main components of the VAP are: (1) variant annotation, and (2) variant validation. The pipeline provides gene annotation and compares a given variant to variants detected in phenotypically normal control samples, including tumor versus control from the same patient. For a trio analysis, the pipeline annotates whether variants in the proband are found in the parents to help identify inherited and de novo variants. To validate variants, the pipeline examines assembly quality scores and aligns molecules against the assembly of interest to determine if the detected variants are well supported.
By using a control database of common variants, VAP filters the thousands of identified variants down to hundreds that are rare, or to a handful of e novo variants. It also identifies the genes they overlap with or are closest to in the genome. The VAP is part of Bionano Access, which provides an interface for setting up experiments on Saphyr, starting and monitoring instrument runs, launching de novo assemblies and SV calling, visualizing SVs, and annotating variants with the VAP. The results can be exported as a dbVar compliant VCF file, for easy integration with variants identified with NGS or other methods.
SV Detection in Cancer and Genetic Disease
Bionano mapping correctly diagnoses genetic disorders: In a publication in Genome Medicine, professor Eric Vilain of Children’s National Medical Center, Washington, DC, presents molecular diagnoses using Bionano mapping of patients with Duchenne Muscular Dystrophy (DMD) (Barseghyan et al. 2017).
Only one-tenth of these large SVs were detected using high-coverage short-read NGS and bioinformatics analyses using a combination of the best SV calling algorithms for NGS data. A manual inspection of NGS reads corresponding with the Bionano derived target regions verified 94% of the total SVs called with Bionano mapping. Many SVs detected with Bionano were flanked by repetitive sequences, making them all but invisible to short-read sequencing.
Targeted Known SV Detection as Biomarkers in Diagnostics and Companion Tests – Cytogenetics, Immuno-Repertoire Variation Mapping
Custom Labeling of Specific Sequences
A team from Drexel University has published several papers on a novel method to label any sequence of choice before imaging on a Bionano system (McCaffrey et al. 2016, 2017). An in vitro CRISPR/Cas9 RNA-directed nickase directs the specific labeling of a specific sequence motif that guide RNAs are designed against. In one application, they label human (TTAGGG)n DNA tracts in genomes that have also been barcoded using Bionano’s standard labeling kits. High-throughput imaging and analysis of large DNA single molecules from genomes labeled in this fashion using Bionano’s Irys or Saphyr permits mapping through subtelomere repeat element (SRE) regions to unique chromosomal DNA while simultaneously measuring the (TTAGGG)n tract length at the end of each large telomere-terminal DNA segment. This method enables global subtelomere and haplotype-resolved analysis of telomere lengths at the single-molecule level. Similarly, this team labeled HIV insertion sites and a variety of other repeat sequences.
With this custom labeling method, virtually any part of the genome can be studied in detail with Bionano mapping, even those parts which don’t have identifiable patterns using Bionano’s standard motif labeling.
Targeted Enriched Genomic Regions
Bionano mapping is typically performed on a whole genome scale. To enable collection of higher depth coverage of genomic regions of interest, or map a region much faster, a team from Tel Aviv University published a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis (Gabrieli et al. 2017). The isolated gel fragment is then used in Bionano’s standard DNA isolation and labeling workflow. The result is a highly enriched sample that can be mapped with Bionano or sequenced. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of Bionano maps and sequencing data at a fraction of the cost of whole genome sequencing.
Immune Repertoire Mapping
The MHC region of the genome has a higher density of genes and of identified disease-causing variants than any other part of the human genome. It is prone to rearrangements, and sequencing based methods are unable to correctly identify and phase the structure of this region. In a Nature Biotech paper, the authors describe constructing Bionano maps covering the 4.7 Mbp MHC region from two individuals and performing de novo sequence assembly using NGS reads (Lam et al. 2012). The maps and NGS contigs were then compared to the reference sequences reported by the MHC Haplotype Consortium as confirmation and to uncover potential differences.
Employing this method, the study found and confirmed a number of interesting genomic features, including a 4 kb error in one reference sequence, anchoring and gap sizing of four NGS contigs, identification of misassembled NGS contigs, differentiation of the two HLA-DRB1 variants, and definition of numerous structural variants, such as a 5 kb insertion and 30 kb tandem duplication.
Ultra-Long Range Epigenetic Pattern Mapping
This technology opens up an entirely new field of research: we can now study if the methylation status of the promotor of a gene influences that of another promotor hundreds of kbp away on single molecule. This compares extremely favorably to the standard methylation analysis methods, in which DNA is chemically converted using sodium bisulfite, followed by array hybridization or sequencing. Bisulfite conversion damages the DNA, and only very fragmented DNA molecules can be isolated and single molecule methylation patterns can be measured over no more than a few hundred basepairs at best.
The proof of concept study presented here demonstrates that we can now read the genome wide methylation profile of cells on long, single molecules while simultaneously mapping major structural variation on these same molecules.
Dynamic Mapping of Genome Functions – Replication Imaging
- Gabrieli T, et al. Cas9-assisted targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping. 2017. https://doi.org/10.1101/110163.
- Grunwald A, et al. Reduced representation optical methylation mapping (R2OM2). 2017. https://doi.org/10.1101/108084.
- Klein K, et al. Genome-wide identification of early-firing human replication origins by optical replication mapping. 2017. https://doi.org/10.1101/214841.