Advertisement

Bionano Genome Mapping: High-Throughput, Ultra-Long Molecule Genome Analysis System for Precision Genome Assembly and Haploid-Resolved Structural Variation Discovery

  • Sven BocklandtEmail author
  • Alex Hastie
  • Han Cao
Chapter
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 1129)

Abstract

Next Generation Sequencing (NGS) has rapidly advanced genomic research with tremendously increased throughput and reduced cost, through reading the fragmented genome content in massively parallel fashion. We have been able to sequence and map genomes to reference sequences with relative ease compared to the past. However, this mapping can only be accurately accomplished in the single copy regions of the genome, leaving out most duplicated genes and structural variation. Additionally, assembly of long genomic segments remains elusive since multi copy regions of the genome produce ambiguity when short read sequence is used.

Most of the large genomes are complex in that they contain not only millions of single or multiple base level variants called SNPs (Single Nucleotide Polymorphism) and indels (small insertions and deletions), they also contain many thousands of much larger structural variants, repetitive regions composed of identical or similar stretches of sequences, mobile elements such as transposons, large insertions, deletions, translations and inversions up to millions of bases, even partial or entire chromosomes altered. Often more than half of the genome is composed of these non-unique and highly variable regions such as in human and up to 90% in certain plants (Jiao et al. 2017). And now through studying thousands upon thousands of genomes, we have come to realize that each genome from each individual bears the mark of its own evolutionary journey and environment. This is seen in the different code in each of the two haplotypes from each family or different ethnicity specific signatures in populations (Sudmant et al. 2015), and even the genomes in different cells derived from the same gamete carry non-static sequence variation accumulated throughout its lifetime, sometimes leading to tumorigenesis and contributing to the natural aging process.

It is not enough just to re-sequence each genome by aligning short reads from NGS to an existing relatively contiguous reference genome, calling only the SNPs and small indels. To realize the full potential of the so-called precision medicine, we need to get to the true, accurate, and complete genome information de novo to understand how these large structural variations might affect biological functions. The first step is creating and identifying technologies that are able to preserve and access native long range genomic content, including SNPS, small indels and all classes of SVs, without gaps.

Large structural variations (SVs) are less common than SNPs and indels in the population in numbers of events but collectively account for a significantly larger number of base pair variations, although the impact on genetic variation and diseases is yet unknown. While single nucleotide mutations might impact the 2% of protein coding regions and key small regulatory elements such as the transcription factor binding sites, larger structural variations could have additional large effects, including eliminating, truncating or altering the coding regions or regulatory elements directly, and also changing the copy number, position or orientation of these genes or promoters, placing them into different genomic context. Moreover, large SVs can alter the complex three-dimensional folding of the chromatin within the cell and how genomic, epigenomic and protein elements interact with each other dynamically in a much more profound time and spatial order. However, the existing prevailing methods cannot comprehensively and cost effectively detect all of the large structural variations due to the limited read lengths of the existing technologies (Huddleston and Eichler 2016).

To address these challenges, Bionano Genomics applies a high-throughput, native, single molecule level genome mapping technology to comprehensively determine genome wide structure using de novo assembly of sequence motif-specific labeled long molecules (>150 kb), linearized in massive parallel nanofluidic channels fabricated on a solid-state material (Lam et al. 2012). Exploiting this technology, structurally accurate whole genome de novo assemblies can be generated. Typically, comparing to the human reference, thousands of SVs (>500 bp) are obtained in a single human genome. Because ultra-long read technology is so new, currently only a fraction of SVs is verified in SVs databases while a large portion are novel. Due to the nature of ultra-long molecules, haplotypes preserving native structural information are phased in differently clustered molecules by the association of the same labeling patterns in a straightforward fashion. In addition, these data are validated instantly by supporting raw images of the long molecules, not inferred by an algorithm (Bickhart et al. 2017). Without excessive processing performed in a typical sample prep such as PCR, adaptor addition, cloning, or library construction, the longest possible genomic DNA molecules (150 kb to megabases) are isolated directly from cells to be labeled, and imaged at single molecule level, ensuring that the integrity of the most native information is preserved at the genomic and epigenomic level (Grunwald et al. 2017).

The genome mapping technology enabled by NanoChannel arrays, Bionano optical mapping, provides valuable information for endogenous highly variable regions such as areas related to immunity (MHC, KIR, TCR, etc.) as well as exogenous elements such as free or integrated viral sequence without a priori knowledge on a whole genome scale (Cao et al. 2014). Furthermore, dynamic intracellular genomic events such as DNA replication can be imaged with this platform process. DNA replication is often implicated as a major cause of genomic error generation eventually causing genome instability and cancers (Klein et al. 2017).

Bionano maps often reach chromosome arm lengths and are therefore highly informative for de novo genome assembly projects where they scaffold fragmented NGS assemblies and correct assembly errors. Many of the resulting assemblies are among the most contiguous and accurate assembled to date.

By employing Bionano mapping’s long-range genome analysis, large SVs can be identified in each individual and across multiple ethnic populations where population-specific structural variation sets are seen. These results highlight the need for a comprehensive set of alternate haplotypes derived from different populations to resolve structural variation patterns in complex regions of the genome, providing evidence for population genomic based diagnosis and drug development.

Bionano mapping technology is a high throughput, high fidelity and versatile platform with high potential to transform clinical cytogenetic and genetic analysis in a fully automated and standardized fashion in a cost-effective way. It has been recently demonstrated that comprehensive large SVs can be profiled in prostate cancer samples, where novel potential causal events were discovered efficiently de novo (Jaratlerdsiri et al. 2017). In a separate study involving rare and undiagnosed diseases, a very large 5.1 Mbp inversion in the genome of a patient with Duchenne Muscular Dystrophy was discovered with Bionano technology in a single, 1 week experiment, leading to the definitive molecular mechanism caused by a truncation of Dystrophin gene (Barseghyan et al. 2017). This inversion had previously evaded a wide range of standard clinical and molecular tests.

These clinical studies have paved the way for demonstration of the potential of routine comprehensive genomic analysis for complex diseases in precision medicine era.

Background

Existing technologies including chromosomal microarrays and whole genome sequencing diagnose less than 50% of patients with genetic disorders (Lee et al. 2014; Miller et al. 2010). This leaves a majority of patients without ever receiving a molecular diagnosis. Undiagnosed disorders are individually rare but their combined incidence and the associated diagnostic odyssey, with resultant delays in treatment, are a drain on families and the healthcare system. Many of these diseases remain medical mysteries with no root cause or clear basis for treatment.

To close this diagnostic sensitivity gap and get a better understanding of the genetic causes of disease, we need better tools to access the entire genome, and large translational research studies to apply these tools to the discovery of novel biomarkers. Genetic disorders for which no molecular basis is currently known are either caused by genomic events that are poorly detected with current technology, events occurring in inaccessible parts of the genome, or a combination of events that is too complex to analyze using existing tools. Better molecular tools are needed to analyze the entire range of genomic variations. Armed with such tools, large translational research studies are needed to identify disease correlated biomarkers spanning all genomic variants in patients with genetic disorders.

Two thirds of the human genome consist of repetitive sequences (Fig. 1). Exome sequencing accesses just 1.5% of the genome (de Koning et al. 2011), and Whole Genome Sequencing (WGS) does not align correctly with the repetitive parts of the genome. The most common repetitive sequences in the genome are LINEs, SINEs, retrotransposons and segmental duplications. The short-read sequences Next-Generation Sequencing (NGS) provides, map with poor accuracy to these repeats. Alignment algorithms typically fail to identify the exact genomic location to align these short-reads to. When they do align, the limited 100–150 bp read length and spacing of paired-end reads does not allow for a correct sizing of larger repeats.
Fig. 1

Repetitive structures in the human genome

Structural variants make up the majority of human genomic variation, but Next-Generation Sequencing technology can’t correctly identify them. Clinical exome sequencing solves about 30% of rare diseases (Lee et al. 2014). NGS, consisting of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) reliably identify single nucleotide variants and small insertions and deletions. However, NGS relies on short-read sequences that are mapped to a reference human genome and fails to identify most large insertions, deletions, or copy-number variations in repetitive regions of the genome. It is incapable of easily detecting other structural variations (SVs) such as inversions and translocations. Non-allelic homologous recombination of repetitive sequences is thought to be a predominant mechanism for the origin of many large SVs. The non-unique sequences flanking these SVs often make them invisible to sequencing-based detection methods. Together, structural variable regions cover 13% of the genome and individuals show structural variation covering as much as 30 Mbp between each other (Sudmant et al. 2015).

Methods

Mechanism of Bionano Technology and Workflow

Ultra-Long Range Linear DNA Analysis Technology Enabled by NanoChannel Array Technology

An overview of the molecular and bioinformatics method is shown in Fig. 2.
Fig. 2

Bionano workflow for DNA isolation, labeling, imaging, analysis

Since structurally accurate genome interrogation and assembly requires long molecules, traditional purification methods are not suitable for DNA isolation for optical mapping. Bionano Genomics adapted the plug lysis strategy commonly used to construct BAC libraries for optical mapping. Briefly, cells/nuclei are embedded into an agarose matrix to protect DNA from mechanical shearing during the purification process. Agarose is then melted and solubilized, and the resulting megabase DNA is further cleaned by drop dialysis prior to labeling at sequence-specific sites.

Megabase size molecules of genomic DNA are labeled at a specific 6 or 7 basepair sequence motif, occuring approximately 8–28 times per 100 kbp, depending on its frequency in a particular genome. The label patterns allow each long molecule to be uniquely identified and aligned.

Labeled DNA is loaded onto the Saphyr chip and placed into the Saphyr instrument where Saphyr initiates electrophoresis to move megabase length molecules from bulk solution into the silicon chip micro environment before unwinding and linearizing the DNA in the NanoChannel arrays. The instrument uses machine learning initially and throughout the run to provide adaptive loading of DNA, optimize run conditions and maximize throughput.

When molecules are fully loaded into the NanoChannels of one flow cell, electrophoresis is halted and the entire surface of the NanoChannel array of that flow cell is rapidly imaged. During the imaging phase of the run, electrophoresis in the second flow cell is initiated. Cycles of loading of the NanoChannels followed by imaging are performed until sufficient data is collected.

Bioinformatics of Bionano Mapping Using Sequence Motif Pattern Specific Labeling

Bionano image detection software creates extracts molecules from raw image data. The backbone stain signal of the DNA molecules is used to identify molecules and to determine their position and size. The distance between the labels on each molecule is recorded to generate an extracted molecule file called a BNX file. The BNX file is the only input needed for the Bionano de novo assembly process.

Images generated by Saphyr are sent to the analysis server for real time data extraction during the run. Image detection is typically completed shortly after the run is finished and de novo assembly can be automatically initiated (for human genomes).

Using pairwise alignment of the single molecules, an assembly graph is constructed and a consensus genome map is produced, refined, extended and merged. Molecules are then clustered into two alleles, where there is heterozygous structural variation, and a diploid assembly is created to allow for heterozygous SV detection. Genome maps can be created using different enzymes labeling different sequence motifs to generate broader coverage and higher label density.

A standard automated pipeline for de novo assembly and SV calling was developed by Bionano Genomics to enable comprehensive SV analysis. The Python-based pipeline manages job submission, drives execution of alignment and assembly tools, and provides data summary information. It features a haplotype-aware assembler designed to detect and differentiate parental alleles. The de novo assembly algorithm is a custom implementation of the overlap-layout-consensus strategy. The assembler assembles extracted molecules from raw image data, and the final consensus maps are used as input for SV calling.

Examples of Applications with Bionano Mapping

De novo Assembly of Complex Genomes – Long Contiguity with Accurate Complex Structural Context

Hybrid Assembly Combining Mapping and Sequencing Data Derived from All Platforms – Single and Multiple Sequence Motif-Based Assembly

The de novo Bionano genome maps are a whole genome de novo assembly and can be used to learn about various characteristics of the genome such as size, repetitive content, and extent of heterozygosity. They can also be integrated with a sequence assembly to order and orient sequence fragments, identify and correct potential chimeric joins in the sequence assembly, and estimate the gap size between adjacent sequences. In order to do so, the Bionano Solve software imports the sequence assembly and identifies the recognition sites for the specifying nick sites in the sequence based on the nicking endonuclease-specific recognition site. These in silico maps for the sequence contigs are then aligned to the de novo Bionano genome maps. Conflicts between the two are identified and resolved, and hybrid scaffolds are generated in which sequence maps are used to bridge Bionano maps and vice versa. Finally, the sequence assembly corresponding to this hybrid scaffold is generated and exported as FASTA and AGP files.

The pipeline is fully integrated with Bionano Access which provides a convenient interface for running Hybrid Scaffold and viewing scaffolding results.

The hybrid scaffolding process considerably reduces the number of contigs found in the initial NGS assembly, improving assembly accuracy and quality while reducing the need for deep sequencing coverage.

The hybrid scaffolding approach can yield significant improvements in contiguity, as expressed by the assembly N50 values. Assembly contiguity can be further increased by performing hybrid scaffolding with maps using two separate nicking enzymes. Two sets of Bionano maps, each generated with a different nicking enzyme, can be integrated with NGS sequences together. This enables the NGS sequences to function as a bridge to merge single-enzyme Bionano maps into two-enzyme maps that contain the sequence motif patterns from both nicking enzymes. Since the Bionano maps are generated independently they serve as orthogonal sources of evidence to detect and correct assembly errors in input data. The complementarity of different data also greatly improves the contiguity of the merged Bionano map while doubling the information density, which substantially increases the ability to anchor short NGS sequences in the final scaffolds.

The two-enzyme approach was validated on the human NA12878 genome, a model data set for which sequence data is publicly available. Three different assemblies were tested: Illumina-D, 51x of 250 bp pair-end sequence; Illumina-S, 40x of 101 bp pair-end and 25x of 2.5–2.5 kbp mate-pair sequence; and PacBio, 46x with mean read length of 3.6 kbp. Compared to the input NGS, the two-enzyme approach improves the scaffold contiguity up to 100-fold, Fig. 3), anchors 30% more sequence contigs in the final scaffolds and corrects 50% more assembly errors in NGS sequences. The pipeline performs robustly in both animal and plant genomes as well (Fig. 4). This approach greatly expands the type of NGS data that can be integrated with Bionano maps to produce highly accurate and contiguous assemblies for complex genomes.
Fig. 3

Improvements in NA12878 assembly contiguity after hybrid scaffold with one-enzyme and two-enzyme genome maps. Illumina-D: 51x of 250 bp pair-end sequence; Illumina-S: 40x of 101 bp pair-end and 25x of 2.5–2.5 kbp mate-pair sequence; PacBio: 46x with mean read length of 3.6 kbp

Fig. 4

Improvements in sugar beet and hummingbird assembly contiguity after hybrid scaffolding with Bionano genome maps using one-enzyme and two-enzymes. For sugar beet, the fold coverage of the PacBio de novo assemblies is shown

At the time of writing, all published data using Bionano mapping has been generated by labeling DNA with nicking endonucleases. These highly sequence-specific enzymes create a single stranded nick at the presence of a 6- or 7 bp motif. At the site of the nicked DNA, fluorescently labeled nucleotides are inserted by polymerization and the molecules are repaired. This method (Nick Label Repair Stain, or NLRS) performs with extremely high specificity but can create double stranded breaks when nick site appear within about 200 bp on opposite strands. Recently Bionano Genomics has developed a novel labeling technology that avoids nicking, and instead uses a direct labeling method where the fluorophore is attached directly to the DNA at the location of a specific sequence motif. Since this Direct Labeling and Staining (DLS) method does not create systematic double stranded breaks, Bionano maps created from molecules labeled with DLS typically show a 50x improvement in contiguity compared to NLRS maps. Bionano maps now typically reach chromosome arm length, and the contiguity of sequence assemblies built using DLS reaches chromosome arm or full chromosome length in a variety of species.

Error Correction and Validation of Sequencing Data

The Bionano hybrid scaffold pipeline detects and resolves chimeric joins. Chimeric joins are typically formed when short reads, molecules, or paired-end inserts are unable to span across long DNA repeats. The errors appear as conflicting junctions in the alignment between the Bionano map and NGS assemblies.

When the hybrid scaffold pipeline detects a conflict, it analyzes the single-molecule data that underlies a Bionano map and assesses which assembly was incorrectly formed. If the Bionano map has long molecule support at the conflict junction, the sequence contig is automatically cut, removing the putative chimeric join (Fig. 5). If it does not have strong molecule support, then the Bionano map is automatically cut. Both assemblies must have coverage spanning both sides of a chimeric join to detect and resolve these conflicts.
Fig. 5

Example of a conflict between a sequence contig and a Bionano map. The conflict junction as shown by the red arrow in the alignment between the sequence contig and the Bionano map. There is strong molecule support spanning the junction region on the genome map, so the sequence is cut at the label indicated

Automated cuts using Bionano Solve help to resolve conflicts with a high level of accuracy. The majority of cuts made using Bionano Solve can be confirmed by comparison to the species’ reference assembly. There are several reasons why some cuts cannot be confirmed: the reference assembly is incomplete, the two separate input assemblies may represent different alleles, or the chimeric joins may have been caused by segmental duplications that are too long for Bionano molecules to resolve.

The two-enzyme scaffolding method improves the error correction even further. Since the Bionano maps were generated independently they serve as orthogonal sources of evidences to detect and correct assembly errors in input data. Compared to the published one-enzyme hybrid-scaffolds, the two-enzyme approach corrects up to 50% more assembly errors in NGS sequences.

Users can manually inspect all conflict resolution results. Bionano Solve notes the IDs and coordinates of the sequences and maps where conflicts have been detected and the corresponding resolution approaches taken. This file can be edited and modified, and then run again in the hybrid scaffold pipeline to produce a new set of scaffolds based on the manual conflict resolution. This manual enhancement process can be performed multiple times, giving users fine control in generating high-quality, complete hybrid scaffolds.

Comprehensive Genomic Structural Variation Discovery and Identification

Detecting All Classes of Structural Variants, Mobile Elements and Repeats, at Haploid Resolved Level

Bionano genome mapping is the only technology that detects all SV types, homozygous and heterozygous, starting at 500 bp up to millions of bp. Bionano maps are built completely de novo, without any reference guidance or bias. This differentiates Bionano from NGS, where short-read sequences are typically aligned to a reference. This alignment often fails to detect true structural variants by forcing the short-reads to map to an incorrect or too divergent reference, or by excluding mismatched reads from the alignment. Only de novo constructed genomes, like Bionano maps, allow for a completely unbiased, accurate assembly.

Bionano’s SVs are observed, and not inferred as with NGS. When short-read NGS sequences are aligned to the reference genome, algorithms piece together sequence fragments in an attempt to rebuild the actual structure of the genome. SVs are inferred from the fragmented data, with mixed success. With Bionano mapping, megabase-size native DNA molecules are imaged, and most large SVs or their breakpoints (in the case of inter-chromosomal translocations) can be observed directly in the label pattern on the molecules. If a native-state DNA molecule with a specific SV exists, then that SV call cannot be wrong.

SV calls are made based on analyses of a multiple local alignment between consensus maps and the reference (Fig. 6). The pipeline supports calling of major SV types: insertions, deletions, inversions, and translocation breakpoints. Bionano Access also supports visualization and confidence-based filtering of these SV types. Poorly aligned or unaligned regions flanked by well-aligned regions are called as deletions or insertions, depending on whether there is gain or loss of sequence relative to the reference. Junctions of neighboring alignments with opposite orientations are identified as inversion breakpoints. Fusion points between distant regions of the genome are identified as translocation breakpoints. Intrachromosomal translocation breakpoints involve regions on the same chromosome but at least 5 Mbp away from each other. Interchromosomal translocation breakpoints involve regions on different chromosomes.
Fig. 6

Structural variant types detected by Bionano mapping. SVs are identified by comparing label patterns in the sample of interest (blue) with those in the reference genome, or in a reference sample (green). Major types detected are

Gain/Loss of material: Labels moving closer together, with or without loss of labels, are evidence of deletions. Label spacing that increases with or without additional labels detected are called as insertions.

Copy number change: Expansions or contractions of tandem arrays or segmental duplications. Duplications are called automatically in direct or inverted orientation.

Balanced events: Genome maps aligning partially with two or more different chromosomes or genomic locations indicate translocations. When label patterns are inverted relative to the reference, an inversion is called.

Zygosity and confidence are assigned to each SV call to facilitate downstream analysis. An SV call can be labeled as homozygous, heterozygous, or unknown. Confidence scores are scaled such that they range from 0 to 1.

SV calls can be exported in a dbVar compliant VCF file. This file format contains all genomic variants identified in sample including SNVs, small indels, and SVs of various sizes. The VCF file generated by Bionano Access can be used in downstream analysis using a variety of existing tools.

Bionano algorithms call SVs by comparing genome structures. To identify a structural variation, a de novo genome map assembly can be aligned to a reference genome, or two samples can be aligned to each other directly. When aligning a genome map to a reference assembly, Bionano software identifies the location of the same recognition sequence used to label the DNA molecules in the reference genome and aligns matching label patterns in the sample and reference. This alignment provides all the annotation of the reference to the de novo assembled genome.

By observing changes in label spacing and comparisons of order, position, and orientation of label patterns, Bionano’s automated structural variation calling algorithms detect all major structural variation types.

Bionano detects seven times more SVs larger than 5 kbp compared to NGS. Professor Pui-Yan Kwok at the University of California, San Francisco, demonstrated the robustness of Bionano mapping for genome-wide discovery of SVs in a trio from the 1000 Genomes Project. Since high quality NGS data on these samples is publicly available, structural variation analysis using short-read data has been performed with over a dozen different algorithms. Using Bionano maps, hundreds of insertions, deletions, and inversions greater than 5 kbp were uncovered, 7 times more than the large SV events previously detected by NGS (Mak et al. 2016). Several are located in regions likely leading to disruption of gene function or regulation.

Bionano has exceptional sensitivity and specificity to detect insertions and deletions over a wide size range as demonstrated using simulated data. Insertions and deletions were randomly introduced into an in-silico map of the human reference genome hg19. The simulated events were at least 500 kbp from each other or N-base gaps. They ranged from 200 bp to 1 Mbp, with smaller SVs more frequent than larger ones.

Based on the edited and the unedited hg19, molecules were simulated to resemble actual molecules collected on a Bionano system and mixed such that all events would be heterozygous. Two sets of molecules were simulated, each labeled with a different nicking endonuclease. Datasets with 70x effective coverage were generated. The simulated molecules were used as input to the Bionano Solve pipeline and SV calls were made by combining the single-enzyme SV calls from both nicking endonucleases using the SV Merge algorithm. SV calls were compared to the ground truth.

Figure 7 shows sensitivity and positive predicted value (PPV) for heterozygous insertions and deletions within a large size range. SV size estimates were typically within 500 bp of the actual SV sizes, while reported breakpoints were typically within 10 kbp of the actual breakpoint coordinates. Additional large insertions (>200 kbp) were found but classified as end-calls.
Fig. 7

Heterozygous SV calling performance from a simulated dataset. Molecules were simulated from unedited and edited versions of hg19 (with insertions and deletions of different sizes) and used for assembly and SV calling

Bionano mapping has exceptional sensitivity and specificity to detect heterozygous insertions and deletions over a wide size range as demonstrated using experimental data. Since there is no perfectly characterized human genome that can be considered the ground truth, a diploid human genome was simulated by combining data from two hydatidiform mole derived cell lines. These moles occur when an oocyte without nuclear DNA gets fertilized by a sperm. The haploid genome in the sperm gets duplicated, and the cell lines resulting from this tissue (CHM1 and CHM13) are therefore entirely homozygous.

Structural variants detected in the homozygous cell lines were considered the (conditional) ground truth. An equal mixture of single molecule data from two such cell lines was assembled to simulate a diploid genome, and SV calls made from this mixture were used to calculate the sensitivity to detect heterozygous SVs.

Table 1 shows the number of insertions and deletions larger than 1.5 kbp detected in the CHM1 and CHM13 homozygous cell lines relative to the reference, and the in silico CHM1/13 mixture. SVs detected in CHM1 only or CHM13 only are heterozygous and those detected in both are homozygous. Bionano has a sensitivity of 92% for heterozygous deletions and 84% for heterozygous insertions larger than 1.5 kbp. The largest detected deletion was 4.28 Mbp in size and the largest insertion 412 kbp
Table 1

Two homozygous cell lines, CHM1 and CHM13 were independently de novo assembled and insertions and deletions >1.5 kbp called

 

PacBio

Bionano

 

CHM1 and CHM13 assemblies

Mixture assembly

Sensitivity (%)

PPV (%)

CHM1 andCHM13 assemblies

Mixture assembly

Sensitivity (%)

PPV (%)

Homozygous insertions

467

353

75.6

96.1

707

700

99.0

97.9

Heterozygous insertions

586

252

43.0

663

554

83.6

Homozygous deletions

221

183

82.8

94.9

269

268

99.6

97.1

Heterozygous deletions

501

337

67.3

517

477

92.3

Raw data was mixed together, assembled and SVs called (Mixture assemblies column). The sensitivity and positive predictive value (PPV) to detect heterozygous relative to homozygous SVs is shown

A similar experiment on PacBio long-read sequencing was described recently (Huddleston et al. 2017). Structural variants were called with the SMRT-SV algorithm in CHM1 and CHM13, and compared to those called in an equal mixture of both. The sensitivity to detect homozygous SVs using PacBio was 87%, compared to 99.2% using Bionano. The sensitivity to detect heterozygous SVs using PacBio was only 41%, which is less than half the 86% sensitivity for heterozygous SV detection using Bionano. Even when the PacBio SV calls were limited to insertions and deletions larger than 1.5 kbp, the sensitivity for homozygous SVs was only 78%, and for heterozygous SVs 54% (Table 1).

Bionano genome mapping detects 98% of large inversions. Inversions are the invisible variants and have traditionally been the hardest to detect structural events. They are balanced, without gain or loss of sequence, and unlike translocations they don’t create easily visible changes in genomic context. Inversions often escape detection by traditional cytogenetic techniques. Chromosomal Microarray can not identify balanced events, and metaphase chromosome spreads can only visualize some megabase size inversions. Next Generation Sequencing approaches tend to miss inversions because reads from inside the inversion map back to the reference without any indication that the orientation has changed. Detection of the breakpoints often fails, especially if the inversion is flanked by segmental duplications, repeat arrays or other non-unique sequences.

Bionano’s imaging of extremely long molecules overcomes these obstacles to identifying inversions. Simulations of thousands of heterozygous inversions of various sizes demonstrated that our SV detection algorithms have high sensitivity to detect inversions larger than 30 kbp, reaching 98% sensitivity to pick up inversions larger than 70 kbp throughout the genome.

Bionano far outperforms other technologies in the detection of translocations. Thousands of translocations were simulated similarly to insertions and deletions in an in-silico map of the human reference genome hg19. The sensitivity for heterozygous translocations was shown to be 98% for breakpoint detection in both balanced and unbalanced translocations. Genome mapping can define the true positions of breakpoints within a median distance of 2.9 kbp, which is approximately 1000 times more precise than karyotyping and FISH. This accuracy is often sufficient for PCR and sequencing if single nucleotide resolution of the fusion point is desired for subsequent gene function studies.

In addition, translocation detection sensitivity was verified in two reference samples, NA16736 and NA21891, which are lymphoblast cell lines produced from blood cells from patients. One patient had a developmental disorders resulting in deafness with DNA repair deficiency caused by a t(9;22) translocation, and a second patient had Prader-Willi syndrome associated with a t(4;15) translocation. Both cell lines had been characterized by traditional cytogenetic methods. Bionano was able to detect both expected translocations as well as the reciprocal translocation breakpoints. Additionally, NA16736 contained a t(12:12) rearrangement which flanked an inverted segmental duplication. In NA21891, one translocation breakpoint could be localized within a gene, resulting in a predicted truncation (Fig. 8).
Fig. 8

Example of a translocation detected by Bionano mapping, associated with Prader-Willi syndrome. Blue bars are Bionano maps, and vertical lines represent Nt.BspQI label sites. For each of the reciprocal translocation breakpoints, maps are shown with alignments of the maps to chromosome 4 (top) and chromosome 15 (bottom) of the human reference hg19. Breakpoint resolution can be determined by the distance between matched and unmatched labels

Bionano Genomics developed a variant annotation pipeline (VAP) to help prioritize variants and to determine if a variant is relevant to the disease or phenotype of interest. In particular, it is useful for family-based and case-control studies. The two main components of the VAP are: (1) variant annotation, and (2) variant validation. The pipeline provides gene annotation and compares a given variant to variants detected in phenotypically normal control samples, including tumor versus control from the same patient. For a trio analysis, the pipeline annotates whether variants in the proband are found in the parents to help identify inherited and de novo variants. To validate variants, the pipeline examines assembly quality scores and aligns molecules against the assembly of interest to determine if the detected variants are well supported.

By using a control database of common variants, VAP filters the thousands of identified variants down to hundreds that are rare, or to a handful of e novo variants. It also identifies the genes they overlap with or are closest to in the genome. The VAP is part of Bionano Access, which provides an interface for setting up experiments on Saphyr, starting and monitoring instrument runs, launching de novo assemblies and SV calling, visualizing SVs, and annotating variants with the VAP. The results can be exported as a dbVar compliant VCF file, for easy integration with variants identified with NGS or other methods.

SV Detection in Cancer and Genetic Disease

Bionano mapping correctly diagnoses genetic disorders: In a publication in Genome Medicine, professor Eric Vilain of Children’s National Medical Center, Washington, DC, presents molecular diagnoses using Bionano mapping of patients with Duchenne Muscular Dystrophy (DMD) (Barseghyan et al. 2017).

His team successfully mapped deletions, a duplication, and an inversion affecting the X-linked dystrophin gene, identifying deletions 45–250 kbp in size and an insertion of 13 kbp. The Bionano maps refined the location of deletion break points within introns compared to current PCR-based clinical techniques. They detected heterozygous SVs in carrier mothers of DMD patients as well, demonstrating the ability of Bionano mapping to ascertain carrier status for large SVs. Vilain’s team identified a 5.1 Mbp inversion involving the DMD gene, previously only identified by RNA sequencing of a muscle biopsy sample but missed by standard clinical methods (Fig. 9).
Fig. 9

A 5.1 Mbp inversion affecting the dystrophin gene detected in a patient with Duchenne Muscular Dystrophy. The inversion was detected twice, independently, in maps generated from patient DNA labeled after nicking with Nb.BssSI (top) and Nt.BspQI (bottom) nicking endonucleases. In both cases the inverted alignment of patient maps (top and bottom) relative to the reference (middle) is shown. Label sites are represented by red (Nb.BssSI) or black (Nt.BspQI) vertical lines in patient maps and reference, with grey match lines showing the aligned sites. RefSeq genes (orange) and the location of the inversion on the X-chromosome are shown at the top. (Barseghyan et al. 2017)

Bionano mapping also identifies genomic rearrangement in prostate cancer: Professor Vanessa Hayes at the Garvan Institute of Medical Research published a complete tumor-normal comparison from a primary prostate cancer (Jaratlerdsiri et al. 2017). Her team identified 85 large somatic deletions and insertions, of which half directly impact potentially oncogenic genes or regions. One such insertion, disrupting a gene known to be involved in cancer, is shown in Fig. 10.
Fig. 10

A 4-kbp somatic insertion within the CHL1 gene on chromosome 3 identified in the prostate tumor of UP2153 using Bionano Mapping. The tumor map (blue track) shows a 2.5-kbp insertion (Chr3: 302.9–305.4 kbp) relative to hg19 (blue track), defined by a tandem repeat interval (inset). However, direct comparison of the tumor to genome maps derived from blood of the same patient (red track) found a larger 4-kbp insertion. (Jaratlerdsiri et al. 2017)

Only one-tenth of these large SVs were detected using high-coverage short-read NGS and bioinformatics analyses using a combination of the best SV calling algorithms for NGS data. A manual inspection of NGS reads corresponding with the Bionano derived target regions verified 94% of the total SVs called with Bionano mapping. Many SVs detected with Bionano were flanked by repetitive sequences, making them all but invisible to short-read sequencing.

Targeted Known SV Detection as Biomarkers in Diagnostics and Companion Tests – Cytogenetics, Immuno-Repertoire Variation Mapping
Custom Labeling of Specific Sequences

A team from Drexel University has published several papers on a novel method to label any sequence of choice before imaging on a Bionano system (McCaffrey et al. 2016, 2017). An in vitro CRISPR/Cas9 RNA-directed nickase directs the specific labeling of a specific sequence motif that guide RNAs are designed against. In one application, they label human (TTAGGG)n DNA tracts in genomes that have also been barcoded using Bionano’s standard labeling kits. High-throughput imaging and analysis of large DNA single molecules from genomes labeled in this fashion using Bionano’s Irys or Saphyr permits mapping through subtelomere repeat element (SRE) regions to unique chromosomal DNA while simultaneously measuring the (TTAGGG)n tract length at the end of each large telomere-terminal DNA segment. This method enables global subtelomere and haplotype-resolved analysis of telomere lengths at the single-molecule level. Similarly, this team labeled HIV insertion sites and a variety of other repeat sequences.

With this custom labeling method, virtually any part of the genome can be studied in detail with Bionano mapping, even those parts which don’t have identifiable patterns using Bionano’s standard motif labeling.

Targeted Enriched Genomic Regions

Bionano mapping is typically performed on a whole genome scale. To enable collection of higher depth coverage of genomic regions of interest, or map a region much faster, a team from Tel Aviv University published a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis (Gabrieli et al. 2017). The isolated gel fragment is then used in Bionano’s standard DNA isolation and labeling workflow. The result is a highly enriched sample that can be mapped with Bionano or sequenced. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of Bionano maps and sequencing data at a fraction of the cost of whole genome sequencing.

Immune Repertoire Mapping

The MHC region of the genome has a higher density of genes and of identified disease-causing variants than any other part of the human genome. It is prone to rearrangements, and sequencing based methods are unable to correctly identify and phase the structure of this region. In a Nature Biotech paper, the authors describe constructing Bionano maps covering the 4.7 Mbp MHC region from two individuals and performing de novo sequence assembly using NGS reads (Lam et al. 2012). The maps and NGS contigs were then compared to the reference sequences reported by the MHC Haplotype Consortium as confirmation and to uncover potential differences.

Employing this method, the study found and confirmed a number of interesting genomic features, including a 4 kb error in one reference sequence, anchoring and gap sizing of four NGS contigs, identification of misassembled NGS contigs, differentiation of the two HLA-DRB1 variants, and definition of numerous structural variants, such as a 5 kb insertion and 30 kb tandem duplication.

A second team studied the MHC region and other complex parts of the genome, in the YH reference genome (Cao et al. 2014). They used Bionano maps to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules. They analyzed the structure of some complex regions of the human genome, including MHC also called Human Leukocyte Antigen (HLA), Killer-cell Immunoglobulin-like Receptor (KIR), IGL/IGH. The YH genome had Asian-specific structural variants in each of these regions. In addition to the MHC region, we also detected Asian/YH-specific structural differences in KIR (Fig. 11), compared to the reference genome.
Fig. 11

Consensus genome maps compared to hg19 in the KIR region. The green bars represent the hg19 in silico motif map; the blue bars represent consensus genome maps. The YH genome map shows a huge variation relative to hg19 and HuRef human reference sequences. KIR: killer cell immunoglobulin-like receptor. (Cao et al. 2014)

Other Applications

Ultra-Long Range Epigenetic Pattern Mapping

In a recent prepublication (Grunwald et al. 2017), a team from Tel Aviv University working with Bionano scientists present a method to fluorescently label DNA molecules based on their methylation patterns. Using a methylation sensitive methyltransferase M.TaqI, a green fluorescent dye is attached to megabase size DNA when the enzyme’s recognition sequence is present without CpG methylation. Bionano’s standard nickase is then used with a red dye to allow for identification of the molecules and for assembly of the genome. In Fig. 12, a green signal is repeated every 50 kbp – this is an unmethylated CpG island in a 50 kbp repeat.
Fig. 12

Individual DNA molecules (blue) are stretch horizontally in NanoChannel arrays. Sequence motifs are labeled red, unmethylated sequences are labeled green, showing a repeating sequence with 50 kbp spacing. (Grunwald et al. 2017)

This technology opens up an entirely new field of research: we can now study if the methylation status of the promotor of a gene influences that of another promotor hundreds of kbp away on single molecule. This compares extremely favorably to the standard methylation analysis methods, in which DNA is chemically converted using sodium bisulfite, followed by array hybridization or sequencing. Bisulfite conversion damages the DNA, and only very fragmented DNA molecules can be isolated and single molecule methylation patterns can be measured over no more than a few hundred basepairs at best.

The proof of concept study presented here demonstrates that we can now read the genome wide methylation profile of cells on long, single molecules while simultaneously mapping major structural variation on these same molecules.

Dynamic Mapping of Genome Functions – Replication Imaging

Cell replication is essential to life, and uncontrolled replication of cells is the cause of cancer. Exactly where eukaryotic cells initiate replication is hard to analyze. Studies looking into replication origins have largely focused on simple organisms with smaller genomes. Observing this process in large genomes is difficult because eukaryotic cells have up to 50,000 replication start points per cell per cycle, and even the most commonly observed replication origin in the genome functions as such in just 10% of cells. Several groups have demonstrated visualization of these replication origins on Saphyr in the bacteriophage Lambda (De Carli et al. 2017) and in human cells (Klein et al. 2017). Synchronized and arrested HeLa cells are transfected with red fluorescent nucleotides, the cell cycle is allowed to resume and DNA prepared using Bionano’s standard workflow. Sequence motifs are then labeled green using a Bionano’s NLRS or DLS kits and imaged on Saphyr. The green signal is used to assemble and align the molecules to the reference, the red signal shows where on those molecules the replication originated. The resulting images (Fig. 13) are stunning and the 290x coverage of the genome allows the team to identify early-firing human replication origins that occur in as few as 1% of cells.
Fig. 13

Individual DNA molecules (blue) are stretch vertically in NanoChannel arrays. Sequence motifs are labeled green, DNA replication origins are shown in red. (Klein et al. 2017)

References

  1. Barseghyan H, et al. Next-generation mapping: a novel approach for detection of pathogenic structural variants with a potential utility in clinical diagnosis. Genome Med. 2017;9:90.CrossRefGoogle Scholar
  2. Bickhart DM, et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet. 2017;49:643–50.CrossRefGoogle Scholar
  3. Cao H, et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience. 2014;3:34.CrossRefGoogle Scholar
  4. De Carli F, Menezes N, Berrabah W, Barbe V, Genovesio A, Hyrien O. High-throughput optical mapping of replicating DNA. Small Methods. 2017;2(9):1800146.  https://doi.org/10.1101/239251.CrossRefGoogle Scholar
  5. de Koning AP, et al. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7(12):e1002384.CrossRefGoogle Scholar
  6. Gabrieli T, et al. Cas9-assisted targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping. 2017.  https://doi.org/10.1101/110163.
  7. Grunwald A, et al. Reduced representation optical methylation mapping (R2OM2). 2017.  https://doi.org/10.1101/108084.
  8. Huddleston J, Eichler EE. An incomplete understanding of human genetic variation. Genetics. 2016;202(4):1251–4.CrossRefGoogle Scholar
  9. Huddleston J, et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 2017;27(5):677–85.CrossRefGoogle Scholar
  10. Jaratlerdsiri W, et al. Next generation mapping reveals novel large genomic rearrangements in prostate cancer. Oncotarget. 2017;8:23588–602.CrossRefGoogle Scholar
  11. Jiao Y, Peluso P, Ware D. Improved maize reference genome with single-molecule technologies. Nature. 2017;546(22):524–7.  https://doi.org/10.1038/nature22971.CrossRefPubMedGoogle Scholar
  12. Klein K, et al. Genome-wide identification of early-firing human replication origins by optical replication mapping. 2017.  https://doi.org/10.1101/214841.
  13. Lam ET, et al. Genome mapping on NanoChannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol. 2012;30:771–6.CrossRefGoogle Scholar
  14. Lee H, et al. Clinical exome sequencing for genetic identification of rare mendelian disorders. JAMA. 2014;312(18):1880–7.CrossRefGoogle Scholar
  15. Mak A, et al. Genome-wide structural variation detection by genome mapping on NanoChannel arrays. Genetics. 2016;202:351–62.CrossRefGoogle Scholar
  16. McCaffrey J, et al. CRISPR-CAS9 D10A nickase target-specific fluorescent labeling of double strand DNA for whole genome mapping and structural variation analysis. Nucleic Acids Res. 2016;44:e11.CrossRefGoogle Scholar
  17. McCaffrey J, et al. High-throughput single-molecule telomere characterization. Genome Res. 2017;27:1904–15.CrossRefGoogle Scholar
  18. Miller DT, et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet. 2010;86(5):749–64.CrossRefGoogle Scholar
  19. Sudmant PH, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526(7571):75–81.CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Bionano GenomicsSan DiegoUSA

Personalised recommendations