Introduction

At the half of the twentieth century, the discovery of DNA structure brought the demand to sequence it [1,2,3]. The two most popular methods Sanger and Maxam–Gilbert were introduced based on chain termination reactions and chemical cleavage analysis, respectively [4,5,6,7]. The Sanger method which depends on termination of the growing nucleotide chain with dideoxythymidine triphosphate (ddTTP) dominated the traditional Maxam–Gilbert method [8, 9, 32, 33]. It was also used by automation and mass in the human genome project (HGP) [10,11,12]. Due to technological shortcomings, the human genome was not possible to be fully sequenced in 2003 [13, 14]. Recently, the gapless next-generation sequencing (NGS) in a T2T consortium makes it possible to address the whole-genome parts. NGS is one of the sequencing technologies that made possible the advance in Oxford Nanopore sequencing with ultra-long-read capacities [15, 16].

Pocket-sized nanopore sequencers, which do not need a reverse transcription process and do not require a high-skill data entry approach, are becoming in need following their introduction for commercial purposes in 2014 [17, 18]. The technology enabled viral genome sequencing during the outbreaks of the Ebola virus in remote areas of West Africa and the Zika virus in the deeply forested regions of Brazil [19, 20]. These days it is used in China to sequence and identify SARS-CoV-2 [21, 22].

Single-molecule direct sequencing characteristics of nanopore-based sequencing methods look tailor-made to sequence epigenetics which has a significant role in driving cancer and its heterogeneity [23,24,25]. Methyl-CpG-binding proteins are identifiers of methylcytosine residues to attract transcriptional repressor complexes like histone deacetylases (HDAC). Those proteins connecting methylation with histone modification are the foundations of epigenetics [26, 27]. Here, we review an introduction to the development of NGS technology based on nanopore sequencing and its application to identify epigenetic tumor heterogeneity [28]. We also discuss the most studied and more impactful methylation and related cytosine modifications that exist as CpG islands.

Advancement of nanopore sequencing as the 4th-generation sequencer

Sequencing from Sanger to 4th-generation NGS

DNA sequencing technology has passed through a half-century of advancements starting from the Sanger and Maxam–Gilbert to the fourth generation of NGS, and nanopore sequencing is marked as the beginning of the fourth generation of gene sequencing technology [29,30,31].

HGP when started in 1990 needed to have a well-established sequencing technology that would make the project feasible because of the automation of the sequencing technology and the scaling up of some advancements [34, 35]. Finalization of HGP brought the reference human genome sequence as well as the advancement of the sequencing technology too [10, 31]. First-generation sequencing used for HGP required longer running times and high cost with limited throughputs. As sequencing demanded more throughput and low-cost technology, the shift from the first generation to the second generation was made in the mid-20s to establish the second-generation sequencing (SGS) [31, 36,37,38]. The shifting was achieved by devising a massively parallel sequencing system that started with the introduction of Roche 454’s pyrosequencing [39,40,41]. Since SGS is limited to short-read (35-1000 bases) and requires PCR amplification like Sanger’s method, it is unable to read regions such as high/low G + C regions, tandem repeat regions, interspersed repeat regions, and is hard to sequence [36, 38]. These SGS difficulties in resolving repetitive sequences of highly fragmented assemblies lead to the development of the next era of gene sequencing, third-generation sequencings (TGSs) including Illumina/Solexa and PacBio [44,45,46]. TGS is marked by single-molecule real-time (SMRT) sequencing, with improved reading length from tens of bases to tens of thousands of bases, reduced sequencing time from days to hours, and PCR elimination of sequencing biases [44, 47] (Fig. 1).

Fig. 1
figure 1figure 1

A Diagrammatic examples of first, second, and third-generation sequencing. Image reprinted from [48] with permission of the publisher (Request ID 600061564, 25 Nov 2021). B DNA sequencing timeline. The landmark events in DNA sequencing. Image adapted from [49]

In 2007, Illumina/Solexa was introduced the sequencing by synthesis (SBS) method of Genome Analyzer platform afterward sequencing by ligation system of ABI’s SOLID—Applied Biosystems instrument [42, 43]. SBS with bisulfite sequencing could be used to identify the methylation of cytosine. However, it could not be able to discriminate between C and 5mC from 5hmC [59, 60].

PacBio RS II as the first commercialized third-generation DNA sequencers that works by enabling the direct observation of DNA synthesis has the advantage of sequencing long-read lengths, high consensus accuracy, a low degree of bias, simultaneous capability of epigenetic characterization and is useful for direct detection of base modifications such as methylation [36, 38, 44, 47, 50, 54,55,56]. Generally, PacBio RS II is ideal for whole-genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetic characterization. PacBio RS II works without PCR amplification and offers the advantages of providing long-read lengths (> 20 kb) and maximum read length (> 60 kb) over first and second-generation platforms. PacBio system is also capable of directly detecting and discriminating epigenetic modifications [28, 54]. Moreover, many hybrid sequencing strategies have been developed and coupled with PacBio to make it more affordable and scalable. The noticeable limitations of PacBio include lower throughput, higher error rates, and higher cost per base [51,52,53].

In PacBio single nucleotide sequencing, four fluorescent-labeled nucleotides with distinct emission spectrum are added to the chip called SMRT cell, and a zero-mode wavelength light pulse is captured when a base is added (Fig. 2). The pulse is then interpreted as a base sequence [38, 54].

Fig. 2
figure 2

Principle of single-molecule real-time sequencing. A A single molecule of DNA template-bound Phi 29 is immobilized at the bottom of the zero-mode waveguide nanophotonic structure, which is illuminated by laser light. B Diagrammatic order of the phospho-linked dNTP association cycle. (1) Phospho-linked nucleotide forms a binding with a template in the polymerase active site. (2) Advancing fluorescence output on the analogous color channel. (3) Phosphodiester linkage formation releases the dye binder phosphate product followed by the ending of zero-mode waveguide nanophotonic fluorescence pulse. (4) Translocation of polymerase enzyme to the next nucleotide of the template strand. (5) Binding of the next cognate nucleotide on the active site of polymerase to continue the cycle [57, 58]. Image reprinted from [57] with permission of the publisher (Order license ID: 1164215-1, 25 Nov 2021)

The most recent (NGS) sequencing with nanopore technology (majorly discussed in this review) has a thin membrane structure that holds nanoscale holes. When biological molecules smaller than the nanopore pass through the hole, it detects the potential charge of individual molecules passing through it [31, 61, 62]. The four various companies are competing to dominate the NGS market based on their price, method, and average reading length (Table 1).

Table 1 Comparison of different NGS technologies. Adapted from [36]

The development of Oxford Nanopore sequencing technology

In 2012, nanopore technology started to be applicable for RNA sequencing with reverse transcription and amplification methods. Following that, Oxford Nanopore Technologies (ONT) developed a device based on an array of biological nanopores that enable reliable decoding of long sequences with an acceptable error rate, low cost, and better miniaturization [64, 94]. Its long-read sequencing capacity makes it a landmark in the history of sequencing [63,64,65,66]. The sequencing is a direct, highly parallel, real-time, single-molecule method that manifests an improved reading length of nucleotides [95,96,97,98].

Nanopores in NGS could reduce the time required for sample amplification along with enzymes, reagents, and optics used in sequencing by synthesis methods. Nanopore sensors are purely electrical and could penetrate blood or saliva DNA samples [67,68,69,70]. A nanopore is a nanoscale opening biological pore simulated from a protein channel through a lipid membrane. The pore can be made by ion track etching or straightforward planar lithography. Using a sensitive patch-clamp amplifier, the ionic current through a single pore can be used to separate two chambers labeled cis and trans [71,72,73]. Voltage is also applied across the membrane to create an ionic current through the nanopore [67, 72].

With mandatory changes from the previous sequencing methods, nanopore sequencing is an essential tool in medicine, such as in cancer research and diagnosis [73,74,75,76]. Moreover, the pore-based sequencing can be used to sequence, assemble, and analyze structural variants and detect epigenetic marks to point-of-care implementation for future human genomics applications [75, 77,78,79] (Fig. 3).

Fig. 3
figure 3

The MinION sequencing device—DNA sequencing is performed by adding a sample to the flow cell. The sensor measures the change in magnitude of current in the nanopore when the DNA molecule passes through it. The data streams are passed to the application-specific integrated circuit ASIC and MinKnow, which generate the signal-level data. Image reprinted from [63] with permission of the publisher (Request ID 600062077, 01 Dec 2021)

Nanopore sequencing technology was the result of a combination of gradual, long, multidisciplinary efforts from different directions [80]. The first upbringing was done in 1976 when Erwin Neher and Bert Sakmann developed mechanisms to record and measure the amount of current flowing through a single ion channel embedded in a biological membrane [81, 82]. But, the direct idea to use ion current measurement for sequencing through a membrane-embedded nanopore was introduced by David Deamer in 1989 [83, 84].

Deamer’s lack of a possible ion channel to allow a nucleotide to pass through was solved when he came across the John Kasianozicz for studying α-hemolysin, which is a protein toxin secreted by Staphylococcus aureus (Fig. 4A, B) [85, 86]. A phospholipid bilayer embedded with biological hemolysin nanopores is separated into two chambers, filled with a KCl solution. The applied electric potential with ionic current (Fig. 5) pushes the negatively charged DNA to the positive pole through the pore until it translocates (Fig. 4C) [87]. Translocation velocity depends on electrical potential applied, nanopores used, and the single or double strandedness of the DNA. Optimal velocity is around 2 nucleotides per millisecond, and a 10 × 10 array human genome can be sequenced in 8 h [88]. The four nucleotides are differentiated by various current disturbances created by translocation of ionic signal blockage. The amplitude and duration of blockages depend on the length and width of the translocating polymer [89, 90].

Fig. 4
figure 4

Representation of α-hemolysin from Staph aureus. Reprinted from [91] with permission of the publisher (CCC License ID: 5196920847168, 27 Nov 2021). A Side view of the alpha-hemolysin heptameric complex indicates the exact location of the phospholipid bilayer. B View of alpha-hemolysin from the cis entrance to the pore [86]. C Structure of α-hemolysin nanopore embedded in a phospholipid bilayer. In nanopore sequencing, the motor protein guides the DNA strand to pass through the pore. This causes current fluctuations through the membrane. The nanopore signal later is converted into a nucleic acid sequence by the base caller. The DNA substrate (violet) is inserted into the pore by an applied electric field. Image adapted from [92]

Fig. 5
figure 5

Biological nanopore instruments with representative ssDNA and protein-bound DNA events [67, 93]. Image reprinted from [67] with permission of the publisher (CCC License ID: 600061571, 25 Nov 2021). A The nanopore instrument uses an amplifier to apply a command voltage Vc and measure ionic current Ip through the nanopore channel. B At 120 mV in 1 M buffered KCl solution, 120 pA of open-channel current is attenuated to 15 pA for 0.2 ms upon the capture of ssDNA into the channel from the cis-chamber until the DNA passes through the pore. With Exol and ssDNA in the cis-chamber, bound events are also observed in which the duration of the current shift is extended 2 ms [67]

To break into the sequencing by synthesis sector, ONT designed a more stable membrane to support the nanopores, which were initially manufactured from lipids. Since the lipid was extremely sensitive to pH and temperature, it was replaced by lipid-coated Teflon hand-fabricated material [99]. The usual membrane works only seconds to minutes before it collapses and takes the whole day of production of the membrane to generate half an hour of data. ONT moved on to synthetic membrane material that makes it more effective. Moreover, to overcome this challenge, in February 2012 GridION, Flongle, MinION, and PromethION platforms were displayed [100, 104, 105]. Perhaps MinION took the most attention, as it deciphers almost a billion DNA bases in 6 h while priced at $900 (Fig. 6) [102, 103].

Fig. 6
figure 6

Comparison of currently available Oxford Nanopore Technologies. Adapted from [23]

Biological versus solid-state nanopores

The initial biological nanopores still yielded the best results with easily makeable, highly modifiable, and reproducible structures that allow repeatable current measurement [109,110,111]. The inorganic nanopores have strength in terms of temperature, solvent compatibility, robustness, and the ability to be integrated with semiconductor electronics [112,113,114]. Solid-state nanopores have an advantage over biological counterparts such as the stronger thermal, mechanical and chemical stability; ease of modifications; tunable pore size and morphology, readily able to be integrated into nanofluidic or other nanodevices, and scalability of fabrication [115,116,117,118]. The most common solid-state nanopores are SiO2 and low-stress silicon-rich nitride SiNx. In addition to the well-developed handling of these materials for semiconductor microelectronic fabrication, silicon-based nanopores are preferred for their robustness, good resistivity, and dielectric strength [119,120,121]. Other elements tried for nanopores are Al2O3 and HfO2, to provide unique membrane fabrication [122,123,124,125].

Solid-state pores, first made by ion-beam sculpting later by transmission electron microscopy (TEM) drilling or dielectric breakdown, have the limitations of being unable to achieve the required thickness needed for membrane stability [107]. In comparison with biopores, solid-state nanopores exhibit lower single-molecule detection due to the intrinsic thickness and lack control over surface charge distribution [126].

A versatile nanopore membrane based on MoS2 was developed with signal amplitude five times higher than solid-state Si3N4 membranes, and unlike graphene nanopores, no special surface treatment was needed to avoid strong interactions between DNA and the surface [126, 127]. Monolayer 2D materials such as graphene, MoS2, WS2, and hexagonal boron nitride (h-BN) are thicker as the spacing between the nucleotides [128, 129]. Compared with traditional solid-state nanopore membranes, monolayer 2D membranes are ideal for nanopore devices as they exhibit a high ionic current signal-to-noise ratio and relatively large sensing regions [129, 130]. Solid-state nanopores channels are long around 100 times the distance between two bases in a DNA molecule (0.5 nm) [131, 132]. Even though it has a sticking effect during translocation, ultrathin graphene monolayer membranes drilled by electron beams after being placed on a silicon nitride are preferable solid-state nanopore technology [131, 133].

Following identification of hemolysin as biological pore, stable membrane nanopores allowing passage of fewer nucleotides at a time were required to reduce entry of numerous nucleotides at once [116]. Thus, Funnel-shaped Mycobacterium smegmatis porin A (MspA) was introduced as an alternative to hemolysin [116, 134]. Unlike mushroom-shaped α-hemolysin, MspA has a reduced passing number of nucleotides in the stem [135]. To improve the readout of ONT nanopores, CsgG (Curli-specific gene products A-G) Escherichia coli outer membrane lipoprotein was also introduced [136]. Out of tens of nanopores tested and thousands of mutants, the CsgG pore had a very narrow and well-defined passage for a DNA strand and outsmarted all the pores tried by ONT [137, 138]. Later CsgG pore was engineered with reading heads that improved the signal and accuracy of the sequence readout [139, 140]. Other protein nanopores include Outer membrane protein F (OmpF), Outer membrane protein G (OmpG), Aerolysin, Nocardia farcinica peptide A/B (NfpA/NfpB), and cytolysin A (ClyA) were also been tried [112, 141] (Fig. 7, Table 2).

Fig. 7
figure 7

Examples of biological nanopores. a α-HL, b MspA, c Phi 29, d ClyA, e FhuA, f aerolysin, g SP1. Reprinted from [142] with permission of the publisher (CCC License ID: 1164219-1, 25 Nov 2021)

Table 2 Different biological nanopores characteristics. Adapted from [142]

Diversifying the nanopore type from different building materials to get more precision, size and chemical properties have widened the application of nanopores beyond sequencing [143]. Self-assembled pore types are produced from a variety of materials including proteins, peptides, synthetic organic compounds, and DNA of various [144]. Companies like Genia technologies (acquired by Roche in $300 million aiming to combine biological nanopores with an optical detection), quanta pore, quantum Biosystems (by prof. T Kawai combining tunneling electron detector with nanopore sequencing), Base4, and Noblemen Biosciences aim to cleave single nucleotides into droplets in a water–oil emulsion and detect their presence by a chemical cascade of reactions [89, 145] (Fig. 8).

Fig. 8
figure 8

Various types and geometries of nanopores. Reprinted from [67] with permission of the publisher (CCC License ID: 600061571, 25 Nov 2021)

Controlling DNA translocation through a nanopore

One of the crucial hurdles for the success of nanopores to be a reliable DNA analysis tool is the ultrafast and stochastic nature of DNA translocation, which demanded the incorporation of motor proteins to translocate DNA by base wise and other experimental modifications [107]. The origin of this problem is the velocity fluctuations due to random diffusion Brownian motion, which combine with a directed motion to create the event of a drift–diffusion process [146]. To achieve a single-nucleotide resolution, the translocation speed of the DNA is expected to be 1–100 ms/nt [107, 147]. Incorporating a biological motor or nanobead and regulating the driving voltage by adjusting pore geometry and experimental conditions are the two ways that have been tried [107, 148]. Sensing each nucleotide of a DNA strand and delivering the strand into the nanopore in a controlled manner were tried to be addressed by modifying macroscopic properties such as solvent viscosity and ion concentration or temperature [149, 150]. Molecular dynamic simulations providing a series of metal-dielectric layers have also been proposed as an additional option [151].

Incorporation of a biological motor or nanobead

To enhance base recognition, DNA exonuclease (from E. coli exonuclease I (ExoI)) and DNA polymerase enzymes were used as a motor in α-HL [152]. Weighing disadvantages like being unable to have multiple reads due to complete digestion of the strand, and the demand to have a precise feeding of nucleotides into pores, made exonuclease enzymes outdated early in motor protein studies [153, 154]. The first polymerase that is considered as A-family was the Klenow fragment (KF) of E. Coli DNAP I with α-HL pores [155]. However, due to stability and processivity issues, the A-family DNAP was replaced by B-family DNAP, i.e., Phi 29 [156]. The bacteriophage phi29 DNA polymerase (phi29 DNAP) has a high affinity for DNA substrates and works well with α-HL and MspA pores [157]. Unlike polymerase, helicases with the ability to bind single-stranded nucleic acids require a partial duplex where the new nucleotides are added to the 3′ end of the primer [158]. Helicase has also a better affinity, can eliminate double reading bases and skipping due to fluctuation in synthesis rate, and exhibits the proofreading trait of Phi 29-DNAP [159, 160] (Fig. 9).

Fig. 9
figure 9

HEL308—helicase as motor protein translocating ssDNA A shows the mechanism and B Domain organization and motions of HEL308. The two (recombination protein A) RecA 1 and 2 domains compose the motor part; here, ATP binds between them and drives or rectifies the mechanochemical cycle, and the auxiliary ratchet domain makes several contacts with ssDNA and may offer determinants of the potential sequence specificity [159, 160]. Image adapted from [159]

An integrated nanopore platform with a nanobead structure was reported to decelerate DNA movement and the noise is reduced by a polyimide layer along with a controlled dielectric breakdown (CDB) process for nanopore fabrication [161]. The second way of controlling translocation relied on regulating the driving voltage as mentioned above, and adjusting pore geometry and experimental conditions is helpful [162,163,164].

Adjusting pore geometry

Limited pore geometries were the factors that forced research to expand into solid-state nanopores, which can give diversity in pore shape. But, they have reduced spatial resolution due to the required thickness needed for membrane stability [107, 119, 165]. Decreasing the nanopore diameter to almost the same size as that of ssDNA, i.e., 1.4 nm, decreases the translocation speed to 1.4 microsecond/base, making narrowing the nanopore one effective way to improve translocation [166]. When the pore diameter is reduced, the amplitude of current signals from DNA increases. Compared to cylindrically shaped nanopores on a continuum modeling system, conical-shaped nanopores produce greater signal amplitudes from biomolecule translocation [167].

Adjusting experimental conditions

The ultrafast translocation speed of single-stranded DNA (ssDNA) in solid-state nanopores is one of the predicaments, and there are various ways to decelerate the speed [161, 166, 168], one of which is controlled dielectric breakdown (CBD) with a divalent metal cation especially Ca2+ provides a silicon nitride nanopore with a deceleration of 100 microseconds per base [169]. Pore-dwelling time was shown to be increased by varying electrolyte cationic species and solution molarities. For solid-state pores, when the cation size decreases from K+ to Na+ to Li+, translocation time strongly increases both for dsDNA and ssDNA and that is due to the stronger binding capacity of smaller cations to the DNA strand [170]. Slowing down of DNA translocation velocity using a LiCl salt gradient and nanofiber mesh was implemented to maintain the DNA molecule in the sensing time of nanopores. Compared to other alkali solutions, LiCl can extend the dwell time by 20 ms (five times longer than NaCl and KCl) for which it reaches 100 ms when the concentration increases and the nanofiber mesh further retards it by 162 to 185 ms [171]. Lowering the translocation speed of ssDNA by using 15-fold increases in LiCl salt concentration brings counter-ion binding and effective lowering of the overall charge of DNA, which in turn lessens the electrophoretic driving power of the system to slow down the translocation velocity. Lowering the translocation enhanced resolution until it allows 5’mC to be distinguished from C without using methyl-specific labels is mandatory [172]. On the other side, decreasing the KCl concentration from 1 to 0.1 M resulted in a shorter time to pass through the nanopore and oppositely longer transit time was gained with a low concentration of MgCl2 in silicon nitride nanopore systems [173].

Enhancing the signal-to-noise ratio SNR

The major hurdle in the progression of nanopore technology is noise in the ionic current, limiting the signal-to-noise ratio (SNR). Solid-state nanopores have the highest SNR due to the large currents at which they can be operated and the relatively low noise at high frequencies. Still, the translocation speed slowdown plays a major role and MspA was shown to increase the SNR > 160 fold [174] (Fig. 10).

Fig. 10
figure 10

Noise in biological and solid-state nanopores. Image adapted from [174]

Nanopore noise power spectral density (PSD) is composed of 1/f noise: white noise, dielectric noise, and amplifier noise, each dominating at different frequencies. When we see the origin of the noise, 1/f noise is due to surface and bulk effect; white noise is from thermal and shot effect; dielectric noise from dielectric membrane current leakage and amplifier noise are due to capacitance in the chip and amplifier [175] (Fig. 11).

Fig. 11
figure 11

Ionic current noise in nanopores for solid-state SiN2 nanopores and biological α-HL (a) pore performed at a constantly applied bias of 100 mV in 1 M KCl buffer at pH 7 at a bandwidth of 10 kHz (b). Image adapted from [174]

To manipulate for SNR improvement, the diameter of the nanopore is limited by the molecule size, and the membrane thickness is constrained by material properties [176,177,178]. Using theoretical thickness limits of amorphous, Si membrane-based nanopore is becoming the leading material for increasing the ionic conductance and producing a high signal-to-noise ratio for sequencing applications [179, 180]. Various approaches are followed to overcome the noise limitations, for example, increasing the conformational stiffness and decreasing pore size in biological nanopores [174], surface functionalization of the SiNx nanopores with a hydrophilic layer such as Al2O3 or SiO2 [124], application of high electric fields to the pore [181], choosing a pH far from the isoelectric point of the nanopore material [174] which are proved to help reduce the noise in solid-state nanopores [178]. Suppression of dielectric noise by minimizing the capacitance and dielectric loss of the chip is also another way to reduce noise [174, 175].

The other improvement area of ONT is the computational requirements for higher SNR and throughputs [182, 183]. This demands more algorithms for base calling, mapping, and variant calling [184]. Low SNR due to technological limitations of the nanopore sequencers makes it unable to read and determine the required nucleotide sequences [182].

Expanding the range too long reads

To sequence unambiguously spanning repetitive elements of the genome, long reads are required for increasing a significant length [187, 188]. The method of pipetting reagents as slowly as possible to minimize shearing force and preserve long DNA templates during library preparation was developed and called SNAILS (a slow nucleic acid instrument for long sequences) [187]. SNAILS implements automating the slow pipetting of library preparation reagents to increase the consistency and throughput of long-read nanopore sequencing [187, 189]. Focusing on DNA extraction and enzymatic reactions to further increase the read length, it is possible to transform from 50 to 70 kb of mechanical shearing to 90 to 100 kb reads of transposase-mediated fragmentation [190].

At the beginning of the millennium, the initial draft of the human genome was not completed and remained as such until the Oxford Nanopore sequencing technology complements the PacBio sequencing [191]. So, we see the complete set of human genomes sequenced. The remaining 8% of the genome addressed by the telomere-to-telomere (T2T) consortium included: gapless assemblies for all 22 autosomes plus chromosome X, all centromeric satellite arrays, and the short arms of the five acrocentric chromosomes [16, 192]. Long-read sequencing gets into inaccessible parts of the genome such as centromeres [101], telomeres, and acrocentric genomic regions [193]. In those regions, massive arrays of tandem repeats predominate and manifest the highest mutation rates both in germline and soma makes [194]. Identification of those techniques allowing access to the regions was a blessing for genomic analysis research and industry [101].

Computational advancements

Computational analysis in sequencing experiments has various tools [104, 105]. But their selection needs to be clear, and separate tools are required for individual steps. Managing and integrating the tools is also difficult. Combining tools to pipelines might help and play a role in mapping sequencing reads, calculating methylation levels, and distinguishing differently methylated positions or regions [106]. Since movement was slow to allow identification of individual nucleotides, the other challenge was creating a well-controlled ratchet of the nucleotide through the pore [87, 107, 108].

Nanopore sequencers can generate enormous amounts of data within a short period due to the development of computational systems that incorporate nanopore chemistry and base calling software [182, 184]. The software performs sequencing and reading of nucleotide fragments followed by two approaches: read mapping and de novo assembly [345]. Read mapping is an alignment of reads against the reference genome to identify variations in the sequenced genome [383]. De novo assembly is used to combine the reads for building the original sequence in the absence of a reference genome [384]. In 2014, Oxford Nanopore Technologies (ONT) launched a beta-testing program for the MinION followed by the development of novel computational approaches for base calling, data handling, read mapping, de novo assembly, and variant discovery of this new generation of data [15, 195]. These approaches improve the de novo sequencing of genomes and make possible the investigation of structural variants with unrivaled accuracy and resolution. The advancement can also reduce the higher error rate of nanopore sequencing techniques [196].

Nanopore chemistry software

A change in sequencing chemistry of sequencers like MinION and GridION has shown a valuable improvement in error rates. Before the production of MinION, sequencing through the biological nanopore allows 1D sequencing of a template strand up on unwinding the double strands by motor protein [182, 387]. However, early models of MinION provided 2D sequencing software that incorporates proofreading of both strands (Template and complimentary), realized due to ligation of hairpin structure to the DNA strands. The accuracy of the 2D read has been more than 5% of the 1D read (read of the template strand alone) [64]. Recently, ONT has developed 1D2 sequencing software that permits the sequencing of the template and complementary strands without physical ligation. Due to this change, 1D2 has shown an increase of 7% accuracy than 1D software [182, 385, 386].

Base calling software

A base calling that involves the computational process of converting the obtained raw current signals to nucleotide sequences is very important for the detection of epigenetic modifications [388]. Hence, ONT has gone through various development stages of base calling software. The base calling was obtained from fragmented current data using HMM at the early stages of development, followed by the implementation of a recurrent neural network in 2016 [389]. Raw current data have been used to collect base calling in 2017. As the accuracy demand increased, updated flip-flop and customized base calling models were practiced in 2018 and 2019, respectively [184, 390].

Real-time base calling can be simplified as the current formats like BAM/CRAM (Binary alignment map/Compressed reference alignment map) are unable to completely reach the ultra-long reads [77]. Up to five neighboring bases influence the current level of a single DNA strand that traverses through MspA [185]. Such kind of limitations inspired to use of the most dynamic programming such as the Viterbi algorithm [186]. Of course, genotyping accuracy is racing short-read sequencing instruments and it is because of insufficiency to discriminate between heterozygous and homozygous alleles. This urges a need for structural variant genotyping tools for long, single-molecule sequencing reads [77]. The computational program of MinION has identification steps to convert base calling electronic data into the required nucleotide sequences [63]. First, the motor protein found above the nanopore unwinds the dsDNA to make proper passage of the ssDNA through the nanopore (Fig. 12A). Second, the ionic current signals obtained from the nucleotide reading are segregated into mean, standard deviation, and length (Fig. 12B). Those signals have a constant sampling frequency of 5000 Hz. Third, the segregated results are then transferred to the machine learning approach box for translating into the template and adjunct signals (Fig. 12C). Finally, the sequence of signals results in a display with the computer device (Fig. 12D). The performance of each step can be evaluated through graphs based on throughput, read length, and accuracy (Fig. 12E, F, G, H) [195].

Fig. 12
figure 12

Steps for computational sequencing of DNA using a nanopore. Image reprinted from [195] with permission of the publisher (License ID: 1164222-1, 25 Nov 2021)

Current challenges and opportunities of nanopore sequencing technology

The two challenges that need to be solved in nanopore sequencing are enzyme turnover and the interval in which the nanopore current is released [67, 186]. The enzyme turnover is used for the identification of successive bases in the sequence stochastic, giving an imperfect ratchet in which the interval between each advance of DNA is variable [197]. Some of the intervals may be so short, overlooked in system noise, or repetitive sequences of identical bases may not be recognized in long intervals. Improved ratcheting mechanisms for accurate nanopore sequencing might solve the issue [186].

Solid-state nanopores modification and functionalization for mimicking some of the important biological pore characteristics are advancing. However, nanopores are single-use only and require more effort to achieve reversible functionalization [198, 199]. Therefore, a hybrid biological/artificial nanopore is the most promising strategy to combine robustness and selectivity [200,201,202]. Nanopore technology in terms of consensus base calling accuracy is unable to compete with other sequencing platforms [203, 204]. Single-molecule sequencing (SMS) has trouble producing sufficient signals, and as a result, the error rates of the individual sequencing reads are higher than SBS sequencing data [205, 206]. Of course, nanopores enabled genome-wide and transcriptome-wide analysis on top of these base modifications in epigenomics. Additionally, as a nanopore technology being applied to protein sequencing too, for proteomics, the opportunity brings the multi-omics to a single platform, which would be nanopore sequencing, the future of sequencing for all applications including in human health and medicine [207,208,209].

The competition with PacBio and the biggest market shareholder Illumina is enormous. Although high-coverage sequencing is required in SMRT, detection with high accuracy is possible using low-coverage reads in nanopore sequencing [209, 210]. It has been easy for Oxford Nanopore to defeat both Illumina and PacBio on the battlefields of legal charges; it seems to continue as such due to super-packed patents held by Oxford Nanopore Company for producing, hunting, and claiming for more than a decade [211,212,213].

Even though many solutions emanate to the challenges as mentioned in Sect. 3, the decade-long journey of nanopore sequencing technology challenges remains still concerning for the adepts working on the technology. Daniel Branton once predicted in his “the potential and challenges of nanopore sequencing” paper in 2008, those similar challenges still exist, but great advancements have been made too [108].

Workflow for Nanopore sequencing

All relevant regulations for working with human subjects should be compiled before sample and library preparation for nanopore sequencing proceeds [214]. Extraction of nucleic acids followed by library preparation and base calling was subsequently performed [66]. Before sequencing and assembling large DNA fragments from short DNA oligonucleotides, a general step is increasing the nanopore sequencing throughput of small DNA amplicons [214, 215] (Fig. 13).

Fig. 13
figure 13

The workflow for nanopore sequencing. Image adapted from [216]

Mapping of nanopore reads is done by alignment to the reference genome with Minimap2. For reads matching known genes, the gene name is added to the corresponding SAM record using the Sicelore Add Gene Name Tag method; here, the genes are annotated with their nanopore read sequence and read qualities [217] (Fig. 14).

Fig. 14
figure 14

Workflow and period for MinION nanopore sequencing and assembly process. The estimation was based on a rapid barcoding sequencing kit, which could pool twelve samples in a single run. Base calling and de novo assembly are dependent on the computer’s capacity used. Image adapted from [218]

Epigenetic tumor heterogeneity and sequencing technologies

Epigenetics and tumor heterogeneity

Epigenetics

Epigenetic components could be conceived as writers, readers, and erasers; writers add chemical groups to histones or DNA (e.g., histone acetyltransferase HATs, histone methyltransferases HMT, or DNA methyltransferases) [219]. Erasers like histone deacetylases HDACs or histone demethylases HDMTs remove the added chemical groups [220,221,222]. A set of reader domains that act as effector proteins by attaching to specific sequences, e.g., methyl-binding domain proteins or Bromo and extra-terminal (BETs) domain proteins, are also known [220, 223, 224]. Out of this DNA methylation which refers to the modified nucleotide 5-methylcytosine (5mC) [225] is the first epigenetic factor to be identified and the main focus here. 5mC is found within all sequences but is highly rich at sequences where cytosine is immediately followed by guanine in the 5′ to 3′ direction [226]. 5mC is considered as a CpG site, while regions with high CpG sites are known as CpG islands found over two-thirds of gene promoters and can serve as epigenetic regulatory switches that restrict gene expression when methylated [227, 228]. CpG islands at the promoter region silence genes for normal developmental requirements and during tumorigenesis [229, 230]. Unlike relatively plastic transcriptional regulation done by histone modification, gene silencing through DNA methylation is more durable and persistent [231]. As a consequence, methylation is the primary epigenetic silencing mechanism used for the repression of endogenous transposons, imprinted genes, and pluripotency-related genes in somatic cells [232, 233] (Fig. 15).

Fig. 15
figure 15

The linkage between DNA methylation and histone modification in pluripotency genes. In embryonic stem cells, pluripotency genes such as Oct 3/4 and Nanog have acetylated (unmethylated) CpG islands. These islands are combined with acetylated Histones (Ac) H3 and H4 and methylated (Me) lysine 4(K4) of Histone H3. With the initiation of differentiation histone methyltransferase (G9a) together with histone deacetylase (HDAc) enzyme binds to the complex. The binding leads to deacetylation of H3 and H4. At the same time demethylation of K4 is catalyzed by HDAc and methylation of K9 is catalyzed by G9a. This modification created a binding site for the chromodomain protein heterochromatin protein 1(HP1). Finally, G9a recruits the methylases DNMT3A and DNMT3B (dark purple circles), which will mediate the de novo methylation of the deacetylated DNA [232, 234]. The process favors epigenetic silencing and methylation while blocking heterochromatinization. Image reprinted from [235] with permission of the publisher (Request ID 600061575 25 Nov 2021)

Other than methylation, there are additional dinucleotide modifications with potential regulatory roles such as 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5Fc), and 5-carboxylcytosine (5CaC) [236]. DNA methylation at the 5th position of cytosine forms 5-methyl cytosine (5mC), which is the main DNA modification occurring mostly in CpG dinucleotide sites of mammals. 5mC can be converted to 5hmC, 5Fc, and 5CaC by ten–eleven translocation families of enzymes called α-ketoglutarate-dependent dioxygenases [237, 238]. Indeed, distribution of 5hmC is possible at protein-coding gene bodies and promoters found on long non-coding RNAs, LncRNAs (Fig. 16) [239].

Fig. 16
figure 16

The landscape of epigenetic mechanisms. A Cytosine and adenine modification, cytosine by methylation, hydroxymethylation (hmC), formylation (fC), and carboxylation (caC) while adenines by methylation. B Histone modification and nucleosomes having different histone variants change position. C Non-coding RNAs play an important role in transcription regulation and are sometimes considered epigenetic mechanisms. D All RNA modifications can also be considered as a part of epigenetics. Image adapted from [240]

The regulatory function of methylation, especially in hypermethylation, lays in the recruitment of co-repressors after the promoter regions of a gene get extra methylation [241]. Such extra methylation leads to transcriptional silencing. The regulation process is directed by DNA methyltransferases (DNMT 1, 2, 3A, and 3B) and methyl-CpG-binding proteins, which identify methylcytosine residues to attract transcriptional repressor complexes like histone deacetylases (HDAC) [27, 242, 243]. Histone acetylation (HAT) and histone deacetylation (HDAC) ultimately affect gene transcription as regulators [27]. There are small RNAs that manage scaffolds that are complementary and nascent but used as an agent to guide histone and DNA methyltransferases [244]. Apart from small RNAs, chromatin-associated long non-coding RNA scaffolds play an independent but co-transcriptional silencing role that provides a system to detect and silence inappropriate transcriptional events [245]. This system also allows the registration of memory for what is carried out as self-reinforcing epigenetic loops [246] (Table 3).

Table 3 Chromatin modifications, readers, and their functions. Adapted from [247] with permission of the publisher (License ID 1165278-1, 01-Dec-2021)

The role of oxidized 5-methylcytosine was controversial for a long time, but the discoveries of binding proteins as a reader to these sites started to show their roles [248, 249]. For 5hmC, a reader protein like UHRF2 (Ubiquitin-like with PHD and ring finger domains 2) was recognized [250]. But, downstream biological effects of this binding have not yet been identified [248, 251]. 5fC and 5caC exist in low amounts specifically in certain genomic locations like enhancers and promoters, and targeted studies have identified binding proteins for those modified nucleotides [252].

Association of epigenetic dysregulation with cancer and targeted therapeutics

The advancement of molecular sequencing technologies to characterize epigenetic aspects has made it one of the other hallmarks of cancer [253, 254]. DNA methylation profiles regulate key cellular processes such as apoptosis, lipogenesis, and downstream transcriptional effects of the MAPK-pathway [255]. Uncontrolled regulation of methylation in those gene regions results in the growth of tumor cells in colorectal cancer (CRC) [256]. Further, methylation-associated epigenetic driver genes have been identified to be involved in the early stages of tumorigenesis in CRC. CRC tumors display CpG island methylator phenotypes (CIMPs). Those phenotypes show high concordance with specific genetic changes, disease risk factors, and patient outcomes [257]. So, hypermethylation of the CpG island region leads to the silencing of tumor suppressor genes to cause the growth of tumor cells [258], while hypomethylation of the CpG island promotes transcriptional oncogenes [259]. Dysregulated epigenetic mechanisms, methylation, and histone modification are also highly associated with the occurrence of glioblastoma [260].

5hmC has specific characteristics which make it suitable for biological functions, majorly to block 5mC-seeking protein interactions with DNA [261, 262]. As a transient intermediate, it has a role during germ cell and early embryonic development to facilitate DNA demethylation [263,264,265,266]. During cell differentiation and reprogramming, TET-mediated DNA demethylation is started with the oxidation of 5mC to 5hmC [267,268,269]. With further oxidations, 5hmC is transformed to an intermediate 5caC and eventually completes DNA demethylation when converting to cytosine [266,267,268,269,270].

On gene bodies and promoters, 5-hydroxymethylcytosine (5hmC) has various roles in cancer hallmarks and differential 5hmC levels were correlated with clinical outcomes and tumor status in colorectal cancer (CRC) patients [239]. 5hmC on the other way has a role in the regulation of DNA functions that makes it one of the early cancer diagnosis and prognosis markers in the future [271, 272]. This expectation comes after the recognition of 5hmC as a transitional state intermediate that has its role to play in the demethylation process of genetic regulation [263, 273].

Generally, epigenetic aberrations of DNA methylation, histone modifications, chromatin remodeling, and micro-RNA can show cancer development and progression and are used as biomarkers for patient stratification [274, 275]. They are also used as predictive models to allow the use of cancer epigenetics in the diagnosis, prognosis, and treatment of patients [274, 276] (Table 4).

Table 4 Epigenetics role in tumorigenesis and progression. Adapted from [247] with permission of the publisher (License ID 1165278-1, 01-Dec-2021)

Epigenetics study moving deep in exploration to targeting epigenetic aberrations as a potential anticancer therapy is suitable for reversible nature of epigenetic changes [277, 278]. Several epigenetic inhibitor agents have been developed and approved for use in routine clinical practice [253, 254, 279]. The mechanism of epigenetic therapy comprises inhibitors of methylation or demethylation and acetylation or deacetylation of DNA and histone proteins [253, 280,281,282]. Inhibitors of epigenetic regulatory mechanisms include various analogs of adenosine, cytidine or deoxycytidine or non-nucleoside small molecule inhibitors for DNMT and hydroxamic acids such as trichostatin A (TSA) and suberoylanilide bishydroxamide (SAHA) for HDAC [27, 283]. Epidrug designs have targeted HDAC inhibitors such as SAHA and romidepsin for refractory cutaneous T cell lymphoma [284, 285], belinostat for peripheral T cell lymphoma [286, 287], or panobinostat for multiple myeloma including decitabine as DNMT inhibitor for hematological malignancies such as myelodysplastic syndromes, acute myeloid leukemia and chronic myelomonocytic leukemia [220, 288, 289].

Numerous epigenetic biomarkers with cancer detection, diagnosis, and/or prognosis capability have been identified [290, 291]. However, their clinical availability is low. Lack of independent validation and variable experimental designs in multicenter groups hindered the advance of translational studies to convert the markers to clinically useful tools [292]. The lack of validation also hinders the availability of easy and affordable testing for cancer [290].

Tumor epigenetic heterogeneity

Heterogeneity of tumors could occur among patients, in the same patient of multiple tumors with the same origin or within a tumor subpopulation, which is called inter-patient heterogeneity, intra-patient heterogeneity, or intra-tumor heterogeneity [23, 293]. As a survival mechanism in various environmental conditions, DNA modification among individual cells is an important epigenetic factor that can regulate phenotypic heterogeneity [294, 295]. Substantial heterogeneity in expression is found even among morphologically indistinguishable cells, which play an important functional role in tissue biology and disease states such as cancer [233].

In human cancer, epigenetic aberrant changes occur more frequently than gene mutations [23, 296, 297]. However, the majority of cancer research focuses on the genetic bases, particularly mutational activation of oncogenes or inactivation of tumor suppressor genes (TSG) [23]. In several lineages of tumor cell differentiation programs, epigenetic mechanisms are integral parts and have a potential molecular link between cancer, stem cell biology, and drug resistance [24].

The level of methylation heterogeneity was found to be correlated with times of relapse-free and overall survival in 79 intra-tumor colorectal tumors [298, 299]. Abundant evidence supports that tumors are frequently composed of heterogeneous cell types to which drug resistance appears to be linked [300, 301] and the role of epigenetic mechanisms for mediating drug resistance in subpopulations of cancer cells has compelling evidence [24, 302] (Figs. 17, 18).

Fig. 17
figure 17

Tumor cell heterogeneity results in a drug-tolerant phenotype of the tumor. A Selection of a subset of DTPs after treatment. B Epigenetic changes mediate the transition between drug-sensitive to drug-tolerant states. Image adapted from [24]

Fig. 18
figure 18

Identification of methylated cytosine residues using solid nanopore synthesized from 2D graphene or MoS2. Image adapted from [351]. a Discrimination of C and mC structures with the help of MBD1 protein. The methylation occurs in the fifth carbon position of the cytosine ring structure, and most of the mC nucleotides are found in the CpG island region of the gene. b Diagrammatic detection model of the mC during nanopore sequencing of DNA. The identification is based on utilizing ionic current differences obtained from the application of the required voltage

Mapping epigenetic heterogeneity in tumor

Roles of epigenetic sequencing in tumor heterogeneity

When we looked at the physiological functions of the TET proteins and their mechanisms of regulation of DNA methylation and transcription, out of the three TET genes TET1 and TET2, expression levels were shown to be low in hepatocellular carcinoma (HCC) tissues [303, 304]. Studies have also revealed that global genomic 5hmC levels are down-regulated in HCC tissues and cell lines [305, 306]. For designing early detection and therapeutic strategies, 5hmC signatures found in HCC tissues and in circulating cell-free DNA are important [305]. Functions of 5-hydroxymethylcytosine (5hmC) in gene regulation and cancer pathogenesis were studied by using sequenced cell-free 5hmC obtained from 49 patients with seven different cancer types. The finding showed that distinct features are available to predict cancer types and stages with high accuracy. The study also suggested that cell-free 5hmC signatures may potentially be used to track tumor stages in some cancer types [307].

Cancer-associated 5hmC signatures were identified in cfDNA [308, 309]. The signatures are characteristics for specific cancer types which are highly predictive of colorectal, gastric, lung, and pancreatic cancers [307, 308]. This marker has also great potential for diagnosis and prognosis of cancer from an analysis of blood samples [308, 310]. So, excelling on conventional biomarkers comparable to 5hmC is further required.

Conventional sequencing methods in epigenetics

DNA methylation can be assessed by: digestion of DNA with chemical conversion (bisulfite reactions), methyl-sensitive restriction enzymes, and affinity enrichment of methylated DNA fragments [311, 312]. A strategy that could distinguish 5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxylcytosine from 5-methylcytosine is important, and many strategies have been developed with their advantages and limitations [236]. Methylation sequencing and/or microarray-based profiling strategies work with NGS techniques [313]. All the epigenetic sequencing methods to map the 5mC need to work with next-generation sequencing that gives the chance to long-read sequencing both for DNA and RNA and they can directly read out the modifications at once [314].

Bisulfite sequencing (BS-Seq) is based on the reactivity difference between methylated cytosine and unmethylated cytosine brought by bisulfite treatment that deaminates unmethylated cytosine to uracil (U), while the methylated one preserves itself [315], so that, during PCR amplification, methylated cytosine remains cytosine, while unmethylated cytosine would be read out as T [314].

Though the base-resolution bisulfite method is the one taken as a gold standard, so far, it had flaws because of the harsh chemical treatment nature, degrades the majority of the DNA, and limits the library of generated epigenetic sequencing [316]. Bisulfite sequencing has many integral faults starting from missing to distinguishing between 5mC and 5hmC [317]. Bisulfite sequencing also provides combined signals such as reduction of sequence complexity leading to low mapping rates, uneven genome coverage, and inherent biases [314, 318]. Those drawbacks occur because 95% of the total cytosine in the mammalian genome is converted to thymine [314]. The most serious problem inherent in base-resolution sequencing and awaiting a possible solution to ameliorate is the degradation of the majority of the DNA during bisulfite treatment and the low conversion efficiency. The bisulfite conversion is also blind to distinguish between 5mC and 5hmc [319].

Alternative to bisulfite techniques, there have been bisulfite-free and base-level resolution sequencing methods like TET-assisted pyridine borane sequencing (TAPS) and are developed for both 5mC and 5hmC [316, 320]. TAPS combines TET oxidation of 5mC and 5hmC to 5-carboxylcytosine (5caC) with pyridine borane reduction of 5caC to dihydrouracil (DHU) [321]. The C-to-T transition completes when PCR converts DHU to thymine and TAPS detects modifications directly with high sensitivity and specificity, without affecting unmodified cytosine [322]. The method preserves up to 10 kilobases long that enable cheaper methylome analysis [316].

Another method based on oxidative bisulfite sequencing (oxBS-Seq) applies the oxidation capability of potassium perruthenate (KRuO4) to produce 5fC and through bisulfite treatment converts into U and the conversion rate is 94.5% [314]. Finally, the 5hmC level and position can be obtained by subtracting the oxBS-Seq from the BS-Seq [323,324,325]. Potassium perruthenate is more damaging than potassium ruthenate, and the latter is more helpful for nanoscale genomic mapping in limited biological and clinical samples [320]. This method is claimed to be able to detect cell-free DNA (cfDNA) of healthy donors and cancer patients, showing base-resolution hydroxymethylomes in the human cfDNA for the first time [314, 326].

Data analysis of methylation needs an efficient tool with bisulfite sequencing datasets, and the recently developed tool BSPAT (bisulfite pattern analysis) has removed multiple/pairwise sequence alignment methods for fast alignment of sequence reads. To make DNA methylation mechanisms and regulation explored, BSPAT summarizes and visualizes DNA methylation co-occurrence patterns [327].

Improvement of the cost along for accessibility and genome coverage of approaches is important especially for those of bisulfite-free methods with base-pair resolution (which are now single-molecule and single-cell analysis) [328]. The methylome’s large portion could be addressed by microarrays and next-generation sequencing technologies at genome-wide levels to generate base-resolution maps of 5mC and its oxidation derivatives in genomic samples [329, 330]. For this purpose, quantitative approaches have been established under bisulfite-based methods like classical bisulfite sequencing, pyro sequencing, etc. [331,332,333].

Before PCR amplification, CpG methylation at the single-base resolution can be determined by methylation-sensitive restriction endonucleases [332]. Affinity-based methods also enrich the methylated areas. But it is difficult to reach the exact site to directly determine. Moreover, the bisulfite method requires DNA denaturation and causes DNA degradation that decreases its efficiency [334, 335]. There are also PCR-caused mapping inefficiencies of bisulfite-treated DNA and bisulfite conversion rates to be considered [311].

The complexity of library preparation and incomplete chemical conversion biases increased due to the bisulfite used to convert unmethylated cytosines to uracil [25, 336]. Illumina-based sequencing fails short of short-read lengths that hinder allele-specific methylation. On the other hand, PacBio long-read sequencing lacks high sequence coverage, limiting it from sequencing the methylated nucleotides. However, Oxford Nanopore sequencing is becoming the most advanced to fit into the situation [25].

Nanopore sequencing for epigenetic tumor heterogeneity

Nanopore sequencing advancing epigenetic mapping

Methylation of DNA is one of the commonest epigenetic modifications that can be used in epigenetic mapping [337, 338]. Methylation also plays a vital role in mammalian gene cell expression [339, 340]. These roles include cell development, aging, and regulation of tumor suppressor genes [341,342,343]. However, most DNA sequencing technologies are unable to differentiate methylated and unmethylated nucleotides in a DNA strand [25, 344]. However, the discovery of the Oxford Nanopore MinION sequencer allows the sequence of methylated regulatory marks without special sample preparation, and with long-read single-molecule nature [345]. This feature makes MinION easier to study allele-specific methylation in heterogeneous cancer samples [25, 54, 346]. Limitations such as multiple nucleotides signal due to at a time entry of 5 nucleotides into the pore and current overlapping of methylated and unmethylated bases are identified [186]. Those drawbacks are resolved upon designing base-caller computational hidden Markov model (HMM) software [64, 347]. Based on the visibility of different current distributions, the software allows distinguishing three modified cytosine (C, 5mC, and 5hmC) and two modified adenine variants (A and 6-mA) [348,349,350]. Despite the incorporation of HMM, clear detection of DNA methylation by solid nanopore sensors constructed from two-dimensional (2D) graphene or molybdenum disulfide has also widened the validity of the process [351]. Furthermore, to detect the mC nucleotide upon passing through the nanopore, labeling of DNA methylation site by an adaptor of methyl-CpG-binding domain proteins (MBD1) is also mandatory (Fig. 19).

Fig. 19
figure 19

Direct reading of DNA methylation by nanopore sequencing. The ionic current is changed as single-stranded DNA passes through the pore; having a methyl group and small changes due to methylation are interpreted by a new set of algorithms. Image reprinted from [25] with permission of the publisher (Request ID: 600061678, 27 Nov 2021)

For studying CpG methylation and chromatin accessibility on long fragments of DNA, nanopore sequencing allows detecting sequencing difficult regions for characterization of genomic elements such as repetitive elements [352, 353]. Looking for the CCN1 gene (a poor prognosis correlated gene in colorectal cancer), methylation heterogeneity was observed in three enhancer regions with the highest activity in Enhancer 3 which is responsible for CCN1 up-regulation. The only way to decipher this is using the long-read nanopore technology [346]. By using nanopore sequencing data, the most complete human methylome is produced through long-read chromatin accessibility measurements (nanoNOMe) paired with CUT and RUN data [354, 355]. The hypomethylated region is extremely inaccessible and paired to CENP-A/B binding [354]. However, long reads interrogated allele-specific long-range epigenetic patterns in complex macro-satellite arrays existent in X chromosome inactivation can be deciphered. This single-molecule measurement clustered read based on the methylation status of epigenetically heterogeneous and homogenous provides a framework to investigate the most ambiguous regions of the human genome [354].

Augmenting the DNA bisulfite method with high-throughput sequencing technologies has widened the range to genome-wide DNA methylation than limited to CpG sites and CpG islands [356, 357]. Genome-wide DNA methylation studies show differential methylation at the genomic sites like promoters, CGIs, and respective elements [358]. Those differential methylations are sources of various clonal cell populations that create heterogeneity [359, 360]. The easiest method to identify modifications has a positive impact on epigenetics and excellent reproducibility and correlation with bisulfite sequencing. Suggestions are saying that nanopore sequencing could become the gold standard for detecting methylation patterns. As the short-read bisulfite sequencing demands differential methylation assessment, statistical methods which we lack now in long-read sequencing extend even to allow nanopore sequencing modifications in haplotypes [77, 361].

MethyQA software package solves the glitch that occurs when the unmethylated cytosine is converted into U and T while using the bisulfite conversion technique [360, 362]. Alleviated by this software, NGS technologies can output the methylation sequencing data having quality issues like: low per-base sequencing at the 3′ end, PCR amplification bias, and low bisulfite conversion rates [362, 363].

5hmC detection limitation deterred the assessing of 5hmC physiological functions and its role in demethylation pathways [364]. The limitation also affects the deep identification role performed by 5hmC: location, regulation of transcription, replication, and epigenetic reprogramming [365]. So, such determination of 5hmC functions demands the development of single-molecule DNA sequencing technologies for which nanopore sequencing best fits [365, 366].

Accuracy measurements for the detection of epigenetic modifications through nanopore sequencing

Out of the discussed methods above, the Oxford MinION nanopore sequencing model with HMM (hidden Markov model) is reported to have the capacity to differentiate among all the modified bases of Cytosine [63, 347]. With better improvements of HMM, HMM-HDP (hidden Markov model with hierarchical Dirichlet process) model has been developed, incorporating accuracy measurements of the modified bases detected by MinION sequencing (Fig. 20a–d) [64, 348, 367]. The model discriminates among all five C5 cytosine variants based on ionic current measurements from low throughput nanopore sensors [368]. In HMM-HDP, the base modifications are detected as changes in the ONT-MinION’s ionic current signal. MinION frequently records ionic currents to divide them into segments called events. The design models each event as a nucleotide striking of length called K-mer [369]. Each K-mer has an alliance with a distribution of ionic currents in Picomas (PA). The individual C, mC, and hmC bases are classified from the synthetic nucleotide regions to measure the accuracy of detections through a change of ionic current signal. After detection of changes in the model, the distribution of the ionic current signal has to be measured to determine segregational strength (Fig. 20e–h). The model also incorporates mapping of 5mC from CC(A/T) GG motifs and 6 mA from GATC motifs using E. coli genomic DNA [367].

Fig. 20
figure 20

a–d Accuracy result of the MinION detection of cytosine methylation variants found in the synthetic oligonucleotides. Outputs from the classification of 6,966 C, 294 5mC, and 467 5hmC strands were sequenced in similar MinION flow cells. a Pre-read accuracy distribution results expressed by comparing normal distributions as Maximum-likelihood estimates (MLE) and HDP model distributions. Distributions are shown by triangles. b Across all cite three-way classification (C, mC, and hmC) of the template and co-template reads. c Confusion matrix showing the performance of HMM-HDP three-way cytosine classification on template reads of synthetic oligonucleotides. d Correlation between the log-odds of correct classification and the mean pairwise Hellinger distance between the methylation statuses of the 6-mer distributions overlapping a cytosine. eh Variation between the ionic current distribution and effect of reading quality for left (6 mA in GATC) and right (5mC in CC(A/T) GG) motifs. The ionic current distribution between A and 6-mA (e) and C and 5mC (f) has shown a difference. Ionic current levels from 100 alignments are shown as a histogram g for A and 6-mA and h for C and mC. Learned probability densities of HDP are shown as curves. Image is Reprinted from [348] with permission of the publisher (Request ID 600061677, 27 Nov 2021)

Single-cell tumor epigenetic mapping using nanopore sequencer

The field of single-cell epigenomics is in its infancy. But, due to the increasingly recognized importance of intercellular heterogeneity in tumors with the rapid pace of technological development, it is expected to show enormous progress over the next few years [370]. Single-cell epigenomics incorporates epigenetic profiling with the isolation of single-cell, barcoding it, and high-throughput sequencing of the isolated cell genome [371]. Since epigenetically modified genes are shown in most cancer cells, it is essential to use simple and lower-cost methods to identify these modifications [372]. Nanopore sequencing with recently upgraded technologies has been the easier and preferable method to detect the epigenetic modifications that occur in a specific cancer type of various organs [373].

Deletions, amplifications, inversions, and translocations of nucleotides in a DNA sequence are the four DNA replication-related causes of gene mutations. Nanopore sequencing can be used to detect the heterogeneity of tumors as a result of these changes, which led to the anticipated alterations during epigenetic modifications [391]. Additionally, nanopore sequencing is highlighted as one of the primary areas of focus for the next-generation approaches to understand the heterogeneity of cancer [392].

Beyond previously accolade genetic alterations, tumor heterogeneity derived by epigenetic reprogramming causes drug-resistant subpopulations of tumor cells [374]. It shows the need for single-cell epigenetic technology capacity to truck drug-induced tumor evolution for the timely intercession of the treatment [293]. In hepatocellular carcinoma, identification of the modification status of tumor suppressor genes using nanopore sequencing showed around 10 potential tumor suppressor gene candidates and the glucokinase gene, more validated to involve in HCC development [375]. Nanopore sequencing allows whole-genome sequencing with the possible identification of epigenetic modifications in lung cancer cell line LC2/ad gene [376]. It also allows the detection of epigenetically modified genes in various cancer types (Table 5).

Table 5 Nanopore sequencing for epigenetic modification study of various cancer types

Main results in the epigenetics-cancer field that nanopore technology allowed

Nanopore sequencing (NGS) is still in its infancy as a tool for cancer research, and applications in molecular cancer research are particularly lacking. Of course, NGS technologies are more suited for use in the investigation of fields like plant science and microbiology. However, employing cell lines as a study medium is gradually being applied to human samples [395,396,397]. Even if the sequencing overcame several obstacles, there are still opportunities for improvement and benchmarking computer techniques for detecting whole-genome DNA alterations [398]. It suggested that there was a pressing need for the benchmark to be able to predict CpG methylation in multiple genomic contexts, particularly those including genes involved in tumor heterogeneity and tumor suppression.

The epigenome pattern on copies of DNA segments has been employed as a harbinger endeavor, and nanopore sequencing is still being used alongside the old standard methods midway. These patterns are determined by nanopore sequencing and allow the assignment of reads of haplotypes to enable chromosome-level allele-specific profiles of CpG methylation and chromatin accessibility on four human cell lines (GM12878, MCF-10°, MCF-7, and MDA-MB-231), which are determinants of nucleosome positioning and DNA accessibility. Then, the application of nanopore sequencing was expanded to find heterogeneity in breast cancer model tumors [346]. Due to its capacity to recognize and sequence nucleotides even when they have little alterations, nanopore sequencing is evolving into a standard to rule the sequencing market [399]. It is hoped that future methylation-mapping-complete software, like NanoMetPhase, would offer a second deep signal for the detection of 5mC and 6 mA. The software employs 2 × coverage to find any DNA base methylation states that are reliable markers for the more accurate detection of tumor heterogeneity [400]. This upbringing will support the parallel implementation of nanopore-based computational and experimental application methods.

Oxford Nanopore can be used for whole-genome sequencing to identify insertions, deletions, inversions, and intrachromosomal translocations in liver cancer, which could then be used for epigenome analysis as the instrument allows for parallel genome and epigenome sequencing to determine the complex heterogeneity and variation of tumor cells [401]. The magic of this pocket-size nanopore sequencing device was tested by sequencing simultaneously on the same day the genome and epigenome of the low-pass whole genome to generate diagnostic copy number (CN) and methylation profiles from the same sequencing run. That is the beginning of the explosion of using nanopores for important molecular classifications in cancer for better diagnosis, prognosis, and treatment decisions in clinics [75]. Another study discovered that nanopore Cas9-targeted sequencing (nCATS) is more effective at detecting isocitrate dehydrogenase 1 and 2 (IDH1, IDH2) and O6-Methylguanine-DNA methyltransferase (MGMT) mutations and methylation status in diffuse glioma in 36 h [402]. The combination of Cas9 mutation and library creation for sequencing appears to be the most effective coupling currently available, and it could aid in identifying single-nucleotide variants (SNVs), structural variations (SVs), and CpG methylations [403]. In order to enable long-range amplification and nanopore sequencing, the BRCA1 breast cancer gene's body and flanking regions are isolated from peripheral blood cells using the Cas9-assisted targeting of chromosomal segments (CATCH) method. It is reasonable to assume that this technology will eventually be available in medical offices and patients' pockets [404]. It is crucial to sequence the epigenome of tumor-specific LINE-1 insertions and their retrotransposon signatures because CpG methylation controls the transposable elements (Tes) involved in the evolution of tumor growth. [405]

Nanopore whole-genome sequencing for intraoperative neuropathological classification has been found to improve practical intraoperative diagnostic accuracy impacting surgical decisions [406], so that with the previous data accumulated for epigenomic tumor signatures in whole-genome analysis done using the chemical methods and Illumina are now the background to bounce up along with nanopore sequencing soon.

For the high-level identification of epigenetic heterogeneity in cancer, nanopore sequencing is generally on the way to link with nanostructural components/materials such as glass nanopipettes, nanostraws, carbon nanotube probes, and other nanomaterials [381]. By constructing channels between the intracellular and extracellular portions of the cell membrane, these nanocomponents facilitate the sequencing by enabling single-cell sampling [382]. An application of bisulfite sequencing to a single-cell level, similar to these nanocomponents, addresses inter- or intra-heterogeneity of tumor cells with significant DNA degradation [382]. To accurately identify the heterogeneity of genes in future cancer treatments, it is therefore advised to research on the combination of nanopore sequencing, nanostructure components, and bisulfite sequencing or direct sequencing.

Conclusion and future perspective

Epigenetics is a significant gene regulator that necessitates thorough sequencing. The multi-omics-based medicine of the future will not be complete without sequencing epigenetics, particularly in the context of cancer biology. Furthermore, research and individualized, evidence-based medical services would benefit from using epigenetics as a biomarker for diagnosis and as a pharmaceutical target. The heterogeneity of cancer is influenced by epigenetics, which makes epigenetic sequencing crucial. Conventional methods have been used for sequencing up until now, but in the future, nanopore sequencing will be a more specialized method. According to earlier research, the Oxford Nanopore sequencer is the best method for advancing both genomic and epigenomic sequencing and has more advantages over rival sequencing technologies when presenting epigenetics in the multi-omics space. Moreover, Oxford Nanopore Technologies, which permits direct sequencing without the need for a lot of reagents, is better suited than any other sequencing device for exploring the roles of epigenetics in cancer heterogeneity.

In the multi-omics age, the Oxford Nanopore sequencing technique will be highly effective in presenting one arm of epigenetics and the other arm of genomics. Oxford Nanopore sequencing is a quickly developing method that is fiercely challenging Illumina's sequencing technology. Due to its reduced size and price, Oxford Nanopore sequencing is predicted to overtake Illumina sequencing technology with several advantages. Consequently, a single nanopore sequencing platform may perform epigenomics, genomics, transcriptomics, and proteomics.

Finally, future cancer medicine studies will need to take into account the incorporation of different nano-biomaterials with nanopore sequencing technologies in order to detect epigenetics in cancer in a more accurate manner. The clinical viability and delivery mechanism must be taken into account by the nano-combined sequencing procedures in addition to the incorporation of biomaterials.