DNA sequencing with nanopores from an ab initio perspective
- 697 Downloads
Advances in materials research means that we find ourselves at the verge of constructing nano-scale devices capable of electrically addressing individual molecules in order to identify or utilize their electrical or electromechanical properties. An important application in life sciences would be electromechanical translocation of a DNA molecule through a nanopore, between nano-scale electrodes, allowing to electrically read out the base sequence (genome). This approach promises to drastically lower the cost per genome, allowing for extensive application in medical diagnostics. Owing to the involved extremely small dimensions which require nanometer-resolution in the fabrication, atomistic modeling plays a crucial role in testing hypothetical device architectures for their performance in nucleobase distinction. First-principles simulations are ideally suited to explore the interactions involved in such scenarios and lay the foundation for electronic transport calculations. This role of computations is even more important here, since it is experimentally not possible to observe directly the kinetics occurring during translocation of a DNA molecule through a nanopore. Here, we provide a brief review of the state of the field, focusing on ab initio studies of nanopore-based DNA sequencing, in particular on the promising recent development regarding graphene nanopores and nanogaps.
Experimental challenges: theoretical chances
The fabrication of a nano-scaled device to rapidly identify the entire base sequence in a genome can be regarded as one of the most challenging tasks for materials science today. One of the main difficulties lies with the formation and alignment of electrodes that would be sufficiently sharp to couple to the nucleotide units of a DNA molecule which is guided through a tiny opening in a membrane, a so-called nanopore. To appreciate the involved challenge, it should be kept in mind that, e.g., the human genome contains three billion base pairs, and that adjacent nucleotides in either one of the two strands that make up the double helix are separated by less than one nanometer (about 3.4 Å in double-stranded DNA; a larger separation in the more flexible single-stranded DNA).
Significant research efforts have been undertaken in the past decade to realize this ambitious goal , but despite appreciable progress, no ultimate success was reached yet that could claim to electrically sequence DNA in the envisioned fashion. While the initial device design was built around nanopores in an insulating material such as silicon nitride (Si3N4) with embedded gold electrodes, the focus has shifted very recently to the material at the center of the 2010 Nobel Prize in Physics: graphene [2, 3, 4, 5, 6]. Indeed, the one-atom-thickness combined with its electrical properties could make graphene an ideal system to scan one nucleobase at a time.
Owing to the extreme challenges faced in fabricating a setup capable of electrically sequencing DNA, computational materials theory in the form of first-principles calculations fills an important need in simulating possible device architectures and evaluating their potential performance as nucleotide sensors. The involved dimensions of the system (typically several nanometers) allow on the one hand to treat the core parts entirely from ab initio; on the other hand, a wide range of computational methods is required to cover all relevant aspects (dynamics, electronic structure, transport). This review aims to summarize and discuss the role of a few selected first-principles investigations in the field of nanopore-based DNA sequencing. To present those efforts in the proper frame, we will first briefly introduce the benefits which a novel approach to whole-genome analysis could offer and discuss some of the landmark experiments carried out in this direction. The remainder of our review is then dedicated to the first-principles evaluation of nanopore-DNA systems, focusing on the used methods and certain obtained results. We conclude with a summary and a short outlook on what the future in this fast-moving field may hold.
The genetic code
All known life is running on complex algorithms encoded in its respective genome. Since the discovery of the DNA structure , much has been found out about the significance of certain base sequences, however, vastly more remains to be understood before we can claim a true insight into the source code that runs us and other bio-machines.
What is holding us back from reaching a deeper understanding? To a large extent, it is the cost factor associated with determining an entire DNA sequence of one individual (human or otherwise). An orders-of-magnitude lower cost would mean that large-scale studies with many participants could be conducted in which the base order from the whole genome could be compared via data mining with disease histories and lifestyle-circumstances as well as -choices. Such studies could reveal close links between certain sequence patterns (including mutations) and the risk to fall ill with a particular disease. Preventive countermeasures could be taken early on to avoid the outbreak of the disease, or, if a patient has already become ill, personalized medicine could be applied that would be tailored to the specific genome to minimize negative side effects and maximize the intended effect of the drug. Staying with the bio-machine analogy, it becomes clear that any error (bug) in the code can lead to dramatic negative consequences in the long run.
For that reason, scientists are looking actively for new ways to sequence DNA faster and for less money than what is currently possible. State-of-the-art sequencing machines have become incredibly fast and the associated price tag for a whole genome has considerably dropped over the years. Still, the cost is too high to allow for large-scale studies involving whole-genome analysis from many individuals. Likewise, a reduction in the cost per DNA sequence would highly benefit studies that aim to deconstruct the genomes of admixed populations, because researchers in the future need to analyze many thousands of genomes to detect weak natural selection signals in the patterns of single-nucleotide polymorphisms, or SNPs.
Alternative and cheaper methods to sequence a genome center around pulling single-stranded DNA through a nanopore. The potential benefits of this approach are manyfold. E.g., nanopore-based sequencing is truly a single-molecule technique, meaning that one can sequence in principle an individual DNA molecule, without any need for amplification. Furthermore, it allows for simple DNA library preparation. Finally, and perhaps most important, the long read length possible in nanopore-based DNA sequencing (as compared to shorter read lengths in traditional methods) means a tremendous reduction in the bioinformatic efforts of assembling the genome.
Sequencing with nanopores: the principle
The remaining challenge is then to design a read-out mechanism that can identify the nucleobases as they pass through the nanopore (just like the magnetic read-head does in a tape player device, to use this analogy one more time). Here, different proposals were made: measure the ionic current blockage, or, embed electrodes inside the nanopore to measure the transverse tunneling current. In this focused review, we will only constrain ourselves to the second proposal, referring the reader to some very comprehensive review articles [15, 16, 17, 18] which already discussed the ionic current approach.
Two seminal experiments ought to be emphasized here. The first one, by the group of Kawai , provided important experimental evidence that DNA sequencing through measurements of electron transport may indeed be possible. They reported the electrical detection of single nucleotides using two configurable nanoelectrodes and demonstrated that electron transport through individual nucleotides occurs via tunneling. They statistically identified the different nucleotides based on their characteristic electrical conductivity. The second one, by the group of Lindsay , was making an equally important experimental demonstration. In their measurements, nucleosides were diffusing through a 2 nm electron-tunneling junction. This resulted in short current spikes (<1 ms) exhibiting a broad distribution of maximum current values. Very interestingly, when one functionalized one of the electrodes with a molecule that can trap nucleosides via H-bonds, this current distribution narrows by an order of magnitude. When even the second electrode is functionalized, the contact resistance to the nucleosides is reduced, and one can identify them via their peak currents. The order of the peak currents (highest for A, followed by C, G, and finally lowest for T) has been successfully predicted by first-principles calculation.
The work on nanopore-embedded electrodes concentrated mainly on gold electrodes [21, 22]. A main challenge with nanopore-embedded electrodes is however to make them sharp enough to couple to only one base at a time. This requirement for single-nucleobase resolution appears to be mandatory to determine the sequence. With gold, it might be difficult to imagine fabricating sharp enough electrodes (and embed them in the nanopore) for this purpose. Therefore, it was a tremendous step forward when at the beginning of 2010, 1 Postma  suggested to utilize graphene nano-electrodes for the DNA sequencing purpose. Graphene has the obvious advantage of being merely one-atom thick, thus offering the best possibility to electrically couple only to one base at a time. Postma’s idea was centered on employing graphene in a double function: as the separating membrane and (by fabrication of a narrow gap) as the electrodes, thus avoiding the difficulties of embedding and alignment.
Merely half a year after the proposal of using graphene for DNA sequencing, three papers by three research teams appeared [24, 25, 26], independently demonstrating an important proof-of-principle, namely that DNA can be translocated through nanopores fabricated in graphene (the fabrication of nanopores in graphene having been previously mastered [27, 28]). Very interestingly, it was later also demonstrated that DNA can be translocated trough stacked layers of graphene and Al2O3 , apparently resulting in a better signal-to-noise ratio.
Molecular electronics sensor device: design, advantages, and operational principle
Measurement of electric current in the molecule or nanowire provides significant information about its structure, state and environment, which can be used in molecular sensor applications. Interchanging target molecules between the same pair of electrodes would provide information about the differences in molecular structure, enabling differentiation between molecular species. This is the case in the nanopore DNA sequencing setup  studied by our group, when a DNA strand is driven between two electrodes guided by the nanopore, and the distinction between nucleotides (four DNA bases) is made on the basis of the current measurement. In another setup the target molecule is adsorbed on a pre-existing probe (molecular or nanowire) connecting two electrodes [38, 37, 38, 39, 40, 41]. Here, the effect on the structure should be enough to produce measurable change in the current upon adsorption of the target and the probe has to be designed in a way to allow only specific targets to bind.
DNA sequencing in a molecular electronics setup has, however, specific challenges to be solved. In the idealized situation, where the single DNA base could be imagined weakly connected between a pair of electrodes, the expected IV curve at low temperature will represent a step-function, with no current flowing at low bias and steep current offset at the bias corresponding to the nearest of the HOMO and LUMO pair entering the voltage window. Such a setup can be thought of as an idealized model for an STM scan of a ssDNA molecule stretched out on a conductive surface. The distinction between different bases could be made by measuring the position of each current offset, in other words the scan over a voltage range for each base is required. This is in contrast with the desire to make a single measurement per base while the DNA strand is sliding between the contacts and through the nanopore. Surprisingly, in the more realistic model this difficulty is somewhat relaxed.
Introducing functional groups helps to differentiate between different bases in the sequence, and is also expected to decrease current fluctuations by stabilizing the molecular geometry. However, introducing the additional molecular unit decreases the tunneling current and requires a more sensitive measurement setup. With an elaborated setup it is expected to be possible to directly measure the position of the molecular states via the current on-set, through differential conductance measurement. This method has additional advantages over the current (conductance) measurement: the moment when the base enters (or exits) the pore, is seen in the differential conductance spectrum, thus allowing for use of thicker electrodes. Also close-lying LUMO states of adenine and guanine can be avoided in favor of states lying further away from the Fermi level. Interestingly, the distortion of the target base with the functionalization groups or molecules works in about the same way for the differential conductance measurements as for the direct current/conduction measurement . The characteristic molecular orbitals can be shifted to the very similar, non-distinguishable energy at the cost of yielding very different differential conduction values ranging from negative differential conductance, through very low absolute values due to very smooth conduction peaks, to a steep current offset.
The advantage of these setups is not only in subnanometer size of the probe or sensitivity, although already an ability to scan a single-DNA molecule for its nucleotide sequence is remarkable. The advantage over traditional chemical analysis is the possibility to constantly, in real time monitor the signal (current measurement) from the probe. This should be compared to a set of (often complex, but possibly automated, as for traditional DNA sequencing) manipulations, which yield a spectrum of the target, allowing for determination of the target structure, but only for the moment of the actual measurement. Often the differentiation between two targets relies on monitoring the color change, which is a much more difficult task than monitoring the electric current.
Developing of ME for sensing opens vast perspectives in monitoring the processes (real time) in the vanishing (size) volumes, including, e.g., living cells. Considering the micrometer-size of the cells and the nanometer-size of the probe, it provides an interesting alternative to existing methods for in vivo monitoring. Targets for pH sensors are plasma around growing cancer cells, blood acidity for diagnostics of diabetes. ME sensors would provide momentary readout and eliminate fluorescent probe molecules used traditionally, which can potentially coordinate to proteins in the cytoplasm, providing false readout. Monitoring the environment (seawater acidity) is another possible application, where both logistics (considering the needed size of the probes) and accuracy could be improved.
DNA sequencing with graphene
In one of the above sections, we discussed recent experiments which led to the demonstration that DNA molecules can be pulled through nanopores fabricated in graphene. Although there is no doubt, the translocation events of DNA can only be indirectly identified via their temporary blockage of the ionic current through the nanopore. In order to understand the translocation dynamics at the atomic level and use the thus gained knowledge for guiding the design of DNA sequencing devices based on graphene nanopores, molecular dynamics (MD) simulations are extremely useful. In a work by the research group of Schulten  such an investigation was carried out in which the effects of applied voltage (4.3, 2.5, and 0.8 V), DNA conformation (in the form of partially folded DNA) as well as positive or negative pore charge (+0.1 e or −0.1 e, respectively, distributed equally over the edge carbon atoms) on the kinetics of DNA translocation through the 2.4-nm wide graphene nanopore were studied. A very interesting feature observed in the simulations was that of significant graphene membrane fluctuation which were seen to be of the same magnitude as the nanopore thickness. The spatial resolution is obviously negatively affected by these fluctuations which are caused by collisions of graphene with K+ and Cl− ions (fluctuations become larger when the ionic concentration is increased) and with water molecules (fluctuations become larger when the temperature is increased). Although the authors suggest that it may be possible to use graphene nanopores for distinguishing between A–T and G–C base pairs in double-stranded DNA, it should be noted that the simulations on which such conclusions are based considered polymers of 20 repeated A–T base pairs and 20 repeated G–C base pairs, respectively. While such homogeneous polymer model systems are quite common in MD simulations of DNA translocation through nanopores, the results do not imply that the sequence of base pairs in a heterogenous double-stranded DNA molecule could be deduced in this manner from the ionic current blockage alone. In any case, the physical effect behind different blockage currents appears to be that for certain bias voltages, poly-AT stretches (or tilts) more than poly-GC, thus permitting a larger ionic current in the former case.
It is also typical for MD simulations that the bias voltage is much larger than what is applied in experiments (here, e.g., 0.8 V, the smallest considered voltage, was still 4–8 times larger than what is commonly found in experiments [24, 25]). More realistic voltage values would lead to such a slow DNA translocation that the required MD simulation time for the translocation of a 45-base-pairs-long DNA molecule would be beyond what is typically computationally affordable (translocation of three base pairs was in fact simulated and took 50 ns). In the simulations, it was clearly observable (despite significant signal noise, which require averaging over) that the ionic current is reduced during the time that DNA translocates through the graphene nanopore and, once DNA has passed completely through, returns to its open-pore value. The blockage by DNA is found to be more effective (ionic current reduced by over 50 % compared to its open-pore value) for lower bias voltage, presumably because DNA is less stretched than for higher bias voltages, and thus covers more cross-sectional area of the pore.
When partially folded DNA is translocated through the pore, some evidence for a double plateau in the ionic current can be observed from the simulations, in agreement with experimental observation  and interpretation, namely that the folded part of DNA due to its larger diameter blocks proportionally more ions. Because of the naturally occurring negative charges on the DNA backbone, an equally negatively charged pore leads to an effectively reduced pore diameter by way of Coulomb repulsion. This leads to a stretching of the DNA which allows more K+ ions to move through the pore, but since they move in the opposite direction than the negatively charged DNA, this creates a drag, effectively slowing down the DNA translocation speed. This is a very important finding, since control over translocation speed is a major goal in nanopore-based sequencing, and the findings suggest that charging of the pore could be a feasible approach.
One shortcoming of the study , which the authors readily admit, is that the graphene membrane was not allowed to undergo polarization. This is, however, a likely consequence of the effect from the charged DNA molecule onto the delocalized electrons in graphene. Future studies should indeed consider this effect, as it can affect the force exerted on the DNA and (even more importantly for the purpose of DNA sequencing) potentially be used as another signal source to distinguish between different nucleotides.
Prezhdo and co-workers  studied the sensitivity of electrical conductance toward different nucleobases when DNA translocates through the nanopore in a graphene nanoribbon. This study represents probably the first attempt to look into the interaction of DNA with a graphene nanopore from first principles. In contrast to the previously discussed molecular dynamics study, this ab initio study only considered individual nucleobases inside the nanopore of a graphene nanoribbon. Such an approach is quite typical for first-principles investigations due to the higher computational cost associated with such studies.
It is also quite typical that the calculations omitted counterions and water molecules. While absolutely essential in MD simulations to properly treat the kinetics of the system, for ab initio calculations in which no dynamics are considered, the purely electronic effects can apparently be described without including the solvent environment.
The authors identified two criteria for DNA sequencing to be achievable: first, the conductance spectra G(V) of individual nucleobases needs to be sufficiently different, and second, the conductance of a nucleobase inside the pore should not significantly depend on its orientation since any orientation-dependence of the signal would drown out the latter by the resulting noise.
The authors claim that the conductance does not change much when nucleobases are reoriented in a way that simulates extremely different configurations that could randomly be assumed during the dynamics of DNA translocation. This could be related to the assumptions made in which the transmission probability has been fixed to one .
The paper from Saha et al.  looked again at the conductance change due to different nucleobases inside a nanopore in a metallic graphene nanoribbon with zigzag edges, but furthermore tested different possibilities for the termination of the dangling bonds in the pore. In addition to hydrogen, also nitrogen was considered. Since the local current density is concentrated around the edges of the zigzag graphene nanoribbon, the presence of the nanopore does not lower the conductance significantly, which leads to larger currents (microampere for a bias voltage of around 0.1 V) than what is usually reported in comparable devices (and hence a better signal-to-noise ratio). The detection mechanism of a nucleobase then involves altering the charge density in the area around the nanopore, thus affecting the edge conduction currents.
Finally, a noteworthy original and quite different idea comes from the the research group of Kim , namely that of utilizing a graphene nanoribbon without any nanopore. The detection would again occur via changes in the conductance, but here due to contact made between the nucleobases and the graphene nanoribbon (which possesses distinct conductance characteristics), as the former is pulled through in a nanochannel underneath the latter (like a bridge over a canal). Processing the signal with a special data-mining technique results in a distinction of the four different nucleobases.
Conclusions and outlook
What are the main challenges then for nanopore-based DNA sequencing? Mainly two: to control the translocation of DNA, in particular slowing it down sufficiently to provide enough time for a proper electrical read-out of each base, and second: to improve the electrodes setup in such a manner that the distinction between the different nucleobase types becomes more robust. The major issue for this latter aspect is the signal noise caused by more or less random orientations and fluctuations of the nucleotides between the electrodes.
What can be done to meet those challenges and overcome them, and how can computational theory contribute to any progress? Simulations can provide important insights on how gating could lead to a better control over DNA translocation [47, 48, 49]. Multi-layered structures of graphene and insulating materials could play a very important role in this regard . In this sense, nanopore-embedded electrodes could act as an electronic ratchet, leading to a stop-and-go motion of DNA . Such a ratchet function is also readily achieved by certain enzymes .
Different atoms or functional groups at the pore edges can play a crucial role for the electrical properties  as well as for the dynamic properties . In this sense, functionalization of the graphene edges could be an important tool to favorably influence the sequencing capability, if it could be implemented in a sufficiently straightforward manner.
In the end, what can be said about the role of simulations and first-principles calculations for explaining experimental observations in nanopore-based DNA sequencing? Of course, it is difficult to give a general answer. But, in certain cases, theory was able to correctly predict (or explain) features of the experiment, such as the order of the peak currents in the above discussed measurement by the group of Lindsay . We most definitely learned something from the molecular dynamics simulations, e.g., how the temporary formation of weak H-bonds on edge-hydrogenated graphene electrodes can contribute to a tighter signal distribution and overall higher conductance . This suggests that such simple functionalization can be very powerful to improve the sequencing process. When it comes to designing the simulation, it is important not to leave out important features of the setup. The role of the direct effect of the solvent on the conductance is perhaps still not fully understood, but certainly it is necessary to sample over the different possible configurations that the nucleotide under investigation can assume relative to the electrodes. For that purpose, molecular dynamics simulations are ideally suited, and of course there, the solvent is always taken into account.
As for the future, it remains to be seen whether graphene nanopores and graphene-based electrodes are indeed the best material for nanopore-based DNA sequencing. The current trend is pointing in this direction and we will likely see a lot more exploration of different designs of graphene electrodes that could accomplish improved nucleotide sensing and distinction. Although it may seem impossible at the moment, one should not give up research in this direction, since, to paraphrase Schneider and Dekker  when they commented on a recent success in sequencing with biological pores, that “DNA sequencing using graphene nanopores is currently still science fiction, but so was sequencing with biological pores two decades ago”. Our expectation is that with the rapid progress witnessed these days, we will not have to wait another twenty years to see some breakthrough in DNA sequencing with a graphene-based device.
Or rather, already in 2009, when the corresponding preprint was posted: arXiv:0810.3035.
The success of any large science endeavor these days depends on team work and we would like to acknowledge our direct collaborators here, as well as the fast-growing group of scientists with whom we had the pleasure to discuss about nanopore-based DNA sequencing. Thanks go to, in alphabetical order: Tobias Blom, Gustavo Troiano Feliciano, Roman Gorbachev, Haiying He, Yuhui He, S. Hassan M. Jafri, Shashi P. Karna, Kwang Soo Kim, Klaus Leifer, Ming Liu, Henrik Löfås, Henrik Ottosson, Manuel Melle-Franco, Ravi Pandey, Biswarup Pathak, Henk W. Ch. Postma, Jariyanee Prasongkit, Alexandre Reily Rocha, Stefano Sanvito, and Gregory Schneider. Furthermore, the possibility to carry out research on this fascinating topic was enabled through the generous financial support from various Swedish sources, in particular the Wenner-Gren Foundations, the Swedish Research Council (VR, Grant No. 621-2009-3628), the Swedish Foundation for International Cooperation in Research and Higher Education (STINT), the Carl Tryggers Foundation, and the Uppsala University UniMolecular Electronics Center (U3MEC). Finally, since the calculations and simulations discussed in this article heavily depend on the availability of sufficient computational power, we would also like to thank the Swedish National Infrastructure for Computing (SNIC) and the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) for providing the necessary CPU hours.
- 15.Branton D, Deamer DW, Marziali A, Bayley H, Benner SA, Butler T, Di Ventra M, Garaj S, Hibbs A, Huang X, Jovanovich SB, Krstic PS, Lindsay S, Ling XS, Mastrangelo CH, Meller A, Oliver JS, Pershin YV, Ramsey JM, Riehn R, Soni GV, Tabard-Cossa V, Wanunu M, Wiggin M, Schloss JA (2008) Nat Biotechnol 26:1146CrossRefGoogle Scholar
- 36.He H, Scheicher RH, Pandey R, Rocha AR, Sanvito S, Grigoriev A, Ahuja R, Karna SP (2008) J Phys Chem C 112: 3456 (Preprint: cond-mat/0708.4011)Google Scholar
- 37.Prasongkit J, Grigoriev A, Ahuja R, Wendin G (2011) Phys Rev B 84:165437 (Preprint: cond-mat arXiv:1104.1441v2)Google Scholar
- 43.Prasongkit J, Grigoriev A, Pathak B, Ahuja R, Scheicher RH (submitted) (Preprint: arXiv:1202.3040)Google Scholar