INTRODUCTION

The polymerase chain reaction (PCR) is one of the basic methods, without which it is almost impossible to imagine the work of a modern molecular biological laboratory. One distinctive feature of this method is its versatility, that is, the ability to efficiently synthesize a variety of nucleotide sequences from DNA samples isolated from different organisms. However, there are some applications of PCR for which this method shows reduced efficiency, one of which is the amplification of long DNA sequences [1]. Amplicons larger than 4–5 kb are usually referred to as long. Amplification of such sequences is not among the most routine applications of PCR, but may be required to solve various scientific problems associated with molecular cloning or genetic analysis. Thus, high-throughput sequencing methods allow parallel targeted sequencing of a large number of amplicons, allowing characterization of a selected genome region in a large number of biological samples, which is a relatively simple and cheap alternative to whole genome sequencing [26]. Sequencing of long amplicons makes it possible to analyze complete sequences of eukaryotic genes or even entire genetic clusters [711], as well as complete virus genomes [1215].

For efficient amplification of long fragments, various approaches, up to very exotic ones, have been proposed, for example, based on the design of primers with special tags that provide intramolecular hybridization of amplicons [16, 17], as well as the use of additional enzymes [18]. One widely applicable method for increasing the efficiency of the synthesis of long fragments in PCR is the use of special DNA polymerases [8, 15, 19, 20]. Although the amplification of long fragments is considered difficult for most DNA polymerases, there are currently many variants of these enzymes that are offered by manufacturers as specifically designed or at least suitable for the synthesis of long amplicons. However, as noted in [8], the amount of information on the study of the suitability of some polymerase variant for the amplification of long sequences in the scientific literature is usually small. At the same time, such data can generally be considered more objective than the information provided by DNA polymerase manufacturers.

Many agricultural plants have complex large polyploid genomes. Genomic DNA isolated from plants using standard methods often contains a large amount of various impurities that adversely affect PCR efficiency [21].

The goal of the present work was to compare the efficiency of six DNA polymerases for the amplification of long sequences with the genomic DNA of potato Solanum tuberosum, a plant characterized by a high content of PCR-inhibiting phenols [2224]. In addition, the efficiency of these DNA polymerases for the amplification of the potato virus Y (PVY) sequence from cDNA synthesized from the total RNA of PVY-infected plants was compared.

MATERIALS AND METHODS

Isolation of nucleic acids. DNA isolation from the plant material of potato leaf tissues was performed using a modified CTAB method [25]. Isolation of total virus RNA from the plant material of PVY-infected potato leaf tissues was performed using the ExtractRNA reagent (Evrogen, Russia) in accordance with the manufacturer’s protocol. Quality control of genomic DNA and total RNA was carried out using agarose gel electrophoresis.

Preparation of potato virus Y cDNA. The PVY infection of potato plants of the Vector variety was tested using the PV-001 reagent kit for differential diagnosis and detection of potato virus RNA by the RT-PCR method (Synthol, Russia). Total RNA was isolated from S. tuberosum potato plants of the Vector variety infected with PVY using the ExtractRNA reagent (Evrogen, Russia). Potato virus Y cDNA was obtained using the ProtoScript® II First Strand cDNA Synthesis Kit (NEB, United States) according to the manufacturer’s protocol.

Primer selection. Selection of primers for amplification of full-length sequences of genes of translation initiation factors eIF4E1, eIF4E2, eIF(iso)4E, nCBP was carried out to highly conserved regions in the 5'-noncoding regions of these genes (Fw-primer) and 3'-noncoding regions of these genes (Rv-primer). Highly conserved regions were identified by analyzing the available assemblies of the genomes of the cultivated potato S. tuberosum of varietes MSH14-112 (GenBank: CP046696.1), P8 (GenBank: CP046682.1), and Solyntus (GenBank: CP055236.1), and wild potato Solanum pinnatisectum variety CGN17745 (GenBank: CP047566.1). Selection of primers for amplification of full-length PVY, as well as its fragments, was carried out for PVY regions that are conserved in their nulcleotide sequence in isolates of this virus of different geographical origins (GenBank accession numbers: OUNKS, AB711155.1; T13, AB714135.1; v942490, EF016294.1; JVW-186, KF770835.1; GZ, MN381731.1; SL50V, MW595187.1). The primer sequences are shown in Table 1.

Table 1. Primer sequences used for amplification

Polymerase chain reaction. The composition of the reaction mixture corresponded to the recommendations of the manufacturers of DNA polymerases. To calculate the optimal primer annealing temperature, the Tm calculator from NEB (https://tmcalculator.neb.com/#!/main) and ThermoFisher Scientific (https://www.thermofisher.com/de/de/home/brands/-thermo-scientific/molecular-biology/molecular-biology-learning-center/molecular-biology-resource-library/thermo-scientific-web-tools/tm-calculator.html) applications were used. For Encyclo polymerase (Evrogen, Russia), the primer annealing temperature was calculated using the formula recommended by the manufacturer. PCR was carried out using MiniAmp™ Plus Thermal Cycler (ThermoFisher Scientific, United States). Amplification conditions are shown in Table 2. A pooled mixture of genomic DNA isolated from ten potato plants of the Zhukovsky early variety diluted 10 times with TE buffer was used as a template. The amplification products were analyzed by electrophoresis of 2 µL (for eIF4E genes) or 4 µL (for PVY fragments) of the PCR mixture in 1.2% agarose gel using the 1 kb DNA Ladder marker (Evrogen, Russia) as a standard. All PCRs were performed and the amplification products were analyzed at least in triplicate.

Table 2. Amplification conditions

RESULTS

Amplification of full-length sequences of potato eIF4E translation initiation factor genes. The cultivated potato S. tuberosum contains four genes encoding translation initiation factors eIF4E: the basic eIF4E1, eIF4E2 the most homologous to the basic one characteristic of various representatives of the Solanaceae family [26], eIF(iso)4E (the eIF4E isoform), as well as the so-called new cap-binding protein nCBP [27]. The sequences of the genes encoding all four variants of the potato eIF4E translation initiation factors were identified in three genomes of Stuberosum sequenced to date and in the wild species S. pinnatisectum. Primers annealing at highly conserved regions in the 5'-noncoding region of the genes before the start codon and in the 3'-noncoding region of the genes after the stop codon were selected for these genes (Table 1). Amplification of the potato eIF4E translation initiation factor gene sequences was carried out using six DNA polymerases: Q5, Phusion, LongAmp (NEB, United States), Platinum SuperFi II, Phusion (ThermoFisher Scientific, United States), and Encyclo (Evrogen, Russia). All of these DNA polymerases are claimed by manufacturers to be suitable for the synthesis of long amplicons from complex templates, including genomic DNA. For each of the four potato translation initiation factor genes, at least with some of the polymerase variants, it was possible to obtain PCR products (Fig. 1), whose length approximately corresponded to the expected length of these genes. The exception was the eIF4E1 gene; the length of the PCR product for it was significantly higher (7000 bp) than that expected from the analysis of genome assembly data (4000–5000 bp). The specificity of the amplification was confirmed by Sanger sequencing of the 5' and 3' end regions of the PCR products.

Fig. 1.
figure 1

Amplification of potato translation factors eIF4E1 (E1), eIF4E2 (E2), eIF(iso)4E (Eiso), and nCBP by six DNA polymerase variants. Phusion(N) is Phusion polymerase from NEB, Phusion(T) is Phusion polymerase from ThermoFisher Scientific.

The most efficient amplification by different polymerase variants was observed for the eIF(iso)4E and nCBP genes. These genes were efficiently amplified by almost all the studied DNA polymerases, except for the Encyclo polymerase, which demonstrated a low amplification efficiency in relation to the eIF(iso)4E gene (Fig. 1). It is important to note that in the case of eIF(iso)4E, different polymerase variants amplified two fragments of slightly different lengths; however, both fragments were sequences of the eIF(iso)4E gene based on the sequencing of their terminal regions (data not shown). Probably, the two PCR products of different lengths were due to polymorphism of the eIF(iso)4E gene in the Zhukovsky early potato variety, presumably by the presence of extended deletions and/or insertions in the noncoding regions of this gene.

The eIF4E1 gene in the Zhukovsky early potato variety was significantly longer than eIF(iso)4E and nCBP. Efficient amplification of eIF4E1 was observed with four out of six DNA polymerase variants: Q5, both Phusion polymerase variants, and LongAmp, and the latter showing the highest efficiency. In contrast, amplification with Encyclo and Platinum SuperFi II polymerases did not result in detectable amounts of eIF4E1 PCR product.

The lowest amplification efficiency was observed in relation to the longest gene, eIF4E2, whose length exceeded 10 000 bp. The efficiency of LongAmp polymerase in amplification of this gene was comparable with efficiency in relation to other genes: eIF4E1, eIF(iso)4E, and nCBP. The Platinum SuperFi II, Phusion (NEB), and Phusion (ThermoFisher Scientific) polymerases also allowed amplification of eIF4E2, although the PCR efficiency was obviously lower compared to the amplification of the other three potato translation initiation factor genes by these polymerases. Amplification of eIF4E2 with polymerase Q5 allowed us to obtain only a small amount of PCR product, while the use of Encyclo polymerase did not allow amplification of eIF4E2.

Amplification of the full-length sequence of the potato virus Y PVY from cDNA of infected potato plants. The potato virus Y PVY is the most significant viral pathogen of this crop; its genome is a single-stranded positive RNA [28]. Total RNA was isolated from PVY-infected potato plants of the Vector variety and cDNA was obtained. From the primers annealed at the conserved regions at the 5' and 3' ends of the PVY genome, the full-length genome of this virus was amplified using the same six variants of DNA polymerases that were used to amplify the genes of potato translation initiation factors. The use of Q5, Phusion (NEB), Phusion (ThermoFisher Scientific), and Encyclo polymerases did not allow to accumulate the full-length PVY genome amplicon in visually detectable amounts (Fig. 2a). Amplification was achieved using LongAmp and Platinum SuperFi II polymerases; however, the accumulation of the PCR product in both cases was low. In addition, the formation of a shorter PCR product of about 7000 bp was also observed when amplifying full-length PVY with LongAmp polymerase.

Fig. 2.
figure 2

Amplification of the full length potato virus Y (PVY) genome and four fragments of the virus genome (fragments 1, 2, 3, 4) using six DNA polymerase variants (a) and a schematic representation of the PVY genome, four genome fragments and location of primers used for amplification (b). The names of primers for amplification of PVY genome fragments correspond to those in Table 1.

Since efficient amplification of the entire PVY genome could not be achieved with any of the six studied DNA polymerase variants, internal primers were selected for the highly conserved regions of the PVY genome sequence, which made it possible to amplify the PVY genome in the form of overlapping fragments (Fig. 2b). Fragment 1, which included almost half of the PVY genome from the 5' end and had a length of about 4800 bp, in contrast to the complete PVY genome, was amplified by most DNA polymerase variants, except for Phusion (ThermoFisher Scientific). However, the amount of synthesized PCR product was low in all cases. Fragment 2, which included the 3'-part of the PVY genome, about 5200 bp long, was amplified by all the studied variants of DNA polymerases. At the same time, the yield of the PCR product was high for all polymerase variants, except for Phusion (ThermoFisher Scientific).

Fragment 3 of the PVY genome, which included whole fragment 2 and about half of fragment 1, was approximately 7500 bp long. An increase in the length of the amplified 3'-part of the PVY genome somewhat reduced the efficiency of amplification for Q5 and LongAmp polymerases, but not for Platinum SuperFi II polymerase. Polymerases Encyclo, Phusion (NEB), and Phusion (ThermoFisher Scientific) were unable to amplify fragment 3, while 2 of them, Encyclo and Phusion (NEB), efficiently amplified the shorter fragment 2. Fragment 4, which included the 5'-part of the PVY genome was amplified in detectable amounts by only three out of six DNA polymerase variants: Platinum SuperFi II, Longamp, and Encyclo, despite its small size (approximately 2200 bp).

DISCUSSION

In this work, the efficiency of six variants of DNA polymerases for the amplification of long nucleotide sequences from a genomic DNA template and cDNA of potato plants was investigated. In general, the efficiency of amplification with an increase in the length of the PCR product was expected to decrease. Among the translation initiation factors of the eIF4E family, the highest amplification efficiency for all variants of DNA polymerases was observed for the eIF(iso)4E and nCBP genes, the amplicon length for these was less than 4 kb, i.e., they did not belong to the conditional category of “long amplicons.” Of all the polymerase variants, only Encyclo showed unsatisfactory performance in relation to these relatively short amplicons, eIF(iso)4E. In contrast, highly efficient amplification of the longest eIF4E2 gene was achieved using a single DNA polymerase variant, LongAmp. The Platinum SuperFi II polymerase also made it possible to efficiently amplify eIF4E2, but the yield of the PCR product was significantly lower than when using LongAmp polymerase.

When performing PCR using cDNA as a template, it was not possible to achieve highly efficient amplification of the entire PVY genome with a length of about 10 000 bp with any polymerase variant, although Platinum SuperFi II and LongAmp were able to synthesize visually detectable amounts of full-length PVY amplicon. The low yield of PCR can be associated not only with the large size of the amplicon, but also with the low efficiency of amplification of the 5'-part of the PVY genome, which was observed for all DNA polymerases. This was apparently determined by the poorer quality of cDNA synthesis from the 5'-parts of long mRNA molecules, since cDNA synthesis occurs in the 5'-3' direction [29]. Indeed, the efficiency of amplification of fragments 2 and 3, located in the 3'-part of the PVY genome, was much higher than that of fragments 1 and 4, located in the 5'-part of the virus genome, with longer amplicons (Fig. 2).

All studied DNA polymerases are claimed by the manufacturers as being suitable for the amplification of long fragments. The maximum lengths of the PCR product specified by the manufacturers that allow efficient amplification are 20 kb for Encyclo, Platinum SuperFi II, and Phusion (ThermoFisher Scientific) and 30 kb for Longamp, while for Q5 and Phusion (NEB) these data are not provided. Since the length of even the longest of the amplicons studied in this work was significantly less than these values, the reduced efficiency of their amplification is probably associated not only with the size of these amplicons, but also with the use of complex matrices in PCR. Contamination of potato genomic DNA with PCR inhibitor substances, as well as a reduced quality of synthesis of long cDNA molecules, reduced the overall efficiency of amplification, which, combined with the increased complexity of synthesis of long amplicons, probably led to the generally observed low efficiency of amplification of the longest fragments, both with genomic DNA, as well as cDNA.

LongAmp polymerase demonstrated the highest efficiency in the amplification of various long sequences, both eIF4E genes and PVY fragments. The use of this polymerase provided the synthesis of all eIF4E genes and all PVY fragments. However, during the amplification of the eIF4E genes LongAmp polymerase showed a significantly higher efficiency than other DNA polymerases, while during the amplification of PVY fragments for three out of five fragment variants, the yield of PCR products was low. The Platinum SuperFi II polymerase, which in this work also demonstrated a generally high efficiency of amplification of various sequences, was unable to amplify the eIF4E1 gene, which was amplified by all other polymerase variants except Encyclo. Both variants of Phusion demonstrated rather high efficiency in the amplification of eIF4E genes; at the same time, Phusion polymerase (NEB) effectively amplified only some fragments of the PVY genome, while Phusion polymerase (ThermoFisher Scientific) did not amplify any of the PVY fragments. The above results indicated that DNA polymerases have specific difficulties with the synthesis of certain amplicons, which differed for different polymerases. This suggests that the selection of DNA polymerase is of key importance in order to achieve efficient amplification of a particular sequence. For example, the Encyclo polymerase, which was generally the least efficient in this work, amplified the 4 PVY fragment more efficiently than the Q5, Platinum SuperFi II, and both Phusion polymerases.

According to the literature data, the synthesis of short nontarget sequences can be one of the significant difficulties in PCR of long fragments, which is generally determined by the higher efficiency of the synthesis of short fragments compared to long ones [17]. However, in the present work, the appearance of shorter nontarget PCR products along with the long target was practically not observed, except for the synthesis of a small amount of short nontarget products during amplification of the full-length PVY polymerase LongAmp, as well as fragment 2 of PVY by some variants of polymerases, which indicates a high specificity of amplification.

CONCLUSIONS

The LongAmp polymerase, which demonstrated high efficiency in this work, is characterized by a relatively low amplification accuracy, according to the manufacturer (NEB), exceeding the accuracy of Taq polymerase by 2 times. The low accuracy limits the applicability of this polymerase for applications such as cloning long sequences in the creation of genetic constructs. The Platinum SuperFi II polymerase, which is characterized by a higher synthesis accuracy, in this work showed the ability to efficiently synthesize various long amplicons. This polymerase has been successfully used to amplify long sequences of various origins [3036]. The disadvantage of Platinum SuperFi II polymerase is its high cost, which is much higher than the cost of most commercially available DNA polymerases, including all the enzymes studied in this work.

None of the DNA polymerases studied in this work provided efficient amplification of all target DNA fragments. At the same time, almost any of the DNA or cDNA fragments presented in this study, including long ones (≈5000 bp or more), could be effectively amplified using at least one DNA polymerase variant. Based on this, if a researcher needs to amplify long DNA fragments, especially from complex templates, which include the genomic DNA of many plants, we can recommend experimental validation of several DNA polymerases as an effective way to achieve satisfactory amplification of the target genome region.