Introduction

Porcine reproductive and respiratory syndrome (PRRS) is one of the most notorious diseases in the global swine industry [5]. It is characterized by reproductive failure in sows and respiratory illness in young pigs [21]. The causative agent, PRRS virus (PRRSV), is classified as a member of the order Nidovirales, family Arteriviridae, genus Arterivirus. PRRSV is an enveloped, single-stranded, positive-sense RNA virus [34]. The linear genome of about 15 kb consists of nine open reading frames (ORFs), two of which encode polymerases (ORF 1a and ORF 1b) and seven of which encode structural genes (ORF 2a, 2b, 3, 4, 5, 6, and 7). Among them, ORF 5, 6, and 7 encode the three major structural proteins, glycoprotein 5 (GP5), membrane (M) protein, and nucleocapsid (N) protein, respectively [5]. GP5 is the major virus antigen associated with the development of neutralizing antibodies and protection [41], and it is often used as a phylogenetic marker. The M protein is thought to be associated with the development of strong cellular immunity [2]. Recently, Jiang et al. [16] reported that DNA vaccines co-expressing two major structural proteins (GP5 and M proteins) with two promoters displayed enhanced immunogenicity. Zheng et al. [45] also suggested co-expression of GP4, GP5, and M proteins of vaccine virus, Ankala. The remaining major structural protein, N protein, is the most abundant and most immunogenic protein in the virion [9] and is an important target for virological detection by PCR [15] and serological detection by ELISA [30], as well as being a phylogenetic marker [44].

PRRSV can be divided into two distinct genotypes, type 1 (European, EU) and type 2 (North American, NA), with Lelystad virus and VR-2332, respectively, as their prototype strains [42]. Also, although the disease symptoms are similar following infections with these viruses, they are antigenically and genetically very different [23].

Since the first identification of PRRSV in the United States in 1987 [17] and in Europe in 1990 [24], many global studies have been published on the basis of its gene [e.g., 25, 40, 46] and genome [e.g., 4, 22, 47] sequences. However, the evolutionary phylogenetics of PRRSV remains controversial [10, 11, 14, 26, 3133].

To further explore the genetic history of the PRRS virus, we determined the complete ORF 5 to 7 sequences from four Korean NA-type isolates and analyzed them together with published sequences for 23 EU-type (one from Korea) and 123 NA-type (three from Korea) isolates from different parts of the world. The first objective of this study was to examine the key mechanisms that drive the evolution of this virus, such as recombination and selection pressure. Subsequently, Bayesian inferences (BI) approaches as well as the maximum-likelihood (ML) method were used to investigate the phylogeny of PRRSVs. Finally, a Bayesian coalescent analysis was conducted to estimate substitution rates, divergence times, and changes in population size.

Materials and methods

Sample and data collection

Using MARC-145 cells, four type-2 PRRSV strains were isolated from three Korean prefectures (Gyeonggi, Gangwon, and Chungnam) in 2007, 2009, and 2010. The PRRSV isolates, CP07-401-9 (passage 9), CP07-626-2 (passage 13), e417-2 (passage 5), and A4699 (passage 9) were used for sequencing ORFs 5 to 7. The complete ORF 5 to 7 sequences (all NA type) were analyzed together with published sequences from four Korean isolates (one EU type and three NA type) and 142 non-Korean PRRSV isolates (22 EU type and 120 NA type). The PRRSVs utilized in the present study are listed in Table S1.

RNA extraction, RT-PCR, and sequencing

The PRRSV isolates in this study were cultured on MARC-145 cells in Dulbecco’s modified Eagle’s medium (5 % fetal bovine serum) for 5 days and harvested by two rounds of freezing and thawing and centrifugation (3,000 rpm for 20 min) when cytopathic changes were observed in about 80 % of the cells. The harvested viruses were used for RNA extraction. Total RNA was extracted from the virus stocks using TRIzol® LS (Invitrogen, Carlsbad, CA) according to the manufacturer’s protocol. The cDNA was synthesized with random hexamer using a SuperScript® III First-Strand Synthesis Kit (Invitrogen) according to the manufacturer’s instructions. For sequencing ORFs 5 to 7, the following primers were selected from a previous study (numbers correspond to the positions in the North American prototype PRRSV strain [VR-2332] [47]): (a) for the ORF 5 gene, O5P1 (13,671–13,693), 5-TCCTACTGGCAATTTGAATG-3, and O5P2 (14,369-14,390), 5-CCTTTAGAGCATATATCATCAC-3; (b) for the ORF 6 gene, O6P1 (14,271-14,292), 5-GTTTCAGCGGAACAATGGGGTC-3, and O6P2 (14,820-14,840), 5-GCACAGCTGATTGACTGGCTG-3; and (c) for the ORF 7 gene, O7P1 (14,751-14,770), 5-TGGGTGGCAGAAAAGCTGTT-3, and O7P2 (15,219-15,240), 5- GTGTCAATCAGTGCCATTGACC-3. PCR was performed separately with each primer set using a Maxime PCR Premix Kit (Intron, Seongnam, Korea). The PCR amplification conditions were 94 °C for 2 min followed by 40 cycles at 94 °C for 10 s, 58 °C for 30 s, and 72 °C for 30 s, followed by a final elongation step at 72 °C for 7 min. The amplified fragments were purified using a QIAquick Gel Extraction Kit (QIAGEN, Valencia, CA) and sequenced in both directions by Macrogen Company (Seoul, Korea) using the corresponding ORF 5, ORF 6, and ORF 7 PCR primers. The complete ORF 5 to 7 sequences from each isolate determined in this study were submitted to GenBank under accession numbers JN809804–JN809807.

Sequence, recombination, and selection pressure analysis

After sequencing three fragments, each fragment was assembled with overlapping sequences to obtain the complete ORF 5 to 7 sequences. These sequences were initially aligned using CLUSTAL X 1.81 [39]. For each genetic region, as well as the entire ORF 5-through-7 region, the following calculations were made from the dataset using the programs PUZZLE 4.0.2 [36] and Modeltest 3.7 [27]: total sites, conserved sites, variable sites, base frequencies, and transition/transversion ratios. The nucleotide and amino acid sequence identities among the PRRSV isolates were also estimated using BIOEDIT 7.0.9 [13].

To detect putative recombinants, the identification of likely parental sequences and localization of possible recombination breakpoints were performed using the RDP 3.0b41 package [19] with the default parameters. This package consists of six different recombination detection programs: RDP, GENECONV, MaxChi, BOOTSCAN, Chimeara, and SiScan. To exclude the possibility of detecting false-positive recombinants, we considered only putative recombinant sites detected by at least three methods. In all analyses, p  <  0.05 was considered to indicate statistical significance.

To evaluate the selective pressure driving PRRSV evolution, nonsynonymous/synonymous substitution ratio (ω = dN/dS) values were estimated using ClustalX 1.81 [39], PAL2NAL [38], and the codeml program in the PAML 3.14.1 package [43].

Phylogeny inference

Phylogenetic analysis was carried out using two different analytical methods, BI and ML. For phylogenetic analysis based on the complete ORF 5 to 7 nucleotide sequences from the 150 PRRSVs, GTR+I+G was selected as the best-fit model using Akaike’s information criterion (AIC) in MODELTEST 3.7 [27]. The BI method was executed in MrBayes 3.1.2 [28] with the following options: nst, 6; rates, gamma; number of generations, 2,000,000; sample frequency, 100; number of chains, 4; burn-in generation, 20,000. To estimate the reliability of nodes, Bayesian posterior probability (BPP) values are shown on the BI tree. ML analysis was conducted using PHYML 3.0 [12], The parameters used for tree construction were as follows: model of nucleotide substitution, GTR; initial tree, BIONJ; nonparametric bootstrap analysis, yes, 500 replicates; proportion of invariable sites, estimated; number of substitution rate categories, 6; gamma shape parameter, estimated by program; optimize tree topology, yes.

Substitutions rates, divergence times, and population size changes

To investigate the evolutionary history of global PRRSVs, the rates of nucleotide substitutions, times of the most recent common ancestor (TMRCA), and changes in population size were co-estimated using the Bayesian Markov chain Monte Carlo (MCMC) approach as implemented in BEAST 1.5.3 [8]. The dataset for these analyses comprised the complete sequences of the ORF 5-to-7 region from 150 PRRSVs circulating worldwide during the last 22 years (1989–2010). Here, GTR+I+G and nst = 6 and rates = gamma were used as the best-fitting model and the likelihood setting, respectively, which were generated with MODELTEST 3.7 [27]. Subsequently, we employed both strict and relaxed (both uncorrelated exponential and uncorrelated lognormal) molecular clocks [7] with five different demographic models (constant size, exponential growth, expansion growth, logistic growth, and Bayesian skyline). Using the Bayesian factor test (log10 Bayes factors >2 in all cases) based on the relative marginal likelihoods of the models [37], the relaxed uncorrelated exponential clock and expansion population size models were selected as showing the best fit for the PRRSV datasets. The changes in effective population size over time were examined using the Bayesian skyline plot (BSP). The datasets were each run for 200,000,000 generations to ensure convergence of all parameters (ESSs>200) with discarded burn-in of 10 %. The resulting convergence was analyzed using Tracer 1.5 (http://beast.bio.ed.ac.uk/Tracer), and the statistical uncertainties were summarized in the 95 % highest probability density (HPD) intervals. Trees were summarized as maximum clade credibility trees using the TreeAnnotator program and visualized using FigTree 1.3.1 [29].

Results

Sequence analysis, selection pressure analysis, and recombination detection

The complete ORF 5 to 7 sequences from four Korean PRRSV isolates were determined. They all belonged to the NA genotype and were the same length (1,494 nucleotides: 603 for ORF 5, 525 for ORF 6, and 372 for ORF 7).

For a total of 150 global PRRSVs (23 EU and 127 NA types), the characteristics of the complete of ORF 5 to 7 region and their individual gene sequences are summarized in Table 1. The overall length of the alignment (including gaps) was 1,535 base pairs, and the deduced amino acid sequences were 502 residues in length. The complete sequences of the ORF 5 to 7 region of the PRRSVs revealed a very low degree of genetic similarity: 619 (40.3 %) of the nucleotides and 102 (20.3 %) of the amino acids were conserved. Pairwise comparisons demonstrated that the nucleotide sequence identity among those sequences ranged from 61.4 % to 100 %, corresponding to 59.6 %–100 %, similarity, respectively, at the amino acid level. Of the three individual genes, ORF 5 was the most variable (average sequence identity of 85.4 % and 83.2 % for nucleotides and amino acids, respectively), whereas ORF 6 was the most conserved (average sequence similarity of 89.8% and 93.2% for nucleotides and amino acids, respectively). The transition/transversion ratio estimated from sequences of the entire ORF 5 to 7 region was 3.71. Here, the transition/transversion ratio of the ORF 5 gene was the highest (3.80), whereas ORF 7 had the lowest ratio (3.39).

Table 1 Summary of the genetic regions of PRRSV for the Bayesian coalescent approach

The complete ORF 5 to 7 sequences of the EU-type viruses were 1,509 nucleotides and 502 amino acids (including gaps) in length and revealed considerable genetic variation: 1,016 (67.3 %) of the nucleotides and 347 (69.1 %) of the amino acids were conserved. The nucleotide sequence identities among this type of isolates ranged from 87.2 % to 99.9 %, corresponding to 87.4–100 % identity at the amino acid level. Of the three individual genes, ORF 5 was the most variable (average sequence similarity, 88.8 % for nucleotides and 89.0 % for amino acid), whereas ORF 7 was the most conserved (average sequence identity, 93.3 % for nucleotides and 93.9 % for amino acids). The transition/transversion ratio estimated from the dataset was 6.20. The transition/transversion ratio of the ORF 7 gene was the highest (7.47), whereas that of the ORF 5 region was the lowest (6.15).

The total sites (including insertions) of the NA-type ORF 5 to 7 region had 1,494 base pairs and 500 deduced amino acids. In comparison with EU-type sequences, they had similar configurations: 914 (61.2 %) of the nucleotides and 308 (61.6 %) of the amino acids were conserved. Pairwise comparisons showed that the identities between NA-type isolates ranged from 87.5 % to 100 % for the nucleotide sequences and from 87.1 % to 100 % for the amino acid sequences. ORF 6, rather than ORF 7 in the EU type, was the most conserved (average sequence identity, 97.2 % for nucleotides and 98.2 % for amino acids). The transition/transversion ratio calculated from the entire NA-type virus dataset was 4.51. The transition/transversion ratio of the ORF 6 gene was the highest (5.61), and the ratio of ORF 5 was the lowest (4.17).

The nonsynonymous/synonymous substitution ratio (ω = dN/dS) value of the complete ORF 5 to 7 sequences of 150 global PRRSVs was 0.2229: 0.2372 for EU-type isolates and 0.2914 for NA-type viruses. Accordingly, our analysis showed that purifying selection acted on the ORF 5-to-7 sequences of both EU- and NA-type PRRSVs. Additionally, no recombination events was detected among the viruses.

Phylogenetic analysis

Ambiguous positions were excluded from the nucleotide alignment so that 1,514 nucleotide sites were used for the phylogenetic analysis, BI, and ML methods, which yielded consistent findings in terms of the phylogeny of PRRSV. The topology of the maximum clade credibility tree (Fig. 1) indicated that all of the global PRRSVs are divided into two different clades corresponding to the EU and NA genotypes (all posterior probabilities = 1.00 and bootstrap values = 100 %). Here, all of the EU-type viruses were divided into one of two groups (all posterior probabilities = 1.00) or were unclassified (4 isolates: JF276431, EU076704, JF304781, and GU047344). Group 1 contained a total of 14 isolates, with four from the United States, three from Spain, two from the Netherlands, and one each from Korea, China, Thailand, Germany, and Portugal during 1991–2007. All five isolates in Group 2 originated from China in 2009. Within the NA clade, all members belonged to one of three major groups (all posterior probabilities = 1.00) or were unclassified (5 isolates: AF325691, AF494042, AY545985, AY424271, and AY262352). Group 1 consisted of 90 Chinese viruses (70.9 % of all NA-type viruses) collected between 2002 and 2009. Group 2 was composed of six Chinese viruses from 1996 to 2008. Group 3 contained 26 isolates, with 12 from China, seven from Korea, four from the United States, and one each from Japan, Thailand, and Canada from 1989 to 2010.

Fig. 1
figure 1

Bayesian maximum clade credibility phylogenetic tree obtained from complete ORF 5 to 7 region sequences of 150 PRRSVs from around the world. The dataset (1,514 sites) was also analyzed phylogenetically by the BI and ML methods, and an identical topology was produced. The robustness of the phylogenetic analysis is presented above the nodes: numbers at the left represent Bayesian posterior probabilities (≥0.80), and those at the right represent ML bootstrap values (≥70 %). Divergence times (in years) are positioned below the nodes; the 95 % HPD intervals are indicated in brackets. Groups are marked by a “G”

Substitution rates, divergence times, and population size changes

The sequences of the ORF 5 to 7 region of PRRSVs collected during the past 22 years (1989–2010) were also analyzed using a Bayesian coalescent approach. The evolutionary rate of PRRSV isolates was estimated to be 1.55 × 10−3 (95 % HPD = 9.06 × 10−4–2.23 × 10−3) substitutions/site/year, projecting the TMRCA back to 491.2 years ago (95 % HPD = 199.6–864.4). Two distinct genotypes diverged at about the same time; the TMRCA estimated for the current diversity of EU-type viruses was 58.7 years ago (95 % HPD = 40.6–191.9), whereas that for the NA-type isolates was calculated at 62.6 years ago (95 % HPD = 41.5–153.7). In contrast, BSP analysis on the basis of the ORF 5-to-7 sequences of the isolates (Fig. 2) showed that the viruses appear to have evolved at an almost constant population size until the late 1970s, when they experienced a population expansion lasting until the late 1980s. The population size then remained constant again until the early 2000s, when a rapid, sharp decline in the effective number of infections occurred.

Fig. 2
figure 2

Bayesian skyline plot of global PRRSVs sampled between 1989 and 2010. The thicker black line represents the median estimate of the effective number of infections over time, and the thinner blue lines indicate the upper and lower bounds of the 95 % highest posterior density

Discussion

To date, the ORF 5-through-7 region has been regarded as the most important target for PRRSV studies involving PCR detection, immunology, and evolutionary phylogenetics. Here, to explore the genetic history of these viruses further, we determined the complete ORF 5-to-7 sequences from four Korean PRRSV isolates from 2007 to 2010 and then analyzed them together with 146 sequences from different parts of the world that were available in the GenBank database. Pairwise comparisons of the ORF 5 to 7 region confirmed that the virus is highly variable in sequence diversity: between intergenotypic sequences, maximum divergences of 38.6 % for nucleotides, and 40.4 % for amino acids were observed. Intra-dissimilarities of both genotypes were similar to each other. Maximum divergences among the EU-type isolates reached 12.8 % for the nucleotide sequences and 12.6 % for the amino acid sequences, respectively, and those among NA-type viruses were 12.5 % and 12.9 %, respectively. This indication is in agreement with previous studies, e.g., Stadejek et al. [35], An et al. [1], and Yoon et al. [44]. Domingo et al. [6] stated that RNA viruses have high mutation rates of 10−3 to 10−5 nucleotide substitutions per site per replication cycle due to their inaccurate RNA replication. In particular, of the three major structural genes, our analysis showed that ORF 5 was the most variable. This finding supports the results from previous studies [20, 35]. Because PCR detection sensitivity and vaccine development depend on sequence variability [18], continuous monitoring of the sequence variability of ORFs 5 to 7 could be taken into account for updating primers and for studies on immunology as well as epidemiology and evolutionary phylogenetics.

Regarding the selection pressure on PRRSVs, our results based on ORF 5 to 7 sequence data revealed that purifying selection acted on the viruses. That is, the nonsynonymous/synonymous substitution ratio (ω = dN/dS) value of the complete ORF 5 to 7 sequences from 150 global PRRSVs was 0.2229: 0.2372 for EU-type isolates and 0.2914 for NA-type isolates. This point confirms the views of Pesch et al. [25], while it contradicts the predictions of others such as Hanada et al. [14] and Song et al. [33], who demonstrated that the evolution of PRRSV was driven by positive selection. The exchange of different genetic sites between viruses by recombination, however, contributes to PRRSV genetic diversity and evolution, although in the present study, no pattern of recombination events was detected that was consistent across all PRRSVs.

Regarding the phylogenetic relationships among the PRRSVs, our results based on ORF 5 to 7 sequence data revealed two clearly defined clades corresponding to the EU and NA genotypes. Despite the striking genetic distances between two genotypes, their clinical symptoms are notably similar following infections with these viruses.

Next, this study focused on geographic and/or temporal influences on the evolution of PRRSVs. Recently, several phylogeny investigators such as Cha et al. [3] and Zhu et al. [47] documented that geographic separation is a factor influencing the PRRSV evolution, based on ORF 5 and genome sequences, respectively. However, the findings of the present study do not support their viewpoints; no apparent correlation was observed between country and/or time and the evolution of global PRRSVs. Within the EU-type clade, group 1 members were found in eight countries (the United States, Spain, the Netherlands, Korea, China, Thailand, Germany, and Portugal) during 1991–2007, although all isolates of group 2 originated from a single country (China) in 2009. Similar features were observed within the NA clade as well. Although all members of both group 1 and group 2 were collected in China, their sampling times were different: group 1 during 2002–2009 and group 2 during 1996–2008. Moreover, the group 3 viruses of the NA type occurred in six countries (China, Korea, the United States, Japan, Thailand, and Canada) during 1989–2010. These features are also consistent with the results of previous work [25, 44] and may be largely due to the rapid expansion and diversification of PRRSVs over a relatively short period of time and their rapid spread via the frequent international trade in livestock. The mixed population structure can make vaccine strategies more difficult. Thus, continuously screening for changes in the mixed population structure of this virus is needed.

The substitution rates and divergence times of PRRSV have been investigated in recent years from the molecular standpoint. Forsberg et al. [10], on the basis of the ORF 3 sequences of EU isolates, estimated mean evolutionary rates of 5.8 × 10−3 (95 % HPD = 4.8–6.9 × 10−3) and dated the most recent common ancestor to 1979, more than 10 years before the start of the European epidemic. Plagemann [26] analyzed ORF 5, ORF 7, and ORF 1b sequences and then postulated that the TMRCA was approximately 100 years ago. Plagemann suggested that a mutant of LDV (lactate dehydrogenase-elevating virus) infected wild boars in central Europe in 1912 and that this intermediate host spread the virus to North Carolina imports; the virus then evolved independently on the European and American continents in the wild boar populations for about 70 years until independently entering the domestic swine population. Subsequently, two debates have occurred [11, 14] on the evolutionary rate and the divergence time of PRRSV. As a result, their values were estimated to be 4.17–9.8 × 10−2 and about 1980, respectively, in the Hanada et al. [14] investigation using ORF 3-to-5 data from two genotypes, whereas those values were recalculated to be around the year 1880 by Forsberg [11] based on ORF 3 sequences of two genotypes. Most recently, Shi et al. [32] estimated a mean evolutionary rate of 1.46 × 10−3 substitutions/site/year for ORF 5 data and noted that the common ancestors of genotype NA viruses appeared approximately in 1979 (1977–1982). Additionally, Song et al. [33] used ORF 5 sequences of two genotypes and demonstrated that the mean evolutionary rates of PRRSVs were 3.29 × 10−3 and that the virus diverged in 1894. Our findings on this topic do not support the previous suggestions. Here, we used a Bayesian coalescent approach using the most extended ORF 5 to 7 region sequences from 150 EU and NA genotype PRRSVs collected during the past 22 years (1989–2010). The relaxed uncorrelated exponential clock and expansion population size model were selected as the best fit for our PRRSV dataset. As a result, in contrast to previous work, our findings indicate that the PRRSV might be an “ancient virus” in spite of its recent emergence. The average substitution rate was 1.55 × 10−3 (95 % HPD = 9.06 × 10−4–2.23 × 10−3) substitutions/site/year, and the time of the most recent common ancestor was 491.2 years ago (95 % HPD = 199.6–864.4); they segregated approximately 1,519.8 years ago (95 % HPD = 1,146.6–1,811.4). Then, two distinct genotypes diverged approximately at the same time; the TMRCA estimated for the current diversity of EU-type viruses was 58.7 years ago (95 % HPD = 40.6–191.9), and that for the NA-type isolates was 62.6 years ago (95 % HPD 41.5–153.7). This configuration is in concordance with the conclusions of several PRRSV researchers and shows that the two lineages must have evolved separately from a very distant common ancestor prior to their emergence into the pig populations on the two continents [23].

Regarding the effective population size changes of PRRSVs, our BSP analysis on the basis of ORF 5 to 7 sequences of the global PRRSV isolates (Fig. 2) revealed that the viruses appear to have evolved at an almost constant population size until the late 1970s, when they experienced a population expansion that continued until the late 1980s. Their population size then remained constant again until the early 2000s, when a rapid, sharp decline in the effective number of infections occurred. The sharp decrease of the PRRSV effective population size might have been due to vaccinations that effectively controlled PRRSV worldwide for many years.

Here, we further studied the genetic history of the PRRSV on the basis of the most extensive ORF 5 to 7 sequences of the EU and NA genotypes. We first explained the evolutionary and past population dynamics of the global PRRSVs inferred from ORF 5 to 7 sequences using a Bayesian coalescent approach. We also demonstrated evolutionary mechanisms such as recombination and selection pressures, as well as phylogeny of PRRSV for two genotypes. Since its outbreaks two decades ago, PRRSV apparently radiated worldwide in an “explosive” fashion during a relatively short time. To date, despite intensive immunological efforts, the porcine reproductive and respiratory syndrome is still the most significant swine disease worldwide. Accordingly, both national and global strategies are necessary to prevent and control this acute disease. The expanding database and genetic history information of PRRSV sequences obtained from the present study might be useful for the prevention and control of this virus as well as for advancing our knowledge about its outbreak and epidemiology.