Introduction

Recent years have been quite unusual, with the Coronavirus Disease of 2019 pandemic sweeping the globe and endangering human life and health. In the meantime, however, plant epidemics continue to spread silently, resulting in severe human food insecurity, especially in developing countries [30, 51]. Plant pathogens are a major threat to agricultural production [14], and viruses comprise nearly half of plant-disease-causing pathogens and are responsible for 30 billion USD in annual economic losses globally [49].

Soybean mosaic virus (SMV), a member of the genus Potyvirus within the family Potyviridae, is the most prevalent and devastating viral pathogen in all soybean (Glycine max (L.) Merr.) production regions worldwide, leading to substantial yield losses and deterioration of seed quality [26, 44, 61, 62]. SMV is seed-borne and aphid-transmitted and can also be transmitted via mechanical inoculation. The diseased soybean seedlings that originate from SMV-infected seeds are the primary inoculum sources. SMV is subsequently transmitted via at least 32 aphid species in a non-persistent manner, eventually resulting in secondary spread within and among soybean fields [26, 31]. SMV infection induces mosaic, chlorosis, rugosity, curl, and necrosis on soybean leaves and causes plant stunting and seed discoloration (seed coat mottling) [22]. Yield reductions are usually reported to range from 8% to 35% under natural field conditions [31]; however, losses of more than 50% and even total crop failure have occurred during severe outbreaks [18].

The SMV genome consists of a monopartite, single-stranded, positive-sense RNA molecule of ~9.6 kb in length, possessing a viral genome-linked protein (VPg) covalently bound to the 5′ terminus and a poly(A) tail at the 3′ end [26, 34]. The viral genome contains a single open reading frame (ORF) encoding a large precursor polyprotein, which is ultimately processed to yield at least 10 mature multifunctional proteins including protein 1 (P1), helper component-proteinase (HC-Pro), protein 3 (P3), 6-kilodalton protein 1 (6K1), cylindrical protein (CI), 6-kilodalton protein 2 (6K2), VPg, nuclear inclusion a-proteinase (NIa-Pro), nuclear inclusion b (NIb), and coat protein (CP) [26, 34]. Additionally, through a frameshift, the SMV genome also encodes a small ORF yielding an 11th protein, termed "pretty interesting Potyviridae ORF" (P3N-PIPO), which is produced as a consequence of transcriptional slippage in the P3 cistron [12, 31].

SMV has undergone changes in its pathogenicity during the long-term process of co-evolution with its hosts and the environment; hence, numerous SMV isolates with different levels of pathogenicity exist in nature [23, 66]. Based on their disease reactions on soybean differentials, a large number of SMV isolates have been categorized into seven strains (G1–G7) in the United States and South Korea [6,7,8, 37, 38, 52, 53, 55] and five strains (A–E) in Japan [60]. In China, SMV has been grouped into two types, namely 22 SC strains (SC1–SC22), which are found nationwide, and three N strains (N1–N3), which are prevalent in Northeast China, on different sets of soybean differentials [41, 46]. In the National Soybean Regional (Uniform) Tests (NSRUT) in China, new soybean cultivars are required to pass SMV resistance assessments before official approval of release; however, SMV SC strains are used for evaluating soybean cultivars from the Yellow-Huai-Hai River Valleys and Southern China, while SMV stains N1 and N3 are used for evaluating the cultivars from Northeast China [48]. The pathogenic relationship between SMV SC and N strains is unclear, and little genomic information about N strains is available, which seriously impedes the nationwide application of the available SMV-resistant soybean germplasm resources.

Since the first determination of full-length nucleotide sequences of SMV [34], complete and partial genome sequencing of SMV isolates has greatly helped us to understand its genomic structure, to identify determinants of virulence, resistance-breaking, and host range, and to study mutation, recombination, phylogenetic relationship, and evolutionary processes [4, 5, 9,10,11, 18, 19, 24, 27,28,29, 35, 50, 52,53,54,55, 63,64,65,66,67,68]. In the present study, the pathogenic relationships between SMV N1/N3 and SC strains were clarified via virus resistance assessments of N strains using the uniform soybean differential system, which was used previously for identifying SC strains [41]. Moreover, the virulence of SMV strains N1 and N3 was compared through monitoring of the pathogenic phenotype and viral accumulation on different susceptible soybean cultivars, and through analysis of the available data from SMV resistance assessments. Finally, the full-length genomic sequences of SMV strains N1 and N3 were determined and compared with other available complete SMV sequences. The results from this study will facilitate the nationwide use of SMV-resistant soybean germplasm, accelerate the progress of soybean resistance breeding in China, and provide useful insights into the molecular variability, geographical distribution, phylogenetic relationships, and evolutionary history of SMV around the world.

Materials and methods

Soybean materials and SMV strains

Ten soybean differentials, including Nannong 1138-2, Youbian 30, 8101, Tiefeng 25, Davis, Buffalo, Zaoshu 18, Kwanggyo, Qihuang No. 1, and Kefeng No. 1, which have been used for identifying SMV SC strains [41], were used for resistance assessments of N strains in this study. The soybean seeds were provided by the National Center for Soybean Improvement, Nanjing Agricultural University.

SMV strains N1 and N3, previously identified in Northeast China [46], were obtained from the Jilin Academy of Agricultural Sciences (Northeast Agricultural Research Center of China). Both strains were maintained separately on soybean cv. Nannong 1138-2 (a highly susceptible host) for further biological and molecular analysis.

Pathogenicity test

Seeds of 10 soybean differentials were individually sown in plastic pots containing moistened nutrition soil mixed with vermiculite and grown at 23–25°C with a photoperiod of 16 h in an insect-proof greenhouse. Seedlings were thinned to 15–20 healthy and uniform plants per pot and mechanically inoculated with SMV strains N1 and N3 as described previously [21]. The inocula were prepared from the symptomatic leaves collected from the corresponding infected Nannong 1138-2 plants, which were homogenized using a sterilized mortar and pestle in 0.01 M phosphate-buffered saline (a mixture of sodium hydrogen phosphate and monopotassium phosphate, pH 7.4) supplemented with a moderate amount of carborundum powder (600-mesh) as an abrasive. Inoculation was performed by gently rubbing the fully expanded unifoliolate leaves with the viral suspension using a paintbrush. Leaves were rinsed with tap water shortly after the inoculation, and plants were sprayed regularly with pesticides to prevent cross-infection via aphids. Disease symptoms (i.e., symptomless, mosaic, and necrosis) were monitored starting 7 days post-inoculation (dpi) and recorded at 1-week intervals until the R1 stage (beginning of flowering) [17].

Virulence comparison

Virus accumulation was detected by quantitative real-time polymerase chain reaction (qRT-PCR) analysis in soybean cvs. Nannong 1138-2 and 8101 that had been challenged with SMV strains N1 and N3. Gene-specific forward (5′-CAGATGGGCGTGGTTATGA-3′) and reverse primers (5′-ACAATGGGTTTCAGCGGATA-3′) were designed targeting the conserved region of SMV CP using Primer Premier 5.0. GmTubulin (accession no. AY907703), amplified with the forward (5′-GGAGTTCACAGAGGCAGAG-3′) and reverse primers (5′-CACTTACGCATCACATAGCA-3′), was used as the internal reference control. Samples were collected independently from the corresponding infected young leaves at 7, 14, 21, and 28 dpi, and total RNA extractions followed by first-strand cDNA syntheses were conducted using an RNA Simple Total RNA Kit (Tiangen, China) and PrimeScript® RT Master Mix (Takara, Japan), respectively. Subsequently, qRT-PCR was carried out in a reaction mixture with a 20-μL final volume, containing 2 μL of template cDNA (approximately 50 ng), 0.4 μL of each primer (10 μM), 10 μL of 2× SYBR® Premix Ex Taq™ (Takara, Japan), and 7.2 μL of sterilized ddH2O. Thermal conditions were set to 95 °C for 30 s, followed by 40 cycles at 95 °C for 5 s, 55 °C for 30 s, and 72 °C for 30 s. Samples were analyzed in triplicate on a LightCycler® 480 II instrument (Roche, Germany) according to the manufacturer’s manual. Transcript levels were quantified using the relative quantification (2-ΔΔCt) method, and data were compared with the internal controls.

Statistical data from SMV resistance assessments deposited in the NSRUT (https://www.natesc.org.cn/) over a four-year period (2015-2018) were analyzed. The disease index (DI) of each evaluated soybean cultivar challenged with SMV strains N1 and N3 was calculated as described previously [23]. On the basis of the DI values, the response types of soybean cultivars in terms of virus resistance were classified as HR (highly resistant, DI = 0), R (resistant, 0 < DI ≤ 20), MR (moderately resistant, 20 < DI ≤ 35), MS (moderately susceptible, 35 < DI ≤ 50), S (susceptible, 50 < DI ≤ 70), and HS (highly susceptible, 70 <DI ≤ 100) [23]. The six classifications were further grouped as resistant (HR, R, and MR) or susceptible (MS, S, and HS).

Whole-genome sequencing

RNA was isolated from soybean leaves infected with SMV strain N1 or N3 and then converted into cDNA, using the methodology described above. Three gene fragments of each strain, overlapping by at least 250 nt in the adjacent regions, were amplified by reverse transcription PCR using high-fidelity polymerase KOD FX (Toyobo, Japan) as described previously [68]. The 5′ fragment (∼3.3 kb) was amplified using the forward primer 5′-AAATTAAAACTMSTYATAAAGA-3′ and the reverse primer 5′-CCYTGCARYACACTAGTCATTTG-3′, the middle fragment (∼3.6 kb) was amplified using the forward primer 5′-CTCCACATACGGARAAATG-3′ and the reverse primer 5′-CCAACCATRCAAACMCGTTC-3′, and the 3′ fragment (∼3.2 kb) was amplified using the forward primer 5′-ATGTTTGGGGTYGGCTATGG-3′ and the reverse primer 5′-AGGACAACAAACATTGCCGYACCT-3′ [68]. The amplicons were separated by electrophoresis in a 0.8% agarose gel and visualized using a gel imaging system (Bio-Rad, USA). The bands with the expected sizes were excised and purified using an AxyPrep DNA Gel Extraction Kit (Axygen, USA) and cloned into the pMD19-T vector (Takara, Japan). To ensure accuracy, at least three clones of each fragment were sequenced bi-directionally by TSINGKE Biological Technology Co. Ltd., Beijing, China. The resulting contigs were trimmed and assembled using BioXM 2.6, and the overall pairwise sequence identity between N1 and N3 was calculated using DNAMAN 9.0 at both the nucleotide and amino acid levels.

Geographical distribution

A total of 104 complete SMV sequences retrieved from the National Center for Biotechnology Information database (http://www.ncbi.nlm.nih.gov/) were analyzed. Based on the nucleotide and amino acid sequence differences between N1 and N3, the analyzed SMV strains/isolates were divided into six pathotypes, including the N1 type, intermediate type I, intermediate type II, intermediate type III, intermediate type IV, and the N3 type. The locations where these pathotypes were isolated were marked on a world map to visualize their worldwide geographical distribution.

Genome-wide analysis

The newly obtained N1 and N3 sequences were compared with other available complete SMV genome sequences. Multiple sequences of SMV nucleotide and deduced amino acids were aligned using BioEdit 7.0 and used for phylogenetic analysis. A phylogenetic tree was built by the neighbor-joining (NJ) method with 1000 bootstrap replicates in MEGA 5.0. To estimate the variations in evolutionary constraints on different regions of the genome, genomic diversity was calculated using DnaSP 5.0 with a sliding window of 100 bp and a step size of 25 bp. To explore the demographic history of SMV populations, Tajima’s D, Fu and Li’s D, and F neutrality tests were applied to each SMV-encoded gene in DnaSP 5.0.

Results

SMV strains N1 and N3 are most closely related to strain SC18

To compare the pathogenic relationship between SMV N1/N3 and SC strains, 10 soybean differentials previously used for identifying SC strains were inoculated with N strains. Surprisingly, the disease reactions of N1 and N3 were the same, namely cvs. Nannong 1138-2 and 8101 both showed mosaic symptoms and were susceptible, whereas the other eight cultivars were all symptomless and resistant to these two strains (Table 1 and Supplementary Fig. S1). Furthermore, strains N1 and N3 were found to exhibit symptoms and pathogenicity identical to those of strain SC18 on the tested soybean cultivars (Table 1). Therefore, we concluded that strains N1 and N3 were most closely related to SC18 based on their performance on the 10 soybean differentials.

Table 1 Resistance assessments of SMV strains N1 and N3 on 10 soybean differentials

N3 is more virulent than N1

Although N1 and N3 could systemically infect Nannong 1138-2 and 8101 (Table 1 and Supplementary Fig. S1), the virulence of N1 and N3 differed on these two soybean cultivars. At first, similar mosaic symptoms appeared on Nannong 1138-2 and 8101 inoculated with N1 and N3 at 7 and 14 dpi (Fig. 1); however, symptoms induced by N3 (severe curl) became more prominent and severe than those induced by N1 (moderate crinkling), both on Nannong 1138-2 and 8101 at 21 and 28 dpi (Fig. 1). Subsequently, virus accumulation in Nannong 1138-2 and 8101 infected with N1 and N3 was measured by qRT-PCR at different time points. For Nannong 1138-2, the amount of virus present at 7, 14, and 21 dpi was similar for N1 and N3, whereas at 28 dpi, the amount of N3 was considerably higher than that of N1 (Fig. 1). For 8101, the amount of N3 present was clearly greater than that of N1 at most of the time points, particularly at 28 dpi (Fig. 1). Hence, the differences in viral titers supported the phenotypic observations for the relative virulence of N1 and N3.

Fig. 1
figure 1

Symptom appearance and qRT-PCR detection of SMV strains N1 and N3 on soybean cvs. Nannong 1138-2 and 8101 at different time points. Data are expressed as the mean of three biological replicates with error bars indicating the standard deviation (SD). SMV, soybean mosaic virus; dpi, days post inoculation

The available statistical data from resistance assessments of SMV strains N1 and N3 in the years 2015–2018 (Supplementary Table S1) were analyzed and are summarized in Table 2. A total of 352 soybean cultivars were assessed for virus resistance to N1 and N3, and no HR and HS types were found (Table 2). Among the 352 cultivars evaluated, 209 (59.4%) were identified as resistant to N1, including R (77, 21.9%) and MR types (132, 37.5%), whereas only 105 (29.8%) were identified as resistant to N3, including R (29, 8.2%) and MR types (76, 21.6%) (Table 2). On the other hand, 143 (40.6%) were identified as susceptible to N1, including MS (131, 37.2%) and S types (12, 3.4%), whereas 247 (70.2%) were identified as susceptible to N3, including MS (134, 38.1%) and S types (113, 32.1%) (Table 2). The average disease index (ADI) of N1 was 31.94 (ranging from 28.04 to 36.82), which was lower than that of N3 (ADI = 42.64, ranging from 40.62 to 45.18) (Table 2). In summary, the 352 soybean cultivars that were evaluated displayed an obviously higher overall level of resistance to N1 than to N3.

Table 2 Resistance of soybean cultivars to SMV strains N1 and N3 in the National Soybean Regional (Uniform) Tests

Taken together, SMV strain N3 was found to be more virulent than N1 based on the pathogenic phenotype, viral accumulation on soybean cvs. Nannong 1138-2 and 8101 (Fig. 1), and the data of SMV resistance assessments (Table 2).

Sequence variations between N1 and N3

The genomes of SMV strains N1 and N3 were completely cloned and sequenced (Supplementary Fig. S2), and both had a genome of 9589 nucleotides encoding a 3067-amino-acid polyprotein (Supplementary Texts S1 and S2). The complete sequences of N1 and N3 were deposited in the GenBank database with accession numbers MN623289 and MN623290, respectively. Sequence alignments showed that N1 and N3 were almost identical, sharing 99.97% sequence identity at both the nucleotide and amino acid levels, with only three nucleotide differences and one amino acid difference (Table 3, Supplementary Figs. S3 and S4). Two nucleotide differences in HC-Pro (A or G at position 1439 and G or A at position 1914) and one nucleotide difference in CI (T or C at position 4377) resulted in a single amino acid difference in HC-Pro, namely Asn (N) or Ser (S) at position 436 (Table 3, Supplementary Figs. S3 and S4).

Table 3 Nucleotide and amino acid sequence differences between SMV strains N1 and N3

The N3 type is the most frequent and widespread type worldwide

A total of 104 SMV strains/isolates were analyzed in this study (Supplementary Table S2). Based on the nucleotide and amino acid sequence differences between N1 and N3 (Tables 3 and 4), SMV strains/isolates were classified as six pathotypes consisting of N1 type (5), intermediate type I (13), intermediate type II (10), intermediate type III (12), intermediate type IV (1), and N3 type (63) (Table 5 and Supplementary Table S3), which were found in China (45), Korea (32), Japan (2), Iran (5), India (1), Canada (5), the USA (13), and Colombia (1) (Table 5, Supplementary Table S3, and Fig. 2). Among the SMV pathotypes, the N1 type was found infrequently, while the N3 type was the most frequent by far. The other types (except for intermediate type IV) were found at a similar frequency (Table 5 and Supplementary Table S3). It was observed that nearly all of the pathotypes were present in China, while there were only three pathotypes in Korea and the USA and only one pathotype in the other countries (Table 5, Supplementary Table S3, and Fig. 2). In particular, it is notable that the N3 type was found in most of the countries and is widely distributed globally (Table 5, Supplementary Table S3, and Fig. 2).

Table 4 Pathotype classification of SMV strains/isolates based on the nucleotide and amino acid differences between N1 and N3
Table 5 Distribution of SMV pathotypes worldwide
Fig. 2
figure 2

Worldwide geographical distribution of SMV pathotypes. The number (n) of the analyzed SMV strains/isolates and the percentage of the pathotypes are indicated for each country, including China (n = 45), Korea (n = 32), Japan (n = 2), Iran (n = 5), India (n = 1), Canada (n = 5), USA (n = 13), and Colombia (n = 1). SMV, soybean mosaic virus

Phylogenetic analysis and the geographical distribution of SMV

Phylogenetic relationships among the available complete SMV sequences (Supplementary Table S2) were analyzed by aligning the nucleotide and amino acid sequences (Fig. 3) and constructing phylogenetic trees, which showed that N1 and N3 were the most closely related to each other at both the nucleotide (Fig. 3a) and amino acid levels (Fig. 3b). Moreover, phylogenetic analysis revealed a significant geographical association among SMV strains/isolates. All five SMV sequences from Canada, two sequences from Japan, and most of the sequences from China and Iran clustered together (Fig. 3). The four G7 strains/isolates (G7a, G7d, G7f, and G7x) from the USA were consistently classified as one subgroup (Fig. 3), and the other sequences from the USA did not cluster closely, due to their sequence diversity. SMV strains/isolates from Korea could be roughly divided into three subgroups including seven sequences (WS109, WS144, WS149, WS160, WS202, WS205, and WS209), nine sequences (CN18, G5, G5H, G5H-clone, G6H, WS32, WS101, WS117, and WS155), and six sequences (WS105, WS110, WS116, WS132, WS135, and WS151), at both the nucleotide and amino acid levels. No pronounced geographical correlation was found among the analyzed SMV strains/isolates when performing the phylogenetic analysis based on the nucleotide and amino acid sequences of each individual gene (Supplementary Figs. S5 and S6), which was probably attributable to the limited amount of genetic information.

Fig. 3
figure 3

Phylogenetic analysis of SMV strains/isolates based on the full-length nucleotide (a) and amino acid (b) sequences. The phylogenetic trees were constructed using MEGA 5.0. The neighbor-joining method with 1000 bootstrap replicates was used to determine phylogenetic relationships. The newly sequenced N1 and N3 are indicated by black triangles, and strains/isolates with black circles indicate that they were not from the indicated country. SMV, soybean mosaic virus

Selection constraints and virus population demography

Nucleotide variability evaluation and neutrality tests were carried out to investigate the evolution of SMV populations and variation under natural selection pressure. The results showed an overall Pi value less than 1 (Fig. 4), implying that the whole SMV genome is under negative selection, and some regions of the 5’UTR, P1, and P3 exhibited higher polymorphism than the other regions (Fig. 4), suggesting that these regions are less evolutionarily constrained. Tajima’s D, Fu and Li’s D and F values for each SMV gene were all negative (Table 6), suggesting that all of the SMV genes are under negative selection pressure to varying degrees and that the SMV population is increasing. P1 may be under the greatest selection pressure, as it was found to have the lowest Tajima’s D value (Table 6). The Fu and Li’s D and F values for P1, NIa-Pro, NIb, and 3’UTR were all significant, and the value for P1 was more highly significant, which is consistent with the result of Tajima’s D test (Table 6). Moreover, Fu and Li’s D and F values were all lower than Tajima’s D values (Table 6), indicating that the mutation rate in SMV may have increased more recently.

Fig. 4
figure 4

Sliding-window analysis of Pi values for each gene of SMV strains/isolates. Pi values were calculated using DnaSP 5.0 and are shown in a sliding window of 100 bp with a step size of 25 bp. The SMV genome is shown to scale above the graph. SMV, soybean mosaic virus

Table 6 Neutrality tests for each SMV gene region of 104 strains and isolates

Discussion

The SMV SC and N strains were classified on different sets of soybean differentials [41, 46], resulting in an inability to compare their pathogenicity and posing severe limitations on exchanging and introducing SMV-resistant soybean cultivars across China. In the present study, SMV strains N1 and N3 were found to be most closely related to strain SC18 (Table 1), which was strongly supported by previous research showing that SC18 accounted for 90.5% of the total isolates collected from Heilongjiang province in northeastern China and was predominant and widespread in this region [40]. Thus, we believe that soybean cultivars resistant to either SC18 or N1/N3 could be used interchangeably, and the mapped R genes conferring resistance to SC18 [39] could also provide resistance to N1 and N3. This information will help to overcome the problems caused by the pathogenic differentiation and host specialization exhibited by SMV SC and N strains and to facilitate the nationwide application of outstanding soybean materials, ultimately accelerating the soybean-breeding progress for improving SMV resistance in China. Nevertheless, the same problems caused by non-uniform SMV strains still exist in other countries, making it difficult to understand the pathogenic divergence of SMV strains/isolates [23, 66, 68]. Consequently, we look forward to future international cooperation to establish a global standard set of soybean differentials in a system for unifying the SMV classification. This will be extremely helpful in exploring the physiological effects and geographical distribution of SMV strains around the world, as well as conveniently making use of the elite SMV-resistant soybean germplasm resources across borders.

HC-Pro has been shown to be strongly associated with symptomatology in several plant-potyvirus systems. Atreya et al. [3] provided evidence that amino acid substitutions within the N-terminus of HC-Pro of tobacco vein mottling virus (TVMV) affected helper component activity, virus accumulation, and symptom expression in infected tobacco plants, especially the mutation K307E, which not only completely abolished aphid transmissibility but also noticeably affected TVMV virulence. Replacement of two-thirds of the TVMV HC-Pro with that from another potyvirus attenuated symptoms and reduced virus accumulation on Nicotiana benthamiana, suggesting that HC-Pro may have universal importance in regulating potyvirus virulence [2]. Introduction of several mutations into HC-Pro of tobacco etch virus decreased virus accumulation and symptom development in tobacco plants [13]. Fragments were exchanged and site-directed mutagenesis was conducted in HC-Pro of zucchini yellow mosaic virus, and a unique mutation (R180I) in the highly conserved motif (from FRNK to FINK) dramatically decreased symptom severity in various cucurbit species, including squash, cucumber, melon, and watermelon, by directly influencing the levels and regulatory functions of microRNA populations [20, 58]. Genome sequence analysis of the progeny virus of SMV strain G7H implied that a single amino acid mutation (P359S) in HC-Pro led to changes in symptoms in soybean cv. Jinpumkong-2 [56]. Similar results were obtained in the current study, and based on the symptom severity, viral accumulation, and the available data in NSRUT, SMV strain N3 was shown to be more virulent than strain N1 (Fig. 1, Table 2, and Supplementary Table S1). Whole-genome sequencing showed that only three nucleotides (two in HC-Pro and one in CI) differed between N1 and N3, with only a single amino acid difference (N436S) in HC-Pro between N1 and N3. The other two nucleotide differences were synonymous substitutions (Table 3, Supplementary Figs. S3 and S4), indicating that the difference in virulence between N1 and N3 was very likely determined by the variation in HC-Pro. In combination, these results convince us that HC-Pro acts as a determinant of symptoms and virulence in the SMV-soybean pathosystem.

Soybean, which has a 5000-year history of cultivation, originated and was domesticated in ancient China and was disseminated early to North, East, and South Asia, and afterwards from Northeast China to Europe and the Americas after the year 1700 [32, 33, 45, 59]. The origin of SMV has been supposed to correspond to that of soybean, namely in South and East Asia, particularly in China [25, 26, 68]. In this study, China was found to have several different SMV pathotypes, displaying the highest level of genetic diversity, while the number of pathotypes was limited in the other countries (Table 5, Supplementary Table S3, and Fig. 2), implying that SMV might have originated in China. Moreover, we presume that the dissemination of soybean has facilitated the spread of SMV across continents via seed-transmission, as soybean is the natural host of SMV. The present study showed that the N3 type is the most common and widespread worldwide (Table 5, Supplementary Table S3, and Fig. 2), suggesting that the enhanced virulence of N3 (Fig. 1 and Table 2) derived from the genetic variation in HC-Pro (Table 3, Supplementary Figs. S3 and S4) facilitated its spread and increased its adaptability in diverse geographical and ecological regions worldwide.

As no uniform system for SMV strain classification is available worldwide, genome sequencing is the most effective approach for exchanging information about the multitudinous SMV strains/isolates [66, 68]. Therefore, the complete sequences of N1 and N3 obtained in this study will complement the current sequence information of SMV. The results from the sequence analysis could broaden and enrich our knowledge about the molecular variability, geographical distribution, phylogenetic relationships, and evolutionary history of SMV. Besides, directional selection pressure created by the widespread application of SMV-resistant soybean cultivars has given rise to the frequent occurrence of resistance-breaking SMV strains/isolates [9, 18, 50]. Furthermore, interspecific genetic exchanges between SMV and other potyviruses have been continually detected. Chen et al. [4] isolated an SMV-like virus from Pinellia ternate, which was shown to result from recombination between SMV and dasheen mosaic virus. An SMV strain that probably originated from the recombination between SMV and bean common mosaic virus (BCMV) or a BCMV-like virus is prevalent in soybean-growing areas of China [5, 64, 65, 67, 68]. Jiang et al. [35] reported that an SMV isolate that was formed by the recombination between SMV and watermelon mosaic virus could cause different diseases in soybean and N. benthamiana plants. These observations emphasize the potential risk in soybean productions and the vital role of genome sequencing in enabling us to discover resistance-breaking and recombinant SMV variants, providing an efficient strategy for monitoring and prevention of SMV.