Evolutionary genomics of white spot syndrome virus

White spot syndrome virus (WSSV) has been one of the most devastating pathogens affecting the global shrimp industry since its initial outbreaks in Asia in the early 1990s. In this study, we recovered 13 complete metagenome-assembled genomes (MAGs) of Japanese WSSV isolates and 30 draft WSSV MAGs recovered from publicly available sequencing data, to investigate the genomic evolution of WSSV. Phylogenetic analysis revealed two major phylotypes, designated phylotypes I and II. Bayesian divergence time estimates placed the divergence time of the two phylotypes between 1970 and the early 1980s, with an estimated substitution rate of 1.1 × 10–5 substitutions per site per year, implying the existence of pre-pandemic genetic diversity of WSSV in Asia. Based on this scenario, phylotype I was responsible for the 1990s pandemic and spread worldwide, whereas phylotype II was localized in Asia and infiltrated Australia. Two cross-phylotype recombinant lineages were identified, which demonstrate the role of genomic recombination in generating the genetic diversity of WSSV. These results provide important insights into the evolution of WSSV and may help uncover the ultimate origins of this devastating pathogen.


Introduction
Penaeid shrimp aquaculture has experienced rapid growth since the latter half of the twentieth century, but it has been constantly threatened by various infections by bacteria, fungi, parasites, and viruses (Lightner et al. 2012;Momoyama and Muroga 2005;Stentiford et al. 2012).One particularly lethal virus for penaeid shrimps is white spot syndrome virus (WSSV), which is a large, double-stranded DNA virus that infects a wide range of decapod crustaceans (H.-C.Wang et al. 2019).WSSV has a bacilliform, enveloped virion which contains more than 40 different structural proteins (H.-C.Wang et al. 2019) and replicates in the cell nuclei of tissues of ectodermal and mesodermal origins (Chou et al. 1995).The WSSV genome is a circular, double-stranded DNA ranging 280-315 kbp in length, encoding 150 to 180 predicted protein-coding genes (Li et al. 2017;Tsai et al. 2004;van Hulten et al. 2001;Yang et al. 2001).
The earliest known outbreak of WSSV occurred in Fujian Province, China in July 1992 (Cai and Su 1993;Lo et al. 2005;Su et al. 1995), and it has since spread rapidly to other regions through human activities, affecting all shrimpfarming regions in the world (Lightner et al. 2012;Oakey and Smith 2018;Onihary et al. 2021;Tang et al. 2013).In Japan, the first recorded incidence of WSSV occurred in Hiroshima Prefecture in 1993 via the introduction of infected kuruma shrimp Penaeus japonicus seeds (Inouye et al. 1994;Momoyama et al. 1994;Nakano et al. 1994).WSSV rapidly spread to other regions in Japan, causing massive damage to shrimp production.By the end of the 1990s, WSSV established itself in wild Japanese crustacean populations (Maeda et al. 1998;Okamoto and Suzuki 1999;Fukuzumi and Chikushi 2003;Izumikawa 2013) and still continues to affect the kuruma shrimp industry.
Understanding the origins and evolution of infectious diseases has various socioeconomic implications such as the development of biosecurity policies.Genomic information is essential for understanding the pathogenesis, epidemiology, and evolution of a virus.Dozens of WSSV genomes have been published from various parts of the world (Cruz-Flores et al. 2020;Han et al. 2017;Kooloth Valappil et al. 2021;Oakey and Smith 2018;Parrilla-Taylor et al. 2018;Rodriguez-Anaya et al. 2016;van Hulten et al. 2001;Yang et al. 2001), but there have been few attempts to decipher the evolution of WSSV at the genome level.In this study, we aimed to dissect the genomic evolution of WSSV by leveraging existing genomic assemblies and newly generated WSSV genome assemblies through de novo sequencing and exploration of publicly available datasets.

WSSV specimens
We sequenced a total of 12 specimens collected from seven Prefectures across Japan (Table 1).JP01 was derived from diseased P. japonicus in Miyazaki Prefecture between 1995 and 2000 and had been maintained as an infectious virus in the Laboratory of Genome Science, Tokyo University of Marine Science and Technology, Tokyo.JP02 was derived from diseased P. japonicus in Yamaguchi Prefecture between 1995 and 2000 and had been maintained as an infectious virus at the National Fisheries University, Yamaguchi Prefecture.JP03 and JP04 were identified from the shotgun sequencing data of Metapenaeopsis lamellata and Trachysalambria curvirostris samples, respectively, which were accidentally found to contain WSSV sequences.Both samples had been fixed with ethanol and therefore infectious viruses could not be recovered.S14, E1, E2, and 1-4 were recovered from diagnostic samples submitted to the Okinawa Prefectural Government and tested positive for WSSV by PCR.Sample 79 was collected during an epidemiological survey conducted by the Deepsea Water Research Center, Kumejima Island, Okinawa Prefecture.P. japonicus samples infected with isolates 0722-1 and Miyako2021 were provided by commercial farmers in Okinawa Island and Miyako Island, respectively.Pc2020 was derived from a naturally infected red swamp crayfish Procambarus clarkii originating from Chiba Prefecture.All Japanese WSSV genomes were regarded as MAGs because the sample preparation procedures did not involve the purification of virions.

DNA extraction
For JP01 and JP02 (Fig. 1), genomic DNA was extracted from shrimp homogenate prepared as a viral inoculum.We first concentrated the homogenate using Amicon Ultracells (Merck) and extracted DNA by phenol-chloroform-isoamyl alcohol extraction.The DNA was further concentrated using Amicon Ultracells.For JP03 and JP04, genomic DNA was extracted from the muscle tissue of the ethanol-preserved specimens by phenol-chloroform-isoamyl alcohol extraction.We used a swimming leg from specimen 79 for DNA extraction by phenol-chloroform-isoamyl alcohol extraction.For Pc2020, we extracted genomic DNA from the gills of a dead crayfish using a MagAttract HMW DNA Kit (Qiagen).

Multiple displacement amplification
Before Oxford Nanopore Technologies (ONT) library preparation, the genomic DNA of JP01, JP02, 79, S14, E1, E2, and 1-4 was amplified by multiple displacement amplification (MDA).Amplification was performed using the REPLIg Mini Kit (150023, Qiagen), and the amplicons were purified using Agencourt AMPure XP beads (Beckman Coulter) and quantified using a Qubit dsDNA BR Assay kit (Thermo Fischer Scientific).Approximately 1.5 µg of the amplicon was then digested with T7 Endonuclease I (M0302, New England Biolabs), purified again using Agencourt AMPure XP beads and quantified using the Qubit dsDNA BR Assay kit.Approximately 750 ng of the purified DNA was used for ONT library preparation.

Sequencing
Paired-end libraries for Illumina sequencing were prepared using the Nextera XT DNA Library Preparation Kit (Illumina) and were sequenced using the MiSeq Reagent kit v2 (2 × 150 cycle) or MiSeq Reagent Kit v3 (2 × 300 cycle).ONT libraries were prepared using the Ligation Sequencing Kit (SQK-LSK109, Oxford Nanopore Technologies) and NEBNext Companion Module for Oxford Nanopore Technologies Ligation Sequencing (E7180, New England Biolabs).The enzymatic reaction steps were extended to twice the suggested duration.The libraries were sequenced on R9.4.1 flow cells on a MinION or a GridION platform.The fast5 files were base-called using Guppy v. 5.0.11 in super-accuracy mode.

De novo assembly and assembly curation
The ONT reads were filtered by length using SeqKit (Shen et al. 2016) and aligned to the CN01 genome (RefSeq Accession no.NC_003225.3)with Minimap2 (Li 2018, p. 2).The mapped reads were extracted and de novo assembled using Flye v. 2.9 (Kolmogorov et al. 2019)

WSSV genome reconstruction from publicly available sequence data
The Sequence Read Archive (SRA) entries that contained signatures of WSSV were identified using BigQuery (last accessed December 2022), and the corresponding entries were downloaded from the DDBJ/NCBI/ENA database.The downloaded entries included the genomes of WSSV and host crustaceans, shotgun metagenomes, and transcriptomes.The sampling dates were determined based on the corresponding BioSample metadata and literature.The reads were trimmed using Fastp v. 0.20.1, and the trimmed reads were mapped to the CN01 genome with Minimap2.The mapped reads were extracted and depth-normalized using BBnorm (Bushnell et al. 2017).The normalized reads were assembled using SPAdes v. 3.15.5 (Nurk et al. 2013), and the contigs were scaffolded by Chromosomer v. 0.1.4a(Tamazian et al. 2016).The scaffolds were polished using Pilon v. 1.24 (Walker et al. 2014) and were manually curated.

Phylogenetic analysis and divergence time estimation
Publicly available WSSV genomes were downloaded from the NCBI database (last accessed February 2023).Proteincoding genes in some of the publicly available WSSV genomes were highly fragmented.The presence of extensive frameshift mutations was likely due to the use of particular sequencing platforms (IonTorrent PGM or ONT), which have been known to struggle with homopolymers.
The fragmentation of core genes such as the major capsid protein (wsv360) and DNA polymerase (wsv514) (Kawato et al. 2019) strongly suggested that the majority of mutations in these assemblies did not originate biologically, and therefore these assemblies were excluded from the analysis.We used Snippy (https:// github.com/ tseem ann/ snippy) to construct a core genome alignment of 61 WSSV genomes by mapping simulated short reads generated from each assembly to the CN01 genome (clean.full.aln in Online Resource 2, Online_Resource_2_whole_genome_alignments.zip).A maximum-likelihood phylogenetic tree was built (clean.full.aln.treefile in Online Resource 2, Online_Resource_2_ whole_genome_alignments.zip) using IQ-TREE 2.2.2.3 (Minh et al. 2020), which revealed two major phylotypes, I and II (Fig. 2).Possible recombinants (lineages 1601 and Qingdao; see "Identification of recombination breakpoints" Section) and genomes with unknown sampling dates were excluded from the downstream analyses.
We then built a maximum likelihood phylogenetic tree with IQ-TREE v. 2.2.2.3 from the recombinant-free, dated whole genome alignment containing 48 sequences (clean2.full.aln.treefileand clean2.full.aln in Online Resource 2, Online_Resource_2_whole_genome_alignments.zip) and inspected the tree with Tempest v. 1.5.3 (Rambaut et al. 2016).Rooting, based on heuristic residual mean squares, placed the root between phylotypes I and II (R 2 = 0.5056; function: heuristic residual mean squared).The root position was independently supported by the minimum variance method implemented in FastRoot v. 1.5 (Mai et al. 2017).
Bayesian phylogenetic analysis was performed with BEAST v. 2.7.4 (Bouckaert et al. 2014).Preliminary analysis using 48 genomes placed WSSV-TH or WSSV-TW as ancestral strains, but this was considered implausible since it drastically deviated from the most likely root position found in the Tempest and FastRoot analyses.Tempest analysis found WSSV-TH and WSSV-TW had large rootto-tip genetic distance residuals, which indicated that the two genomes contained disproportionately large numbers of unique nucleotide variations relative to their sampling dates.We suspect that the observed large residuals were sequencing artifacts arising from the Sanger-based shotgun sequencing strategy.Therefore, we excluded from the dataset WSSV-TW, WSSV-TH, and six other sequences with the absolute values of root-to-tip genetic distance residuals larger than 1.30 × 10 -4 (WSSV-LS, POMZ4, POMZ1, Pc2020, PG1, WSSV-Peru).
The final recombinant-free whole genome alignment contained 40 genomes (clean3.full.aln in Online Resource 2, Online_Resource_2_whole_genome_alignments.zip).The maximum-likelihood phylogenetic tree built with IQ-TREE v. 2.2.2.3 (clean3.full.aln.treefile in Online Resource 2, Online_Resource_2_whole_genome_alignments.zip) yielded a Tempest R 2 value of 0.8049 (function: heuristic residual mean squared).Bayesian divergence time estimation was performed using BEAST v. 2.7.4 (Bouckaert et al. 2014).To account for the ambiguity in the sampling dates of JP01A, JP01B, and JP02, we set a prior distribution of the three tips as a normal distribution (σ = 1.0) with the mean at 1998.Fifty million iterations were performed, which were sampled every 10,000 steps after a 10% burn-in.We used Tracer v. 1.7.2 (Rambaut et al. 2018) to monitor the progress of the run and to ensure that the effective sampling sizes of all parameters were larger than 200, except the posterior of run "coalescent constant, GTR," which was 175.3.Three population models (constant, exponential, and Bayesian skyline) and two substitution models (HKY and GTR) were used to assess the impact of model selection.A maximum clade credibility tree was generated for each run with TreeAnnotator (Drummond and Rambaut 2007), which was visualized with FigTree v. 1.4.4 (http:// tree.bio.ed.ac.uk/ softw are/ figtr ee/).We arbitrarily selected the estimate under the HKY model assuming a constant population size for presentation in Fig. 3, as all estimates converged on similar tree topologies and estimated divergence dates.BEAST XML files and resulting trees are available as Online Resource 3 (Online_Resource_3_BEAST_trees.zip).

Identification of recombination breakpoints
Possible recombinants (lineages 1601 and Qingdao) were detected by inspecting neighbor-net (Bryant and Moulton 2004) phylogenetic networks constructed from the whole genome alignment of 61 WSSV genomes using SplitsTree4 v. 4.19.0 (Huson 1998).We used RDP4 v. 4.101 (Martin et al. 2015) to analyze the recombination breakpoints of the crossphylotype lineages.Subsampling of WSSV genomes was necessary to complete RDP4 analysis.Recombination sites of lineage 1601 were identified by analyzing the following six genomes: phylotype I, CN01 and PC; phylotype II, CN03 and WSSV-AU; lineage 1601, 1601 and GCF7.Recombination sites of lineage Qingdao were identified by analyzing the following six genomes: phylotype I, CN01 and PC; phylotype II, CN03 and WSSV-AU; lineage Qingdao, Qingdao2019 and Qingdao2020.Recombination breakpoints were plotted against the CN01 genome, and corresponding regions of genome alignment were extracted with a custom script (https:// github.com/ satos hikaw ato/ bio_ small_ scrip ts/ blob/ main/ crop_ align ment.py).Maximum-likelihood phylogenetic trees based on the subgenomic alignments were constructed with IQ-TREE 2.2.2.3.Subgenomic alignments and maximum-likelihood phylogenetic trees are available as Online Resource 4 (Online_ Resource_4_subgenomic_alignments.zip).

Sequencing and assembly of Japanese WSSV genomes
We sequenced a total of 12 specimens from farmed and wild crustaceans collected in Japan (Table 1), resulting in 0.35 to 3.74 Gb of Illumina reads per specimen and 69 Mb to 14.9 Gb of ONT reads per specimen (Online Resource, Table S1).A total of 13 genomes were recovered, ranging in size from 288,190 bp (Miyako2021) to 311,562 bp (JP02), with the number of protein-coding genes ranging from 160-180 (Table 2).
JP01A (299,976 bp) and JP01B (293,923 bp) originate from diseased P. japonicus in Miyazaki Prefecture between 1995 and 2000.Haplotype phasing resolved two closely related genotypes present in the viral inoculum.Isolate JP02 (Fig. 1), originating from Yamaguchi Prefecture between 1995 and 2000, had a genome size of 311,562 bp -the largest complete WSSV genome sequenced to date and approaching the estimated genome size of TH-96-II (312 kbp) (Marks et al. 2005).The coding regions were overall very similar to those of CN01 (309,286 bp; NCBI RefSeq Accession no.NC_003225) (Li et al. 2017).The difference in genome size between JP02 and CN01 was mainly due to variations in repeat sequence lengths, including homologous repeats and variable number tandem repeats in ORFs.
Isolates JP03 and JP04, obtained from Mikawa Bay in Aichi Prefecture and Lake Hamana in Shizuoka Prefecture, respectively, shared (i) the translocation of wsv486 gene (Online Resource, Fig. S1), (ii) a 3381-bp deletion around the ORF14/15 region (Online Resource, Fig. S2), and (iii) a 7057-bp deletion around the ORF23/24 region (Online Resource, Fig. S3).The translocation of wsv486 has previously been reported in CN03 (GenBank Accession no.KT995471.1)(Li et al. 2017).The similarity of WSSV genomes from Mikawa Bay and Lake Hamana suggests that the two isolates share a common origin, which is consistent with the fact that wild P. japonicus populations in the two prefectures share a common spawning ground, the Sea of Enshu (Suitoh et al. 2014).Isolate 79 was identified during an epidemiological survey of WSSV in P. japonicus, while Pc2020 was identified in a wild Procambarus clarkii from Chiba Prefecture.Collectively, WSSV genomes derived from wild crustaceans provide further evidence that WSSV has become established in the wild crustacean populations in Japan.

Draft WSSV genomes recovered from publicly available sequencing data
A total of 30 WSSV draft genomes were recovered from publicly available sequencing datasets (Table 3).Nineteen MAGs were recovered from samples derived from China,

Two major phylotypes
The maximum-likelihood phylogenomic tree revealed two phylotypes (Fig. 2).Rooting between the two phylotypes was supported by heuristic residual mean squares, implemented in Tempest, and the minimum variance method implemented in FastRoot.We identified two cross-phylotype recombinant lineages (1601, shaded in orange in Fig. 2; Qingdao, light green), which will be discussed later (see "Emergence of recombinant strains").

Divergence time estimates
We next estimated the divergence time of the two phylotypes by Bayesian phylogenetic analysis.For the analysis, we removed from the dataset (i) sequences with unknown sampling dates, (ii) cross-lineage recombinants, and (iii) WSSV genomes that had exceptionally large or small rootto-tip divergences in the maximum-likelihood phylogenetic tree.The Bayesian phylogenetic analyses using 40 WSSV genomes yielded 95% confidence intervals for the divergence between phylotypes I and II occurring around 1973-1981(median in Fig. 3: 1977)), suggesting that the two phylotypes had diverged prior to the 1990s pandemic.The estimate was robust to the choice of priors including population dynamics (coalescent constant, coalescent exponential, and Bayesian skyline) and substitution models (HKY and GTR).
The Bayesian phylogenetic analyses converged to median estimated mutation rates of 1.11 × 10 -5 to 1.15 × 10 -5 substitutions/site/year (Online Resource, Table S2).While this may be higher than estimated mutation rates of other dsDNA viruses (Brennan Greg et al. 2022;Firth et al. 2010;Guellil et al. 2022;Morga et al. 2021), it can be reasonably derived from an experimentally derived mutation rate of a baculovirus (1 × 10 −7 mutations/site/replication) (Boezen et al. 2022) assuming 100 replications/year, both of which are reasonable assumptions considering the biology of WSSV.Although we should be cautious with interpreting this value since estimated virus mutation rates vary substantially between short and long terms (Duchêne et al. 2014;Ghafari et al. 2021), we believe that our estimate of the WSSV mutation rate is reasonable for analyzing the genomic evolution of WSSV in the past few decades.

Phylotype I was responsible for the 1990s pandemic
Phylotype I contained all strains isolated during the 1990s pandemic that originated in Asia.All Japanese WSSV genomes belonged to clade I.The lack of resolution among the phylotype I members in the tree is consistent with the rapid spread of the virus, which did not allow enough time for WSSV genomes to accumulate mutations to enable tracking its geographic spread.Marks et al. (2004) proposed the existence of an ancestral strain with the largest genome, which subsequently shed redundant segments to become smaller genomes (Marks et al. 2004).Isolate TH-96-II, with an estimated genome size of 312 kbp, is believed to be a close representation of the ancestral WSSV genome, although the whole genome of this isolate has not been published (Marks et al. 2005).JP02 and CN01, both belonging to phylotype I, retained the intact ORF13/14 and ORF24/25 regions and are therefore likely to closely resemble TH-96-II (Online Resource, Figs.S2 and S3).The short branch lengths of JP02 and CN01 also suggest that they are close representations of the common ancestor(s) of phylotype I (Fig. 1).
A small branch which forms a sister clade to all other phylotype I members was found (Figs. 2 and 3).The estimated divergence time between this clade and the other members, including CN01 and JP02, predates the 1990s, suggesting that the two lineages had diverged before the pandemic.This might represent another line of evidence that WSSV had prepandemic genetic diversity.Alternatively, it is possible that there have been small cross-phylotype recombination events that could not be detected by our analysis, but which contributed phylotype II-like phylogenetic signals to these sequences.
Taken together, these findings suggest that phylotype I was responsible for the 1990s pandemic and that JP02, CN01, and TH-96-II represent the ancestral genotype(s) of phylotype I that emerged in the pandemic epicenter in the 1990s in East Asia (Zwart et al. 2010).
Phylotype II is characterized by a 5949-bp deletion in the variable region ORF13/14 (relative to TH-96-II; Online Resource, Fig. S2).Based on this specific genomic deletion, it is likely that WSSV genotypes from Madagascar and Saudi Arabia belong to phylotype II (Onihary et al. 2021;Tang et al. 2013), although it is possible that other parts of their genomes have originated from other phylotype(s) due to recombination.ORF23/24 is more variable, with Indian isolates (CWG3, DBA1182, and WSSV-LS) sharing a 11,273-bp deletion (relative to CN01; Online Resource, Fig. S3), while Chinese (CN03 and CN04) and Bangladeshi (SS304) isolates share a 10,893-bp deletion (relative to CN01; Online Resource, Fig. S3).The ORF14/15 and ORF23/24 regions in WSSV-AU appear to have accumulated additional sequences.
Collectively, the distribution of phylotype II appears to be localized in Asia and have a distinct origin from that of phylotype I.This suggests that phylotype II represents a WSSV lineage already present in Asia prior to the 1990s pandemic.

Emergence of recombinant strains
Virus genomes can recombine, sometimes leading to a more complex evolutionary history with reticulate rather than bifurcating branches (Brennan Greg et al. 2022;Kolb et al. 2017).To investigate the possibility of genome recombination events in WSSV, we generated a neighbor-net phylogenetic network, which visualizes conflicting phylogenetic signals present in the whole genome alignment (Fig. 4a).By visually inspecting the phylogenetic network and comparing the topology with that of the maximum-likelihood phylogenetic tree in Fig. 2, we identified two possible cross-phylotype recombinant lineages, which we named 1601 and Qingdao.
Lineage 1601 consists of isolate 1601 (MH663976.1;referred to as "Procambarus clarkii virus" and "WSSV-Cc" by the authors of the original publication) (Ke et al. 2021) and four related MAGs (Laibin2019, GCF7, Jangsu2019, and Sichuan2020).This cluster was placed within phylotype I in the maximum-likelihood phylogenetic tree in Fig. 2, whereas it fell into phylotype II in the neighbor-net network Lineage Qingdao is represented by MAGs Qingdao2019 and Qingdao2020, which were identified from RNA-seq data of P. japonicus sampled in Qingdao, Shandong, China, in 2019 and 2020.Qingdao2019 and Qingdao2020 an early branching clade in phylotype II in the maximum-likelihood phylogenetic tree (Fig. 1).However, the phylogenetic network linked the Qingdao lineage to a phylotype I representative (Wei-fang2018) with a reticulation, suggesting a cross-phylotype recombination.
We hypothesized that these conflicting phylogenetic signals result from genomic recombination between two phylotypes, leading to distinct parts of the genome originating from different phylotypes.To test this, we used RDP4 to define potential recombination breakpoints within the two recombinant lineages (Fig. 4b).The analysis revealed two recombination events in the 1601 lineage and one in the Qingdao lineage (Online Resource, Table S3).
The predicted recombination breakpoints tell us which parts of the genome originate from which phylotype, and we expected that a phylogenetic analysis of selected genome segments should corroborate this.We selected three regions in the genome that showed no signs of recombination in either recombinant lineage (coordinates 23,734,111,858,and 205,825;Fig. 4b).The maximum-likelihood phylogenetic trees constructed from these regions supported the presence of two major phylotypes.As expected, lineages 1601 and Qingdao were classified under different phylotypes depending on the genome segment (Fig. 4c-e).These results demonstrate that crossphylotype recombination events gave rise to the 1601 and Qingdao lineages.
In conclusion, these observations indicate that WSSV genomes have experienced cross-phylotype recombination events, resulting in the emergence of at least two chimeric strains.

Discussion
In this study, we aimed to understand the evolution of WSSV using whole genome sequences.Phylogenetic analysis indicated the presence of two major phylotypes.The overall topology of the phylogenetic tree is consistent with the explosive spread of phylotype I during the 1990s pandemic.The common ancestor of phylotype I, the probable pandemic strain, likely had the largest genome of over 310 kbp in size and spread worldwide, and the genome shrank independently in various geographic regions (Zwart et al. 2010).Phylotype II, in contrast, seems to have a distinct origin in Asia and spread to Australia.A divergence time estimate pointed to the most recent common ancestor of the two phylotypes existing between 1970 and the early 1980s.This suggests that there was a preexisting diversity of WSSV genotypes in Asia before the 1990s pandemic.
WSSV classification based on partial genomic segments should be interpreted with caution, as it only reflects the origin of given specific genomic segments, rather than the entire genome.The traditional molecular markers are useful in analyzing epidemiology within a country or a province, as clearly stated by the original authors (Dieu et al. 2004).Our results also indicate that WSSV genomes do recombine, further complicating the evolutionary history of the whole viral genome.In this regard, it may be difficult to classify WSSV isolates based on a handful of markers and discuss the origins of WSSV.
We identified two cross-phylotype recombinant lineages, 1601 and Qingdao.It is possible that there are recombinant lineages that have been missed out in our analysis, as suggested by the inconsistent placement of some isolates in the phylogenetic trees constructed from subgenomes.We have found that detecting recombination in WSSV genomes is a complex task which requires careful attention to various factors, such as the construction of a reliable whole genome alignment, selection of confidently non-recombinant reference sequences, and consideration of structural variations in accessory genes and repetitive sequences.We also explored the possibility that recombinant MAGs were artifacts resulting from mixed infection, but this was considered unlikely because highly similar sequences were identified from multiple datasets sampled at different localities and timepoints.
The hidden genetic diversity of WSSV in Asia has been suggested by Oakey et al. (Oakey et al. 2019) and Zeng (Zeng 2021) who assessed the diversity of variable numbers of tandem repeats.It is possible that a thorough epidemiological survey in Asia could reveal a yet unknown genotypic diversity of WSSV.Direct sampling of WSSV in diverse geographic locations is difficult for various reasons.However, we may be able to discover ancestral WSSV strains through bioinformatic analyses (Kawasaki et al. 2021), as we have successfully recovered multiple WSSV genomes from publicly available sequence data, including datasets that do not necessarily target WSSV.

Fig. 1
Fig. 1 Genome diagram of white spot syndrome virus JP02.Outer track: protein-coding genes and their transcriptional orientations (blue).Middle track: GC skew of 100 bp sliding windows with 10-bp increments (positive: emerald, negative: purple).Inner track: deviation of GC contents from the average, 100 bp sliding windows with 10-bp increments

Fig. 2
Fig.2Maximum-likelihood phylogenetic tree of WSSV genomes.Japanese WSSV MAGs sequenced in this study are highlighted with a filled circle (•); WSSV MAGs recovered from publicly available high-throughput sequencing data are highlighted with a triangle (▲).Sampling years (if known) are indicated after an @ sign, followed by the NCBI accession numbers, if they exist.The sampling years for JP01A, JP01B, and JP02, which range from 1995 to 2000, due to uncertainty, were arbitrarily indicated as 1998.Numbers beside nodes indicate the SH-aLRT support (%)/ultrafast bootstrap support (%).Phylotypes I and II are shaded with light blue and pink, respectively.Cross-phylotype recombinant lineages 1601 and Qingdao are shaded with orange and light green, respectively

Fig. 4
Fig. 4 Cross-phylotype recombinant WSSV lineages.a Neighbor-net phylogenetic network of 61 WSSV genomes using uncorrected distances ("Uncorrected_P") ignoring constant sites.Phylotypes I and II are shaded with light blue and pink, respectively.Cross-phylotype recombinant lineages 1601 and Qingdao are shaded with orange and light green, respectively.Lineage 1601 clusters with phylotype II despite its phylogenetic position within phylotype I in the maximum-likelihood phylogenetic tree in Fig. 1.Lineage Qingdao is linked to Weifang2018 (phylotype I) with a reticulation, indicative of conflicting phylogenetic signals and suggesting their close relationships.b Coordinates of recombination breakpoints in lineages 1601 and Qingdao.The upper track shows the genome diagram of WSSV CN01.The middle and bottom tracks indicate the inferred origins of genome segments in lineages 1601 and Qingdao.The breakpoint coordinates are those of the CN01 genome.The gray boxes between the middle and bottom tracks denote the recombination-free segments used for the phylogenetic analyses in c to e. c Maximumlikelihood phylogenetic tree of recombination-free segment corresponding to nucleotides 23,315-111,734 in the CN01 genome.Phylotypes I and II are shaded with light blue and pink, respectively.Cross-phylotype recombinant lineages and Qingdao are shaded with orange and light green, respectively.Numbers beside nodes indicate the SH-aLRT support (%)/ultrafast bootstrap support (%). d Maximum-likelihood phylogenetic tree of recombinationfree segment corresponding to nucleotides 111,735-190,858 in the CN01 genome.e Maximumlikelihood phylogenetic tree of recombination-free segment corresponding to nucleotides 205,155-297,825 in the CN01 genome

Table 1
WSSV samples sequenced in this study (Patterson et al. 2015) al. 2017).Canu was particularly useful for removing chimeric reads arising from MDA, which improved the assembly quality.The resulting assemblies were polished with POLCA v. 3.4.2(ZiminandSalzberg2020)using Illumina reads trimmed using Fastp v. 0.20.1(Chenetal.2018).JP01 was found to be a mixture of two closely related haplotypes, and therefore two genotypes were resolved using FreeBayes v. 1.3.6 (Garrison and Marth 2012) and WhatsHap v. 1.6(Patterson et al. 2015).Haplotype-specific reads were reassembled using Canu v. 2.2 and polished with POLCA v. 3.4.2.

Table 2
Assembly statistics of Japanese WSSV MAGs