Sweet potato (Ipomoea batatas (L.) Lam.) is an important root crop, ranking seventh in global food crop production, with approximately 90 million tonnes (t) produced annually (FAOSTAT 2020). Sweet potato production in Australia has significantly increased in recent years, from approximately 6,000 t in 1990 to over 78,000 t in 2019 (FAOSTAT 2020). This increase in production has been facilitated by increased demand for sweet potato and improved agronomic practices including the introduction of new cultivars in the 1980s, mechanization of production systems, together with the introduction of a pathogen-testing scheme in the early 2000s, providing sweet potato growers with ready access to clean planting material (Bourke 2009; Dennien et al. 2013).

Viral infections are one of the major constraints to global sweet potato production. Due to the vegetative nature of sweet potato propagation, pathogen accumulation occurs over successive propagation cycles (Clark and Hoy 2006; Clark et al. 2012; Valverde et al. 2007). Globally, efforts have been directed towards the identification and characterisation of viruses infecting sweet potato, and possible phyto-sanitation processes to support the safe exchange of germplasm. The availability of pathogen-tested, clean planting material will limit virus spread to new areas and increase production. More than thirty different viruses have been reported from sweet potato globally (Buko et al. 2020; Clark et al. 2012; Jones 2021; Kreuze et al. 2021), however, only five of these, namely sweet potato feathery mottle virus (SPFMV), sweet potato virus 2 (SPV2), sweet potato chlorotic fleck virus (SPCFV), sweet potato leaf curl virus (SPLCV) and sweet potato collusive virus (SPCV) have been confirmed from Australia, with limited sequence information available (Barkley et al. 2011; Gibb and Padovan 1993; Jones and Dwyer 2007; Maina et al. 2016a, b; Maina et al. 2018a, b; Tairo et al. 2006).

Sweet potato collusive virus (SPCV) (genus Cavemovirus, family Caulimoviridae), previously known as sweet potato caulimo-like virus (SPCaLV), was first identified in 1987 (Atkey and Brunt 1987) following the observation of isometric particles of about 50 nm in diameter from sweet potato originating from Puerto Rico, which were graft transmissible to the indicator host I. setosa. SPCV has since been detected in sweet potato from several South Pacific countries (Papua New Guinea, Tonga, and the Solomon Islands), New Zealand, the Caribbean Islands, Central America, China, Egypt, Kenya, Portugal, and Uganda (Atkey and Brunt 1987; Cuellar et al. 2011; Davis and Ruabete. 2010; De Souza and Cuellar 2011; Feng et al. 2000). However, only a single complete genome sequence of SPCV, originating from Portugal (GenBank accession NC_015328), together with nine SPCV partial replicase sequences (GenBank accession HQ698912 – HQ698920) (Cuellar et al. 2011) and one partial coat protein sequence (GenBank accession MK802082) (Wang et al. 2021), has been reported.

SPCV was the second member of the genus Cavemovirus to be characterised after the type species Cassava vein mosaic virus (CsVMV) (Teycheney et al. 2020), with two additional putative cavemovirus species recently described from cactus and chicory, tentatively named “epiphyllum virus 4” (EpV-4) (Zheng et al. 2020) and “chicory mosaic cavemovirus” (ChiMV) (Silva et al. 2021), respectively. SPCV usually occurs as a symptomless infection in sweet potato with no insect vector identified to date (Sastry et al. 2019). However, graft transmission of SPCV to the indicator plant I. setosa induces small chlorotic spots and flecks leading to necrosis (Dennien et al. 2013). SPCV has a dsDNA genome of around 7.7 kb with four open reading frames (ORFs) and a large intergenic region in which the pregenomic RNA promoter, the RNA polyadenylation signal and the negative-sense strand primer-binding site are located (Cuellar et al. 2011). ORF 1 encodes a multifunctional protein that is cleaved into two functional subunits, including the coat protein (CP) and a movement protein (MP), ORF 2 encodes a second polyprotein that is cleaved to give an aspartate proteinase (AP), ribonuclease H (RNase H) and reverse transcriptase (RT), which are typical of all Caulimoviridae members, while ORF 3 encodes a putative inclusion body protein (IBP) (Geering 2014; Cuellar et al. 2011). In addition, a small fourth ORF (ORFa) is also predicted but has no known function (Cuellar et al. 2011). Interestingly, SPCV, ChiMV and EpV-4 have a genome organisation that differs slightly from CsVMV where a small ORF of unknown function is located between the CP/MP-encoding ORF and the AP/RT/RNaseH-encoding ORF (Cuellar et al. 2011; Silva et al. 2021; Zheng et al. 2020).

In 2007 SPCV was first identified in Australia through routine pathogen testing of sweet potato germplasm by the Department of Agriculture and Fisheries (DAF), Gatton Research Facility (GRF) from two sweet potato cultivars, namely GRF0085/Alleys-Red (collected from a Cairns fresh food market) and GRF0069/Beni-Aka (imported from Japan and located in a field collection at the DAF Redlands Research Facility). Graft inoculation of the two sweet potato accessions onto I. setosa in glasshouse experiments induced leaf symptoms typical of infection with SPCV including circular interveinal chlorotic spots and necrotic flecks (Fig. 1a-b) (Atkey and Brunt 1987). Leaf tissue from the grafted I. setosa subsequently tested positive for SPCV using the nitrocellulose membrane ELISA (NCM-ELISA) kit developed by the International Potato Centre (CIP), Peru (Fig. 1c) (Gutiérrez et al. 2003). Here we describe further work to characterise the two SPCV isolates, including microscopy to determine particle morphology and cytopathological effects and molecular analysis of the complete genomes of the two isolates.

Fig. 1
figure 1

Characterisation of sweet potato collusive virus (SPCV) isolates from Australia. a) Symptoms of SPCV infection in leaves of Ipomea setosa grafted with scions from sweet potato infected with isolate Beni-Aka; b) Symptoms of SPCV infection in leaves of I. setosa grafted with scions from sweet potato infected with isolate Alleys-Red; c) Nitrocellulose membrane ELISA showing the detection of SPCV in duplicate leaf samples from healthy, Alleys-Red (AR)- or Beni-Aka (BA)-infected I. setosa plants – ‘old’ and ‘young’ indicate leaves from the lower or upper parts of the plants, respectively; d) Transmission electron micrograph of ultrathin tissue section from I. setosa leaf tissue infected with isolate Beni-Aka showing virions in the cytoplasm of an infected cell (Bar = 200 nm); e) Transmission electron micrograph of virus-like particles purified from I. setosa infected with SPCV isolate Beni-Aka negatively stained with 2% phosphotungstic acid (pH 7). Bar = 50 nm; f) Maximum-likelihood phylogenetic tree (1000 bootstrap replicates) generated in MEGA7 using partial replicase sequences of SPCV-Aus1 and SPCV-Aus2 ORF 2. (Cassava vein mosaic virus (CsVMV) was used as an outgroup; isolate GenBank accession numbers are indicated in parentheses)

Leaf tissue from the grafted I. setosa plants was used for electron microscopy studies to identify viral particles in cells of infected plants. To detect the presence of virions ultra-thin sections (approximately 50–60 nm) were cut from healthy and graft-inoculated I. setosa leaf tissue samples, stained with uranyl acetate/lead citrate (Glauert 1972), and examined by transmission electron microscopy (TEM) in a JEOL 1200EX transmission electron microscope at 80 kV. Large numbers of isometric particles, ~ 50 nm in diameter, were observed in sections from I. setosa plants grafted with scions from both Alleys-Red and Beni-Aka (Fig. 1d). Virions were subsequently purified from the leaves of sweet potato and I. setosa essentially as described by Hull et al. (1976) and examined by electron microscopy after negative staining with 2% phosphotungstic acid (pH 7). When virion preparations were examined by TEM, numerous regular icosahedral particles, averaging ~ 50 nm in diameter, were observed in extracts from both sweet potato and I. setosa preparations (Fig. 1e).

Leaf tissue from the two infected sweet potato cultivars was then used for total nucleic acid extraction (TNA), rolling circle amplification (RCA) and next-generation sequencing (NGS). Briefly, TNA was isolated from 100 mg of infected sweet potato leaf tissue as described by Kleinow et al. (2009). RCA was subsequently performed using 2.5 μM of exonuclease-resistant random hexamers (ThermoFisher Scientific, Australia) as described by Sukal et al. (2019). The undigested RCA products were purified using the Illustra™ GFX™ PCR DNA and Gel Band Purification Kit (GE Healthcare) and sent to Macrogen (South Korea) for library preparation using the Nextera™ XT sample preparation kits (Illumina) and sequencing using the Illumina MiSeq platform.

A total of 2,585,444 and 2,279,642 paired-end reads were obtained for the Alleys-Red and Beni-Aka samples, respectively (Supp. Table 1). The quality of the MiSeq reads was assessed with FastQC v0.10.1 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), with residual adapter sequences trimmed, as well as low quality (Q < 30) and short reads (< 40 nt) removed, using the Bbduk plugin in Geneious® v11.0.2 (http://www.geneious.com; Kearse et al. 2012). Quality corrected reads were de novo assembled using the embedded Geneious® assembler. Subsequently, BLASTn was used to query the de novo assembled contigs with > 1000 bp size against a local RefSeq virus database (downloaded from NCBI 15th Dec 2020).

BLASTn of assembled contigs revealed a single contig of 10,857 bp from Beni-Aka and three contigs of 3,270, 4,020 and 4,050 bp from Alleys-Red having 92–95% nucleotide sequence identity with the complete genome sequence of SPCV isolate Mad1 (NC_015328). The complete genome of SPCV-Mad1 was then used to carry out a reference-guided assembly using Geneious mapper with five iterations. Based on these assemblies, a complete genome of 7,712 bp was generated from the Beni-Aka sample and the isolate was subsequently named SPCV-Aus1 (GenBank accession no. MZ208794). Similarly, for the Alleys-Red sample the assembly generated a putative partial sequence of 7,062 bp, with initial analysis indicating that the sequence was missing a portion of the large intergenic region. To determine whether the missing sequence in the large intergenic region was related to the genome assembly, an issue with the HTS protocol used, or a true mutation, further analysis was carried out using PCR and Sanger sequencing. Briefly, PCR was carried out using sequence-specific primers flanking the region where the missing sequence was predicted. PCR master mix consisted of 10 μl of 2X GoTaq Green Master Mix (Promega), 10 ρmol of each sequence-specific primer (SPCV-IR_FP 5'- ATCGGACAGCGACTCAAAGG -3' and SPCV-IR_RP 5'- TCTTTCACGTTCTAATTGCTCCAT -3') and 1 μl of the original TNA extract (diluted to 50 ng/μl) in a final volume of 20 μl. PCR cycling conditions included an initial denaturation step at 94 °C for 2 min, followed by 35 cycles of 94 °C for 20 s, 50 °C for 20 s, and 72 °C for 2 min, with a final extension at 72 °C for 10 min. The amplified products were cloned into pGEM®-T Easy (Promega) and Sanger sequenced as described previously (Sukal et al. 2017). The partial sequence of 7,062 bp obtained from the reference-guided assembly and the PCR-amplified fragment of 919 bp were subsequently de novo assembled to obtain the complete circular genome of the Alleys-Red SPCV isolate. The complete genome sequence of the Alleys-Red isolate was 7,274 bp and named SPCV-Aus2 (GenBank accession no. MZ208795).

The complete genome sequences of SPCV-Aus1 and SPCV-Aus2 have an A + T content of 74.3% and 74.1%, respectively, similar to the A + T composition of SPCV-Mad1 (74.3%) (Cuellar et al. 2011). Also consistent with SPCV-Mad1 and other Caulimoviridae, a plant cytoplasmic initiator methionine tRNA sequence comprising either 5′- TGGTATCAGAGCATAGTT -3′ (SPCV-Aus1) or 5′-TGGTATCAGAGCAAAGTT-3′ (SPCV-Aus2) was identified and designated as the start of the circular genome (Medberry et al. 1990; Teycheney et al. 2020). Three putative major ORFs were predicted for both isolates using Geneious ORF predictor.

ORF 1 of SPCV-Aus1 is positioned from nt 54 to 3,839 and encodes a putative protein of 1,261 aa with a Mr of 149.8 kDa and contains predicted CP and MP domains. ORF 2 positioned from nt 3832 to 5757 encodes for a putative protein of 641 aa with a Mr of 75.6 kDa containing the AP, RT and RNase H domains, while ORF 3, positioned at nt 5687 to 6883 and encoding a putative protein of 398 aa with a Mr of 46.3 kDa was predicted to encode a putative IBP domain. SPCV-Aus1 also contained a putative small ORF 4, positioned at nt 7446 to 7556 and encoding a putative protein of 36 aa with a Mr = 4.3 kDa, however, no conserved domains were identified. This small ORF is located in a position similar to a small ORF (known as ORFa) identified in the SPCV-Mad1 sequence, which is also predicted in the genomes of CsVMV and EpV-4.

The SPCV-Aus2 isolate also encodes three major ORFs and a small ORF 4. However, ORF 1 of SPCV-Aus2 was shorter than the corresponding ORF 1 of SPCV-Aus1 and SPCV-Mad1, while ORF 2, ORF 3 and the small ORF 4 (ORFa) were similar in length to their corresponding ORFs in the other SPCV isolates. ORF 1 of SPCV-Aus2 was 3345 bp (nt 55 – 3399) encoding a putative protein of 1114 aa with Mr of 132.1 kDa. Sequence comparison between SPCV-Aus1 and SPCV-Aus2 showed ORF 1 of SPC-Aus2 does not contain a sequence equivalent to nt 102–548 (149 aa) of SPCV-Aus1, however the predicted CP and MP domains are present. ORF 2 of SPCV-Aus2 was 1926 bp (positioned at nt 3392 – 5317), encoding a putative protein of 641 aa (Mr = 75.4 kDa), while ORF 3 was 1197 bp (positioned at nt 5247 – 6443) encoding a putative protein of 398 aa with Mr of 46.4 kDa and ORF 4 was 111 bp (positioned at nt 7009 – 7119) encoding a putative protein of 36 aa (Mr = 4.3 kDa). Similar motifs were detected in the first three ORFs of SPCV-Aus2, while ORF 4 had no significant similarity in database searches.

Phylogenetic analysis was carried out using partial replicase sequences (864 bp) of SPCV-Aus1 and SPCV-Aus2, together with other SPCV sequences available from GenBank and CsVMV (accession NC001648). MEGA7 (Kumar et al. 2016) was used to align sequences by ClustalW with the default settings (Larkin et al. 2007), and a phylogenetic tree generated using the maximum-likelihood (ML) method, based on the Kimura-2-Parameter model following Nearest-Neighbour-Interchange ML heuristic method, with 1000 bootstrap replications for tree generation. The phylogenetic analysis showed that SPCV-Aus1 clusters together with SPCV isolates from North and Central America and was most closely related to isolates Mex183 from Mexico and Cub44 from Cuba (Fig. 1f). In contrast, SPCV-Aus2 clustered together with SPCV-Mad1 and four isolates from Africa (Fig. 1f). Isolates from Guatemala (Gua138 and Gua154) and Panama (Pan128) formed a distinct subgroup which appears to be ancestral to the two groups comprising other SPCV isolates sequenced to date (Fig. 1f).

Analysis of pairwise sequence identities between the SPCV-Aus1 and SPCV-Aus2 sequences with the 10 previously published partial replicase sequences of SPCV isolates using SDT v1.2 (Muhire et al. 2014) revealed 92.8–99.9% sequence identity between isolates (Fig. 2). The two isolates from Australia share 94.6% similarity within this region, while SPCV-Aus1 has 92.8–94.6% similarity with published SPCV sequences and SPCV-Aus2 has 92.9–97.8% similarity with published SPCV sequences (Fig. 2).

Fig. 2
figure 2

Pairwise nucleotide identity for partial replicase nucleotide sequences of available sweet potato collusive virus (SPCV) isolates from NCBI together with SPCV-Aus1 and -Aus2. Cassava vein mosaic virus (CsVMV) was used as an outgroup

This work reports the first full-length SPCV isolates characterised from Australia, complementing the single published full-length genome sequence of SPCV currently available. Analysis of these new complete sequences confirms the genome organisation of SPCV-Mad1, which differs slightly from CsVMV, the type member of the genus Cavemovirus, with no small ORF2 identified in the three fully sequenced SPCV isolates now available (Supp. Table 2). Similarly, the recently reported ChiMV and EpV-4 also have the same predicted genome organisation as the three fully characterised SPCV genome sequences (Supp. Table 2) (Silva et al. 2021; Zheng et al. 2020).

These results confirm the presence of SPCV isolates in sweet potato in Australia and increase the number of full-length sequences now available from one to three. This information will update the biosecurity status for the Australian sweet potato industry and enhance efforts currently in place to develop diagnostic protocols for the testing and production of clean planting material. Sweet potato originated in Central and South America and was later moved throughout the Pacific region, and Asia, by Polynesian and European voyagers (Mu and Li 2019). While the SPCV-Aus1 sequence is most similar to central American isolates of SPCV, the SPCV-Aus2 sequence is more closely related to isolates from Africa and Portugal. As sweet potato viruses are prevalent in planting material it is highly likely that the two isolates were introduced from different locations with planting material. While the Beni-Aka material was known to be imported from Asia, there is currently no sequence information available to confirm if similar isolates are present in this part of the world. The origin of the Alleys-Red plant material is unknown, but it could be speculated that it arrived from Africa or Portugal, or otherwise originates from the same location as cultivars with related sequences. The sequence of movement into Australia can only be determined through further characterisation of global SPCV diversity.