Identification and characterisation of common glow-worm RNA viruses

The common glow-worms (Lampyris noctiluca) are best known for emission of green light by their larvae and sexually active adult females. However, both their DNA and RNA viruses remain unknown. Glow-worms are virologically interesting, as they are non-social and do not feed as adults, and hence their viral transmission may be limited. We identified viral sequences from 11 different virus taxa by the RNA-sequencing of two Finnish populations of adult glow-worms. The viruses represent nine different virus families and have negative, positive, or double-stranded RNA genomes. We also found a complete retroviral genome. Similar viral sequences were found from the sequencing data of common eastern firefly of North America, a species belonging to the same family (Lampyridae) as that of the common glow-worm. On average, an individual glow-worm had seven different RNA virus types and most of them appeared to establish a stable infection since they were found from glow-worms during two consecutive years. Here we present the characterization of load, prevalence, and interactions for each virus. Most of the glow-worm RNA viruses seem to be transmitted vertically, which may reflect the biology of glow-worms as non-social capital breeders, i.e., they invest stored resources in reproduction.


Introduction
The common glow-worm (Lampyris noctiluca) is a nocturnal beetle, whose larvae and sedentary females emit highly visible green light. Larva glow is a warning signal for distastefulness [1][2][3] and adult females produce green light to attract flying males [4,5]. Glow-worms are capital breeders: they feed only as larvae, rendering this stage critical for storing energy, and potentially, for acquiring commensal and pathogenic organisms. Larval development takes one to four years and adult lifespan is of few weeks [4]. Most adult females glow for only one or a few nights to attract males, which is followed by mating, and soon after, they lay their eggs and die. The common glow-worm is widely distributed across the northern hemisphere, from Portugal to China and from Spain to Finland [4,6].
Glow-worm populations seem to be isolated from each other due to local differences in their emergence time and females' inability to fly. This lack of conspecific contact makes glow-worms epidemiologically interesting. Larvae are predators feeding mainly on slugs and snails, and at early stages the larvae may forage together [4]. Individuals from different populations have limited contact with each other, and likewise, individuals within one population meet perhaps only at birth, during larval stage, and mating [4,7]. Thus, horizontal virus transmission, that is, transmission between individuals of the same generation, is presumably quite limited. So far the only viruses found in Lampyridae family, the two orthomyxo-like viruses found in the North American common eastern firefly (Photinus pyralis), were observed to be transmitted vertically, that is, from parent to offspring [8].
We identified viral sequences from two Finnish glowworm populations and studied their presence in different life stages and tissues in Finnish and English glow-worms. We identified 11 RNA viruses from adult glow-worms, four Edited by Lorena Passarelli.

Bioinformatics analysis
The raw RNA-seq reads from the 2017 samples were trimmed for adapters and low-quality nucleotides (Q < 20) and filtered for reads less than 36 nt in length. The trimmed reads were pooled and assembled using Trinity v2.3.2 with in silico normalization [9]. Contigs over 1000 nt were aligned to each other using CAP3 [10]. Contigs were pooled and searched using Blastx v2.3.0 [11] (an e-value < 10 -10 and culling limit 1) against invertebrate virus protein sequences using NCBI virus protein RefSeq (downloaded in May, 2018). Contigs from individual data were translated to protein sequences with Transdecoder (github.com/TransDecoder), and only the longest of protein coding sequence over 160 aa was selected for search with Blastp against the invertebrate virus proteins. Virus positive contigs were pooled and aligned with each other with CAP3 and searched using Blastx, as above. Virus contigs were manually checked for complete or nearly complete viral genomes using NCBI ORFfinder (www.ncbi.nlm.nih.gov/ orffi nder/). All the identified viral sequences were seeded into NOVOplasty [12] with the original RNA-sequencing data to make sure that the viral sequences were as complete as possible. The data are deposited into at NCBI (PRJNA577041). Coverage and alignment of the reads to the virus sequences were studied with IGV [13] so that at least two reads were covering the reference sequence at any site. Remapping of reads from each glow-worm separately to virus sequences was done with BWA [14].
NCBI Blastp (nr) was used to identify possible domains and identical protein sequences: identity describes, which percent of characters in the query sequence is identical with the target sequence and coverage describes how much of the query sequence is covered by the target sequence. HHpred [15] was used to search for remote homologous proteins based on protein structure prediction. RNAfold (rna.tbi. univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) was used to predict stable genomic RNA structures.
Similar virus sequences to glow-worm viruses were searched with Blastx from other insects (taxid:6960) from NCBI's transcriptome shotgun assembly sequence database (TSA). We set Blastx bit score, which takes into account the alignment length, mismatches and gaps, above 200 for positive hits.

Phylogenies
Phylogenies were reconstructed from the RdRP amino acid sequences of each virus and 19 most similar hits in Blastp, RSA TSA hits (Table 2), and virus sequences from Shi et al. [16]. Sequences were aligned using E-INS-I method in MAFFT v7.313 [17] and prior to phylogenetic analysis the alignments were trimmed using trimAl v1.2 [18]. Amino acid substitution model was selected using ProtTest 3 [19] and phylogeny reconstruction was performed using PhyML v.3.0 [20]. Phylogenies were processed with iTOL [21].

Quantitative-PCR (qPCR)
qPCR was performed with virus-specific primers (Online Resource 2) using EvaGreen (Solis BioDyne) kit, as per manufacturers' protocol. Standards for qPCR were created with virus-specific primers using Phusion enzyme (Ther-moFisher). The PCR products were isolated with GeneJET gel extraction kit (ThermoFisher). Only the results with over 10 viral sequences were analyzed further.

Statistical analysis
All statistical analyses were performed in R v3.5.0 (www.Rproje ct.orf/). Pairwise analysis of virus reads was done by Spearman's correlation using multiple testing p-value correction.

Virus identification
We identified 11 RNA viruses by whole-transcriptome sequencing of 29 adult glow-worms. The samples were from two Finnish populations collected in 2017, four males (heads) and eight females (lanterns) from Konnevesi Research Station and eight males (heads) and nine females (lanterns) from Hanko (Online Resource 1). The tissues were chosen because they are important in sexual signaling. Viral sequences were assembled from pooled sample with the Trinity assembler (Table 1) and virus contigs were identified by protein similarity search against National Center for Biotechnology Information (NCBI) virus protein database. The virus sequences were deposited into NCBI GenBank database ( Table 2). The viruses were named according to their genomic organization (Fig. 1)  The sequence data were strand-specific and allowed us to separate the negative-strand replicative intermediates of positive-strand viruses and positive-strand transcripts of negative-strand viruses (Online Resource 3). The strand-specific data indicated that all identified positive-strand RNA viruses were replicating in the glow-worms. However, as the production of the sequencing library involved poly-A enrichment that biases the ratios of the virus transcripts, the strand-specific data were not analyzed further.
We analyzed also virus sequence variation between the samples (Online Resource 3). However, only fraction of the glow-worms had complete virus sequences without    are in scale to that. LnoMLV1 has two ORFs in different reading frames presented by the two ORF boxes aligned above and below the sequence line. RdRP RNA-dependent RNA polymerase, Gag capsid, ENV envelope, and RT reverse transcriptase LnoFV1 (0.00097) and partitivirus LnoPV1 (0.0010). Frequency of variable sites for partitivirus LnoPLV2 was 0.0016 variable sites/genome size/sample number.
Lampyris noctiluca macula-like virus 1 (LnoMLV1) is 6744 nt long with two open reading frames (ORFs). The longer ORF codes for a protein of 1920 aa, which was similar to an RdRP and a hypothetical protein of Bombyx mori Macula-like virus (txid:288,456, 84% coverage with 56% identity) according to Blastp search. The best hit by HHpred was a replicase of Tomato mosaic virus (e-value = 3e-24, pdb:3VKW_A), and several shorter hits to viruses of the Tymoviridae family. Shorter ORF, coding for a protein of 231 aa, was similar (98% coverage and 45% identity) to a coat protein of Bee Macula-Like virus 2 (2,094,260), according to Blastp search. HHpred showed that the ORF was similar to plant tymovirus desmodium yellow mottle virus capsid protein (e-value = 6.8e-59, 1DDL_C). Species demarcation criteria for Maculavirus genus is by ICTV less than 90% identity in capsid protein. Based on phylogenetic analysis, LnoMLV1 was most similar to Bee Macula-Like virus 2 (2,094,260) (Online Resource 7).

Lampyris noctiluca rhabdo-like virus 1
Rhabdoviruses are monopartite negative-strand RNA viruses. L. noctiluca rhabdo-like virus 1 (LnoRLV1) was found in three parts. The longest transcript of 6218 nt codes for a protein of 1924 aa, which according to Blastp search, is similar to Hubei rhabdo-like virus 3 (1,923,187) RdRP from Coleoptera mix (99% coverage and 34% identity). According to HHpred, the protein was similar to vesicular stomatitis virus L polymerase (e-value = 3.8e-190, 5A22_A). The second transcript of 2571 nt encodes a protein of 603 aa. Blastp search indicated the similarity of the protein to Hubei rhabdo-like virus 3 hypothetical protein 1 (80% coverage and 23% identity), and HHpred analysis showed that the protein was somewhat similar to fusion glycoprotein of Hendra virus (e-value = 0.00002, 5EJB_C), which suggests that the protein codes for an envelope. The smallest transcript of 1743 nt codes for a protein of 426 aa, which Blastp search found most similar to putative glycoprotein of Hubei rhabdolike virus 3 (39% coverage and 28% identity), and HHpred showed a weak similarity to p40 nucleoprotein of Borna disease virus (e-value = 23, 1N93_X) from the Mononegavirales order, to which the rhabdoviruses belong. Thus, the smallest transcript most probably codes for a capsid protein.

Lampyris noctiluca chuvirus-like virus 1
Chuviridae is a negative-stranded RNA virus family within the order Mononegavirales, with variable genome structures i.e. segmented, non-segmented, and circular [28]. LnoCLV1 genome had two segments, the larger genome of 6829 nt and encodes a protein of 2206 aa and the smaller segment that was 4489 nt long, with three ORFs coding for 693, 407, and 115 aa long proteins. According to Blastp search, the protein from the larger genome segment was similar to RdRP of Hubei chuvirus-like virus 3 (100% coverage and 43% identity, 1,922,858), and HHpred showed similarity to vesicular stomatitis virus L polymerase (e-value = 5.2e-185, 5A22_A). According to Blastp search, the 693 amino acids protein was similar to putative glycoprotein of Hubei chuvirus-like virus 3 (91% coverage and 49% identity) and HHpred showed similarity to the envelope glycoprotein of human herpesvirus 1 (e-value = 1.6e-27), which suggests that the ORF codes for an envelope protein. The 407 aa protein was similar to a hypothetical protein of Hubei chuvirus-like virus 3 (Blastp: 95% coverage and 37% identity), and HHpred showed protein structural similarity to p40 nucleoprotein of Borna disease virus (e-value = 2.6, 1N93_X) suggesting that the ORF may code for a coat protein. For the shortest protein of 115 aa, neither Blastp nor HHpred found any significantly similar proteins. According to phylogenetic analysis, LnoCLV1 was most similar to Hubei chuvirus-like virus 3 (1,922,858) (Online Resource 9), which has been isolated from Odonata mix and has a monopartite genome [16].

Lampyris noctiluca partiti-like virus 1 and 2
Partitiviruses are small bipartite double-stranded RNA viruses coding for RdRP and capsid proteins. We could only find the RdRP coding segment for two partitivirus-like viruses identified from the glow-worms. Lampyris noctiluca partiti-like virus 1 (LnoPLV1) genome was 1462 nt long and codes for a protein of 377 aa. According to Blastp search, the protein is similar to RdRP of Hubei partiti-like virus 31 (1,923,038, 93% coverage and 72% identity). HHpred showed that the protein sequence was similar to RdRP of negativestrand RNA virus, Thosea asigna virus (e-value = 1.5e-36, 5CX6_B). In phylogenetic analysis LnoPLV1 was closest to Hubei partiti-like virus 31 (Online Resource 10) isolated from spider mix. Only the RdRP genome segment was identified in the Hubei partiti-like virus 31 [16].
Lampyris noctiluca partiti-like virus 2 (LnoPLV2) genome segment was 1461 nt long and encoded a 436 aa protein. Blastp search showed similarity to RdRP of Hubei partiti-like virus 51 (90% coverage and 42% identity, 1,923,060), which was closest in the phylogenetic analysis also (Online Resource 10). Hubei partiti-like virus 51 was identified from Chinese land snail mix, and only RdRP segment was found, as observed in LnoPLV2, HHpred showed that LnoPLV2 RdRP is most similar to Mengo virus RdRP (8.6e-37, 4NYZ_A). Mengo virus belongs to the Picornaviridae family. LnoPLV1 and LnoPLV2 are separate species as their had only 28.6% identity over RdRP residues (57.6% coverage) and were only distantly related by phylogenetic analysis (Online Resource 10).

Lampyris noctiluca totivirus-like virus 1
Totiviruses are double-stranded RNA viruses without an envelope. L. noctiluca totivirus-like virus 1 (LnoTLV1) genome is 4761 nt long and contains two ORFs coding for proteins of 834 and 693 aa. As seen in Blastp search, the longer ORF was similar to hypothetical protein 3 of Hubei toti-like virus 16 (90% coverage with 24% identity, 1,923,304). HHpred found the longer ORF similar to 1 3  ). In phylogenetic analysis, Lno-ErV1 was found to be similar to uncharacterized proteins from several insects, such as, kissing bug, the old world swallowtail butterfly, fruit fly, louse, termite, ant, and silkworm, and transposons from the cabbage looper moth and red flour beetle, indicating that these insects have similar endogenous retroviruses (Online Resource 11).

Glow-worm viruses in other insects
We used NCBI's transcriptome shotgun assembly sequence database (TSA) to search for similar virus sequences in other insects. We set Blastx bit score, which takes into account the alignment length, mismatches and gaps, above 200 for positive hits. We found no similar viruses from insect transcriptomes for LnoFV1, LnoPLV2, or LnoTLV1 (Table 1). However, for the other viruses, we found similar sequences from several insect hosts, ranging from fruit flies and mosquitoes to bugs and sharpshooters (Table 1 and Online Resource 5-12). Interestingly, we found that a firefly (Photinus pyralis) shared five similar virus sequences (LnoIV1, LnoIV2, LnoBLV1, LnoPLV1, and LnoErV1) with glow-worms. The western tarnished plant bug (Lygus hesperus) shared four similar virus sequences (LnoIV1, LnoIV2, LnoBLV1, and LnoErV1), and the mediterranean fruit fly (Ceratitis capitata) and sharpshooter (Cuerna arida) shared three similar viruses (LnoIV1, LnoIV2, and LnoRLV1). All the insect hosts that had LnoIV1-like virus sequences also had LnoIV2-like virus.

The glow-worm virus loads, prevalence, and interactions
Viral RNA read amounts for samples collected in 2017 were analyzed for each glow-worm individually. Before the analysis, we removed poly-A-tails from the virus sequences. Only reads with a properly paired mate read mapping to the virus sequence were analyzed and the results were standardized by virus length and all clean reads of the sample (Fig. 2a). The most common viruses were LnoIV2 and LnoErV1, which were found in all samples. A high prevalence of LnoErV1 is expected as it is probably integrated into the glow-worm genome. The least common virus, found only in one sample, was LnoCLV1. The two Finnish glow-worm populations, southern Hanko and northern Konnevesi, had significantly different amounts of LnoFV1 (two-tailed student's t test: t = − 3.14, p-value = 0.0068 for females, not significant in males), LnoPLV1 (t = − 2.73, p-value = 0.015 for females, not significant in males), and LnoPLV2 (t = − 2.94, p-value 0.01 for females, not significant in males). LnoFV1 load levels were higher in Hanko than those in Konnevesi population  FLV1 IV1 IV2 BLV1 RLV1 CLV1 PLV1 PLV2 TLV1 ERV1   FN1B  FN1E  FN1H  FN2B  FN2E  FN2H  FN3B  FN3E  FN3H  FN4B  FN4E  FN4H  FS1B  FS1E  FS1H  FS2B  FS2E  FS2H  FS3B  FS3E  FS3H  FS4B  FS4E  FS4H  FS5B  FS5E  FS5H  LF1B  LF1H  LF2B  LF2H  LF3B  LF3H  LE1B  LE1H  LE2B  LE2H  LE3B  LE3H  2wL1  2wL2   0  2  4  6 and partiti-like viruses were almost exclusively found in the Hanko population. Viral load variation was analyzed for each virus separately (Fig. 2b). The positive-strand RNA virus loads varied somewhat more than the other RNA viruses; for example, positive-strand RNA viruses had a larger interquartile value range and more outlier values. Especially, LnoMLV1 had five outlier values, of which the highest reached over 1000 times higher than the median value. Two of the highest LnoMLV1 amounts were found in the two southern population male heads, 48% and 23% of all reads were coding for LnoMLV1. The highest virus read amounts in female lanterns coding for LnoMLV1 were 2% and 1.4% of all reads. There were no differences in the total viral load between the Hanko and Konnevesi populations (t = 0.08, p-value = 0.94).
Glow-worms, irrespective of their gender or population, had seven different virus types, on average. A minimum of two and at the most nine virus types were observed (Fig. 2c). We studied whether the viruses affected each other's infection by pairwise correlations (Fig. 2d) and found two statistically significant interactions, read counts of LnoIV1 and TyLV1 correlated positively (Spearman's correlation coefficient = 0.6, Holm's method adjusted p-value = 0.036) and existence of the two partiti-like viruses correlated positively in the individuals (correlation coefficient = 0.8, adjusted p-value < 0.0001).

Glow-worm virus loads in different body sections
We studied the virus loads by qPCR of glow-worm samples collected in 2018. Four female samples from Äänekoski and five female samples from Tvärminne were separated into head (the first segment, H), body (B), and eggs (E). The females were most probably unmated since they were still glowing when they were collected. Egg samples were scraped from body cavity and may have contained other tissues such as fat [29].
Additionally, three laboratory-reared larvae from southern Finland (Hanko) and three from southern England (Princes Risborough) were divided into head (H) and body (B) sections. Twelve isolated and unfed 2-week-old larvae from a single southern Finnish mother were split into two groups with six larvae in each (2wL1 and 2wL2) to yield enough RNA for experiment. All the viruses characterized from the 2017 samples, except LnoMLV1, were found from the 2018 female samples and their eggs (Fig. 3a), indicating that the viruses may be vertically and maternally transmitted. In further support of vertical transmission, we found LnoIV1, LnoIV2, LnoPLV1, LnoTLV1, and LnoERV1 from the unfed larval samples. Furthermore, all the viruses except LnoPLV1 and LnoMLV1 were also in the laboratoryreared larvae. Similar to the RNA-seq data, the prevalence of LnoRLV1 (67%), LnoFV1 (73%), LnoIV1 (100%), LnoIV2 (100%), and LnoERV1 (100%) were high. Yet the prevalence of LnoPLV2, which was 24% in the 2017 data and was found to be 100% in the 2018 data. Further differences in LnoBLV1virus prevalence between the sample cohorts were 48% in the 2017 samples and 7% in 2018 samples, and for LnoCLV1, the prevalence was 3% in 2017 samples and 50% in 2018 samples.
In females, there was no significant difference in the distribution of viruses between head and body (N = 55, t = − 1.12, p-value = 0.27). However, the egg tissue had significantly lesser viruses than those in heads (t = − 3.94, p-value = 0.00014) or bodies (t = − 2.695, p-value = 0.0081, Fig. 3b). In larvae, the heads had a significantly higher percentage of viruses than those in the bodies (N = 36, t = 4.45 p-value = 0.00003, Fig. 3c).
There were no differences between populations in the virus load (mean + SE: Tvärminne 12,876 + 4581 and Äänekoski 9132.80 + 4874, t = 0.55, p-value = 0.60). We excluded FN3 sample (Fig. 3a) from the analyses, as it had very high virus load.

Discussion
We identified 11 novel common glow-worm RNA viruses. In insects, viruses transmit horizontally through feeding, mating, or a vector [30]. Vertical virus transmission may occur through genome integration, such as for retroviruses or wasp polydnaviruses [31], or infection of eggs and sperm. Strict vertical transmission by infection of eggs or sperm is documented only for Drosophila sigmavirus [32]. Several insect viruses studied in bees and mosquitoes can transmit both horizontally and vertically [33,34]. Vertical transmission has been hypothesized to be associated with low virulence and latent infection while under certain conditions, such as during host stress or presence of secondary hosts, virus activation enables horizontal transmission [33,34]. Since glow-worms are in contact with each other only at birth, when foraging during larval stage, and during mating, within Fig. 3 Virus loads in 2018 glow-worm samples: a the glow-worm RNA virus levels were studied by qPCR from head (H), body (B), and eggs (E) of nine Finnish glow-worm females from two populations (FN: Äänekoski and FS: Tvärminne) and from heads (H) and bodies (B) of three laboratory-reared larvae from Finland (LF) and three from England (LE). Additionally, 12 unfed 2-week-old larvae split into two pools (2wL1 and 2wL2) were also studied. Virus loads in 10-based logarithmic scale shown in a heat map from white (no virus) to dark red (loads over 1,000,000 virus sequences per sample). The bar blots show the virus distribution between the different body parts in b females (N = 7) and c Larvae (N = 6). The sum of virus reads in all of the body parts is 100%. Variation is shown by standard error and significant differences between groups by asterisk (*** = p-value < 0.001, and ** = p-value < 0.01) ◂ population transmission of viruses would be inefficient without vertical transmission.
However, horizontal transmission to and from mollusc is possible. Larvae had most of the viruses in their head segment, which may indicate possibility of horizontal transmission as the larvae are predators and salivary glands are known sites to carry viruses [35]. Adults do not feed and have no mouthparts and, consistently, had no significant differences in virus loads between head and body. Glow-worm adults and larvae are unlikely to be eaten by other insects as larvae and probably also adults are distasteful [1,3] and thus this kind of virus transmission pathway is not expected.
Most of the identified viruses were found from the glowworms collected both in 2017 and in 2018 suggesting that they form stable infections. As we found the same RNA virus sequences from the four spatially separate Finnish populations, we suspect that these viruses are a natural set of viruses infecting glow-worms and not an indication of a pathogenic state. However, LnoTyLVI is an exception. We could not find LnoMLV1 from the 2018 samples, and LnoMLV1 loads were extremely high in some of the 2017 samples: in the two male heads their amounts reached up to 48% and 23% of all the sequenced RNA reads. In comparison to negative-and double-stranded viruses, RNA levels of positive-strand RNA viruses, such as LnoMLV1, varied most between the glow-worm individuals. Similar phenomenon has also been seen in Argentine ant viruses [36]. However, such massive LnoMLV1 levels as found in the two males are clearly a result of pathogenic virus activation and might explain why we did not detect LnoMLV1 the next year. Viruses can be very reactive in a new host but then disappear suddenly if they fail to adjust and transmit [37].
Interestingly, we found five similar viruses from RNAsequencing data of another glowing insect: the common eastern firefly (Photinus pyralis). The common glow-worm and the eastern firefly are from the same Lampyridae family, which may explain why they share similar viruses. Yet, they do not occur in the same continent: eastern firefly is found in North America and common glow-worm only in Eurasia [4]. However, we did not find from our data the two orthomyxolike viruses, which were identified from the common eastern firefly and shown to transmit vertically [8].
As the light emitting ability of glow-worms is a major fitness factor, it will be interesting to study whether viruses affect the bioluminescence in larvae or females. According to our study, a whole range or viruses reside at lantern tissue so the light-regulating interaction could be possible.