Introduction

Cyanophages, viruses infecting cyanobacteria, are numerous biologically active entities in aquatic ecosystems and play an important role in determining host population diversity, dynamics, and evolution [1,2,3,4]. Most of the currently known cyanophages are members of Myoviridae (myocyanophages), Siphoviridae (siphocyanophages), and Podoviridae (podocyanophages) families [5,6,7]. Among these, the diversity of myocyanophages is probably the most well represented in public databases to date. However, some studies indicate that sipho- and podoviruses might exhibit higher actual diversity compared to members of Myoviridae [8,9,10]. Cyanophage distribution is often correlated with the distribution of their hosts, and their abundance changes in time and space [1, 11, 12]. Cyanoprokaryota (cyanobacteria), including scum-forming genera Microcystis, single-celled members of Cyanobium or Synechococcus, and filamentous species belonging to Lyngbya, Oscillatoria, Planktothrix, and Phormidium, are widely distributed photosynthetic organisms [13]. Among them, Microcystis and Synechococcus are two of the most described in the context of susceptibility to viral infections [14,15,16]. Moreover, the environmental studies indicated that the distribution and diversity of cyanophages might be directly or indirectly (through the host) affected by physicochemical agents [17,18,19,20]. According to Finke and Suttle [21], the diversity of the marine phage community depends on a promoted variety of environmental factors including salinity, temperature, and concentration of nutrients, followed by water column mixing. Solar radiation may damage viral particles and negatively influence infection efficiency as described by Wilhelm et al. [22] in the marine environment. The studies of freshwater cyanophages conducted by Cheng et al. [23] also showed that decay of their infectivity was correlated with UV intensity. The cyanophage composition was also found to be influenced by seasonal variations and water column depth as described by Hurwitz et al. [24] based on ocean metagenomics studies. Despite the growing number of researches on cyanophages, the information about their complex diversity and distribution in freshwater remains insufficient.

Recently, Finke and Suttle [21] showed that a specific individual gene (gp43), that is, used as a genetic marker to assess virus diversity, can highly reflect the variation observed by the whole genome and gene content comparisons. The diversity of cyanophages can be assessed using phage group/clade-specific molecular markers such as those encoding major capsid protein, portal protein, tail sheath protein, and DNA polymerase [25,26,27]. The host-derived cyanophage auxiliary metabolic genes (AMGs) are also widely used to assess cyanophage diversity and distribution. For example, genes psbA and nblA, which encode the D1 protein of photosystem II (PSII) and nonbleaching protein A, respectively [16, 28]. The psbA genes were reported as highly prevalent among some marine myo- and podocyanophages (clade A) which infected Prochlorococcus and Synechococcus [4, 29,30,31]. The psbA genes were also identified in some freshwater cyanophages (e.g., Synechococcus phage S-CRM01); however, their prevalence in this aquatic environment is less known [4, 32]. The nblA gene was also proposed as the genetic marker and was found in freshwater cyanophages infecting Microcystis and then Planktothrix [15, 16, 33]. However, some studies indicated that this gene is highly conserved and thus tends to underrepresent genetic diversity [15, 16, 33,34,35]. The structural gene g91 encoding tail sheath protein in cyanophages infecting Microcystis aeruginosa was used to assess their diversity. Based on the comparative analysis of this gene, three major genotypes were distinguished and their spatial and temporal distribution have been tracked [36, 37].

The occurrence and monitoring of freshwater cyanophages based on the abovementioned genes were conducted in situ in several different ecosystems in Japan [28, 36,37,38,39], China [16, 40], France [41], the USA [4], Poland [42], and Canada [43]. However, most of the studies conducted so far were limited to one or two water bodies and none of them referred to the occurrence and diversity of freshwater cyanophages, including a larger geographical area. Presuming that differences in cyanophage community compositions would increase with geographic distance (distance-decay hypothesis) and that spatial distribution patterns is a result of interplay between cyanophages, cyanobacteria, and environmental conditions; one could expect that observed cyanophage diversity would reflect the area surveyed. Therefore, the present study aimed to determine the diversity and spatial distribution of active cyanophages community, which were infecting cyanobacteria, from an extensive area spanned over two countries. Towards this aim, we analyzed sequence diversity of three different marker genes (psbA, nblA, and g91) in 21 lakes of the temperate-humid continental climate zone (Poland and Lithuania), in an area with a span of approx. over 200,000 km2 (Fig. 1). Besides, we assessed the relationship between the occurrence of marker genes, their sequence diversity, cyanobacterial communities composition, and environmental variables. Such information could be helpful to explore the potential linkage between cyanophages and their host—cyanobacteria, their spatial distribution between waterbodies, and sensitivity on environmental factors.

Fig. 1
figure 1

Study site of lakes in Poland and Lithuania

Materials and Methods

Source of Material

Samples were collected from 14 Polish and 7 Lithuanian lakes situated in the temperate-humid continental climate zone. They were the subject of research on cyanobacteria in our previous publication [44]. Samples were collected from the following: Lubosińskie (LUB), Bytyńskie (BYT), Buszewskie (BUS), Pniewskie (PNI), Myśliborskie (MYS), Pakoskie (PAL), Grylewskie (GRY), Mogileńskie (MOG), Niepruszewskie (NIE), Ilno (ILN), Gopło (GOP), Żabiniec (ZAB), Zbąszyńskie (ZBA), Miejskie (MIE), Jieznas (JIE), Širvys (SIR), Gineitiškės (GIN), Didžiulis (DID), Mastis (MAS), Lūkstas (LUK), and Simnas (SIM) (Fig. 1, Fig. S1). They represent fertile lakes from meso-eutrophic to hypertrophic with high phytoplankton diversity (Table S1 ).

Sampling

Samples were collected from the central part of the lake in August 2013 and July–August 2014 (Fig. 1, Fig S1). Integrated phytoplankton samples were collected from the epilimnion in stratified lakes or from the surface water layer in polymictic lakes from one sampling station using a water sampler. Approximately, 300 mL of water samples were collected to aseptic plastic bottles as integrated water probes from the water column (e.g., mixed water from samples taken every 1 m deep) during the early afternoon (Fig. 1, Fig. S1). The 1-L phytoplankton samples were preserved with acidified Lugol’s solution with a final concentration of 1% immediately after sampling. The samples were transferred to the laboratories and stored under cool and dark conditions until they were analyzed.

Measurements and Analyses of Physicochemical Parameters

Water temperature, pH, and conductivity were determined in situ using a multiparameter probe. Integrated water samples were collected for chemical analyses. The water samples were analyzed for total nitrogen (TN) and total phosphorus (TP) with a HACH spectrophotometer [45, 46].

Analysis of Cyanobacterial Composition

Phytoplankton samples were sedimented in 1-L glass cylinder for 48 h, gently decanted off, and the final sample volume of 20–30 mL was used for further analysis. Cyanobacterial species identification [47,48,49] and counts were conducted using light microscopes under × 400 magnification. The enumeration of specimens was carried out in 100–150 fields of Fuchs-Rosenthal chamber, which ensured that at least 400 specimens were counted to reduce the error to less than 10%. A single cell, a coenobium, or a filament represented one specimen in the analysis. The biovolume of each species was determined through a volumetric analysis of cells using geometric approximation and expressed as a wet weight following Wetzel and Likens [50].

Isolation and Amplification of Genes

Freshwater samples in the volume of 100 mL each were filtered onto 0.45 µm nitrocellulose membrane filters (Millipore, USA). Subsequently, filters containing cell fraction were inserted in the 2 mL of lysis buffer (400 mM NaCl, 40 mM EDTA, 0.75 M sucrose, and 50 mM TRIS–HCl; pH 8.3), then stored at − 20 °C before DNA extraction.

DNA was isolated from stored filters according to hot phenol-mediated extraction described by Giovannoni et al. [51] with minor changes described by Mankiewicz-Boczek et al. [52] including the modification of centrifugation speed (13,000 g) and the final concentration of proteinase K (275 µg mL1−). The extracted nucleic acid was used as a template for molecular analyses of the genes for general presence of cyanobacteria, 16S rRNA (258 bp), and the specific presence of Microcystis genus, 16S rRNA (250 bp), and for cyanophages—psbA (740 bp), nblA (200 bp), and the g91 (g91_S – 132 bp, and g91_L – 206 bp) which together cover all three genotype groups distinguish by Kimura-Sakai et al. [37]. All nucleotide primers and parameters of PCR are described in Table S2 and Table S3 (supplementary materials).

Sequencing of Cyanophage Genes

DNA samples for nucleotide Sanger sequence analyses (Table 1) of genes: psbA, nblA, and g91 were chosen based on the good quality PCR amplicons. In consequence, the psbA, nblA, g91_S, and g91_L were analyzed and deposited in Genbank database for six (LUB, BUS, PAL, PNI, GIN, and ILN (accession numbers: MW853986 to MW853991, respectively)), six (BUS, SIM, BYT, PAL, PNI, and GIN (accession numbers: MW853992 to MW853997, respectively)), nine (LUB, MIE, BUS, SIM, BYT, PAL, PNI, JIE, and GIN (accession numbers: MW853982 to MW853985, respectively)), and four (SIM, BYT, JIE, and GIN (accession numbers: MW853992 to MW853997, respectively)) lake samples, respectively. To prepare samples for sequencing, the selected DNA samples were amplified with the use of Pfu DNA polymerase (forming blunt-end; Thermo Scientific) according to producer procedure and PCR conditions showed in Table S3. The specific primer sequences, chemical concentration, and amplification program for PCR can be found in Table S2 and Table S3. Obtained PCR products were purified with the use of QIAGEX® II Gel extraction Kit (QIAGEN), cloned into pJET1.2/blunt vector (Thermo Scientific), and sequenced (Genomed S.A.). The obtained forward sequences were improved by reverse complementation and the primer sequences were clipped out with the use of a BioEdit Sequence Alignment Editor (version 7.2.5).

Table 1 Presence of cyanobacteria and cyanophage amplicons, and PCA group results of studied lakes

A phylogenetic tree was constructed for the psbA gene. A cluster was performed, separately, for cyanophage and cyanobacterial sequences with 90% similarity. The cyanophages and cyanobacteria sequences were taken from the NCBI non-redundant database (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Then, the sequences were aligned with the use of MAFFT-DASH and the tree was constructed with the use of RAxML NG.

In case of the search for similar sequences of gene fragments (nblA and g91) shorter than 200 bp, the online Local Alignment Search Tool (BLAST), based on data from the following databases: the NCBI non-redundant sequence database (https://blast.ncbi.nlm.nih.gov/Blast.cgi), JGI virus public database (https://img.jgi.doe.gov/), and viruSITE integrated database (www.virusite.org), were used.

Statistical Analysis

The principal component analysis (PCA) was used to evaluate the spatial distance between 21 lakes according to the total abundance of different cyanobacterial species, the occurrence of cyanophage genes (psbA, nblA, g91_S and, g91_L), and the environmental factors including nutrients (TP and TN) and physicochemical parameters (temperature, pH, and conductivity). All data were transformed to avoid skewed distributions with the subtraction of the mean and the division with the standard deviation ((x-mean)/Sd). Groups were defined according to the number of genes detected for each lake. The one-way ANOVA and Tukey’s tests were used to measure significant differences between the groups with the scores obtained for the PC1 and PC2 (Table S4). The PCA was performed with PAST 4.03 [53]. Levine’s test was used to check homogeneity of variance from the means. The proposed statistical analysis is used to condensate multivariate databases often obtain in environmental studies, allowing to identify the most important factors that explain the highest variance within a set of samples [54].

Results and Discussion

Cyanophages are specialized to infect cyanobacteria and could play an important role in modulating harmful blooms. As cyanophage distribution was found related to the occurrence of their hosts [1, 11, 12], it is needed to obtain knowledge of the cyanobacteria composition and factors influencing their growth in the study area. Analysis of cyanobacteria—potential virus host—indicated that their 16S rRNA gene was found in all studied lakes (Table 1). The total cyanobacteria biomass varied from 0.04 to 40.47 mg L−1 (Table S1). Filamentous cyanobacteria from the genera Aphanizomenon, Cuspidothrix, Dolichospermum, Limnothrix, Planktolyngbya, Pseudanabaena, Planktothrix, or Raphidopsis were among the dominants in most studied lakes. Additionally, Microcystis was among the dominant genera (0.57–1.35 mg L−1) in three lakes based on the microscopic analysis, and their overall presence was confirmed in 18 lakes according to the genetic analysis—16S rRNA (Tables 1 and 2, Table S1).

Table 2 Comparison of the variables among tree lake groups distinguished in PCA

The study area (Fig. 1) was represented by the temperate-humid continental climate zone characterized by hot summers [55] together with water parameters which are shown in the following ranges: water pH 7.4–9.01, water temperature 16.3–27.8 °C, and conductivity 251–729.1 µS cm−1. While total nitrogen and total phosphorus concentrations varied between 0.85–7.5 and 0.02–0.47 mg L−1, respectively (Table S1), such parameters, conducive to eutrophication, ensured background and favored the development of cyanobacteria [56, 57].

According to the authors’ knowledge, the presented studies are the first which refer to the relationship between the occurrence of all three cyanophage marker genes simultaneously (psbA, nblA, and g91), their sequence diversity, cyanobacterial communities composition, and environmental variables from the freshwater environment of an extensive area (approx. over 200,000 km2). Moreover, the results described below confirmed that the environmental factors, most likely local, may have an important role in shaping the genetic variation in phages.

Cyanophages Occurrence and Diversity

The cyanophage genes (psbA, nblA, or g91) presented in host cells were detected in 16 from the 21 studied lakes (Fig. 1, Table 1). The lack of amplification of selected marker genes for cyanophages in some lakes, despite the presence of their potential hosts, may have been related to the number of the genes below the detection limit or used genetic markers not targeting the different phage communities, present in the lakes studied. According to Schrader et al. [58], the PCR inhibitors should be also taken into consideration.

The psbA was found in 88% cyanophage-positive samples (Table 1). Its DNA sequences were found between 75 and 98% of similarity for five Polish lakes (LUB, PNI, BUS, PAL, and ILN) and one Lithuanian lake (GIN). The variants with the highest similarity level were observed between LUB-PNI (98%) and BUS-GIN (95%). The psbA sequence of ILN had the lowest level of similarity (75–78%) with the analyzed sequences. Although all psbA sequences observed in this study branched within the larger cluster of marine cyanophages, they also grouped more closely to each other than to their marine counterparts (Fig. 2). Most of the psbA sequences showed 95–100% similarity to each other (data were not shown), with the only exception of Lake ILN (Fig. 3). The psbA sequences were intermixed, indicating that there were no differences in the distribution of cyanophages between distant lakes. Therefore, the higher divergence of ILN from other lakes may suggest that other, most likely local, factors might be responsible for the diversity of the cyanophage community, whereas the psbA sequences from Polish lake ILN appeared to be the most similar with marine Synechococcus myocyanophage genome (S-CAM22) (Fig. 3). As it was described by Dreher et al. [26], the psbA similarity to the marine counterparts was also confirmed within Synechococcus-specific S-CRM01 cyanophage, isolated from freshwater Copco Reservoir (Northern California, USA). The high similarity of freshwater cyanophages psbA to marine cyanomyoviruses was also found in East lake (China) by Ge et al. [59]. Moreover, the psbA of novel freshwater Ma-LEP Microcystis podocyanophage, isolated from Erie lake (USA) by Jiang et al. [4], also presented high sequence similarity with marine S-CBP4 Synechococcus podocyanophage. The above results, of freshwater psbA sequence similarities to their marine counterparts, indicated that this genetic marker can be used to study the diversity among freshwater and marine phages as already described by Chenard and Suttle [32].

Fig. 2
figure 2

Phylogenetic tree of psbA sequence fragment alignment among different cyanophages and cyanobacteria

Fig. 3
figure 3

Phylogenetic tree and multiple amino acid sequence alignment of psbA gene fragments among different cyanophages

The nblA, g91_S, and g91_L Microcystis cyanophage genes were found in 50%, 56%, and 44% of cyanophage-positive samples, respectively (Table 1). All nlbA sequences observed in this study from the analyzed samples (BYT, BUS, PNI, PAL, GIN, and SIM) if compared to each other showed high similarity, ranging from 88 to 99%. The highest level of sequence similarity (96%) was found between two Polish lakes—BYT and BUS—and between two Lithuanian lakes—GIN and SIM. The nblA sequences from analyzed samples were highly similar (> 90%) with their corresponding gene fragments of uncultured Myoviridae phages (AB812972.1 and AB812972) and MaMV-DC (KF356199.1) The literature data indicated that the nblA gene is highly conserved and, hence, may underrepresent the existing diversity among cyanophages [34]. The g91_S sequences obtained from six Polish lakes (LUB, BYT, BUS, PNI, PAL, and MIE) and three Lithuanian lakes (JIE and SIM) were similar in the range of 90–97% between them. Only GIN showed the lowest similarity (80–85%) when compared to all other sequences. Whereas the g91_L from the one Polish (BYT) and three Lithuanian lakes (JIE, GIN, and SIM) were similar in the range of 95–96%, the g91_S and g91_L sequences from this study were convergent (> 91%) with their counterparts in culturable (MaMV-DC, KF356199.1; Ma-LMM01, AB231700.1) and unculturable (MH117957.1) Microcystis cyanophages. The above results might indicate the presence of Ma-LMM01-like phages within investigated lakes, as it was also showed in the Bay of Quinte (a Lake Ontario, Canada) by Rozon and Short [43] or Sulejowski Reservoir (Poland) by Mankiewicz et al. [42]. In the case of lakes where there was no positive detection of nblA and g91 genes represented Ma-LMM01-like phages, it is also possible that other Microcystis-specific phages occur, which genomes were not characterized yet.

Environmental Variables

Principal component analysis (PCA) showed the relationships between cyanophages, cyanobacteria, and the physicochemical parameters of water (Fig. 4). The PC1 and PC2 represented up to 36.7% of the total variance of the observations (19.46% and 17.24%, respectively; see also Fig. S3). The PCA scores and loadings (estimated with the Pearson correlation [r]) are described in the supplementary material Table S4 and S5, respectively. The PCA grouped lakes into three different clusters: groups A including only Polish lakes (LUB, PNI, BUS, BYT, and PAL), B including only Lithuanian lakes (SIM, GIN, and JIE), and group C (DID, SIR, ILN, MYS, MOG, ZAB, GRY, GOP, and ZBA) including both Polish and Lithuanian lakes. In groups A and B, two or three cyanophage genetic markers were detected while group C consisted of lakes with only one or none of the studied genes (Table 2, Fig. 4). Group A was significantly segregated from groups B and C (p = 4.76 × 10−4 and 2.2 × 10−4, respectively; see Table S6). The PC1 presented the highest positive correlations with the TP and conductivity (r = 0.71 and 0.70, respectively), followed by the occurrence of cyanophage genes—nblA and psbA (r = 0.56 and 0.52, respectively) (see Table S5). These results suggested that the abovementioned factors could be important variables contributing to the spatial distancing between the Polish and Lithuanian lakes and favored the development of particular cyanobacteria [56, 57], which can differ in A and B groups analyzed (Fig. 4). The modest relationship between the abundance of some viral genes and TP was indicated for the Bay of Quinte by Rozon and Short [43]. Moreover, TP as one of the most important parameters for the regulation of cyanobacterial occurrence could directly influence in their development and thus becoming available to phages for the genome replication process inside the host cell [60, 61].

Fig. 4
figure 4

Principal component analysis (PCA) based on environmental–physicochemical variables, diversity, and biomass of cyanobacterial species and cyanophage genes’ presence for Polish and Lithuanian lakes

Whereas the Lithuanian lakes in group B were significantly differentiated from group C by the vertical component—PC2 (p = 6.41 × 10−9; Table S7 and Fig. 4), which could be explained by the high positive correlations observed between the PC2 and the cyanophage genes—g91_S, g91_L, and nblA (r = 0.79, 0.69, and 0.61, respectively), followed by the pH (r = 0.47) (see Table S7), cyanophages have a wide range of pH tolerances; however, a decrease in pH below the host’s optimal requirements may directly affect the host’s cells homeostasis and thus negatively affect the intracellular cyanophage replication process [20]. Thus, group C was characterized not only with the lowest detection of cyanophage genes, but also the lowest values of environmental factors, and therefore, was found negatively scored in the PCA (Fig. 4, Table S4).

The psbA sequences presented in LUB, PNI, BUS, PAL, and GIN lakes were aligned close to each other within the phylogenetic tree, with exception of ILN lake (Fig. 3). While after comparing their presence with physicochemical factors as part of the PCA analysis (Fig. 4), the mentioned psbA sequences with high similarity were divided into two groups: A (LUB, PNI, BUS, and PAL) and B (GIN). The separateness of psbA ILN based on its higher sequence divergence was also reflected within PCA results, as the one which was subjected to group C (Fig. 4). This observation might confirm the important role that the environmental factors, most likely local, may have in shaping the genetic variation in phages.

As it was shown, different cyanobacterial species were subjected to different groups highlighted with the use of PCA analysis (Fig. 4, Table S1). For instance, Planktothrix agardhii (average biomass 15.6 mg L−1) was a characteristic-dominant species in group A, Planktolyngbya limnetica (3.2 mg L−1) in group B (Fig. 4), whereas Aphanizomenon gracile was the dominant species found in groups A and B with average biomass of 3.27 mg L−1 and 3.80 mg L−1, respectively, and had five times lower biomass in the group C (0.64 mg L−1) (Fig. 4, Table S1). Observed species differentiation might result from the influence of different physicochemical factors. For example, lakes from group A where P. agardhii was a dominant species were positively related to TP which is in line with previous studies that demonstrated domination of this cyanobacterium in hypertrophic lakes with high concentrations of phosphorus [62, 63]. Also, A. gracile is a common dominant species in temperate lakes adapted to various types of environmental and nutritional conditions [64,65,66]. However, the cyanobacteria composition represented by total biomass was found to rather enhance the cyanophage genes occurrence (Table 2, Table 1S) than single species highlighted within PCA results. It was observed that in the lakes where two–three cyanophage genes were determined, also cyanobacteria biomass was two–three times higher (Table 2, Table 1S).

Conclusions

The research of cyanophages based on the amplification of psbA, nblA, and g91 genes confirmed their occurrence in most of the studied lakes. The DNA sequences obtained for each gene showed a high similarity between them. Also, the similarity to their marine Synechococcus myocyanophage (psbA) and freshwater Microcystis myocyanophages (nblA and g91) counterparts was confirmed. Furthermore, the psbA revealed higher diversity, in comparison to the nblA and g91 genes. In consequence, no clear distribution pattern for cyanophages can be detected. The principal component analysis showed that TP and pH could be important environmental parameters differentiating the sampling sites between the lakes and might directly or indirectly (by cyanobacteria) influence the occurrence of cyanophages.