Background

Lichens are symbiotic associations between a fungus (mycobiont) and a photosynthetic partner (photobiont) that can be an eukaryotic alga (phycobiont), a cyanobacterium (cyanobiont), or both [1]. While the vast majority of lichen fungi (> 13,500 species), mainly from Ascomycota, associate with green algae (Chlorophyta), over 1500 species of lichen-forming fungi form so called “cyanolichens” that have cyanobacteria as primary photobionts (forming “bipartite” lichens) or accessory photobionts (forming “tripartite” lichens) [2]. Cyanobacterial symbioses have evolved repeatedly in different lineages of lichen-forming fungi [35], resulting in convergently similar thallus morphology in distantly related cyanolichens [2].

In lichen symbioses cyanobacteria provide mycobionts with photosynthate and/or fixed nitrogen. At the same time, the fungal partners provide the cyanobacteria with moisture, carbon dioxide and inorganic ions, as well as a relatively stable habitat, protected from environmental extremes and predation [1].

The association of fungal mycobiont partner and the photobiont partner (e.g. Nostoc) can either be by codispersal, e.g. in the lecanoromycete lichen Lobaria pulmonaria [6], or de novo by the pairing of a germinating spore and a free-living photobiont, as generally found in the lecanoromycete genus Peltigera [7, 8].

Most lichen symbioses are thought to be obligate as the majority of mycobionts are refractory to propagation in vitro and do not survive without their photosynthetic partners [9, 10]. And, although many cyanobacterial symbionts can be readily isolated and maintained in pure culture [11], they often appear unable to establish aposymbiotic populations outside lichen thalli in nature [12].

Nostoc is common in cyanolichens, especially in the temperate and cold regions of the world. All Nostoc species are filamentous and have complex life cycles involving cellular differentiation. Their non-branching filaments consist of cylindrical or spherical vegetative cells with intercalary heterocysts, large specialized nitrogen-fixing cells developing in mature trichomes [13]. The filaments of Nostoc strains are usually covered with a sheath of mucilage and many free-living Nostoc can form gelatinous macroscopic colonies in nature. The ability to produce mucilage and to form hormogonia, slender motile filaments, is generally used to distinguish Nostoc strains from the closely related genus Anabaena [13, 14], but they can be more reliably differentiated by akinete size and shape, together with other morphological characters [15]. However, some strains of Nostoc only produce hormogonia erratically or do not produce them at all [14, 16, 17].

The taxonomy of the family Nostocaceae is still rather poorly resolved, as exemplified by the placement of Calothrix and Tolypothrix spp. in several phylogenetic clades and the separation of Nostoc spp. into different clades [18]. Symbiotic Nostoc strains of many cyanolichens, including both bi- and tripartite species of Peltigera, have traditionally been called Nostoc punctiforme, but cyanobacterial strains resembling Nostoc muscorum, Nostoc sphaericum and Nostoc linckia have also been cultured from Peltigera species [19, 20].

In the lichen symbiosis cyanobacteria tend to undergo several morphological and structural changes [1923], confounding phenotypic identification. Thus, molecular techniques offer a powerful addition for studying the diversity of these organisms, and for comparing lichen symbiotic strains as well as free-living cyanobacteria. During the past fifteen years the cyanobacterial symbionts of lichens have been the subject of many molecular investigations which have greatly increased our understanding of symbiont diversity. However, most of these studies have been based on a limited number of marker genes (e.g. 16S rDNA, rbcLX, trnL) and have mainly been focused on phylogenetic relationships of different strains [8, 2430]. So far little is known about what defines symbiotic competence of cyanobacteria on the genome level.

Genome sequencing of N. punctiforme PCC 73102 [31], a model strain for cyanobacterial symbiosis with plants, together with transposon mutagenesis [32, 33] and insertion of antibiotic resistance cassettes [34] have identified a number of genes involved in the symbiosis [35]. Here we present the complete sequence and analysis of genomes from lichen-symbiotic Nostoc strains - one from the bipartite lichen Peltigera membranacea and one from the tripartite lichen Lobaria pulmonaria, - together with a discussion of genes which appear distinctive for symbiotic Nostoc. In addition we make use of draft genome data from three more Nostoc strains derived from P. membranacea and metagenome data from the lichens P. membranacea and P. malacea, as well as currently available whole genome data from members of the Nostocales.

Results and discussion

Genome properties

We have shotgun-sequenced DNA from five lichen-associated Nostoc strains and two lichens (Table 1). Draft genome assemblies were generated for three of the strains. The genomes of two strains, namely of the nosperin producer Nostoc sp. N6 [36] and the cyanobiont of L. pulmonaria, were completely assembled and annotated. The genome of Nostoc sp. N6 (8.9 Mb) is similar in size to that of symbiotic N. punctiforme PCC 73102 but it is larger than genomes of free-living Nostoc and Anabaena strains (Table 1 and Additional file 1). It consists of one circular chromosome (8.21 Mb) (Fig. 1) and 10 extrachromosomal replicons – 7 circular (pNPM1, 213,966 bp; pNPM2, 167,441 bp; pNPM3, 44,778 bp; pNPM4, 44,777 bp; pNPM5, 41,255 bp; pNPM6, 30,992 bp; pNPM7, 29,551 bp) (Fig. 1) and 3 linear (pNPM8, 66,996 bp; pNPM9, 22,270 bp; pNPM10, 21,916 bp) (Fig. 2). Based on the sequence coverage obtained, the linear replicons are present in higher copy numbers than the circular ones. pNPM9 and pNPM10 are characterized by a lower GC content (36.6% and 37.7%, respectively) than the rest of the genome (Table 2). The ends of pNPM8, pNPM9 and pNPM10 are in each case composed of identical or nearly identical inverted repeats 2.73, 0.16 and 1.06 kb long, respectively, with a conserved AAATTAACRGAC sequence at each end (Additional file 2: Figure S1). The ends of the linear plasmids form covalently closed hairpins, in which one DNA strand loops around and becomes the complementary strand. This was inferred from two observations: (i) presence of reads with palindrome sequences in the Nextera XT library and (ii) drop in coverage close to the ends of the plasmids and absence of read pairs spanning putative palindromes in the Nextera mate pair library. These methods use different DNA inputs for adapter addition: Nextera XT uses PCR to add adapters to denatured DNA whereas Nextera Mate Pair ligates adapters to blunt ends of double-stranded DNA.

Fig. 1
figure 1

Chromosome and seven circular plasmids of the Nostoc sp. N6 genome. The outermost and second circles indicate genes in forward and reverse orientation color-coded by their COG categories. The third circles show pseudogenes. The fourth circle of the chromosome shows the rRNA genes (brown) and tRNA genes (green). The two innermost circles show GC content in gray and black and the GC skew in green (+) and purple (–)

Fig. 2
figure 2

Linear replicons of Nostoc sp. N6. The lowermost and second lines indicate genes in forward and reverse orientation color-coded by their COG categories (see Figure 1). The third lines show pseudogenes. The two uppermost lines show GC content in gray and black and the GC skew in green (+) and purple (–). Blue arrows represent terminal inverted repeats (IR)

Table 1 List of lichen-associated Nostoc strains sequenced in this study
Table 2 Summary of the Nostoc sp. N6 and Nostoc sp. ‘Lobaria pulmonaria cyanobiont’ genomes

Terminal inverted repeats and hairpins are common in linear DNA molecules enabling replication of genome ends [37]. Linear replicons are rarely found in Cyanobacteria, the only known examples being a 429.7 kb linear chromosome of Cyanothece sp. 51142 (accession NC_010547) [38] and a 37.15 kb incision element of Anabaena variabilis ATCC 29413 (accession NC_014000) [39]. An interesting feature of the largest linear plasmid pNPM8 is that it carries 24 tRNA genes out of the minimum of 32 tRNAs required for translation according to Crick’s wobble hypothesis [40]. Genes for tRNAs carrying isoleucine and histidine are not present, while there are 3 different tRNA genes for arginine and 2 for lysine, serine, glutamine and glutamic acid. tRNA genes have frequently been found in phages where they facilitate expression of phage genes with codons that are rare in the host genome [41, 42]. However, codon frequencies in the Nostoc sp. N6 chromosome vs. pNPM8 did not provide support for this (Additional file 2: Figure S2). The Nostoc sp. N6 linear elements might be phage remnants that have lost their structural proteins but retain the ability for self-replication in the host cells. Linear plasmid prophages are uncommon in nature, e.g. N15 in E. coli [43], PY54 in Yersenia enterocolitica [44] and φKO2 in Klebsiella oxytoca [45]. Genes involved in chromosome partitioning and segregation, such as parAB [46] and parM [47], typical of many low copy number plasmids and bacterial chromosomes, were not found on the linear replicons, except for pNPM8 which carries a presumptive parA gene (NPM_80015). Interestingly, a gene encoding a typical phage protein, terminase, involved in DNA packaging into empty phage capsids, was found in pNPM9 (NPM_90012) disrupted by an insertion sequence.

The genome of the L. pulmonaria cyanobiont is nearly 1.5 Mb smaller than that of Nostoc sp. N6 and has a total size of 7.34 Mb. It consists of one circular chromosome (7.06 Mb) and 4 circular extrachromosomal replicons – pNLP1 (121,770 bp), pNLP2 (63,064 bp), pNLP3 (58,727 bp) and pNLP4 (34,881 bp) (Fig. 3). Compared to Nostoc sp. N6, the L. pulmonaria cyanobiont genome contains a smaller number of coding regions, a larger number of pseudogenes and 3 ribosomal DNA (rDNA) operons instead of the 4 copies generally found in Nostocales [48]. These features along with much slower growth observed in pure culture (3–4 times slower than Nostoc sp. N6) suggest genome shrinkage, gene loss and a possible semi-obligate nature of the cyanobiont. Interestingly, cbiM transporter genes, involved in the uptake of cobalt for cobalamin (vitamin B12) biosynthesis (locus tags NLP_0266 and NLP_2774), were found to be pseudo in the L. pulmonaria cyanobiont. Several other essential genes had disabling mutations but had intact functional homologues.

Fig. 3
figure 3

Chromosome and four plasmids of the Nostoc sp. ‘L. pulmonaria cyanobiont’ genome. The outermost and second circles indicate genes in forward and reverse orientation color-coded by their COG categories. The third circles show pseudogenes. The fourth circle of the chromosome shows the rRNA genes (brown) and tRNA genes (green). The two innermost circles show GC content in gray and black and the GC skew in green (+) and purple (–)

Inversion of the GC skew ((G −C)/(G+C)) from positive to negative, typically seen at the replication origin of bacterial chromosomes, cannot be applied to predict the location of oriC in Cyanobacteria since their DNA asymmetry is greatly disturbed by mutational pressure [49] and extensive chromosome rearrangements (see below). A putative oriC for the chromosomes was identified in both strains downstream of dnaA, encoding a chromosomal replication initiation protein (locus tags NPM_0001 and NLP_0001) (Additional file 2: Figure S3). Both oriC regions contain 6 DnaA boxes, most with TTTTCCACA, the DnaA box motif specific for Cyanobacteria [50]. Location of oriC adjacent to the dnaN gene encoding the β subunit of DNA polymerase III has been claimed to be universal among Cyanobacteria [50, 51]. In free-living Nostoc strains and in Anabaena variabilis, oriC is located in the intergenic region between the dnaA and dnaN genes. However, in lichen-associated Nostoc strains and in N. punctiforme, dnaA and dnaN genes are not adjacent (∼ 52 kb apart in N. punctiforme). Interestingly, no apparent DnaA boxes were found adjacent to either the dnaA gene (Npun_F0001) or the dnaN gene (Npun_F0034) in N. punctiforme whereas a putative oriC with a cluster of DnaA boxes lies within the Npun_F0036–Npun_F0037 intergenic region.

Genome and proteome comparison

Phylogenetic analysis of 31 conserved single copy protein genes from Nostocales strains available in GenBank and JGI-IMG showed that lichen-associated Nostoc strains and N. punctiforme group together in a clade (Nostoc II; Fig. 4) suggesting a monophyletic origin of these symbiotic Nostoc strains. The recently sequenced symbiotic strains Nostoc sp. KVJ20 [52] and Nostoc sp. Moss 2 [30] also associate with this clade, whereas two other moss derived isolates together with terrestrial soil isolates of Nostoccalcicola and Nostoclinckia form a subclade within the Nostoc II clade. The free-living aquatic Nostoc strains group together with some Anabaena strains (clade Nostoc I; Fig. 4) while other Anabaena strains group with members of the genera Aphanizomenon and Dolichospermum (clade Anabaena/Aphanizomenon). These major phylogenetic relationships are in accord with what has been found by O’Brien and coworkers [29] and by Warshan [30]. Although O’Brien’s clade Nostoc II contains three free-living terrestrial isolates – N. punctiforme SAG 71.79, N. commune 02011101 and N. muscorum SAG 57.79 (currently known as Desmonostoc muscorum) – none of them have been tested for symbiotic competence and two of the strains have P. membranacea cyanobionts as their closest phylogenetic relatives. Nostoc strains with specificity for symbiosis with Gunnera also fall within clade II [53] but their genome sequences are currently not available.

Fig. 4
figure 4

Maximum liklelihood phylogenomic tree of Nostocales strains based on 31 single-copy core bacterial phylogenetic markers [135]. Arthrospira platensis NIES-39, Lyngbya sp. PCC 8106 and Planktothrix agardhii NIVA-CYA 126/8 from the order Oscillatoriales were used as the outgroup. Numbers at branch nodes are bootstrap percentages based on 100 replicates (only values >50 are shown). Scale bar indicates 5% sequence divergence. Selected clades are named according to [29]. Predominantly symbiotic clade is highlighted with green, paraphyletic group is highlighted with blue. Lichen-associated strains are shown in bold

Large scale genome comparisons of lichen-associated strains with N. punctiforme PCC 73102 reveal a low level of synteny between them (Additional file 2: Figure S4) indicating high genome plasticity and genome shuffling in these strains (Additional file 2: Figure S5). The ten most prominent regions of synteny include the whole set of genes involved in nitrogen fixation (the nif gene cluster; locally collinear block 1), some photosynthetic genes (locally collinear block 3), and genes encoding the majority of ribosomal proteins (Clusters of Orthologous Groups (COG) category J; locally collinear block 7) (Additional file 2: Figure S5, Additional file 3), which are known to be syntenic across species [54, 55], as well as many genes involved in carbohydrate transport and metabolism (G) and cell wall/membrane/envelope biogenesis (M).

Although symbiotic Nostoc strains N6 and N. punctiforme show a higher number of encoded proteins in most COG categories than the free-living strains (mean total 4344 vs. 3587; Additional file 2: Table S5), the fraction assigned to COG categories was similar (63-68%) and the distribution among categories was similar for all analyzed Nostoc and Anabaena strains (Fig. 5). On the average, clade II of symbiotic Nostoc strains has a higher proportion of genes devoted to carbohydrate transport and metabolism (G), lipid transport and metabolism (I) and secondary metabolite biosynthesis, transport and catabolism (Q) compared to clade I Nostoc and Anabaena strains. Interestingly, clade I, comprised of free-living Nostoc and Anabaena strains (Additional file 1), exhibits the highest number of genes for inorganic ion transport and metabolism (P) (Fig. 5). In contrast, the Nostoc strains in symbiosis may benefit from host (plant or fungus) provision of inorganic ions, e.g. by the action of mycobiont siderophores [56].

Fig. 5
figure 5

COG category distribution of the proteins encoded in the genomes of selected Nostoc and Anabaena strains. The ordinate axes indicate the percentage of genes in each COG functional category relative to the genes of all COG categories (left) and percentage COG category distribution among different clades (right)

The genome of Nostoc sp. N6 was found to encode the highest number of COG category L proteins (DNA replication, recombination and repair) (Additional file 2: Table S3). One possible explanation for this is that terrestrial cyanobacteria are generally subject to ultraviolet (UV) irradiation, and therefore are expected to possess efficient mechanisms for repair of UV-induced DNA damage [57]. Nevertheless, one of the genes involved in biosynthesis of the cyanobacterial sunscreen scytonemin (tyrosinase, tyrP) [5860] is missing in the Nostoc sp. N6 genome, and another one (DSBA oxidoreductase, frnE) was found to be a pseudogene due to an in-frame stop codon (Additional file 2: Table S6). Both genes are thought to participate in oxidative dimerization of precursors to form scytonemin [61]. The cyanobionts of L. pulmonaria and P. malacea appear to have all genes necessary for scytonemin biosynthesis (Additional file 2: Table S6). It is possible that Nostoc sp. N6 compensates for the lack of scytonemin with a larger repertoire of enzymes for DNA repair.

Nostoc sp. N6 has a high number of transposable elements and inteins (Additional file 2: Tables S7-S9). The best studied case of inteins in Cyanobacteria is in DnaE (the α subunit of DNA polymerase III) encoded by two different ORFs and assembled by trans-splicing [62]. More information on transposons and inteins can be found in Supporting Information.

The majority of lichen associated Nostoc strains studied appears to have an alternative vanadium-based nitrogenase in addition to the standard molybdenum-based nitrogenase. This includes three of the strains studied here, Nostoc spp. 210A, 213 and 232, as well as the P. malacea lichen cyanobiont [63]. The reason for the common occurrence of this alternative nitrogenase in lichen-associated cyanobacteria is not clear, but may relate to low availability of molybdenum in cyanolichens and/or a functional advantage at relatively low growth temperatures [64]. A novel finding is that these lichen Nostoc strains carry a near complete duplication of VnfD, with a cyanobacterial aminoacyl-tRNA synthetase domain (CAAD; pfam14159) inserted at the carboxy end, similar to peptide insertions found in GluRS, ValRS, LeuRS and IleRS amino acid tRNA synthetases in a variety of cyanobacteria, where this domain is thought to direct the proteins to thylakoid membranes, a key source of reducing power and ATP [65]. Further information on the molybdenum-based (nif) and the vanadium-based gene clusters (vnf) is provided in Supporting Information (Additional file 2: Figures S9 and S10).

Comparison to minimal bacterial and cyanobacterial gene sets

In order to see what pathways might differ, be incomplete or deteriorating in lichen cyanobionts, we performed comparative analyses with the minimal bacterial [66] and the cyanobacterial “core” and “shell” [67] gene sets, represented by 206 and 682 genes respectively (Additional files 4 and 5). The most prominent differences were observed for pyrimidine metabolism, in split ribonucleotide reductase enzymes, carbohydrate catabolism and potassium transport, as described in the Supporting Information.

Identification of genes specific to symbiotic Nostoc strains

To identify functions enriched in symbiotic Nostoc genomes (present in over 80% of group), we performed an all-by-all BLASTP search of all the proteomes from the Nostoc I and II clades plus the sister clade (Fig. 4) and assigned identified hits into orthologous groups. For the lichen-associated Nostoc strains 152 protein orthologs satisfied the criteria set (see Methods), 189 orthologs for the predominantly symbiotic clade, 399 for the Nostoc II clade and 385 for the combined Nostoc II clade and sister clade (see Additional file 6 for listing). A few of the most prominent gene collectives associated with symbiotic Nostoc are discussed below.

Hormogonium regulating locus.

Hormogonia are relatively short motile filaments, lacking heterocysts, formed by cyanobacteria from the orders Nostocales and Stigonematales. A hormogonium-inducing factor (HIF) secreted by plant hosts induces symbiotic cyanobacteria to differentiate hormogonia and they then dedifferentiate back into nitrogen-fixing filaments after about 48 h [68]. The capacity of Nostoc strains to form hormogonia has been found to be necessary, but not singularly sufficient, for symbiotic competence [69, 70]. An aqueous extract of the hosting hornwort Anthoceros punctatus appears to contain a hormogonium repressing factor (HRF) because it suppresses HIF-induced hormogonia formation. Analysis ofN. punctiforme mutants led to proposal of the following model of HRF-dependent modulation of HrmR transcriptional regulation [71]: HRF enters the Nostoc cell and it, or a derivative similar to galacturonate, binds to the repressor protein HrmR, decreasing affinity for the hrmR and hrmE promoter regions. This derepresses transcription of these genes, somehow leading to inhibition of hormogonia formation and return to the vegetative state [72].

In N. punctiforme the hormogonium regulating locus is linked to genes involved in sugar transport (Fig. 6) [73]. It has been hypothesized that these genes are involved in HRF-induced synthesis of a metabolite inhibitor of hormogonium differentiation, rather than a carbon catabolic function [72]. This metabolite, probably similar to galacturonate [72], binding to the HrmR protein, may act in a positive feedback loop alleviating repression of the hrm locus, leading to increased production of the metabolite and at the same time facilitating increased import of sugars such as glucose, fructose and sucrose. Since PfkA (6-phosphofructokinase) appears to be nonessential in symbiotic Nostoc, these sugars must be channeled through the oxidative pentose phosphate (OPP) pathway or the Entner-Doudoroff (ED) pathway, both producing NADPH reducing equivalents facilitating biosynthesis and decreasing dependence on the non-oxidative pentose phosphate reactions (Calvin cycle). This catabolic shift may simultaneously induce development from hormogonia to vegetative cells and heterocysts. The shift from vegetative cells to heterocysts is accompanied by an increase of the OPP-specific Gnd and an even greater increase in Zwf [74], indicating increased carbon flow via the ED pathway. The hrm locus is restricted to the Nostoc II clade and its sister clade.

Fig. 6
figure 6

Hormogonium regulating and sugar transporter loci in symbiotic Nostoc strains. Pseudogenes are denoted with an asterisk. orpB, carbohydrate-selective porin; mviM, inositol-2-dehydrogenase; glpC, glucose permease; frtA1A2BC, ABC-type fructose transporters; hrmE, inositol oxygenase; hrmK, gluconate kinase; hrmR, LacI family transcriptional regulator; hrmI, glucuronate isomerase; hrmU, D-mannonate oxidoreductase; hrmA and unk, unknown. A broken genome line indicates 2 separate loci

D-alanine-D-alanine ligase operon. In addition to a conventional cell-wall specific D-Ala-D-Ala ligase (DdlA), the lichen associated Nostoc strains uniquely harbour another D-Ala-D-Ala ligase, of type 3, thought to be involved in modification of peptide moieties in peptidoglycans as described in Supporting Information (Additional file 2: Figures S14 and S15).

Phosphonate biosynthetic genes.

Phosphonates are organophosphorus compounds containing direct carbon-phosphorus bonds, e.g. in phosphonolipids where they can not be cleaved by regular phospholipases. The biochemical pathways and gene clusters for phosphonolipid synthesis are well studied [75], facilitating recognition in new settings as in the case of the lichen-associated Nostoc strains in this study (Fig. 7). This cluster is characteristic of the Nostoc II clade. Extended information is provided in the Supporting Information.

Fig. 7
figure 7

Phosphonate biosynthetic gene clusters of lichen cyanobionts (a) and proposed encoded biosynthetic pathway (b) (adapted from [75]). A homologous gene cluster from Burkholderia is shown for comparison. CTP-APT, CDP-alcohol phosphatidyltransferase; OG-Fe(II), 2-oxoglutarate non-heme Fe(II) dependent oxidase; unk, conserved hypothetical proteins; NTPT, NTP transferase; pepM, phosphoenolpyruvate phosphomutase; ppd, phosphonopyruvate decarboxylase; AEPT, 2-aminoethylphosphonate aminotransferase; hpnL, putative membrane protein; higBA, toxin-antitoxin module. A broken genome line indicates separate loci

The additional peptidoglycan and phosphonate lipid functions may lead to cell wall modifications that are well tailored to the intrathalline environment, as well as being recognized as compatible by a mycobiont during establishment of symbiosis. Despite being sheltered by a mycobiont, lichen cyanobionts are subjected to extracellular enzymes and metabolites produced by both the mycobiont and intrathalline bacteria. Therefore, a specific ability to withstand some unfavorable aspects of this cohabitation is expected from lichen associated Nostoc strains.

Chloramphenicol phosphotransferase. Chloramphenicol is an antibiotic produced by Streptomyces venezuelae ATCC 10712 and several other actinomycetes [76]. The bacteriostatic activity of chloramphenicol results from its binding to the 50S subunit of the bacterial ribosome blocking peptidyl transferase [77]. S. venezuelae escapes the toxicity of its own lethal secondary metabolite by expressing a chloramphenicol phosphotransferase (CPT) that phosphorylates the primary (C-3) hydroxyl of chloramphenicol (Additional file 2: Figure S16) [78]. Genes encoding CPT were found almost exclusively in the Nostoc II clade.

Gas vesicles, sulfur metabolism. Genes encoding gas vesicle proteins have been shown to be involved in hormogonium function and establishment of the N. punctiforme symbiosis [79, 80] as well as in the symbiosis of Nostoc with feathermoss [81]. Gas vesicle proteins GvpC, GvpV and GvpW appear to be characteristic for the Nostoc II clade and its sister clade. Several genes associated with assimilation of alkane sulfonates in the moss-Nostoc association [82] were also found to be enriched in the Nostoc II clade.

Sensory mechanisms. All the comparison groups were found to have differences related to sensory mechanisms and motility, including signal transduction histidine kinases, methyl-accepting proteins as well as diguanylate cyclases, thought to be involved in regulating motility in cyanobacteria [83]. The diversity and rapid divergence of sensory mechanisms underlines the great variety of ecotypes found in the genus Nostoc, especially in strains with symbiotic capacity [84]. Differences in genes involved in sensory mechanisms were also found in the comparison made by Warshan et al. [82].

Secondary metabolites

Cyanobacteria produce a multitude of secondary metabolites, many of them toxic [85, 86]. In a recent study, Liaimer et al. [52] found that Nostoc symbionts of the liverwort Blasia pusilla more frequently produce nodularin and microcystin type compounds antagonistic to other Nostoc strains than free living Nostoc from the same locality. Most types of secondary compounds were detected in only 1 to 4 out of the 20 strains examined. The occurrence of the main secondary metabolite pathways in Nostoc punctiforme, in the Nostoc strains from the Blasia habitats [52] and in the lichen-derived strains of the present study shows little overlap. One of the secondary compounds detected by Liaimer et al. [52] is the polyketide synthase plus non-ribosomal peptide synthase (PKS-NRPS) product nosperin [36]. We previously suggested that nosperin might have cytotoxic properties analogous to cyanobiont microcystins [87, 88] which can serve as protective compounds in cyanolichens, e.g. against grazers. Interestingly, Nostoc sp. 232 was found to be devoid of nsp genes encoding nosperin, but it has a putative microcystin gene cluster not found in the nsp containing Nostoc sp. N6 strain. Similarly, the single Blasia-habitat Nostoc strain showing nosperin does not exhibit any of the other metabolites under study [52]. Remnants of the nsp gene cluster were found on the chromosome of the L. pulmonaria cyanobiont (Additional file 2: Figure S19), where almost the entire cluster has been deleted, probably due to the absence of selective pressure.

Whole genome sequencing of the nosperin producer Nostoc sp. N6 revealed that the nsp gene cluster is located on the chromosome. The abundance of insertion sequences surrounding the cluster and the apparent mixed gene origin suggests that it has been acquired as a genomic island through horizontal transfer and undergone several intragenomic recombination events [36]. The genome of Nostoc sp. N6 was also found to encode pathways for the biosynthesis of nostopeptolide- [89] and banyaside/suomilide-like [52] compounds as well as nostocyclopeptide [90] (Additional file 2: Table S14). Nostopeptolide in Nostoc punctiforme has been found to be a major hormogonium-repressing factor and is therefore considered responsible for cellular differentiation of Nostoc [91].

Nostocyclopeptides are cyclic heptapeptides with a unique imino linkage in the macrocyclic ring, isolated from the lichen cyanobiont Nostoc sp. ATCC 53789 [92]. Two homologous NRPS functions (locus tags NPM_1843 and NPM_1844) were found in the genome of Nostoc sp. N6. A nostoclide-like compound with a very similar structure, cyanobacterin, produced by the cyanobacterium Tolypothrix sp. PCC 9009 (Scytonema hofmanni UTEX 2349) [93, 94], was found to inhibit the growth of many cyanobacteria, as well as green algae and angiosperms [95, 96]. Based on the homology with Tolypothrix sp. PCC 9009, we identified putative gene clusters for biosynthesis of nostoclide-like compounds in the genomes of Nostoc spp. 210A and 232 (Additional file 2: Figure S20b). More extensive information on secondary products can be found in the Supporting Information.

Conclusions

The complete genome sequences and comparative genomic analyses of two lichen-associated Nostoc strains are presented here. The finished genomes, manually curated, are appropriate for all types of detailed analyses and act as high-quality references for comparative purposes [97]. Comparative genome analysis of symbiotic and free-living cyanobacteria allowed the identification of several pathways that may contribute to symbiotic competence of Nostoc strains. One pathway, encoded by the hormogonium regulating (hrm) locus, was previously identified in symbiotically competent N. punctiforme and plays a central role in abrogating hormogonia formation. This pathway is similar to pathways of sugar uronate metabolism in heterotrophic non-cyanobacterial prokaryotes [71, 72]. Although the hrm locus has been shown to be important in the Nostoc-plant symbiosis, its presence in all of the lichen-associated Nostoc strains from this study suggests it is also relevant to establishing Nostoc-mycobiont symbioses. Pathways that may be involved in cell wall biogenesis of lichen cyanobionts were also identified, including novel gene clusters encoding synthesis of phosphonate lipids and an MXAN_4097-like amidoligase (D-Ala-D-Ala ligase).

It is apparent that the ability to form and maintain symbiosis is a complex trait governed by many factors and different combinations of these factors may result in different symbiotic associations – from loose to the most intimate. The study presented here is the first attempt to determine, on a whole genome level, what genes and features may contribute to symbiotic competence of Nostoc cyanobionts in lichens. Although we have pinpointed candidate symbiotic genes in the lichen-associated Nostoc genomes, a more thorough analysis, e.g. with targeted mutations and resynthesis of symbiosis, is required to verify the importance and involvement of individual genes and pathways. Some progress has been achieved in studying plant-cyanobacterial symbioses using the readily cultured hornwort Anthoceros and the liverwort Blasia as model organisms. However, there are substantial differences between plant- and mycobiont-cyanobacterial symbioses, e.g. due to the heterotrophic nature of fungi. In contrast to many lichens with green algal photobionts, the bionts of cyanolichens are difficult to culture and synthesize in the laboratory. Problems include slow growth or unculturability of most mycobionts, difficulties in obtaining axenic cultures of photobionts, and in maintaining resynthesized biont cultures for long periods of time. Few attempts have been documented of cyanolichen resynthesis under laboratory conditions [98103] and currently there are no available models to study mycobiont-Nostoc symbiosis. The use of the glomeromycete Geosiphon pyriforme, which is easily culturable and capable of forming symbiosis with Nostoc strains, can help to overcome some of these limitations [104].

Recent studies of ten genomes and proteomes from moss-associated Nostoc strains compared to the non-symbiotic Nostoc sp. CALU 996, identified a number of gene families present in the symbiotic strains but not in the comparison strain [81], [82]. Several of these, including the hrm locus, genes encoding gas vesicle proteins, genes connected with sulfur metabolism and genes linked to sensory mechanisms were identical or similar to symbiotic-specific gene clusters identified in the lichen-associated Nostoc.

In addition to Nostoc, several other nostocean cyanobacteria have been reported in lichen symbioses. Members of the genera Scytonema, Calothrix, Dichothrix, and Tolypothrix have also been found in lichens as cyanobionts [20, 105, 106]. Isolation and genome sequencing of these lichen-associated strains can add more support and knowledge to our current understanding of what determines symbiotic competence in Nostoc and other cyanobacteria.

Methods

Isolation and culture of Nostoc strains

Peltigera membranacea thalli for cyanobiont isolation were collected from a moss carpet (Hylocomium splendens and Pleurozium schreiberi) at Keldur, Reykjavik, Iceland, and Lobaria pulmonaria thallus was collected from a maple tree trunk (Acer macrophyllum) at Cedar Road, Vancouver Island, British Columbia, Canada. Nostoc strains were isolated on BG-110 agar medium as previously described [107], purified by repeated streaking on the same medium and maintained at room temperature.

DNA extraction, library construction and sequencing

Genomic DNA was prepared from Nostoc cultures grown in liquid BG-110 medium at an illumination of 50 μmol photons m −2 s −1 as described in [108]. Sequencing libraries were prepared using Nextera XT and, for some strains, Nextera Mate Pair Sample Preparation Kits (Illumina) according to the manufacturer’s protocols and sequenced using MiSeq Reagent Kits v2 with 2 ×250 and 2 ×150 cycles, respectively (Additional file 2: Table S1). Roche 454 reads of P. membranacea and P. malacea metagenomes generated previously [109] were also used in this study to increase the number of lichen-associated Nostoc strains.

Genome assembly

Draft assemblies of Nostoc spp. N6 and ‘Lobaria pulmonaria cyanobiont’ genomes were constructed using MIRA v3.2.1 (www.chevreux.org/projects_mira.html) and further processed and verified using GAP5 (Staden package) [110] (Additional file 2: Table S1). Remaining gaps were closed by PCR and Sanger sequencing. Draft genomes of Nostoc spp. 210A, 213, 232 and the P. malacea metagenome were assembled using SPAdes v3.10.1 [111] with default parameters. Prior to assembly Illumina reads were processed with Trimmomatic v0.36 [112] with “LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:20” parameters. SPAdes contigs >1 kb were binned using MaxBin v2.2.4 [113] and those belonging to Cyanobacteria were scaffolded using BESST v2.2.6 [114, 115]. The resulting assemblies were improved with FinishM v0.0.9 (https://github.com/wwood/finishm) and Pilon v2.11.6 [116]. Scaffolds were taxonomically classified using Kaiju (http://kaiju.binf.ku.dk/) [117] and PhyloPythiaS+ (http://phylopythias.bifo.helmholtz-hzi.de/) [118] web servers. Those not assigned to Cyanobacteria were manually checked using a BLAST search [119], and contaminating scaffolds were removed. Completeness and contamination of the assemblies were assessed with CheckM v1.0.7 [120] (Additional file 2: Table S2).

Genome annotation

Draft genome assemblies were annotated using the NCBI Prokaryotic Genome Annotation Pipeline [121]. For complete genomes ORFs were predicted with Prodigal [122], followed by manual correction in Artemis [123] using the gene prediction improvement pipeline GenePRIMP [124]. All encoded proteins were assigned functions by combining results from InterProScan [125], CDD [126] and BLAST searches [119] against the NCBI nonredundant (nr) database. Transfer RNA genes were identified with tRNAScan-SE-1.23 [127] and ribosomal RNA genes (5S, 16S, 23S) were predicted using RNAmmer [128]. Other non-coding RNAs were identified with Infernal (v.1.1) [129] using RFAM convariance models (http://ftp.ebi.ac.uk/pub/databases/Rfam). Identification of CRISPR elements was performed using CRISPRfinder [130] and PILER-CR [131]. Pseudogenes were annotated using the GenePRIMP pipeline and rechecked manually in Artemis. Single in-frame stop codons and frameshifts were confirmed in the original assemblies. Ribosomal slippage was annotated according to standard operating procedures (SOP) at the GenePRIMP website (http://studylib.net/doc/7260119). Finally, short ORFs (encoding < 100 aa) without any significant homology (E-value >10−2) to the nr database, and ORFs represented solely by low-complexity sequences (e.g. spanning micro- and minisatellite regions) were removed from the annotation. Intein-containing proteins were identified by the presence of an intein/homing endonuclease domain (COG1372). Excision of nifD and fdxN excision elements in Nostoc sp. N6 was confirmed by previously generated RNA-Seq data [36] mapped with Bowtie 2 [132]. Origins of replication (oriC) were identified by locating DnaA boxes (TT A/ TTNCACA) [133]. The location of a cluster of DnaA boxes, especially adjacent to dnaA and/or dnaN genes, is considered an indicator for the location of oriC. Transposases were classified into IS families using ISfinder (https://www-is.biotoul.fr/; [134]).

Phylogenomic analysis

Available genomes of Nostocales strains along with Arthrospira platensis NIES-39, Lyngbya sp. PCC 8106 and Planktothrix agardhii NIVA-CYA 126/8 (order Oscillatoriales) as an outgroup were retrieved from GenBank and the Joint Genome Institute’s Integrated Microbial Genomes database (JGI-IMG) in January 2018. Thirty-one marker proteins that are universally conserved across the bacterial domain (dnaG, frr, infC, nusA, pgk, pyrG, rplA, rplB, rplC, rplD, rplE, rplF, rplK, rplL, rplM, rplN, rplP, rplS, rplT, rpmA, rpoB, rpsB, rpsC, rpsE, rpsI, rpsJ, rpsK, rpsM, rpsS, smpB and tsf) were extracted from the genomes using the AMPHORA2 pipeline [135] and aligned with MUSCLE [136]. An alignment mask was generated using Zorro [137]. The marker alignments were further concatenated into a single partitioned alignment and the best protein substitution model for each of the markers was predicted using the concat_align.pl script of phylogenomics-tools (https://github.com/kbseah/phylogenomics-tools; https://doi.org/10.5281/zenodo.46122). A maximum-likelihood phylogeny was derived using the PROTCATWAG model for tree search in RAxML v8.2.4 [138] automated by the tree_calculations.pl script of phylogenomics-tools. Branch support was assessed using the approximate likelihood ratio test for branches (SH-like aLRT) [139] with 100 replicates.

Genome and proteome comparison

Whole genome comparisons were performed using PROmer (MUMmer 3.0 package; [140]) and Mauve [141]. To identify orthologous groups specific to different clades (Fig. 4) an all-by-all BLASTP search was performed on proteomes of 56 strains belonging to a) Nostoc I clade (16 strains), b) Nostoc II clade (27 strains), c) sister clade to Nostoc II clade (13 strains) with soft masking and thresholds: E-value < 10−10, percentage identity \(\geqslant \)50% and percentage match \(\geqslant \)50%. The resulting hits were clustered into orthologous groups using OrthoMCL [142, 143]. Orthologous groups specific to different clades were extracted as shown in Additional file 2: Figure S21. For COG category distribution comparison proteins encoded in the genomes of selected Nostoc and Anabaena strains were classified into COG functional categories [144] using RPS-BLAST against PSSMs (Position-Specific Scoring Matrices) from the updated COG database [145] with an E-value < 10−2 and the top hit retained.