1 Introduction

Lectins are single- or multi-domain glycoproteins capable of binding sugar moieties through specific interactions with carbohydrate recognition domains (Brown et al. 2007). Most lectins are multimeric, consisting of non-covalently associated subunits that give lectins their ability to agglutinate cells or form aggregates with glycoconjugates (Mohr and Pommerening 1985). First discovered in plants, lectins have been found in nearly all eukaryotes. Specific functions of lectins vary substantially but lectins are most often categorized based on their ligand binding specificity. For instance, galectins are able to specifically recognize β-galactosides. Lectins have long been thought to play a role in protection and in symbiosis through recognition of sugar moieties characterizing the surfaces of specific organisms (Sharon and Lis 2004). The necessity to either recognize partners (in order to trigger defense, infection or mutualism) or to escape recognition (to avoid infection or defense mechanisms) is a selective force shaping the evolutionary history of genes in interacting species (De Mita et al. 2006).

Coevolution, the process of reciprocal changes between interacting species during evolution, molds the organization of communities that exist over periods long enough to allow selection of species traits (Thompson 2001). Rapid change driven by positive selection is often detected in genes involved in the coevolutionary process, such as recognition mechanisms in host-pathogen relationships, immune responses, and reproduction processes (Baum et al. 2002; Biswas and Akey 2006; Hughes 2007; Iguchi et al. 2011). Lectins have frequently been implicated in studies of coevolution, and high amino acid diversity reflecting positive selection has been reported in a wide range of organisms including prawns (Ren et al. 2012), amoeboids (Weedall et al. 2011) and oysters (Wu et al. 2011). In corals, it has been shown that lectins change rapidly, possibly due to selective pressure from symbionts (Hayes et al. 2010; Iguchi et al. 2011). Under positive selection, the ratio of nonsynonymous (amino acid-changing) substitutions is higher than that of synonymous (silent) substitutions per site (Hughes and Nei 1988; Yang and Bielawski 2000). In contrast, genes involved in housekeeping and in major morphological and developmental adaptations of individual organisms generally exhibit relatively low levels of amino acid change (Ford 2002).

The lichen Peltigera membranacea is an association of two symbionts: a heterotrophic ascomycete (mycobiont), and a photosynthetic cyanobacterim (photobiont) from the genus Nostoc (Miao et al. 1997). Lichen mycobionts belonging to the genera Peltigera and Nephroma appear to show some selectivity in their choice of Nostoc strains, but the diversity varies with lichen species (Paulsrud et al. 2000). In the gelatinous Collemataceae lichens, reciprocal high co-specificity has been reported as certain species were found to be associated with a single lineage of Nostoc (Otálora et al. 2010). Peltigera species have served as models in several studies of lectins and some of the extracellular mycobiont lectins distinguish between symbiotic and other cultured Nostoc sp., suggesting their involvement in recognition of potential symbiotic partners (Lockhart et al. 1978; Petit et al. 1983; Galun and Kardish 1995; Diaz et al. 2011). Recently lec-1, a mycobiont gene encoding a galectin-like protein in P. membrancaea, was characterized and shown to be differentially expressed in thalli, rhizines and apothecia, tissues of this foliose lichen (Miao et al. 2012). The current study describes the characterization of a second lectin gene, lec-2, in P. membranacea. We examined whether the corresponding protein LEC-2 is under positive selection, and whether variants of LEC-2 are associated with specific photobiont strains.

2 Materials and methods

2.1 Lichen collection, DNA extraction, amplification and sequencing

The lec-2 gene was identified in a database of contigs (contiguous sequence blocks) from the on-going P. membranacea whole genome sequence (Pmb-WGS) project (unpublished; Xavier et al. 2012). In brief, the Pmb-WGS is comprised of P. membranacea metagenomic DNAs extracted from lichen samples collected at Keldnagil, Iceland. DNAs were processed for sequencing at commercial facilities via the Roche 454 and the Illumina/Solexa methodologies, resulting in 1.76 Gb of 454 data and 1.4 Gb of Illumina data that were assembled using automated procedures. For the present study, additional specimens were collected from the same locality, as well as from other sites in Iceland in 2010–2011 (Table 1). DNA was extracted from one lobe of each specimen using methods described previously (Xavier et al. 2012). The lobes from the new Keldnagil specimens were each processed as three to five fragments of ~1 cm2 in order to examine intrathalline variability. Vouchers of each specimen were saved in the Andrésson lab herbarium.

Table 1 Peltigera membranacea samples used in the study

Homologs of lec-2 were amplified by PCR using the primer pairs lec2-F92 (5′-GTCGTGTCAAATCACTCAAGGTCGG-3′) and lec2-R766 (5′-CCGTAGTCGCCTATATCATCGCA-3′); rbcLX (from end of ribulose bisphosphate carboxylase gene L to start of gene S, ~0.8 Kb) was amplified with CW (5′-CGTAGCTTCCGGTGGTATCCACGT-3′) and CX (5′-GGGGCAGGTAAGAAAGGGTTTCGTA-3′) (Rudi et al. 1998). Amplification reactions were performed in a final volume of 25 μl containing 1x PCR buffer (Fermentas) (20 mM Tris-HCl (pH 8.8 at 25 °C), 10 mM (NH4)2SO4, 10 mM KCl, 1 % (v/v) Triton X-100, 0.1 mg/ml BSA), 2.5 mM MgSO4, 0.2 mM each dNTP, 0.24 μM primers and 0.2 U Taq DNA polymerase. PCR was performed with an initial denaturation step of 2 min at 94 °C, linked to 33 cycles 20 s at 94 °C, 20 s at 55 °C and 50 s at 72 °C, followed by an extension step of 10 min at 72 °C. All lec-2 amplicons from Keldnagil were digested with RsaI and those representing the same lobe were confirmed to have the same restriction pattern (i.e. there was no intrathalline variation) prior to processing one amplicon from each thallus for sequencing. Amplicons were then treated with exonuclease I and alkaline phosphatase (New England BioLabs) and Sanger sequencing of both strands (with above primers) was conducted using the BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems), and analysed on an ABI3100 automated sequencer (Applied Biosystems). DNA sequences were deposited in GenBank as accessions presented in Table 1.

2.2 Identification and characterization of lec-2

The lec-2 gene was identified from Pmb-WGS contigs using the sequence of LEC-1 (accession number JQ899138; Miao et al. 2012) as a query in tblastn (Altschul et al. 1990). The gene was annotated by comparison with known lectin genes and inspection for conserved dinucleotides (GT/AG) at prospective splice sites associated with appropriate frame shifts, and deduced features were confirmed by mapping Illumina RNA-Seq reads from the P. membranacea transcriptome sequencing project (Xavier et al. 2012) using the CLC Genomics Workbench version 5.2 (CLC bio, Denmark). A 3D model of LEC-2 was constructed using the SWISS-MODEL Protein Modelling Server (http://swissmodel.expasy.org/) (Schwede et al. 2003; Arnold et al. 2006; Kiefer et al. 2009).

2.3 Sequence alignment and phylogenetic analyses

Single nucleotide polymorphisms (SNPs) were resolved into haplotypes by determining overlapping Sanger or 454 sequence reads containing identical SNPs (Table 1, Fig. 1). Sequences were aligned using CLUSTALW as implemented in the CLC Genomics Workbench version 5.2 (CLC bio, Denmark). Phylogenetic trees were constructed using maximum likelihood (PhyML 3.0) (Guindon and Gascuel 2003; Guindon et al. 2010) as implemented in Seaview (4.2.12) (Gouy et al. 2010). A model generator was used to identify the best-fit model (Keane et al. 2006) and models TN93 and K80 were used for lec-2 and rbcLX sequences, respectively. Maximum likelihood branch support was determined by statistical analysis using the approximate likelihood-ratio test (Anisimova and Gascuel 2006).

Fig. 1
figure 1

Sampling sites in Iceland and LEC-2 haplotypes. Approximate location of collection site is indicated by position of circled number presenting LEC-2 haplotype. Geographical coordinates are given in Table 1. Pie chart in the lower left represents sample site at Keldnagil where three haplotypes were found in different proportions in 2010–2011

2.4 Test for selection

The number of synonymous substitutions per synonymous site (d S) and the number of nonsynonymous substitutions per nonsynonymous site (d N), and their variances, Var(d S) and Var(d N) were calculated. The null hypothesis of no selection (H0: d N = d S) was tested against the positive selection hypothesis (H1: d N > d S) using the Z test: Z = (d N − d S)/√ Var(d S) + Var(d N). This test determines whether d N/d S ratios are significantly different from 1 (Nei and Gojobori 1986). Calculations were performed using MEGA version 5 (http://www.megasoftware.net) (Tamura et al. 2011).

3 Results

3.1 Identification and characterization of lec-2

The sequence of LEC-1 was used to query the P. membranacea whole genome sequence (Pmb-WGS) for contigs with additional lectin-like genes. A 3.5 kb contig was identified with a 600 nt open reading frame similar to lec-1 (Miao et al. 2012) and the gene was designated as lec-2. The lec-2 gene is comprised of a 241 nucleotide (nt) 5′ untranslated region (UTR) with one intron (54 nt), two coding regions (472 nt, 8 nt) separated by an intron (64 nt), and a 401 nt 3′ UTR. Illumina RNA-Seq data (unpublished) was used to verify the intron junctions. The predicted 17.4 kDa protein of 159 amino acids, LEC-2, was most similar to LEC-1 (GenBank entry JQ899138; 38 % identity) and galectins from the saprotrophic basidiomycete, Coprinopsis cinerea (GenBank entries XP_001836010.2; 36 % identity and XP_001836012; 31 % identity). A search against proteins with structural models gave the best hit to a galectin from another saprotrophic, mushroom-forming basidiomycete, Agrocybe aegerita (PDB accession, 2ZGP_A, 32 % sequence identity).

3.2 Allelic variation in lec-2 and positive selection

The DNA used for the Pmb-WGS was derived from a mixture of thalli, and when the individual 454 reads were inspected, a high level of variation was found. Four alleles, generating four variants of the deduced protein, LEC-2, were observed and the most common amino acid sequence was designated as haplotype 1 (Table 1, Fig. 2). For comparison, in the same material, the housekeeping genes gpd-1 (encoding glyceraldehyde-3-phosphate dehydrogenase, accession JQ837250) and tub-2 (encoding β-tubulin, accession JQ837249) showed no polymorphisms, although the combined length of the coding regions was 3938 nt (Miao et al. 2012). Two of the four LEC-2 haplotypes were again observed when samples of single thalli collected later from the same site were examined individually (Table 1), thus validating the results from whole genome sequencing.

Fig. 2
figure 2

LEC-2 allelic variation in P. membranacea. XBB13 (haplotype 1) is reference; galectin signature residues of the carbohydrate recognition domain are identified at the top (human galectin-1 numbering; Hirabayashi and Kasai 1988). Amino acid substitutions are indicated by shading. LEC-2 haplotypes are indicated within square brackets and * indicates haplotypes found in the Pmb-WGS project

The lec-2 genes of single lobes collected from 13 additional sites across Iceland were investigated by Sanger sequencing (Table 1). From 21 thalli, a total of 11 protein haplotypes were distinguished (Fig. 2) based on a total of 14 nonsynonymous nucleotide substitutions and one synonymous nucleotide substitution in the coding region of lec-2. According to the hypothesis of selective neutrality, the pairwise ratio of d N/d S, (number of nonsynonymous substitutions per nonsynonymous site and the number of synonymous substitutions per synonymous site) is equal to 1 (Nei and Gojobori 1986). The d N/d S ratio in the lec-2 coding region was found to be 5.1, and application of the Z test (Tamura et al. 2011) showed that this deviation from neutrality was highly significant (p = 0.003).

3.3 Three dimensional (3D) model of LEC-2

A 3D model for P. membranacea LEC-2 was generated based on the crystal structure of a galectin from A. aegerita (PDB accession 2ZGP_A). Of seven canonical positions associated with ligand binding (human galectin-1 numbering; Hirabayashi and Kasai 1988), LEC-2 showed conservation in five positions and substitutions at two: H44S and R73L/S. In this model, amino acid replacements were found mainly on the well-exposed peripheral surface of the protein, rather than in the predominant β-sheet structures (Figs. 2 and 3). About half of the polymorphic sites were on the periphery of the carbohydrate-binding pocket and one-third was in a second fairly localized area.

Fig. 3
figure 3

(a, b) Two views of a 3D model of LEC-2. A ball and stick structure representing the residue of the reference sequence is shown at each polymorphic position; substitutions are indicated by a one-letter amino acid code and galectin signature residues of the carbohydrate recognition domain are represented as cylinders. b) Is rotated 90º clockwise relative to a) and tilted slightly

3.4 Correlation of LEC-2 haplotypes and photobiont rbcLX genotypes

The association of particular mycobiont and photobiont strains was investigated by examining the correlation between the LEC-2 haplotype and the genotype of the corresponding photobiont. Since the genome of the Nostoc is comprised of a single chromosome, i.e. one linkage group for all loci, the sequence of one locus such as rbcLX can be used as a proxy for its strain type, and by inference, its possible lectin ligand. Thus, pairs of sequences covering the coding region of lec-2 from the mycobiont and the rbcLX region of the photobiont from each of the 21 individually collected P. membranacea thalli were obtained and the sequences for each marker were aligned and used for constructing phylogenetic trees. The two trees showed different topologies (Fig. 4). Different Nostoc strains were found to be associated with a single LEC-2 haplotype; e.g. Nostoc strains associated with the most common LEC-2 haplotypes, 1 and 9, were distributed throughout the rbcLX phylogenetic tree. This suggested a low specificity for the photobiont, with a broad range of photobionts being compatible with some mycobiont LEC-2 haplotypes.

Fig. 4
figure 4

Phylogenetic trees of LEC-2 haplotype sequences (left) and rbcLX sequences (right). Bar indicates substitutions per site. Numbers at the branch points show approximate likelihood ratios. Haplotype numbers (square brackets) are shown for each clade. Dotted lines connect sequences from the same sample

4 Discussion

Lectin-like genes have been identified in various organisms including slime molds (Huh et al. 1998) and fungi (Nowrousian and Cebula 2005; Singh et al. 2010, 2011; Miao et al. 2012). In this study, we describe the lec-2 gene from the mycobiont of the lichen P. membranacea. The high sequencing coverage in the Pmb-WGS and typical eukaryotic sequence features, including the presence of an intron and fungal splicing signals, support the conclusion that lec-2 is a mycobiont gene. Alignment of LEC-2 with LEC-1 (Miao et al. 2012) revealed 38 % identity and the primary structure of the LEC-2 sequence showed similarity to fungal galectins CGL1 and CGL2 from C. cinerea, both involved in fruiting body development (Walser et al. 2005). The structural model of LEC-2, based on a galectin from A. aegerita, showed galectin-type ligand coordinating residues around a presumptive carbohydrate-binding pocket (Fig. 3).

In contrast to the total lack of SNPs in the previously described lec-1 (Miao et al. 2012), considerable sequence diversity was found in the coding region of lec-2 when data from the Pmb-WGS project was examined. While intra-lobe samples consistently yielded a single restriction pattern, a total of 14 amino acid substitutions was found in 21 samples from 14 geographical locations. This number is high compared with what has been described in other taxa of lichen-forming fungi (Werth 2010), given that we examined only 21 thalli, suggesting that there may be strong selection for changes in the functional properties of the LEC-2 protein. Modeling of the LEC-2 structure showed that the substitutions occurred primarily at well-exposed sites on the peripheral surface of the protein (Fig. 3). Amino acids characteristic for the galectin carbohydrate recognition domain R48, N61, W68, E71 (Fig. 2) were found near the proposed sugar-binding pocket, but substitutions in the carbohydrate recognition domain-motif were identified at positions H44S, V59I and R73L/S. Replacements at H44 and R73 have been observed in a galectin found in the mycorrhizal basidiomycete Laccaria amethystina (Lyimo et al. 2011), while the conservative replacement V59I has been reported in various galectins, including that of A. aegerita (Yagi et al. 2001). In the galectin from C. cinerea, a W68R substitution in the carbohydrate recognition domain is known to be critical in determining chitooligosaccharide versus lactose binding (Wälti et al. 2008), while in a galectin from Agrocybe cylindracea, residues upstream of the carbohydrate recognition domain motif were influential in determining sialic acid binding (Ban et al. 2005). In the galectin of A. aegerita, substitutions H59A and R63H affected the lactose binding ability (Yang et al. 2009). Thus, substitutions at or close to the galectin carbohydrate recognition domain in LEC-2 are likely to alter the specificity of ligand binding, possibly enabling discrimination of surface sugars on an interacting organism. Additionally, amino acid substitutions at the surface of the protein may alter multimerization, often involved in effective ligand binding and pattern recognition (Sharon and Lis 2004).

Comparison of the relative frequencies of synonymous and nonsynonymous nucleotide substitutions indicated significant positive selection in LEC-2. The selection is most likely due to an important interaction with a factor that shows clear variation, ultimately expressed in differences in ligand coordinating residues in the carbohydrate recognition domain. This driving force for selection on lec-2 is unlikely to be ascribed to physical factors such as climate or niche, as multiple haplotypes exist within a topographically homogeneous sampling site (Keldnagil), and the same haplotype can be found at geographically distant sites representing both coastal and highland habitats. It appears more likely that the selective response is due to biotic factors associated with symbiont partners, pathogens or parasites.

The hypothesis that evolution of LEC-2 is driven by interaction with different strains of Nostoc photobionts was tested by comparing the mycobiont LEC-2 haplotypes and the photobiont-derived rbcLX sequences. For genotyping of Nostoc symbionts, we chose the rbcLX sequence marker, since it has been shown to be more suitable for Nostoc typing than 16S rDNA and tRNALeu (intron) sequences (Otálora et al. 2010). We found no significant correspondence between the Nostoc genotypes and the mycobiont LEC-2 haplotypes. Thus, LEC-2 does not seem to be a determinant of photobiont partner choice in P. membranacea. As the gene(s) coding for the LEC-2 ligand are unknown, there is uncertainty about their linkage to rbcLX, but we assumed linkage within each Nostoc strain, as a low degree of recombination is generally observed in clonal bacterial populations (Didelot and Maiden 2010).

Future experimental determination of the nature of the LEC-2 ligand(s) and various LEC-2 haplotypes may help shed light on the interaction of LEC-2 and its ligand, and the role it plays in the biology of P. membranacea. Further research should also focus on the distribution of lec-2 homologs in the genus Peltigera as well as the origin and divergence of lec-2 in P. membranacea.