Background

Potato (Solanum tuberosum L.) is one of the most widely consumed carbohydrate-rich staple foods in large parts of the world; it is the fourth largest food crop in production [1]. Potato is mainly used as a staple food, but it also has a number of medicinal values. Moderate consumption of the juice from the tubers is used in the treatment of peptic ulcers, bringing relief from pain and acidity [2].

Pathogenesis-related proteins, often called PR proteins, are a structurally diverse group of plant proteins that are toxic to invading fungal pathogens. They are widely distributed in plants in trace amounts, but are produced in much greater concentrations following pathogen attack or stress. PR proteins exist in plant cells intracellularly and also in the intercellular spaces, particularly in the cell walls of different tissues. Varying types of PR proteins have been isolated from each of several crop plants. Different plant organs, e.g., leaves, seeds, and roots, may produce different sets of PR proteins. Different PR proteins appear to be expressed differentially in their hosts in the field when temperatures become stressful, low or high, for extended periods [3].

The several groups of PR proteins have been classified according to their function, serological relationship, amino acid sequence, molecular weight, and certain other properties. PR proteins are either extremely acidic or extremely basic and therefore are highly soluble and reactive. At least 14 families of PR proteins are recognized. Among these pathogenesis-related proteins, glucan endo-1,3-beta-glucosidases (β-1,3-glucanases) are one important hydrolytic enzyme that is abundant in many plant species after infection by different types of pathogens. The amount of them significantly increases and plays a major role in defense reaction against fungal pathogens by degrading the cell wall, because β-1,3-glucan is a structural component of the cell walls of many pathogenic fungi. Glucan endo-1,3-beta-glucosidase appears to be coordinately expressed along with chitinases after fungal infection. This co-induction of the two hydrolytic enzymes has been described in many plant species, including pea, bean, tomato, tobacco, maize, soybean, potato, and wheat [4,5,6,7,8,9,10,11]. In addition to their roles in pathogen defense, glucan endo-1,3-beta-glucosidases have been implicated in cell division, pollen development, pollen tube growth, regulation of plasmodesmata signaling, cold response, seed germination, and maturation [12].

Glucan -1,3-beta-glucosidase forms highly complex and diverse gene families in plants, and a single plant species may have various copies of glucan-1,3-beta-glucosidase genes [12]. The glucan -1,3-beta-glucosidases are the enzymes which can cleave the beta glycosidic linkages of glucans. They can be divided into two groups, exo or endo. The exo-hydrolases catalyze the hydrolysis of the beta-glucan chain by sequentially cleaving glucose residues from the non-reducing end and releasing glucose as the sole hydrolysis product. The endo-hydrolases cleave β-linkages at apparently random sites along the polysaccharide chain, releasing smaller oligosaccharides [13]. The enzyme glucan-1,3-beta-glucosidase is important to delay the growth of pathogenic fungi and to decrease the damage caused by disease in fruits. The application of this enzyme is possible due to the composition of the cell walls of certain microorganisms which contain β-glucans [14].

Many studies have shown that the synthesis of glucan endo-1,3-beta-glucosidase is stimulated when plants are infected by fungal, bacterial, or viral pathogens, and its concentration also increases dramatically. For instance, mRNA for a tomato glucan endo-1,3-beta-glucosidase accumulated to a higher level in leaves infected with the fungal pathogen Cladosporium fulvum [15], barley infected with powdery mildew [16], maize infected with Aspergillus flavus [17], pepper infected with Phytophthora capsici, wheat infected with Fusarium graminearum [11], chickpea infected with Ascochyta rabiei (Pass.) Labr [18]., and peach infected with Monilinia fructicola [19]. Scientists throughout the world have tried to analyze or predict the regulatory elements of pathogen-related genes in higher plants whose expression products have an inhibitory effect on microorganisms such as fungi. However, only a small percentage of PR genes have been investigated.

To the best of our knowledge, there is no report that evaluates the regulatory elements of glucan endo-1,3-beta-glucosidase genes in potato (Solanum tuberosum L). Moreover, owing to the crucial roles of glucan endo- 1,3-beta-glucosidase genes in the plant defense system, it is imperative to understand and analyze the promoter region and regulatory elements of glucan endo-1,3-beta-glucosidase genes in Solanum tuberosum. The knowledge will contribute to our understanding of the expression profiles and regulatory mechanism of glucan endo-1,3-beta- glucosidase genes. It also provides a promising target for genetic engineering for improved glucan endo-1,3-glucosidase expression in potato and uplifts the level of defense response in potato against fungal pathogens and develops disease-resistant transgenic potato, which is an environmentally friendly approach of a disease control method.

Methods

A total of 27 whole genome shotgun gene sequences of glucan endo-1,3-beta-glucosidase for Solanum tuberosum cultivar DM 1-3 516 R44 were retrieved from the NCBI database available at https://www.nlm.nih.gov/gene; of these, 19 of them were selected for analysis, while the remaining eight gene sequences were excluded from this analysis because they were not having the functional gene structure (many stop codons appear in the middle and the reading frame was highly fragmented), after checking with CLC Genomics Workbench ver. 3.6.1 (http://clcbio.com, CLC bio, Aarhus, Denmark) (Table 1).

Table 1 List of the glucan endo-1,3-beta-glucosidase genes of Solanum tuberosum cultivar DM1-3 156R44 selected for analysis

Finding of transcription start sites and determination of promoter sequence

Glucan endo-1,3-beta-glucosidase gene sequences of Solanum tuberosum cultivar DM 1-3 516 R44 were downloaded in FASTA file from NCBI Genome Browser, and 1-kb DNA sequences upstream ATG were used as an input file for determining the transcriptional start sites (TSSs) for the retrieved genes. The Neural Network Promoter Prediction (NNPP version 2.2) tool set was used with the minimum standard predictive score (between 0 and 1) available at https://www.fruitfly.org/seq_tools/promoter.html [20]. For those regions containing more than one TSS, the highest prediction score was considered.

Motif discovery and comparison of the discovered motif against a database of known motifs

Motif discovery was performed by MEME suite (Multiple Em for Motif Elicitation) software version 3.5.4 available at http://meme-suite. org/tools/meme using minimum and maximum motif width of 6 and 50 bp, respectively, and a maximum number of 3 motifs; the rest of the parameters were kept at default. The MEME output was shown in HTML, as well as in several other formats. The motif with the least E-value was used for comparison against a database of known motifs using TOMTOM and ranked the motifs in the database and produce an alignment for each significant match [21]. TOMTOM reported for each query a list of target motifs, ranked by p-value and q-value of each match [22]. TOMTOM also displayed putative transcription factors (TFs) that resemble the TFs of glucan endo-1,3-beta-glucosidase genes. Finally, after identification of those putative TFs interacting with DNA motif, the role of the TFs was described.

CpG island analysis

Sequences of 2000 bp upstream ATG for each glucan endo-1,3-beta-glucosidase gene of Solanum tuberosum cultivar DM 1-3 516 R44 were downloaded in FASTA format from NCBI (https://www.ncbi.nlm.nih.gov/), and the bioinformatics prediction of CpG islands was analyzed using CLC Genomics Workbench ver. 3.6.1 (available at http://clcbio.com, CLC bio, Aarhus, Denmark). Searching for MspI cutting sites (fragment sizes between 40 and 220 bp) is relevant for the detection of CGIs, because studies using whole genome CpG island libraries prepared for different species revealed that CpG islands are not randomly distributed but are concentrated in particular regions, because CpG-rich regions are achieved by isolation of short fragments after MspI digestion that recognizes CCGG sites [23]. The parameter setting was as follows, with a guanine and cytosine (GC) content greater than or equal to 55% and observed to expected CpG ratio (Obs CpG/ExpCpG) greater than or equal to 0.65 and length ≥500 bp [24].

Mining glucan endo-1,3-beta-glucosidase genes for simple sequence repeats

The 19 query sequences of glucan endo-1,3-beta-glucosidase genes of Solanum tuberosum cultivar DM 1-3 516 R44 were screened to detect di-, tri-, tetra-, penta-, and hexanucleotide simple sequence repeat (SSR) motifs using the SSRIT tool available at Gramene database (http://www.gramene.org/db/searches/ssrtool). After a thorough examination, the output was generated with details of the repeat motif, number of repeat units, repeat length, SSR start, and SSR end point [25].

Phylogenetic relationship analysis

The phylogenetic analysis was inferred using the UPGMA method [26]. The analysis involved 40 glucan endo-1,3-beta-glucosidase gene sequences selected from Solanum tuberosum, Nicotiana tabacum, Solanum lycopersicum, and Arabidopsis thaliana [26]. The genetic distances were computed using the p-distance method [27]. Codon positions included were 1st+2nd+3rd+Noncoding. All ambiguous positions were removed for each sequence pair (pairwise deletion option). The phylogenetic analysis, genetic distances, conserved sites, variable sites, and base composition of the gene sequences were conducted using the Molecular Evolution Genetic Analysis X32 (MEGA X32) available at https://www.megasoftware.net/ [28].

Results

Finding of transcription start sites and determination of promoter sequence

Transcription start sites (TSSs) predicted for each of the 19 study subjects are presented in Table 2. The prediction showed that the glucan endo-1,3-beta-glucosidase genes of Solanum tuberosum cultivar DM 1-3 516 R44 had TSSs ranging from 1 to 3. The predictive score for the majority 16 (84.2%) of the promoter regions was 0.90 and above. The highest promoter prediction score (1.0) was obtained for two gene sequences only (Pro-102604922 and Pro-102581946) while the lowest promoter prediction score (0.8) was obtained in none of them (Table 2). In addition, the result of promoter predictions for glucan endo-1,3-beta-glucosidase gene sequences with a cutoff value of 0.80 showed that the majority 12 (63.2%) of the gene sequences showed only one TSS, while 7 (36.8%) of them revealed multiple TSSs.

Table 2 Number and predictive score for glucan endo-1,3-beta-glucosidase genes of Solanum tuberosum cultivar DM 1-3 156 R44 TSSs

In general, the TSSs of gene sequences were located between the range of −79 and −2900 bp relative to the translation start codon (ATG), with a relatively highest occurrence in the region above −1000 bp (5 sequences), followed by −201 to −400 bp and -601 to −800 bp regions (4 sequences, each), −1 to −200 bp (3 sequences), and −401 to −600 (2 sequences), while the lowest occurrence was observed at −801 to −1000 bp (1 sequence).

Discovery of common motifs and associated TFs in the promoter regions

In the current study, five candidate motifs that were shared by glucan endo-1,3-beta-glucosidase gene promoter sequences of Solanum tuberosum cultivar DM 1-3 516 R44 were discovered (Table 3). The relative location and spatial distribution of the majority of the discovered common motifs were concentrated between +1 and −500 bp of the TSSs. MEME generated common candidate motifs for 18/19 of the gene promoter sequences. It is also interesting to notice that the discovered motifs were distributed on both positive and negative strands with 30 and 25, respectively, as shown in Fig. 1.

Table 3 Identified common candidate motifs in Solanum tuberosum DM 1-3 156 R44 glucan endo-1,3- beta-glucosidase gene promoter regions
Fig. 1
figure 1

The discovered motifs in glucan endo-1,3-beta-glucosidase genes of Solanum tuberosum cultivar DM 1-3 516 R44

To determine a candidate common promoter motif which is functionally important, a motif which was shared by the majority of promoter regions of Solanum tuberosum glucan endo-1,3-beta-glucosidase genes was selected. Among the five motifs, MβG II was identified as a common promoter motif shared by 94.4% of Solanum tuberosum glucan endo-1,3-beta-glucosidase promoters. A common promoter motif serves as binding sites for transcription factors involved in gene expression and regulation of these genes. A sequence logo for MβGII generated by MEME is presented in Fig. 2. Moreover, further analysis was carried out to get more information on the MβGII motif of the potato (Solanum tuberosum DM 1-3 156 R44) glucan endo-1,3-beta-glucosidase genes. Thus, MβGII was compared to registered motifs in publicly available databases to see if they are similar to known regulatory motifs.

Fig. 2
figure 2

Sequence logo for the identified common motif MβGII for glucan endo-1,3-beta-glucosidase genes of Solanum tuberosum cultivar DM1-3 156 R44

Discovery of matches to the query motif

Among the discovered five common candidate motifs, MβGII with the E value of 3.5e−001 was used as a query motif for comparison against a database of JASPAR2018_CORE_vertebrates non-redundant uniprobe_mouse of known motifs using TOMTOM web application [21]. The analysis showed that the query motif MβGII serves as binding sites for 8 transcription factors, namely, (MA0016.1(usp), MA0359.1(RAP1), MA0159,1(RARA: RXRA), MA1149.1 (RARA: RXRG), MA0258.2(ESR2), UP00070_2(Gcm1_ secondary), MA0450.1(hkb), and MA0801.1(MGA). As we tried to check the role of the identified TFs in the UniProt protein database, they act as a receptor to their target ligands, regulate gene expression in various biological processes and developments, involved in cell adhesion and cell junction formation, and act as a repressor or activator (Table 4).

Table 4 List of matches to the query motif from the database JASPAR2018_CORE_vertebrates_non redundant and Uniprobe mouse

CpG island analysis

In the present study, CpG island analysis of the promoter region was investigated using in silico digestion method (using restriction enzyme MspI) and the result showed low CpG density in the investigated regions. Fragments were observed only in gene ID: 102593331 and 102595860 (Table 5). The presence of low-density CpG islands might be associated with selective gene expression at a specific tissue.

Table 5 MspI cutting sites and fragment sizes for glucan endo -1,3-beta-glucosidase genes in the promoter regions

SSR motif occurrence in sequences

In the present study, 265 different SSR motifs ranging in size from 2 to 6 (dimer to hexamer) and in number from 2 to 9 per gene were detected in the gene sequences of Solanum tuberosum cultivar DM 1-3 516 R44 examined, shown in supplementary table 1. Dimer motifs such as ac, at, ag, ca, ct, ga, gt, ta, and tc were found in the majority (95%) of the gene sequences. Assuming the presence of a large number of tandem repeats, their effects are likely to occur in the glucan endo-1,3-beta-glucosidase gene of Solanum tuberosum cultivar DM 1-3 516 R44. Gene sequences with the highest number of dimer repeats are shown in Table 6.

Table 6 Gene sequences with the highest number of dimer repeats

Genetic divergence among gene sequences from different plant species

The genetic distance was assessed using 40 gene sequences (supplementary table 2). A total of 5812 positions or sites were found in the final dataset. The genetic distance among the gene sequences ranged from 0.685 to 0.770. Gene ID:102605428 and ID:102578810 recorded the least genetic distance (0.685); both are from the same species Solanum tuberosum. Meanwhile, the highest genetic distance (0.77) was estimated between ID:102581946 in Solanum tuberosum and ID:832156 in Arabidopsis thaliana and between ID:107820469 in Nicotiana_tabacum and ID:834215 in Arabidopsis thaliana, each. The overall mean genetic distance was calculated as 0.73, and this shows a narrower genetic diversity range among the sequences. The distance matrix is shown in supplementary table 3.

Phylogenetic relationships of glucan endo-1,3-beta-glucosidase gene sequences

The phylogenetic tree resulted in seven clusters: cluster I comprised of 9 gene sequences, 3 from Nicotiana tabacum, 2 from Arabidopsis thaliana, 3 from Solanum tuberosum, and 1 from Solanum lycopersicum; cluster II comprised of 8 gene sequences, 5 from Nicotiana tabacum, 2 from Solanum tuberosum, and 1 from Solanum lycopersicum; cluster III comprised of 7 gene sequences, 5 from Solanum tuberosum, 1 from Nicotiana tabacum, and another 1 from Arabidopsis thaliana; cluster IV comprised of 4 gene sequences, 2 from Arabidopsis thaliana, 1 from Nicotiana tabacum, and 1 from Solanum tuberosum; cluster V consisted of 3 gene sequences entirely from Solanum tuberosum; cluster VI comprised of 4 gene sequences, 2 from Nicotiana tabacum, 1 from Solanum lycopersicum, and 1 from Solanum tuberosum; and cluster VII comprised of 2 gene sequences mainly from Solanum tuberosum. Meanwhile, two gene sequences from Solanum tuberosum and one from Arabidopsis thaliana were individually isolated from the clusters (Fig. 3).

Fig. 3
figure 3

UPGMA phenogram illustrating the relationships among the glucan endo-1,3-beta-glucosidase gene sequences grouped by gene ID and scientific name

Multiple sequence alignment of the gene sequences

The multiple sequence alignment was conducted using the Clustal Omega algorithm available online at https://www.ebi.ac.uk/Tools/msa/. The result ranges from 24.4% (between ID107820469 and ID102605428) to 95.2% (between ID107803828 and ID107824944) shown in supplementary table 4. The number of conserved sites, variable sites, and the frequency of nucleotide bases is mentioned in Table 7. Gene ID102601178 in Solanum tuberosum had the lowest rate for both conserved sites and variable sites, accounting for 7.5% and 20.7%, respectively, whereas gene ID102589208 in Solanum tuberosum had the greatest value (28.8%) for conserved sites and gene ID832156 in Arabidopsis thaliana had the highest proportion (76.1%) for variable sites.

Table 7 Number of conserved sites, variable sites, and frequency of each nucleotide

Discussion

Finding of transcriptional start site (TSS) triggers the prediction of the promoter region and thus simplifies the subsequent analysis of gene expression. In the present in silico analysis, the number of TSSs per gene sequences was 1 to 3, and the majority 12 (63.1%) of the gene sequences had a single transcription start site, consistent with the previous finding by [29], who reported that 62.1% of the gene sequences contained single TSS. However, in most in silico analysis studies, it has been reported that most genes have more than one TSS [30,31,32,33,34]. In the present study, it was also revealed that the locations for 42% of the TSSs were below −500 bp relative to the ATG. However, several authors reported that the location of the TSSs of the majority (>50%) of the gene sequences studied was below −500 bp relative to ATG [35,36,37,38].

Patterns of gene expression (conditionally or temporally) have been linked to transcription regulation [39]. The common promoter motif is short DNA segments that serve as binding sites for TFs involved in gene expression regulation [31]. In the present study, the common promoter motif was found in 18 (94.4%) of the promoter sequences investigated. Some studies reported the sharing of a common promoter motif by all the promoter sequences (100%) [29, 32]. The discovery of matches to the query sequence showed that the query motif serves as binding sites for 8 transcription factors, involved in the regulation of gene expression as a receptor, transcription factor, or repressor in various biological processes (Table 4).

Several studies reported that CpG islands (CGIs) play an important role in the regulation of gene expression [40]. DNA of plant species has been shown to contain more CpG dinucleotides than human DNA [41]. Methylation of cytosine at CpG islands has been shown to restrict the access of promoter region of genes to their transcription factors, hence preventing their expression [42]. Consistent with the present analysis, low CpG content was reported in the promoter region of rice PR2 (beta 1,3-glucanase) genes but none is identified in the promoter region of all the families of Arabidopsis thaliana PR gene families [43]. The absence of CpG islands in glucan endo-1,3-beta-glucosidase gene (PR2) might be indicative of tissue-specific gene expression. Ferguson and Jiang [44] also showed that dicots such as potato genome contain low CpG density than monocots. Conversely, Gardiner-Garden and Frommer [45] reported that, in plants, high-density CpG islands tended to lie near the 5′-ends (towards the promoter region) of housekeeping genes which is associated with broad expression of these genes.

In the current study, the cluster analysis showed that the gene sequences from different plant species clustered together. In our results, the range of conserved sites was between 7.5 and 28.8% while the range of variable sites was between 20.7 and 76.1%. Though the percentage range of variable sites was wider than the conserved sites, the phylogeny showed the opposite relationship.

In the present study, the SSR motifs ranged in size from 2 to 6 (dimer to hexamer), and the number of SSR motifs per gene ranged from 2 to 9. The SSR motif analysis also revealed that there is lack of significant variation in the repetition number of the SSR motifs between gene sequences of the different plant species and lack of differences within the repetitive SSR motifs between gene sequences within species. As it is already known, the presence of SSRs within genes can lead to (i) a gain or loss of gene function, (ii) affect transcription and translation, (iii) mRNA splicing, or (iv) export to the cytoplasm. All these effects eventually lead to phenotypic changes [42]. Most often, the length of the simple sequence repeat (SSR) motif does not exceed nine nucleotides and is referred to as short tandem repeats (STRs) or SSRs, or microsatellites. Short tandem repeats are associated with a higher frequency of mutation, affecting DNA sequence composition and length [46].

CGIs are known to concentrate near the transcription start sites (TSSs) of genes. Genes that possess CGIs are often highly expressed in multiple tissues. In the current study, CpG island analysis of the promoter region showed a low density of CpG islands. Possibly, low CpG island density could be one reason for the lack of divergence between gene sequences. According to Prendergast et al. [47], CpG island poor regions are not subjected to evolutionary divergence. Moreover, due to the lack of significant differences in the number of repetitions of SSR motifs between gene sequences of the different plant species and lack of differences within the repetitive SSR motifs between gene sequences within species, the phylogenetic analysis did not show a clear and defined phylogenetic relationship. Therefore, further analysis of CpG islands and their convergence into TSSs of genes and involvement in evolutionary divergence will pave the way for a greater understanding of their roles in gene expression and gene evolution.

Conclusion

The major aim of this work was to explore regulatory elements that can determine the expression of glucan endo-1,3-beta-glucosidase genes of Solanum tuberosum cultivar DM 1-3 516 R44. Consequently, the study showed transcription factors that serve as receptors, activators, and/or repressors of glucan endo-1,3-beta-glucosidase gene. In addition, transcription start sites, promoter regions, SSR motifs, and CpG islands in glucan endo-1,3-beta-glucosidase gene that plays role in the process of gene expression regulation were identified. The phylogenetic analysis revealed that the clustering patterns of the gene sequences were not entirely based on taxa. In general, this in silico analysis would allow for the understanding of regulatory mechanisms involved in glucan endo-1,3-beta-glucosidase gene expression and helps to identify gene regulatory elements in the promoter regions.