Identification and evolution of C4 photosynthetic pathway genes in plants

Shi, Weiping; Yue, Linqi; Guo, Jiahui; Wang, Jianming; Yuan, Xiangyang; Dong, Shuqi; Guo, Jie; Guo, Pingyi

doi:10.1186/s12870-020-02339-x

Identification and evolution of C₄ photosynthetic pathway genes in plants

Research article
Open access
Published: 30 March 2020

Volume 20, article number 132, (2020)
Cite this article

Download PDF

You have full access to this open access article

BMC Plant Biology Aims and scope Submit manuscript

Identification and evolution of C₄ photosynthetic pathway genes in plants

Download PDF

Weiping Shi¹,
Linqi Yue¹,
Jiahui Guo¹,
Jianming Wang¹,
Xiangyang Yuan¹,
Shuqi Dong¹,
Jie Guo¹ &
…
Pingyi Guo¹

7 Citations
1 Altmetric
Explore all metrics

Abstract

Background

NADP-malic enzyme (NAPD-ME), and pyruvate orthophosphate dikinase (PPDK) are important enzymes that participate in C₄ photosynthesis. However, the evolutionary history and forces driving evolution of these genes in C₄ plants are not completely understood.

Results

We identified 162 NADP-ME and 35 PPDK genes in 25 species and constructed respective phylogenetic trees. We classified NADP-ME genes into four branches, A1, A2, B1 and B2, whereas PPDK was classified into two branches in which monocots were in branch I and dicots were in branch II. Analyses of selective pressure on the NAPD-ME and PPDK gene families identified four positively selected sites, including 94H and 196H in the a5 branch of NADP-ME, and 95A and 559E in the e branch of PPDK at posterior probability thresholds of 95%. The positively selected sites were located in the helix and sheet regions. Quantitative RT-PCR (qRT-PCR) analyses revealed that expression levels of 6 NADP-ME and 2 PPDK genes from foxtail millet were up-regulated after exposure to light.

Conclusion

This study revealed that positively selected sites of NADP-ME and PPDK evolution in C₄ plants. It provides information on the classification and positive selection of plant NADP-ME and PPDK genes, and the results should be useful in further research on the evolutionary history of C₄ plants.

The structure, functional evolution, and evolutionary trajectories of the H+-PPase gene family in plants

Article Open access 02 March 2020

The coordination of major events in C4 photosynthesis evolution in the genus Flaveria

Article Open access 02 August 2021

Investigating the NAD-ME biochemical pathway within C4 grasses using transcript and amino acid variation in C4 photosynthetic genes

Article 04 August 2018

Background

Photosynthesis is the process used by plants to convert solar energy into chemical energy. This enables them to produce their own food for development [1]. Photosynthesis in higher plants can be classified into C₃, C₄ and Crassulacean acid metabolism (CAM) based on how they fix carbon during the process that leads to different initial photosynthesized products. The majority of land plants use the C₃ pathway, whereas C₄ and CAM plants were evolved from C₃ plants [2, 3]. C₄ plants are more efficient than C₃ plants in utilizing CO₂ leading to superior adaptiveness to subtropical and tropical environments, lower concentrations of CO₂, and more stressed environments [4]. Numerous studies have focused on understanding the efficiency and the mechanism of carbon fixation in C₄ plants [5, 6].

Among the many enzymes involved in the C₄ photosynthesis pathway PPDK and NADP-ME are considered to be the most important [7, 8].

PPDK is a critical enzyme that controls the photosynthetic rate in C₄ plants [9]. Many PPDK genes in C₄ and CAM plants have been cloned, exemplified by those in maize and Mesembryanthemum crystallinum [10, 11]. A phylogenetic study suggested that PPDK genes in sorghum and rice are homologous [12]. Detailed analysis of PPDK isoform sequences between the Poaceae and Arabidopsis indicated that their sequences share about 20 amino acids of chloroplast transit peptide (cTP), proving that the PPDK genes had evolved before divergence of monocots and dicots [12].

NADP-ME genes can be classified into photosynthetic and non-photosynthetic types. The former mostly function in the chloroplasts [13] and improve photosynthetic efficiency by facilitating the release of CO₂ from decarboxylation of malate in proximal bundle-sheath cells, and in C₄ plants by providing CO₂ to Rubisco for carbon fixation [14, 15]. Genomic and phylogenetic analyses showed that the NADP-ME gene family in the Poaceae has four branches, with one branch (NADP-ME IV) being expressed in the plastids. The C₄-specific NADP-ME has some codons suppressed under positive selection and is independent of the NADP-ME IV family [16, 17].

Natural selection, a key factor in biological evolution, includes positive selection, purifying selection, and neutral selection [18]. The base substitution rate (non-synonymous/synonymous, ω = dN/dS), an index that determines selection pressure after change, is typically used to understand the direction of evolution and its selective strength in a coding sequence. If ω > 1, a gene might undergo positive selection or presence of a new amino acid offers a fitness advantage; ω =1 is indicative of neutral selection; and a value of ω < 1 indicates purifying selection [19]. As an important basis of adaptive evolution, positive selection functions in a population by favorable transmission and increased frequency of a mutant allele [18].

Positive selection often implies the emergence of a new function [19, 20]. In transformation of the C₃ to C₄ pathway positive selection mainly occurred in key enzymes in C₄ photosynthetsis, such as Rubisco, phosphoenolpyruvate carboxylase (PEPC), NADP-ME, and PPDK [12, 21,22,23,24,25,26]. For example, two positively selected large subunit (LSu) amino acid substitutions, M309I and D149A, distinguish C₄ Rubiscos from the ancestral C₃ species [21]. With the switch to C₄, 21 amino acids evolved under positive selection and converged to similar or identical amino acids in most of the grass C₄ PEPC lineages [22]. Acquisitions of C₄ PEPC in sedges (Cyperaceae) were driven by positive selection on at least 16 codons [23]. Previous studies used variation in amino acids to study rates of evolution in the C₄-NADP-ME pathway, and a number of residues was found to be under significant positive selection [24]. During independent evolution of NADP-ME in C₄ plants strong positive selection led to sequence convergence [25]. For example, among the 29 residues of C₄ NADP-MEs and non C₄ NADP-MEs, residues 284, 450 and 539 were identified as having been under positive selection during evolution of C₄-NADP-ME in grasses, suggesting they were important in explaining kinetic and structural differences between C₄ and non-C₄ groups [26]. Phylogenetic analysis also suggested that the maize PPDK gene and its sorghum ortholog were under significant positive selection, implying possible functional changes [12].

The underlying molecular mechanisms of C₄ photosynthesis are poorly understood and few studies have been directed to understanding whether positive selection was associated with evolution of NADP-ME and PPDK in C₄ plants. Completion of the whole genome sequences of C₄ plants such as sorghum and maize [27, 28], and improved knowledge of photosynthetic pathways and evolution, have set a solid foundation for study of the evolution and expression of key C₄ enzyme genes. A comparison of the PPDK and NADP-ME gene families in C₄ plants could advance knowledge of the evolutionary, functional and metabolic roles of these genes during photosynthesis. This study investigated the evolutionary processes in NADP-ME and PPDK in algal, moss, Lycopodiophyta, monocotyledon and dicotyledon species, providing new information regarding C₄ photosynthesis.

Results

Numbers of NADP-ME and PPDK genes in plants

A total 162 NADP-ME and 35 PPDK sequences were found in 25 species, including one algal, one moss, one Lycopodiophyta, 10 monocot (including 6 C₄), and 12 dicot (including 1 C₄) species (Additional file 1: Table S1; Additional file 2: Table S2). There were 14 NADP-ME genes in soybean. Carrot, cotton and poplar each had 9 NADP-ME genes and Selaginella moellendorffii had 3 (Additional file 1: Table S1). The number of PPDK genes was far fewer, with the largest number being 3 in the banana species Musa acuminata. Most other species had only 1 or 2 PPDK genes (Additional file 2: Table S2).

Analysis of conserved amino acid sequences in NADP-ME and PPDK proteins

The MEME program used to analyze conserved sequences in NADP-ME and PPDK proteins identified 20 motifs (Additional file 3: Table S3; Additional file 4: Table S4). Among NADP-ME genes, those from algae (Cre14.g629700.t1.1, Cre14.g628650.t1.2, Cre14.g629750.t2.1, Cre01.g022500.t1.2) did not contain motifs 13, 14, 15, 17 and 19. Subfamily A had two unique motifs, 17 and 19, whereas subfamily B had three unique motifs, 13, 14 and 15 (Additional file 5: Figure S1). The PPDK gene in green algae lacked motif 15, whereas all other PPDK genes had all 20 candidate motifs (Additional file 6: Figure S2).

Phylogeny of NADP-ME and PPDK

We constructed a phylogenetic tree for all 162 NADP-ME genes from 25 species and discovered that they shared a common ancestor. The algal NADP-ME was the most ancient gene and was divergent from the rest of the clade. Subfamilies A and B separated after whole genome duplication (Additional file 7: Figure S3). In subfamily A, the NADP-ME gene in algae branched off first, and the rest were classified into subfamilies A1 and A2. A clear clustering between monocot and dicot plants for each subfamily was observed. Among the A1 and A2 monocot branches, NADP-ME in Musa acuminata and Ananas comosus branched off before the Poaceae. Within the Poaceae, NADP-ME genes in C₄ plants were more closely related to each other (Fig. 1). In the B subfamily, the NADP-ME genes of algae again branched off first, followed by the land plants Physcomitrella patens and Selaginella moellendorffii. Among angiosperm species, NADP-ME from dicots (B2 subfamily) branched first and NADP-ME in the monocots diverged after gene duplication and formed the B1 subfamily which underwent three whole genome duplication events. Like the A subfamily, the NADP-ME in Musa acuminata and Ananas comosus branched off earlier than counterpart in the Poaceae in which there were four branches, namely, NADP-ME-B-M1, NADP-ME-B-M2, NADP-ME-B-M3 and NADP-ME-B-M4 (Fig. 2). We discovered that the NADP-ME genes were clustered and closely related within each of the C₃ and C₄ species groups.

All 35 PPDK genes from 25 species were used to construct a phylogenetic tree. The PPDK gene in green algae was first to branch off, and there was further divergence into subfamilies I and II. Subfamily I consisted of monocots and subfamily II consisted of dicots. The PPDK gene in subfamily I first appeared in Musa acuminata and Ananas comosus and later diverged to the Poaceae. Whole genome duplication then occurred after this divergence, and two main branches were formed, with one branch including barley, maize and Brachypodium distachyon showing loss of the PPDK gene or lack of a conserved PPDK structure. It was also discovered that PPDK genes in C₄ plants are closely related (Fig. 3).

Analysis of selection pressure on NADP-ME and PPDK genes

Selection pressures within each of the A and B subfamilies of NADP-ME genes were investigated. In the subfamily A, the M0 and M3 models were based on the site model for calculation. Under the M0 model ω was 0.091, indicating that it was under purifying selection. The P-value from the chi-squared test comparing the M0 and M3 models was 0.000, suggesting that the ω value were not constant across loci (Table 1). For the branch model, seven branches, a1-a7, were assigned as front branches. The branch model results showed that the ω values for all front branches were < 1. Likelihood ratio tests (LRT) showed that branches a1, a2 and a3 were significantly different from the other branches with all ω values < 1 thus suggesting purifying selection (Fig. 1; Table 1). The branch-site model revealed that the proportions of positive selection at a1-a5 were 5, 0.2, 5.8, 11 and 1.5%, respectively, whereas the proportions at a6 and a7 were close to 0. The numbers of positively selected sites for a1-a7 were 8, 5, 14, 5, 4, 2 and 2 at a posterior probability of 0.6. The LRT result suggested that branches a1, a3 and a5 were significantly different from the M1 model (P < 0.05). Interestingly, the a1 branch, ancestral to subfamilies A1 and A2, was stabilized after positive selection at both the a1 and a3 branches. On the contrary, a4 and a5 still had positively selected sites following positive selection at the a2 branch. This suggested that subfamily A1 had undergone different levels of positive selection at different branches. The a5 and a7 branches comprised mostly monocots and C₄ plants (Fig. 1; Table 1). For subfamily B, the ω values were similar to those of subfamily A on the site model. LRT indicated that subfamily B was still under purifying selection with ω values varying among sites (Table 2). Branches b1-b5, were under strong purifying selection with proportions of positively selected sites of 5.5, 0, 0.6, 8.9 and 1.8% and ω values much smaller than 1 based on the branch and branch-site models (Table 2). The numbers of positively selected sites for b1-b5 were 8, 0, 2, 0 and 5 at a posterior probability of 0.6 (Fig. 2; Table 2). It was concluded that b1 is the most ancient branch of NADP-ME genes in subfamily B with a total of 8 positively selected sites. The b3 and b5 branches had 2 and 5 positively selected sites, whereas both the b2 and b4 branches comprised dicots, with no positively selected sites at a posterior probability of 0.6, thus indicating that the b2 and b4 branches were more conserved than the b3 and b5 branches and that the evolutionary steps from b1 to b3 and b5 in Subfamily B were rather complex (Fig. 2; Table 2).

Table 1 Parameters in the site analysis, branch and branch-site analyses of NADP-ME-A

Full size table

Table 2 Parameters in the site analysis, branch and branch-site analyses of NADP-ME-B

Full size table

For the PPDK gene family, the M0 and M3 models compared by LRT yielded a P-value of 0.000 based on the site model. This indicated that the ω values were not constant across sites, similar to the NADP-ME gene family results (Table 3). Branches a-e were assigned as foreground branches in the branch model, with their ω values much smaller than 1, suggesting purifying selection. Interestingly, the ω values from a to e were gradually increasing, with a (0.0006) < b (0.024) < c (0.026) < d (0.078) < e (0.284). This trend suggests that PPDK genes were under strong purifying selection in lower plants prior to divergence of monocots and dicots. Even after divergence of monocots and dicots from lower plants there was duplication of PPDK genes. The ancestral branch of both dicots and monocots (c, b, d) are still under strong purifying selection. Purifying selection on branch e, which contains C₄ plants was declining (Fig. 3; Table 3). The branch-site model showed that the proportion of positively selected sites of branches a-d was close to 0, but in the case of branch e it was 4.4%. The numbers of positively selected sites of a-e were 1, 1, 0, 0, and 8 at a posterior probability of 0.6. Positively selected sites on branch e were statistically more than on the other four branches with P < 0.0001 (Fig. 3; Table 3).

Table 3 Parameters in the site analysis, branch and branch-site analyses of PPDK

Full size table

Protein structural characteristics of NADP-ME and PPDK

Based on the above phylogenetic relationships and positive selection analysis, we conducted detailed structural and functional studies using the protein sequence alignment of NADP-ME at the a5 branch and PPDK at the e branch, which contain monocots and C₄ plants, respectively. Cre06.g268750.t1.2 in the a5 branch and Cre10.g424750.t1.2 in the e branch were used as reference sequences for further analyses. Sites 94H and 196H in the a5 branch (Fig. 4) and 95A and 559E in the e branch (Fig. 5) were significantly positively selected at a posterior probability threshold of 95%. Conserved and highly conserved regions were distinguished.

Distribution of positively selected sites on three dimensional structures of NADP-ME and PPDK

We took the three-dimensional (3D) model of seita.9G200600.1 and seita.9G354600.1 as an example and analyzed the positively selected sites. As shown in Fig. 6a, the positively selected sites 94H and 196H in the a5 branch of NADP-ME-A were mapped to the sites 148S and 370 W of seita.9G200600.1. Similarly, the positively selected sites 95A and 559E in the e branch of PPDK were mapped to the sites 147R and 663H of seita.9G354600.1 (Fig. 6b). The yellow color in 3D models indicates the helix region, red represents the sheet region, and blue corresponds to specific amino acids. 148S, 147R, 663H were located in helix regions, and 370 W was located in the sheet region (Fig. 6).

Expression analysis of foxtail millet NADP-ME and PPDK genes determined by qRT-PCR

Based on the phylogenetic relationships (Figs. 1, 2 and 3), we selected 6 NADP-ME and 2 PPDK foxtail millet genes for qRT-PCR after light treatment. Expression levels of all these genes were up-regulated after light exposure for 1 h. Except for NADP-ME genes, Seita.5G314300.1 and Seita.9G200600.1, the others had higher expression levels after light treatment for 6 h (Fig. 7; Additional file 8: Table S5).

Discussion

Evolution of the NADP-ME and PPDK gene families

C₄ photosynthesis evolved approximately 30 million years ago [29]. Angiosperm C₄ plant species then underwent 62 independent evolutionary events [30]. Most C₄ plants are monocots, including 4600 grass and 1600 sedge species, whereas only 1600 C₄ species from 16 families are dicots with 75% of them in families Chenopodiaceae, Amaranthaceae, Euphorbiaceae, and Asteraceae [31]. Previous research concluded that despite the specific cell structure of C₄ plants the enzymes PEPC, NADP-ME and PPDK were essential for C₄ photosynthesis [32, 33]. Interestingly, increases in the numbers of NADP-ME and PPDK genes occurred later in evolution. Various studies have suggested that multiple duplication events occurred during plant evolution, including the γ event that separated monocots and dicots [34], and ρ event that occurred before divergence of wheat, maize and rice, but after divergence of grasses and pineapple [35], and τ and σ events that occurred in the Poaceae [36].

In this study, 14 and 7 NADP-ME genes were identified in soybean and maize, respectively (Additional file 1: Table S1). Although the maize genome size (2300 Mb) is more than twice that of soybean (1100 Mb) [28, 37] the number of NADP-ME genes in maize is less than in soybean, indicating that expansion of the NADP-ME gene family was not by genome duplication, but was caused by different expansion patterns after divergence of monocot and dicot species [38, 39]. For the 35 PPDK genes from 25 species identified in this study most species had only one or two members (Additional file 2: Table S2). Compared to NADP-ME the numbers of PPDK genes were less but were more stable during evolution.

NADP-ME and PPDK genes are widely present in photosynthetic plant species such as algae, mosses, ferns, gymnosperms and angiosperms [40, 41]. From the phylogenetic trees constructed in this study we concluded that NADP-ME genes were branched into subfamilies A and B. The B2 branch containing all dicot species evolved earlier than the B1 branch containing all monocot species, suggesting that the B subfamily evolved independently after divergence of the monocots and dicots a step known as the γ event (Fig. 2) [34]. The phylogenetic tree of the PPDK gene family showed that monocots branched off and formed subfamily I before dicots formed subfamily II, indicating that the PPDK gene family evolved independently after divergence of monocots and dicots [34]. In the Poaceae there was clear clustering within monocots and dicots. For example, the NADP and PPDK genes of C₄ plants were more closely related to each other than to C₃ plants (Figs. 1, 2 and 3). We inferred that both the NADP-ME and PPDK gene families in the Poaceae underwent independent evolution after the ρ event in monocots [36]. In addition, NADP-ME and PPDK in C₄ plants are more closely clustered than in C₃ plant species, possibly due to the higher photosynthetic efficiency of C₄ plants.

Identification of positively selected sites and their function significance

This study used site, branch and branch-site models to investigate the effects of selection pressure on the NADP-ME and PPDK gene families. Both site and branch models failed to detect any positive sites, possibly negated by purifying selection and neutral drift [42, 43]. The branch-site model is most accurate and can detect rare positively selected sites on specific branches [44]. The branch-site model detected a total of 55 sites at a posterior probability of 0.6 that had undergone positive selection in the NADP-ME gene family (Tables 1 and 2). We found a total of 8, 5, 14, 5, 4, 2, and 2 positively selected sites for the a1-a7 branches, respectively, in subfamily A (Fig. 1; Table 1). In subfamily B we found 8, 0, 2, 0 and 5 positively selected sites for b1-b5 branches (Fig. 2; Table 2). The branch model for the PPDK gene family revealed that the ω values were much smaller than 1 for the five front branches, indicating strong purifying selection (Table 3). The branch-site model detected 1, 1, 0, 0 and 8 positively selected sites for branches a-e (Table 3).

Both site and branch models suggested that the NADP-ME and PPDK gene families had undergone mostly purifying selection while maintaining normal genes function. Detection of a few positively selected sites by the more accurate branch-site model demonstrated that only a few beneficial mutations had occurred during evolution in order to adjust to changing environments [45]. C₄ plants are capable of utilizing lower amounts of CO₂ compared to their C₃ counterparts. This might be related to the positively selected sites found in both the NADP-ME and PPDK families in C₄ plants.

Positive selection is the retention and spread of advantageous mutations throughout a population and has long been considered synonymous with shifts in protein function [45]. Determining the amount of positive selection has wide-ranging implications for understanding genome function and maintenance of genetic variation [46]. In this study, four positively selected sites, including 94H and 196H were identified in the a5 branch of NADP-ME and 95A and 559E in the e branch of PPDK at a posterior probability threshold of 95% (Figs. 4 and 5). Previous studies showed that minimal changes in the primary structure were responsible for the different kinetic behavior of each NADP-ME and PPDK isoform [47, 48]. To clarify the roles of positively selected sites in C₄ plant evolution and explore the relationship between positively selected sites and high photosynthetic rates in C₄ plants, 3D models of seita.9G200600.1 and seita.9G354600.1 were drawn. As shown in Fig. 6, positively selected sites 148S, 147R, and 663H were located in helix regions, whereas 370 W was located in a sheet region. These positive amino acid selection sites might reflect the functional divergence in C₄ and C₃ plants that caused C₄ plants to possess higher photosynthetic capacity. These results also indicated that the amino acid sites of NADP-ME and PPDK family members changed during plant evolution, and that the evolutionary rates were different. It also provided a priority basis for further analysis of the functions of NADP-ME and PPDK.

Further analysis of genes in the a5 branch of NADP-ME and e branch of PPDK showed that the C₄ plants in the a5 branch include GRMZM2G085747_P05, Sobic.001G201700.1, Sevir.9G198800.1, Pahal.9G197100.1, Aco007622.1, and Seita.9G200600.1 (Fig. 1). Previous study showed that maize GRMZM2G085747 was involved in the Calvin cycle by carbon fixation in the sheath cells of leaf vascular bundles maize (a C₄ species) during photosynthesis [49]. Sorghum NADP-ME gene Sobic.001G201700 showed high transcript abundance in the C₄ pathway [50]. Furthermore, a comparison of one C₃ and 11 C₄ grass species (Poaceae) showed that the transcript abundance of Sobic.001G201700 was consistently elevated in C₄ species [24]. The e branch of PPDK members all belonged to C₄ plants, including Seita.9G354600.1, Sevir.9G360400.1, Pahal.9G416400.1, and Sobic.001G326900.1 (Fig. 3). A previous study reported that Sobic.001G326900 showed a high transcript abundance in the C₄ pathway [50]. In this study, the sites 94H and 196H in the a5 branch of NADP-ME and 95A and 559E in the e branch of PPDK were identified as positively selected at posterior probability thresholds of 95% (Figs. 4 and 5). GRMZM2G085747 and Sobic.001G201700 in the a5 branch of NADP-ME, and Sobic.001G326900 in the e branch of PPDK were all involved in C₄ photosynthesis [24, 49, 50]. Our results suggested that these sites were positively selected for high photosynthetic rates during C₄ evolution.

Conclusions

One hundred and sixty two NADP-ME and 35 PPDK genes characterized in 25 species had highly similar motif compositions within subfamilies. Phylogenetic analysis showed that the NADP-ME and PPDK genes can be placed in four and two branches, respectively. The NADP-ME and PPDK genes in C₄ species had closer evolutionary relationships than in C₃ species. Analyses of selective pressure on the NAPD-ME and PPDK gene families identified four positively selected sites, including 94H and 196H in the a5 branch of NADP-ME, 95A and 559E in the e branch of PPDK at posterior probability thresholds of 95%. The positively selected sites were located in helix and sheet region. It was inferred that positive selection was driving the evolution of NADP-ME and PPDK in C₄ species. This study contributes to an increased understanding the roles of NADP-ME and PPDK in C₃ and C₄ species, and provides insights into the evolutionary biology of C₄ plants.

Methods

Dataset

Conserved NADP-ME and PPDK protein sequences of Arabidopsis and rice were obtained from the public databases Uniprot (https://www.uniprot.org/) and TAIR (https://www.arabidopsis.org/). All NADP-ME and PPDK protein sequences and CDS (coding sequences) of 25 species, including representatives of algal, moss, Lycopodiophyta, monocotyledon and dicotyledon species were obtained from Phytozome V12 (https://phytozome.jgi.doe.gov/pz/portal.html) and incorporated into a local database. Each sequence was compared to the NADP-ME and PPDK protein sequences from other species and those from Arabidopsis and rice using blastp with a threshold of E < 1e-5. CDD and Pfam were used to investigate whether the sequences contained conserved NADP-ME and PPDK protein structures. Incomplete protein structures were removed.

Molecular weights and isoelectrical points of NADP-ME and PPDK protein sequences were analyzed using Expasy (https://web.expasy.org/compute_pi/).

Construction of phylogenetic trees and analysis of conserved protein sequences

Multiple comparisons of candidate NADP-ME and PPDK protein sequences were made using the software MUSCLE3.8.31 [51]. Neighbor joining (NJ) trees were constructed with the software MEGA 7.0 using the Poisson model with 1000 bootstrap replications, gaps were filled using pairwise methods, and other parameters were based on default values [52]. Maximum likelihood (ML) trees were constructed for NADP-ME and PPDK using the Bayesian Information Criterion (BIC) and 1000 bootstrap replications with the software IQ-TREE1.6.5 [53]. The optimal model of the ML trees was estimated using the parameter M: ONLY TEST. Visualization of the constructed phylogenetic tree used Figtree.

Analysis of conserved protein sequences used the software MEME 2.12.0 with -nmotifs: 20, −minw: 10, maxw: 50 [54]. Other parameters were based on default values. Results were visualized using TBtools software.

Analysis of natural selection pressure

The protein sequences of NADP-ME and PPDK from the multiple comparison analyses were determined using Muscle 3.8.31 software, the CDS and aligned protein sequences are submitted to the online tool PAL2NAL (http://www.bork.embl.de/pal2nal/) for codon alignment. Selection pressure was calculated using the software PAML4.9e, with ω < 1 indicating purifying selection, ω =1 indicating neutrality, and ω > 1 indicating positive selection [55]. Three methods were applied to calculate selection pressure: (1) site-specific models that adopt the M3 and M0 models in testing; (2) branch-specific models that compare the foreground branches to the background branches to test for positive selection; and (3) branch-site models (Model A), that tests for positively selected sites. Statistical analyses were performed using chi-squared tests.

Positive selection in protein sequences and structure analysis

The aligned rearranged CDS and amino acids were entered into PAL2NAL (http://www.bork.embl.de/pal2nal/), a web tool for performing multiple codon alignments. Then the aligned sequences were visualized by ESPript v3 (http://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi).

The full-length protein sequences of foxtail millet (Setaria italica) NADP-ME and PPDK were submitted to I-TASSER server (https://zhanglab.ccmb.med.umich.edu/I-TASSER/) to predict the 3D structure. Positively selected sites were tested at a posterior probability threshold of 95% in the branch-site model and mapped onto the surface of 3D structures by PyMol v2.3 (http://PyMOLwiki.org).

Plant growth and harvesting

Foxtail millet cultivar Yugu 1 used for qRT-PCR was provided by Anyang Institute of Agricultural Sciences, Henan. Seeds were surface-sterilized in 0.5% NaClO for 1 min and cleaned three times with sterilized distilled water, then were plated on GM-agar media and stratified in darkness for 3 days at 4 °C. After germination, the seedlings were grown in darkness for 3 days at 27 °C and transferred to a growth chamber at 27 °C and light conditions (600 μmol m^− 2 s^− 1). After light treatment for 0, 1, and 6 h, leaves were collected, immediately frozen in liquid nitrogen and stored at − 80 °C for RNA isolation. All samples were biologically duplicated 3 times.

qRT-PCR

Primers designed by Primer 3 using cDNA sequences from Setaria italica v2.2 (phytozome.jgi.doe.gov) are listed in Additional file 9: Table S6. qRT-PCRs were performed in triplicate and using SYBR® Green PCR Master Mix Kit (Applied Biosystems, GA, USA). Data acquisition and analyses were performed using the ABI7900 system (Applied Biosystems). Relative expression levels were determined using the 2^-ΔΔCT analysis method.

Availability of data and materials

All data generated or analyzed during this study has been contained within the manuscript and supplementary information files.

Abbreviations

3D:: Three-dimensional
BIC:: Bayesian Information Criterion
CAM:: Crassulacean acid metabolism
CDS:: Coding sequences
LRT:: Likelihood ratio test
Lsu:: Large subunit
ML:: Maximum likelihood
NADP-ME:: NADP-malic enzyme
NJ:: Neighbor joining
PEPC:: Phosphoenolpyruvate carboxylase
PPDK:: Pyruvate orthophosphate dikinase
qRT-PCR:: Quantitative RT-PCR

References

Wohlfahrt G, Gu L. The many meanings of gross photosynthesis and their implication for photosynthesis research from leaf to globe. Plant Cell Environ. 2015;38:2500–7.
Article CAS PubMed PubMed Central Google Scholar
Aldous SH, Weise SE, Sharkey TD, Waldera-Lupa DM, Stuhler K, Mallmann J, et al. Evolution of the phosphoenolpyruvate carboxylase protein kinase family in C₃ and C₄Flaveria spp. Plant Physiol. 2014;165:1076–91.
Article CAS PubMed PubMed Central Google Scholar
Yin H, Guo HB, Weston DJ, Borland AM, Ranjan P, Abraham PE, et al. Diel rewiring and positive selection of ancient plant proteins enabled evolution of CAM photosynthesis in Agave. BMC Genomics. 2018;19:588.
Article PubMed PubMed Central CAS Google Scholar
Caemmerer SV, Ghannoum O, Furbank RT. C₄ photosynthesis: 50 years of discovery and innovation. J Exp Bot. 2017;68:97–102.
Article CAS Google Scholar
Wang S, Tholen D, Zhu XG. C₄ photosynthesis in C₃ rice: a theoretical analysis of biochemical and anatomical factors. Plant Cell Environ. 2017;40:80–94.
Article CAS PubMed Google Scholar
Yin X, Struik PC. Can increased leaf photosynthesis be converted into higher crop mass production? A simulation study for rice using the crop model GECROS. J Exp Bot. 2017;68:2345–60.
Article CAS PubMed PubMed Central Google Scholar
Covshoff S, Szecowka M, Hughes TE, Smith-Unna R, Kelly S, Bailey KJ, et al. C₄ photosynthesis in the rice paddy: insights from the noxious weed Echinochloa glabrescens. Plant Physiol. 2016;170:57–73.
Article CAS PubMed Google Scholar
Chen YB, Lu TC, Wang HX, Shen J, Bu TT, Chao Q, et al. Posttranslational modification of maize chloroplast pyruvate orthophosphate dikinase reveals the precise regulatory mechanism of its enzymatic activity. Plant Physiol. 2014;165:534–49.
Article CAS PubMed PubMed Central Google Scholar
Chastain CJ, Baird LM, Walker MT, Bergman CC, Novbatova GT, Mamani-Quispe CS, et al. Maize leaf PPDK regulatory protein isoform-2 is specific to bundle sheath chloroplasts and paradoxically lacks a pi-dependent PPDK activation activity. J Exp Bot. 2018;69:1171–81.
Article CAS PubMed Google Scholar
Fukayama H, Tsuchida H, Agarie S, Nomura M, Onodera H, Ono K, et al. Significant accumulation of C₄-specific pyruvate, orthophosphate dikinase in a C₃ plant, rice. Plant Physiol. 2001;127:1136–46.
Article CAS PubMed PubMed Central Google Scholar
Matsuoka M, Furbank RT, Fukayama H, Miyao M. Molecular engineering of C₄ photosynthesis. Annu Rev Plant Biol. 2001;52:297–314.
Article CAS Google Scholar
Wang X, Gowik U, Tang H, Bowers JE, Westhoff P, Paterson AH. Comparative genomic analysis of C₄ photosynthetic pathway evolution in grasses. Genome Biol. 2009;10:R68.
Article PubMed PubMed Central CAS Google Scholar
Arias CL, Pavlovic T, Torcolese G, Badia MB, Gismondi M, Maurino VG, et al. NADP-dependent malic enzyme 1 participates in the abscisic acid response in Arabidopsis Thaliana. Front Plant Sci. 2018;9:1637.
Article PubMed PubMed Central Google Scholar
Sonawane BV, Sharwood RE, Whitney S, Ghannoum O. Shade compromises the photosynthetic efficiency of NADP-ME less than that of PEP-CK and NAD-ME C₄ grasses. J Exp Bot. 2018;69:3053–68.
Article CAS PubMed PubMed Central Google Scholar
Yin X, Struik PC. The energy budget in C₄ photosynthesis: insights from a cell-type-specific electron transport model. New Phytol. 2018;218:986–98.
Article CAS PubMed PubMed Central Google Scholar
Gerrard Wheeler MC, Tronconi MA, Drincovich MF, Andreo CS, Flügge UI, Maurino VG. A comprehensive analysis of the NADP-malic enzyme gene family of Arabidopsis thaliana. Plant Physiol. 2005;139:39–51.
Article PubMed Central CAS Google Scholar
Christin PA, Samaritani E, Petitpierre B, Salamin N, Besnard G. Evolutionary insights on C₄ photosynthetic subtypes in grasses from genomics and phylogenetics. Genome Biol Evol. 2009;1:221–30.
Article PubMed PubMed Central CAS Google Scholar
Barreiro LB, Quintana-Murci L. From evolutionary genetics to human immunology: how selection shapes host defence genes. Nat Rev Genet. 2010;11:17–30.
Article CAS PubMed Google Scholar
Rosnow JJ, Edwards GE, Roalson EH. Positive selection of Kranz and non-Kranz C₄ phosphoenolpyruvate carboxylase amino acids in Suaedoideae (Chenopodiaceae). J Exp Bot. 2014;65:3595–607.
Article CAS PubMed PubMed Central Google Scholar
Kapralov MV, Smith JAC, Filatov DA. Rubisco evolution in C₄ eudicots: an analysis of Amaranthaceae Sensu Lato. PLoS One. 2012;7:e52974.
Article CAS PubMed PubMed Central Google Scholar
Kapralov MV, Kubien DS, Andersson I, Filatov DA. Changes in Rubisco kinetics during the evolution of C₄ photosynthesis in Flaveria (Asteraceae) are associated with positive selection on genes encoding the enzyme. Mol Biol Evol. 2011;28:1491–503.
Article CAS PubMed Google Scholar
Christin P-A, Salamin N, Savolainen V, Duvall MR, Besnard G. C₄ photosynthesis evolved in grasses via parallel adaptive genetic changes. Curr Biol. 2007;17:1241–7.
Article CAS PubMed Google Scholar
Besnard G, Muasya AM, Russier F, Roalson EH, Salamin N, Christin PA. Phylogenomics of C₄ photosynthesis in sedges (Cyperaceae): multiple appearances and genetic convergence. Mol Biol Evol. 2009;26:1909–19.
Article CAS PubMed Google Scholar
Watson-Lazowski A, Papanicolaou A, Sharwood R, Ghannoum O. Investigating the NAD-ME biochemical pathway within C₄ grasses using transcript and amino acid variation in C₄ photosynthetic genes. Photosynth Res. 2018;138:233–48.
Article CAS PubMed Google Scholar
Wang L, Peterson RB, Brutnell TP. Regulatory mechanisms underlying C₄ photosynthesis. New Phytol. 2011;190:9–20.
Article CAS PubMed Google Scholar
Saigo M, Alvarez CE, Andreo CS, Drincovich MF. Plastidial NADP-malic enzymes from grasses: unraveling the way to the C₄ specific isoforms. Plant Physiol Bioch. 2013;63:39–48.
Article CAS Google Scholar
Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, et al. The Sorghum bicolor genome and the diversification of grasses. Nature. 2009;457:551–6.
Article CAS PubMed Google Scholar
Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–5.
Article CAS PubMed Google Scholar
Aubry S, Kelly S, Kümpers BM, Smith-Unna RD, Hibberd JM. Deep evolutionary comparison of gene expression identifies parallel recruitment of trans-factors in two independent origins of C₄ photosynthesis. PLoS Genet. 2014;10:e1004365.
Article PubMed PubMed Central CAS Google Scholar
Sage RF, Christin PA, Edwards EJ. The C₄ plant lineages of planet earth. J Exp Bot. 2011;62:3155–69.
Article CAS PubMed Google Scholar
Gowik U, Westhoff P. The path from C₃ to C₄ photosynthesis. Plant Physiol. 2011;155:56–63.
Article CAS PubMed Google Scholar
Wang Y, Bräutigam A, Weber AP, Zhu XG. Three distinct biochemical subtypes of C₄ photosynthesis? A modelling analysis. J Exp Bot. 2014;65:3567–78.
Article CAS PubMed PubMed Central Google Scholar
Jiang L, Chen YB, Zheng J, Chen Z, Liu Y, Tao Y, et al. Structural basis of reversible phosphorylation by maize pyruvate orthophosphate dikinase regulatory protein. Plant Physiol. 2016;170:732–41.
Article CAS PubMed Google Scholar
Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–8.
Article CAS PubMed Google Scholar
Ming R, Van Buren R, Wai CM, Tang H, Schatz MC, Bowers JE, et al. The pineapple genome and the evolution of CAM photosynthesis. Nat Genet. 2015;47:1435–42.
Article CAS PubMed PubMed Central Google Scholar
Mckain MR, Tang H, Mcneal JR, Ayyampalayam S, Davis JI, de Pamphilis CW, et al. A phylogenomic assessment of ancient polyploidy and genome evolution across the poales. Genome Biol Evol. 2016;8:1150–64.
CAS PubMed PubMed Central Google Scholar
Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–83.
Article CAS PubMed Google Scholar
Qin Z, Wang Y, Wang Q, Li A, Hou F, Zhang L. Evolution analysis of simple sequence repeats in plant genome. PLoS One. 2015;10:e0144108.
Article PubMed PubMed Central CAS Google Scholar
D'Hont A, Denoeud F, Aury JM, Baurens FC, Carreel F, Garsmeur O, et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–7.
Article CAS PubMed Google Scholar
Motti CA, Bourne DG, Burnell JN, Doyle JR, Haines DS, Liptrot CH, et al. Screening marine fungi for inhibitors of the C₄ plant enzyme pyruvate phosphate dikinase: unguinol as a potential novel herbicide candidate. Appl Environ Microb. 2007;73:1921–7.
Article CAS Google Scholar
Tronconi MA, Andreo CS, Drincovich MF. Chimeric structure of plant malic enzyme family: different evolutionary scenarios for nad- and nadp-dependent isoforms. Front Plant Sci. 2018;9:565.
Article PubMed PubMed Central Google Scholar
Lawrie DS, Messer PW, Hershberg R, Petrov DA. Strong purifying selection at synonymous sites in D melanogaster. PLoS Genet. 2013;9:e1003527.
Article CAS PubMed PubMed Central Google Scholar
Smith MD, Wertheim JO, Weaver S, Murrell B, Scheffler K, Kosakovsky Pond SL. Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection. Mol Biol Evol. 2015;32:1342–53.
Article CAS PubMed PubMed Central Google Scholar
Gharib WH, Robinson-Rechavi M. The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC. Mol Biol Evol. 2013;30:1675–86.
Article CAS PubMed PubMed Central Google Scholar
Qian J, Liu Y, Chao N, Ma C, Chen Q, Sun J, et al. Positive selection and functional divergence of farnesyl pyrophosphate synthase genes in plants. BMC Mol Biol. 2017;18:3.
Article PubMed PubMed Central CAS Google Scholar
Williamson RJ, Josephs EB, Platts AE, Hazzouri KM, Haudry A, Blanchette M, Wright SI. Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora. PLoS Genet. 2014;10:e1004622.
Article PubMed PubMed Central CAS Google Scholar
Wei M, Li Z, Ye D, Herzberg O, Dunaway-Mariano D. Identification of domain-domain docking sites within Clostridium symbiosum pyruvate phosphate dikinase by amino acid replacement. J Biol Chem. 2000;275(52):41156–65.
Article CAS PubMed Google Scholar
Wheeler MC, Arias CL, Tronconi MA, Maurino VG, Andreo CS, Drincovitch MF. Arabidopsis thaliana NADP-malic enzyme isoforms: high degree of identity but clearly distinct properties. Plant Mol Biol. 2008;67(3):231–42.
Article PubMed CAS Google Scholar
Li C, Huang Y, Huang R, Wu Y, Wang W. The genetic architecture of amylose biosynthesis in maize kernel. Plant Biotechnol J. 2018;16(2):688–95.
Article CAS PubMed Google Scholar
Döring F, Streubel M, Bräutigam A, Gowik U. Most photorespiratory genes are preferentially expressed in the bundle sheath cells of the C₄ grass Sorghum bicolor. J Exp Bot. 2016;67(10):3053–64.
Article PubMed PubMed Central CAS Google Scholar
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Article CAS PubMed PubMed Central Google Scholar
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:1870–4.
Article CAS PubMed PubMed Central Google Scholar
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32:268–74.
Article CAS PubMed Google Scholar
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–8.
Article CAS PubMed PubMed Central Google Scholar
Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We gratefully acknowledge help from Robert A. McIntosh (University of Sydney), with English editing.

Funding

This work was supported by grants from the National Key R&D Program of Shanxi Province (201803D221019–5, 2015-TN-09) and Science & Technology Innovation Foundation of Shanxi Agricultural University (2016YJ05). The funding agency played no role in the design of the study and collection, analysis and interpretation of data or in writing the manuscript.

Author information

Authors and Affiliations

College of Agronomy, Shanxi Agricultural University, Taigu, 030801, China
Weiping Shi, Linqi Yue, Jiahui Guo, Jianming Wang, Xiangyang Yuan, Shuqi Dong, Jie Guo & Pingyi Guo

Authors

Weiping Shi
View author publications
You can also search for this author in PubMed Google Scholar
Linqi Yue
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jianming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Shuqi Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jie Guo
View author publications
You can also search for this author in PubMed Google Scholar
Pingyi Guo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JG and PYG contributed to overall design of the experiments, provided advice for data analysis, and assisted in writing the manuscript. WPS analyzed the data, carried out the experiments and wrote the manuscript. LQY and JHG helped to perform phylogeny, selection pressure and protein modelling analyses. JMW, XYY and SQD participated in the design of experiments. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Jie Guo or Pingyi Guo.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1:

Table S1. Characterization of NADP-MEs in 25 plant species.

Additional file 2:

Table S2. Characterization of PPDKs in 25 plant species.

Additional file 3:

Table S3. Consensus sequences of motifs 1–20 in NADP-MEs.

Additional file 4:

Table S4. Consensus sequences of motifs 1–20 in PPDKs.

Additional file 5:

Figure S1. Conserved protein motifs in NADP-ME genes of 25 plant species. Motif numbers 1–20 are displayed as different colored boxes. Sequence information for each motif is provided in Additional file 3: Table S3.

Additional file 6:

Figure S2. Conserved protein motifs in PPDK genes of 25 plant species. Motif numbers 1–20 are displayed as different colored boxes. Sequence information for each motif is provided in Additional file 4: Table S4.

Additional file 7:

Figure S3. Phylogenetic tree established for 162 NADP-ME genes in 25 species. A1, A2, B1 and B2 are represented by the red, pink, blue and green, respectively.

Additional file 8:

Table S5. The raw data of qRT-PCR.

Additional file 9:

Table S6. Primer sequences of 8 genes used for qRT-PCR validation.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Shi, W., Yue, L., Guo, J. et al. Identification and evolution of C₄ photosynthetic pathway genes in plants. BMC Plant Biol 20, 132 (2020). https://doi.org/10.1186/s12870-020-02339-x

Download citation

Received: 21 November 2019
Accepted: 11 March 2020
Published: 30 March 2020
DOI: https://doi.org/10.1186/s12870-020-02339-x

Identification and evolution of C4 photosynthetic pathway genes in plants

Abstract

Background

Results

Conclusion

Similar content being viewed by others

Background

Results

Numbers of NADP-ME and PPDK genes in plants

Analysis of conserved amino acid sequences in NADP-ME and PPDK proteins

Phylogeny of NADP-ME and PPDK

Analysis of selection pressure on NADP-ME and PPDK genes

Protein structural characteristics of NADP-ME and PPDK

Distribution of positively selected sites on three dimensional structures of NADP-ME and PPDK

Expression analysis of foxtail millet NADP-ME and PPDK genes determined by qRT-PCR

Discussion

Evolution of the NADP-ME and PPDK gene families

Identification of positively selected sites and their function significance

Conclusions

Methods

Dataset

Construction of phylogenetic trees and analysis of conserved protein sequences

Analysis of natural selection pressure

Positive selection in protein sequences and structure analysis

Plant growth and harvesting

qRT-PCR

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Identification and evolution of C₄ photosynthetic pathway genes in plants