The origin of pigment-binding differences in CP29 and LHCII: the role of protein structure and dynamics

The first step of photosynthesis in plants is performed by the light-harvesting complexes (LHC), a large family of pigment-binding proteins embedded in the photosynthetic membranes. These complexes are conserved across species, suggesting that each has a distinct role. However, they display a high degree of sequence homology and their static structures are almost identical. What are then the structural features that determine their different properties? In this work, we compared the two best-characterized LHCs of plants: LHCII and CP29. Using molecular dynamics simulations, we could rationalize the difference between them in terms of pigment-binding properties. The data also show that while the loops between the helices are very flexible, the structure of the transmembrane regions remains very similar in the crystal and the membranes. However, the small structural differences significantly affect the excitonic coupling between some pigment pairs. Finally, we analyzed in detail the structure of the long N-terminus of CP29, showing that it is structurally stable and it remains on top of the membrane even in the absence of other proteins. Although the structural changes upon phosphorylation are minor, they can explain the differences in the absorption properties of the pigments observed experimentally.


Introduction
In plants, light capturing, the primary step of photosynthesis, is performed by Photosystems I and II, which are modular assemblies composed of a core, containing the reaction center (RC), and an antenna consisting of several light-harvesting complexes (LHCs). The LHCs are pigmentbinding membrane proteins that, in light-limiting conditions, are mainly responsible for harvesting light and efficiently transferring excitation energy to the RC, where it is used to promote charge separation [1]. In contrast, under environmental stresses, they are involved in protective mechanisms that dissipate a large part of the harvested energy as heat [2].
All plant LHCs are members of the same nuclear-encoded membrane protein superfamily [3]. They have a high sequence and structural homology [4], and have common features that are at the basis of their evolutionary success [5]. They consist of three membrane-spanning α-helices and two short amphipathic helices exposed to the thylakoid lumen. Plant LHCs bind 11-16 chlorophylls (Chls) of the types a and b and 3-4 carotenoid molecules (Cars) [4]. The LHC protein matrix controls the inter-pigment distances and orientations and provides a distinct environment to each pigment. In this way, the LHC protein scaffold governs the energy landscape of its pigments to ensure fast and directional excited-state energy transfer (EET) while maintaining a long excited-state lifetime, which is imperative for efficient light harvesting.
In vascular plants, there are 12 main LHC complexes. They are highly conserved across species but differ in their spectroscopic properties, pigment-binding characteristics, response to external stimuli, and location within the photosystem assemblies, suggesting that each LHC is tailored for a specific role. Plant LHCs are, however, all structurally highly homologous and the question arises of which structural differences contribute to their differing properties. It should be noted that minor differences between proteins can have drastic effects on the spectroscopic properties of the complexes since they can alter the pigment energy landscape and e.g., open or close quenching channels [6][7][8][9][10][11]. Moreover, it has been shown that to understand the structure-function relationships of the LHCs we need to go beyond the static picture since the dynamics of LHCs are essential to accurately describe many of their properties [10,[12][13][14].
Although several LHCs have been studied in detail, the direct comparison between the structural dynamics of LHCs under the same MD framework is lacking. In this work, we compare LHCII and CP29, the major and a minor antenna complexes of Photosystem II (PSII). Their structures and location in the PSII supercomplex are shown in Fig. 1. LHCII is located at the periphery of the supercomplex, while CP29 is between LHCII and the core. Their different role is reflected in their energetic landscape: in LHCII, the lowest energy state is associated with a cluster of Chls facing the neighbouring complex to which the energy needs to be transferred [16]. CP29, on the other hand, possesses two isoenergetic lowest energy-sites in agreement with its role as a wire between LHCII and the core [30]. They also differ in their pigment binding properties, spectral properties, oligomerization state and their role in balancing the excitation of the two photosystems (state transitions) [31]. However, their static structures are very similar (see Fig. 1C). In this work, we systematically compare their structural dynamics to elucidate the determinants of their different properties. We furthermore specifically focus on the most evident structural difference between them: the N-terminus.

Nomenclature
The Chls and Cars in this article are designated by the nomenclature from Liu et al. 2004. Unless otherwise stated the amino acids are designated according to the structures used for the MD simulations (CP29 from the cryo-electron microscopy (cryo-EM) structure from Pisum sativum, PDB: 5XNL, Chain R [16] and LHCII from the crystal structure from Spinacia oleracea, PDB: 1RWT, Chain A [15]). For simplicity, to refer to the abovementioned resolved structures we use the term static structures.

Preparation of starting structures and topologies
We performed atomistic MD simulations of the CP29 complex embedded in a solvated model lipid membrane. For the simulations the software Gromacs 4.6.3 [32] was employed. Two separate starting structures were used, differing in their phosphorylation state (T68 and T70 phosphorylated and non-phosphorylated) [33]. The starting CP29 structure was extracted from a Cryo-EM study on a C 2 S 2 M 2 -type PSII particle from Pisum sativum (PDB: 5XNL, Chain R [16]). In the Cryo-EM structure not all of the heavy atoms of some of the phytol groups of the Chls and the DPPG lipid hydrophobic tail were resolved, most likely because of their high flexibility. The structures of these partially resolved molecules were completed through the manual addition of atoms using the Chimera software [34]. Topologies for Chl a [35], Chl b [8], Lut [8], Vio, Neo, 1,2-dipalmitoyl-sn-glycero-3-phosphoglycerol (DPPG) [8] and the 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) [36] lipids were based on the GROMOS 53a6 united-atom forcefield repository [37]. The topology for the CP29 protein was generated through the pdb2gmx program [32] using the Gromos 54a7 united-atom forcefield [38]. The protein protonation pattern was assigned as established by Müh et al. [17]: i.e. E131, E165 and the C-terminus deprotonated; E128, E151 and H236 protonated; and all other titratable residues in their standard protonation states at neutral pH. For the phosphorylated complex the residues T68 and T70 were changed to phosphothreonine (PO 3 2− ) and the topologies were altered accordingly with Gromos54a7 parameters using the Vienna-PTM web server [39]. Phosphothreonines were simulated with the charge -2e since their pKA is less than 7.0 [40].

Membrane insertion
The CP29 complex was embedded in a solvated POPC lipid membrane. First, a pre-equilibrated square POPC lipid membrane [36] consisting of 128 POPC molecules solvated in water was used as the basis to form a larger bilayer by stacking 4 replicates of the same bilayer next to each other using the Gromacs gmx genconf program [32]. The resulting square bilayer consisted of 512 POPC lipids. More water was added along the z-axis via the Gromacs gmx solvate program [32]. After that, the principal axis of the protein central helices was aligned to the z-axis of the membrane and CP29 was positioned in the centre of the solvated membrane. Lipid or water molecules (except for the interstitial water molecules already present in the CP29 structure) within a 1.2 Å distance from CP29 were removed. The final system consisted of CP29, 428 POPC molecules and > 20,000 water molecules. To mimic neutral physiological conditions 10 mM of Na + Cl − [41] was added to the bulk water. A total of 2 different simulation boxes were created following this protocol, one for the phosphorylated and one for the unphosphorylated form of CP29. Fig. 1 Comparison of CP29 and LHCII structure. A CP29 structure [16]. B LHCII structure [15]. C Overlap of CP29 (brown) and LHCII (blue) structures. The arrows indicate the major differences between them. D C 2 S 2 M 2 structure [16]. The dashed line indicates the symmetry axis. The amino acid-and pigment numbering schemes are adopted from [16] for CP29 and from [15] for LHCII

Energy minimization, NVT and NPT equilibration and production runs
The procedure used to equilibrate the simulation boxes follows the one used in [8]. The two simulation boxes (with phosphorylated or unphosphorylated CP29) were subjected to energy minimization through a steepest descent algorithm. During this minimization strong isotropic position constraints of 10,000 kJ mol −1 nm −2 were applied to the α-carbons of the protein, to the atoms of the chlorophyll tetra-pyrroles and to all carotenoid and DPPG atoms to perturb only minimally the high-resolution architecture of the complex [29]. To properly relax the temperature of the system, the minimized simulation boxes were first subjected to an isothermal-isochoric (NVT) equilibration simulation of 10 ps employing the same position restraints. Then a series of (isothermal-isobaric) NPT equilibration simulations (for a total of 40 ns) were run, in which every 10 ns the position restraints were reduced in the order 10,000 1000 500 200 kJ mol −1 nm −2 . For each system (phosphorylated, unphosphorylated), 3 independent replicas were created with different initial random velocities. Each of the 6 replicas was simulated for ~ 1 μs simulated time of unconstrained NPT. In the unconstrained NPT simulations the integration timestep used was 2 fs. The pressure was kept constant at 1 bar through semi-isotropic coupling using a Parinello-Rahman barostat [42] with compressibility of 4.5 × 10 −5 bar −1 and relaxation time constant of 5 ps. The temperature was maintained at 300 K via the Nosé-Hoover thermostat [43] that was coupled to the solvent, the membrane and the CP29 complex independently with a time constant of 0.5 ps. The LINCS algorithm [44] was used to constrain the molecular bonds. For the long-range interactions a Particle Mesh Ewald scheme was employed. For the Coulombic interactions a cut-off of 1.0 nm and for Van der Waals interactions a cut-off of 1.4 nm was used. Periodic boundary conditions were used.

LHCII simulations
We also analyze simulations of a monomer of LHCII embedded in a model POPC membrane of 344 total lipids solvated with 15,215 water molecules and the same concentration of Na + CL − ions here used for CP29. These simulations are described in [8]. The simulations conducted in this work were constructed using the same topologies, simulation parameters and equivalent equilibration protocol used for the simulations of LHCII [8]. Some of the analyses reported here are described in [8], and are used here for comparison with CP29.

Trajectory preparation
The MD trajectories were reduced in size and prepared for the analysis by keeping only one frame every 2 ns. Then, the frames were fitted back onto the initial coordinates of the backbone atoms of the LHC central helices. The molecules that were split across the periodic boundary conditions were reconnected using the Gromacs trjconv tool [32]. Unless otherwise stated, the analyses were performed on the trajectories from 600 ns onwards since at this point the systems were considered equilibrated, as assessed from the root-mean-square deviation (RMSD) of the C α -atoms of the amino acids of the central helices of the complexes (see Fig.  S1).

DSSP (Figs. 2A, B, D, E, 7A)
The secondary protein structure calculations were based on the DSSP model [45]. The calculations were performed on the equilibrated trajectories (discarding the first 600 ns) and on the resolved structures with the Gromacs do_dssp tool [32] that is based on the CMBI DSSP software [45,46]. With this tool, the hydrogen bond network of the protein residues in the structure was computed and the secondary structure was then defined for each protein domain.

RMSD (Fig. S2AC)
The average RMSD per residue was calculated using a custom python script, that makes use of the MDAnalysis library [47,48], taking the initial structure of the system as reference. The calculations were performed for the concatenated equilibrated trajectories. For the amino acid residues the RMSD was calculated by taking only the C α -atoms. For the Chls all atoms except those appertaining to the phytol-tail were considered. For the Cars and the DPPG molecule, all of the atoms were taken into account. The RMSD value at a specific timepoint for a residue was computed according to the following equation: is the position of atom i of the considered residue at time t.
vs. LHCII structural comparison. DSSP secondary structure characterization per protein domain of CP29 based on the static (cryo-EM) structure [16] A and LHCII in the static (crystal) structure [15] D and the average population of DSSP elements in the MD simulations for CP29 B and LHCII E. Domain composition of CP29 C and LHCII F. G Comparison between the CP29 complex static structure (brown) and the middle CP29 structure from the MD simulations (cyan). H Comparison between LHCII complex static structure (brown) and the middle LHCII structure from the MD simulations (cyan). The analyses in this figure were performed on the concatenated equilibrated trajectories ◂ 1 3 The average RMSD between the Cryo-EM structure of CP29 [16] and the crystal structure of LHCII [15] was calculated using the Chimera program [34] using the Needleman-Wunsch alignment algorithm.

RMSF (Fig. 8B & S2BD)
The mean root-mean-square fluctuation (RMSF) and the B-factor per residue were calculated using the Gromacs rmsf tool on the concatenated equilibrated trajectories considering all the atoms of the amino acid residues [32]. The RMSF value of a residue at a certain timepoint is computed according to: r i (t) is the position of atom i of the considered residue at time t.
The B-factor is computed as: < RMSF > is the time-averaged RMSF of the residue in question.

Carotenoid dihedrals (Fig. 4, S3)
The distribution of dihedral angles of the terminal rings of the Cars bound to the L1 and L2 sites were computed using the Gromacs g_angle tool [32]. The atoms involved in the dihedrals are shown in Fig. 4C. The dihedral angles were computed for the concatenated equilibrated trajectories.

N-terminus clusters (Fig. 7C) and middle structures (Fig. 2G, H, S6 & S7)
Representative structures of the N-terminus were extracted via cluster analyses based on the single-linkage clustering algorithm. This algorithm employs an agglomerative scheme that iteratively subdivides the structures sampled along the MD trajectories into clusters. First, the two closest structures of the ensemble, according to a defined distance function, are paired to a cluster. The cluster is then expanded by iteratively adding the next closest structure of the ensemble by taking into account all the structures in the cluster, until no structures can be found anymore within a specified cut-off distance. When no other structures within a certain cut-off distance can be added, the process is repeated for the remaining structures forming new clusters. To use this algorithm we employed the cluster program of Gromacs [32]. The average RMSD was used as the distance function (see above). For the N-terminus clusters (Fig. 7C) all of the atoms of the CP29 N-terminus, which includes the residues 11R-C95, were used for calculating the RMSD. The RMSD cut-off distance was 0.25 nm. The computation was performed on the concatenated equilibrated trajectories. 5 clusters could be identified of which only 3 were significantly populated (68%, 24% and 8%, respectively) in the MD simulations. Middle structures of the CP29 and LHCII protein complexes ( Fig. 2G, H) were calculated using the Gromacs cluster tool applied to the concatenated equilibrated trajectories taking into account all of the atoms of the amino acid residues [32]. The middle structure is defined as the structure that is the closest (based on the RMSD distance function) to all the other structures in the ensemble. For the middle structures of the Chls 601 and 609 ( Fig. S6 & S7) within the LHC complexes a similar approach was followed, but instead the structures were arranged based on the atoms of the Chl in question excluding the phytol tail.

Chl-Chl excitonic couplings (Figs. 3, 7D, Table S1)
Excitonic Chl-Chl couplings were computed following the ideal dipole approximation (IDA): x = normalized transition dipole moment vector of Chl x in debye. R 12 = distance between the central magnesium atoms of Chl 1 and Chl 2 in nm. r 12 = normalized distance vector between the central magnesium atoms of Chl 1 and Chl 2 in nm.
The excitonic couplings were calculated for the Chl Q y transitions and the Chl transition dipole moment vectors were taken along there ND-NB axis. As in [8], the lefthand term in the above equation (containing the local field correction factor, the magnitudes of the transition dipole moments and the relative dielectric constant) was assigned to be 90 for Chl a -Chl a coupling, 75 for Chl a -Chl b coupling and 63 for Chl b -Chl b coupling. These parameters were shown to be adequate for modelling some of LHCII's spectral properties [49,50]. The Chl-Chl excitonic couplings were calculated in this work using a homemade python script based on the MDAnalysis library [47,48]. The mean excitonic coupling values were determined based on the concatenated equilibrated trajectories. More quantitative methods than the IDA to calculate Chl-Chl coupling such as TrESP exist [51]. However, the IDA accurately captures the trends observed in the Chl-Chl couplings. Indeed, it was shown that the computation of Chl-Chl excitonic couplings following the IDA provides results comparable to more accurate methods based on Time-Dependent Density Functional Theory [18].

Interaction energy (Figs. 5, 6, S4 & S5AC)
The linear interaction energies with the surroundings of the Car L2 and the Chls 601, 609 and 614 for both LHCII and CP29 were calculated using the g_mmpbsa program [52]. The interaction energies consist of electrostatic and Van der Waals (VdW) contributions. For the calculations the ligand atoms consisted of all the atoms of the particular pigment of interest and for the complex all the other atoms in the simulation box were included. Using the g_mmpbsa program [52] the interaction energies were decomposed on a per residue basis. The calculations were performed on the concatenated full trajectories. angle in the CP29 structure [16] and the red line that of the LHCII structure [15]. The analyses in this figure were performed on the concatenated equilibrated trajectories 1 3 2.14 Closest distances (Fig. S5BD, 8A) The closest distance between two groups of residues was determined using the Gromacs mindist tool [32].

Hydrogen bonds (mentioned in text)
The presence of a hydrogen bond between two groups of residues was determined using the hbond plugin of the  with LHCII residues/pigments and of C Chl a 601 and D Vio L2 with CP29 residues/pigments. E Barplot comparing the strongest interacting partners for the Chl 601 in CP29 and LHCII and F for the Car L2 in CP29 and LHCII VMD software [53]. The cut-off angle was set at 30° and the cut-off distance was set at 0.4 nm. The analysis was performed on the concatenated trajectories.

Image rendering
All the molecular visualizations were rendered using the computer program Chimera [34].

Static versus dynamic structure
The static structures of CP29 and LHCII show a high degree of homology (average RMSD of 0.1 nm, see Fig. 1C). In our previous work, we showed that in a model lipid membrane, the structure of several LHCII domains deviates from the static (crystal) structure [8] and the question arises if the same changes occur for CP29. More in detail, in LHCII the peripheral solvent-exposed regions such as N-terminus, lumenal loop, stromal loop, helix D and C-terminus are flexible and assume different conformations in the crystal structure than in the membrane. On the other hand, the membrane-spanning helices (especially helix A and B and, to a somewhat lesser extent, helix C, see Fig. S2A & C) are stable and relatively rigid, in agreement with experimental results [54]. This heterogeneity in flexibility across the protein domains also applies to CP29, as can be appreciated by comparing the average RMSD and the B-factor of the simulated CP29 (Fig. S2A & B) and LHCII (Fig. S2C & D). The patterns of RMSD also agree with another computational study on CP29 [10].
The stability of the A, B, and C helices is confirmed by comparing the secondary structural elements of the two complexes between the static structure and in the membranes ( Fig. 2A, B, D & E). Other domains instead change when the complexes are embedded in the membrane. For instance, a large part of the turning motif of the stromal loop of CP29 turns into a bending motif ( Fig. 2A & B) and in LHCII the 3 10 -helix of the stromal loop resolved in the crystal structure is absent in the membrane (Fig. 2D & E). The lumenal loops of the two complexes differ in the static structures ( Fig. 2A  & D) but become more similar in the membrane since the 3 10 -helical motif of LHCII evolves into an α-helix and the turn into a coil, as in CP29. Helix D and the C-terminus have a similar composition of secondary structural elements in the two complexes, both in the static structures and in the membrane, which is surprising as these regions are relatively flexible (Fig. S2B & D). A direct comparison between the static structures of CP29 and LHCII and the simulated membrane ones is shown in (Fig. 2G & H).

Chl-Chl interactions
Although the structures of CP29 and LHCII are also largely similar in the membrane, the relatively minor structural differences observed can in principle have a drastic effect on their spectroscopic properties. In particular Chl-Chl interactions, which dominate the spectral features of the LHCs, are very sensitive to conformational changes in the protein. To assess the effect of the dynamics on the Chl-Chl interactions, we computed the excitonic couplings between Chl pairs in the two complexes along the MD simulations. The values are reported in Fig. 3A & B. Figure 3C shows the time evolution of the excitonic couplings for some of the stronger interacting Chl-Chl pairs.
In our previous work on LHCII, we observed significant fluctuations in the excitonic coupling strengths between Chl 611 and 612, which represent the lowest energy site of the complex [49,55,56]. These fluctuations correlated with conformational changes in the N-terminus, suggesting that this domain is involved in light harvesting regulation [8]. In CP29 there are two isoenergetic lowest energy sites: Chl 611-612 and Chl 603-609 [30]. None of these pairs shows large fluctuations in excitonic couplings (Fig. 3C), which remain consistently high throughout the simulations, suggesting that their energy does not change. The same is true for the electronic coupling between Chls 608 and 610, and 613 and 614 in both CP29 and LHCII (see Fig. 3C and Tab. S1). Instead, the coupling of Chl 606-607 and 604-606 in both complexes deviates significantly from the initial values along the trajectories. In LHCII, these reconfigurations encompass breaking the hydrogen bond between Q131 and the formyl group of Chl 607, and the subsequent coordination of Chl 606 by Q131. These events lead to significant changes in the Chl 606-607 and 604-606 excitonic interactions. Reconfigurations at this site also occur in CP29, but they are different: in this complex, the connection between the Chl 607 and helix C is lost, leading to a substantial decrease in coupling strength between Chls 606-607 in two of the three replicas. A more detailed description of the reconfigurations at this site is presented in the binding affinity section.

Carotenoids
The local conformations of the Cars are an important factor in tuning the spectral properties of the LHCs. It has been shown that the torsional conformation of the rings of xanthophylls significantly affects their energetics [57] and it is therefore possible that this feature is exploited in the photoprotective switch of the LHCs. The Lut in the L1 site is proposed to be the major putative quenching site in both LHCII and CP29 in vitro [6,58] and we have therefore explored the conformational landscape of its rings in the simulations.
As a measure for the torsional conformation of Lut L1, we computed the distribution of a characteristic dihedral angle (see Fig. 4C) along the simulations of CP29 and LHCII. We observe that the rings of Lut L1 in CP29 and LHCII possess the same rotational freedom (Fig. 4A & B). This pattern was also observed in previous MD studies [7,10,11]. If the rotations of the rings of Lut L1 can tune the energetic state of LHCII from a light-harvesting state to a quenched one and vice-versa, then this mechanism could also be present in CP29.

Pigment-binding affinity
Next, we address the difference in pigment binding affinity between CP29 and LHCII. We specifically focus on the pigment binding sites that accommodate a different pigment type in LHCII and CP29, i.e. Chl 601, 609 and 614 and Car L2.
Chl 609: in both LHCII and CP29 the axial ligand of Chl 609 is a glutamic acid (E139 and E159, respectively), however LHCII binds Chl b in this site, while CP29 binds Chl a. The discriminating factor is the presence in LHCII of a glutamine (Q131) and of a glutamic acid in CP29 (E166 in maize, homologous to E151 in this research) as shown by mutational analysis where the mutation Q131E in LHCII led to an increase in Chl a affinity, whereas the reverse mutation in CP29 (E166Q) increased Chl b affinity [59]. Q131 in LHCII and E151 in CP29 serve as axial ligands to Chl b 606 [56,59] (through a water molecule in the crystal structures [15,60] but directly in the molecular dynamics simulations of LHCII [8]).
The interaction energies between Chl 609 and its surrounding residues/pigments in LHCII and CP29 are shown in Fig. 5A, C & E. The binding mode of Chl 609 in the two complexes is largely similar and its interactions with residues that occupy the same position in the two proteins are mostly of similar magnitude. There are, however, some notable differences: 1) Chls b 606 and 607 interact substantially more strongly with Chl 609 in LHCII than in CP29, 2) R91 in CP29 has a significant contribution to the binding of Chl 609, while the corresponding residue in LHCII (K60) only interacts weakly, and 3) the binding of Chl 609 to CP29 is stabilized by Chl 616, which is absent in LHCII. Despite these differences, Chl 609 holds the same position and orientation in CP29 and LHCII (see Fig. S6).
Notably, the simulations show that Q131 in LHCII only weakly interacts with Chl 609 and the hydrogen bond between Q131 and the Chl b 609 formyl group observed in the crystal [15] is lost (it is present in less than 2% of the frames). Visual inspection of the LHCII trajectories shows that Q131 is indirectly involved in the binding of Chl 609: Q131, together with W128, anchors the Chl 606-607 pair to helix C, such that Chl 607 can coordinate the formyl group of Chl b 609 (Fig. S5A & B). At variance with this, the corresponding residues in CP29, i e., E151 and I148, do not strongly interact with Chl 607 (Fig. S5C) and the distance between these residues and the formyl group of Chl b 607 increases substantially during the simulations (Fig. S5D). The simulations also highlight the importance of neighbouring Chls in stabilizing Chl binding and provide a rationale for why, upon a single point mutation directed at the ligands of Chl 606 or 609, more than one Chl is lost [56,59,61,62]. Furthermore, the role of Q131 in binding the Chl 606-607 cluster and its indirect effect on the binding of Chl 609 is rationalized. This also explains why the Q131E mutation in LHCII and the reverse mutation, E166Q (in maize, E151Q in pea), in CP29 influence the affinity for Chl a or b in the 609 site [59].
Chl 614: According to the structures, site 614 accommodates a Chl b in CP29 and a Chl a in LHCII. The axial ligand for this Chl is histidine in both complexes, i.e. H212 in LHCII and H230 in CP29. This is surprising since, based on the difference in Lewis acidity between Chl a and b, histidine is believed to have an intrinsic higher affinity for Chl a [63][64][65]. In vitro reconstitution studies indicate a mixed affinity for this site in both CP29 and LHCII [56,59].
The results of the interaction energy calculations for Chl 614 are shown in Fig. 5B, D & F and demonstrate that the binding pocket of this Chl is very similar in CP29 and LHCII. The major players are the axial ligand, Chl 613 and several residues in the D helix & C-terminus region (see Figs. 2C & F and 5B, D & F). The main difference is the presence of W227 in CP29, which provides an additional −8.6 kJ mol −1 to the interaction energy with Chl 614 with respect to the homologous L209 in LHCII, consistent with the formation of an N-H:O hydrogen bond. The hypothesis that W227 could be a discriminating factor for Chl b binding in CP29 was formulated in a structural study [60] and is now confirmed by the dynamics.
Chl 601: Site 601 is occupied by Chl b in LHCII and Chl a in CP29. The axial ligand for this Chl is a carbonyl oxygen belonging to W14 in CP29 and Y24 in LHCII [15,16]. In both complexes a proline in the vicinity of these residues (P12 in CP29 and P27 in LHCII) prevents them  in (B). Visualization of the CP29 (white) N-terminus (blue). The amino acid residues that were labeled in [72] are shown in red. The black lines and Roman numerals designate the different domains of the N-terminus as established by [74]. C The CP29 N-terminus clusters: Cluster 1 (black, 68%), Cluster 2 (red, 24%) and Cluster 3 (blue, 8%). The Cryo-EM structure of the N-terminus is shown in white. D Differences in the average Chl-Chl excitonic coupling values in CP29 and CP29-Phos simulation. Differences smaller than 15 cm −1 are not reported ◂ from forming intra-peptide hydrogen bonds, allowing Chl coordination [65].
The binding pocket of Chl 601 is largely comparable in the two complexes. Coordination of this Chl occurs mainly through residues of the N-terminus and the DPPG lipid (see Fig. 6A, C & E). Throughout the simulations, Chl 601 remains coordinated to the carbonyl, which is reflected in the high interaction energy between Chl 601 and Y24 and W14 in LHCII and CP29, respectively (Fig. 6E). The main differences are related to the interactions with DPPG and the F34/W46 (CP29/LHCII) residues. W46 in LHCII is closer to Chl 601 than its counterpart (F34) in CP29, allowing it to form a hydrogen bond with the 17 3 oxygen (IUPAC numbering) of Chl 601. Consequently, the orientation of Chl b 601 is such that its formyl group is facing away from the protein, whereas in CP29, the corresponding methyl group faces more toward the protein (see Fig.  S7). However, during the simulations, Chl 601 does not Fig. 8 CP29 N-terminus. A Distance between the C α atom of the designated amino acid residue and the magnesium atom of the closest recombinant Chl (see methods).The red lines indicate the corresponding distances as determined by [72]. The striped rectangle boxes indicate the uncertainty in the configuration of the BODIPY label that was used in the [72] research and which is 0.7 nm in length. B Comparison of the average RMSF per N-terminus amino acid residue the CP29 simulations to experimentally determined amino acid mobility parameters: ΔB −1 [74], < ΔB 2 > −1 [74] & Ω (73) form hydrogen bonds with the LHCII protein nor with the solvated membrane (POPC lipids and water), making it impossible to identify the discriminating factor for the binding of different types of Chls in this site. Considering that this Chl binding pocket is at the periphery of the complex, it is possible that the binding is stabilized by neighbouring complexes. This hypothesis is strengthened by the observation that Chl 601 is absent in the recombinant LHCs [66], suggesting that additional stabilizing elements in its vicinity must be present in vivo.
Car L2: The L2 carotenoid binding site is occupied by Vio in CP29 and Lut in LHCII. For the L2 site in LHCII QM/MM calculations [67] have shown that the overall binding pattern of Vio and Lut in this site was very similar and that the total interaction energy for both Cars was of comparable magnitude. It was therefore concluded that the difference in L2 selectivity must stem from non-enthalpic factors, such as entropy or folding dynamics. The difference in L2 affinity in CP29 was studied via mutation analysis [68]. It was shown that neither the aspartic acids D35 and D123 (D50 and D138 in maize), nor the N-terminus influence the site selectivity. Moreover, substitution in LHCII of the protein domain surrounding the L2 lumenal ring (lumenal part of helix B, the luminal loop, helix C and a part of the stromal loop) with the homologous domain of CP29 did not affect the site selectivity, suggesting that it originates from interactions with the stromal part of helix B. Figure 6B, D & F show the interaction energies of Lut in L2 in LHCII and Vio in CP29. In the text S1 a comparison is made between the results of this study to previously performed calculations. The overall motif of Car binding in the L2 pocket is similar in LHCII and CP29. The principal interactions occur with the central helices and the surrounding Chls and are of similar amplitude. In LHCII the Lut L2 stromal ring is anchored by intermittent hydrogen bonds to the N-terminal residues W46 and T48, and by VdW interactions with Chl 602. Conversely, the Vio L2 stromal ring does not form significant interactions with residues in the N-terminal domain and the VdW interactions with Chl 602 are small. The former observation is consistent with the experimental evidence that removing the CP29 N-terminus does not lead to a significant loss of xanthophylls [68]. Due to the reduced interactions in the stroma, the dihedral of the Vio L2 stromal ring in CP29 has larger conformational freedom than its LHCII counterpart (see Fig. S3). At the lumenal end, both Vio in CP29 and Lut in LHCII form intermittent hydrogen bonds via their hydroxyl groups with a tryptophan in the lumenal loop (W121 and W97, respectively). The differences in the Chl 606-607 cluster (see above) lead to increased interaction between Chl 607 and Vio in CP29 as compared to the Chl 607 and Lut in LHCII. Based on this observation, one might expect that the organization of this cluster is important for the L2 selectivity, but experimental evidence indicates this is not the case [68]. Moreover, neither N61 in LHCII, nor F87 in CP29, which were suggested to play a role in the binding [68], interact with the Car in the L2 site. Based on these data, we conclude that just as for LHCII [67], Car selectivity in the L2 site in CP29 is not regulated through enthalpic factors.
It has been proposed that upon acidification of the lumen, several residues of the lumenal loop and helix C of CP29 get protonated, leading to a conformational change upon which the Vio in CP29 can diffuse out of the L2 binding site [60,69]. The binding affinity calculations clearly demonstrate that the binding of the Cars in the L2 site is significantly stabilized by residues within the two rigid central helices and by the surrounding Chls. The conformational change required for the release of the L2 Car in both proteins therefore should entail severe disruptions in the central helical region and the Chl manifold, which seems unlikely given the stability of the central helices.

CP29 N-terminus
The most significant structural difference between CP29 and LHCII is the N-terminus. This domain is significantly longer in CP29 than in LHCII and the other LHCs (~ 100 amino acid residues in CP29 versus e.g. ~ 50 in LHCII, see Fig. 1C). This domain has been suggested to be important for the anchoring of CP29 to the core in the PSII supercomplex [70], but also for the stacking of the grana, by linking two CP29 proteins belonging to PSII complexes on opposite membranes [71].
To get information about the structural flexibility of this domain, we have calculated the distribution of secondary structural elements (regions as defined in (Shabestari et al. [74])) throughout the dynamics, according to the DSSP model (see Fig. 7A) (note that region I was not resolved in the cryo-EM structure and is therefore absent in our model). While all domains are mostly disordered in the dynamics (Fig. 7A, a bend and coil motif are regarded as unstructured in the DSSP model [45]), region III consistently retains a structured β-sheet and a 'hairpin' turn motif. The turn in region II as well as the α-helical pattern in region V, are also maintained during the dynamics. We also note that the N-terminus extended hairpin motif does not move towards the water phase nor towards the hydrophobic core of the lipid bilayer, but remains flat on top of the membrane. To complement these data, we have retrieved the conformations of CP29 throughout the simulations using a cluster analysis, the results of which are shown in Fig. 7C. Three major conformations could be extracted from the trajectories. Still, they are all quite similar, indicating that the structure of the N-terminus observed in the C 2 S 2 M 2 complex remains stable even in the absence of the neighbouring complexes.
Several studies have aimed at resolving the structure and flexibility of the CP29 N-terminus in monomeric complexes embedded in detergent micelles [72][73][74][75]. Berghuis et al. labelled several amino acids in the CP29 N-terminus with the fluorescent BODIPY dye and determined the distance between the dye and the closest Chl using time-resolved fluorescence spectroscopy. A visualization of the labelled residues is provided in Fig. 7B. To compare experiments and simulations, we determined the same distances along the trajectories of the CP29 MD simulations. Chl 616 was excluded from this analysis since it is absent in the monomeric CP29 [76]. The results are reported in Fig. 8A. S20, S27, A43, S77 and T84 (S97 in [72]) exhibit similar distances to the closest Chl in the experiments and simulations. Based on the location of these residues within the N-terminus, we conclude that the regions II, IV and V of the N-terminus (see Fig. 7B for the regions) adopt a similar conformation in detergent and in the membrane. S52 and L61, which are located in region III, are instead ~ 2 nm closer to the CP29 Chls in the experiments than in the simulation, suggesting that in detergent the N-terminal hairpin of region III is folded down towards the CP29 helical core, whereas in the model lipid membrane it is in its extended conformation oriented along the surface of the membrane. It is probable that the detergent micelle imposes sterical hindrances on this part, forcing the N-terminus to adopt a folded conformation. The flexibility of the N-terminus in terms of the average RMSF per residue could be compared to experimentally determined flexibility parameters arising from ESR (Ω, [73])-and EPR labelling studies (ΔB −1 , < B 2 > −1 , [74]). While the values are very similar for region II, V and a large part of region III (comparison of RMSF to Ω, ΔB −1 and < B 2 > −1 in Fig. 8B), the region near the tip of the hairpin is instead more rigid in detergent, possibly again due to the constraints that the detergent micelle imposes on this area. These data therefore show the distortions that detergent micelles can induce in LHCs and prompt the use of more native environments when studying them, such as e.g. liposomes [77] or nanodics [78][79][80].

Phosphorylation of the CP29 N-terminus
The N-terminus of plant CP29 was shown to be phosphorylated at sites T68 and T70 in response to light and other environmental stresses [33,[81][82][83][84]. It was proposed that the phosphorylation mediates the disassociation of LHCII from the PSII supercomplex [33] and it was shown that it induces conformational changes in CP29, which result in differences in absorption around 638, 653-663 and 683 nm [85]. To mimic the phosphorylation's effects, we ran three independent simulations with residues T68 and T70 phosphorylated. We observed that phosphorylation induces an extra β-bridge motif in region III (Fig. 7A), while no changes are observed in the other regions. Possibly, conformational changes become more apparent once the interactions with neighbouring complexes are included in the model system.
Next, we analyzed the effect of phosphorylation on the organization of the pigments since the observed differences in absorption 93 could result from changes in the excitonic couplings between Chls. Excitonic coupling differences between Chls in the phosphorylated and non-phosphorylated CP29 complex on the order of 16-22 cm −1 are observed for the Chl 604-606-607 cluster and the 611-612, 603-616 and 609-616 pairs (Fig. 7D). Exciton modelling of CP29 revealed significant participation of the Chl 606-607 pair to the 638 nm and 653-663 nm absorption bands and a considerable contribution of the Chl 611-612 pair to the peak at 683 nm [30]. Differences in excitonic couplings in these clusters can therefore explain the observed differences in absorption.

Conclusion
In this work, we have compared the dynamic structural properties of two LHCs, LHCII and CP29. The MD data indicate that their apoproteins possess very similar flexibility patterns, and the main differences concentrate in the lumenal and stromal loop regions. Overall, both proteins undergo relatively small conformational changes upon equilibration in the membrane. Inspection of their Chl-Chl excitonic coupling dynamics revealed more significant differences at their lowest energy sites. In LHCII, the electronic couplings between Chl 611 and 612 vary widely. In CP29, which contains two isoenergetic lowest energy sites located at the 603-609 and at the 611-612 pair, the couplings remain consistently high along the dynamics. This points to a photoregulatory function for the lowest energy site in LHCII, but not in CP29.
The MD trajectories also allow for detailed binding affinity calculations and have yielded insights into the factors that drive the differences in pigment binding properties between the two complexes. The data show that the Q131 residue in LHCII is indirectly involved in the selective binding of Chl 609. It anchors the C helix to the Chl 606-607 cluster, which in turn coordinates the formyl group of Chl b 609. The homologous residue in CP29, E151, does not connect Chl 607 to helix C, resulting in the loss of connection between the Chl 606-607 pair and Chl 609. In addition, the W227 residue in CP29 (L209 in LHCII) is a constitutive hydrogen bond partner for the Chl b 614 formyl group, and it is thus responsible for the selectivity of this binding site for Chl b. No such distinctive residue could be observed for the L2 carotenoid binding site in CP29 and LHCII, which could explain the difference in Car binding properties, 1 3 suggesting that non-enthalpic factors are at the origin of this phenomenon.
We finally turned our focus to the unusually long N-terminus of CP29, which is proposed to have a role in stabilizing the PSII supercomplex and contains phosphorylatable residues with a regulatory function. The CP29 N-terminus is largely unstructured and flexible but contains also some consistently structured regions. Its conformation as resolved in the cryo-EM structure is stable also in the absence of the neighboring complexes. By comparing the dynamic structure of CP29 in the membrane to experiments performed on CP29 in detergent, we suggest that the detergent micelle imposes sterical constraints on the extended hairpin region of the CP29 N-terminus. Finally, simulations of CP29 with its N-terminus phosphorylated revealed that an additional β-bridge motif is formed and that long-range allosteric effects induce changes in the excitonic couplings in the Chl manifold, explaining the experimentally observed spectral shifts.
Acknowledgements This work is supported by the Dutch Organization for Scientific research (NWO) via a TOP grant to RC and a Veni grant to NL Data availability All data needed to evaluate the conclusions in this paper are present in the paper and/or Supplementary Information. All original data are available from the authors upon reasonable request.

Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.