Insights on the supramolecular polymorphism of poly(γ-benzyl-L-glutamate) rod-like peptides from atomistic molecular dynamics simulations

This work reports an all-atom molecular dynamics study of the first stages of aggregation of poly(γ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma$$\end{document}-benzyl-L-glutamate)—PBLG—polymers end-capped with C60. PBLG self-assembles in water and shows polymorphism when specific changes in the molecular structure are made. Three variants of PBLG are compared, which differ for the location of the C60 moiety: N-terminus, C-terminus, or both. The aim of the computational experiments was to rationalize the key molecular properties that are relevant to the supramolecular polymorphism. Single-peptide simulations in tetrahydrofuran and in water allowed to quantify the strength of the self-assembly driving force in terms of the overall order parameter of the phenyl rings that are “coating” the peptides. Two-peptide simulations for the singly capped peptides showed that two kinds of aggregates can be formed: one “slow” thermodynamically more stable, and one “fast” kinetically favoured. These first-stage aggregates are interpreted as the seeds leading to different self-assemblies.


GRAPHICAL ABSTRACT Introduction
Self-assembly, one of the most amazing phenomena in nature [1][2][3], is also a central field of study in materials science [4][5][6][7], nanochemistry [8,9], and biomimetic chemistry [2,[10][11][12][13][14][15][16]. Of particular interest is the non-covalent self-assembly in systems where the single building block includes all necessary ''information'' to control, on the one hand, the size of the self-assembled structure (e.g. by modifying one or more dimensions of the building block), and on the other hands, the supramolecular shape by applying small, directed chemical changes to the building blocks. The latter characteristic is referred to as supramolecular polymorphism [17].
A rational way to design new polymorphic selfassembling systems is desirable, but it cannot be left only to a vague chemical insight. The complexity of the building blocks and the complexity of the interaction among hundreds or thousands of such molecules require support from theoretical and computational chemistry. On the one hand, the computational study of the self-assembling process demands large simulations with the order of 10 3 building blocks (often macromolecules) included into a solvent box. A plethora of approaches to accelerate these calculations are available: (1) Monte Carlo methods [18], (2) coarse-grained [19] and mesoscopic [20] deterministic molecular dynamics (MD) simulations; (3) stochastic approaches such as dissipative particle dynamics [16,[21][22][23][24], and Brownian dynamics [25] simulations; (4) mean-field approaches, like the self-consistent field theory [26], which is purely thermodynamic, based on the minimization of a phenomenological expression for the overall free energy of the system, and (5) multiscale methods such as external potential dynamics [26] and Meso-Dyn [27], in which the building blocks are described at some level of detail, while the medium is treated at a mean-field level. On the other hands, rationalizing how the geometrical properties of the self-assembled structures are related to the chemical details of the building blocks is a second important issue, which is concerned also with the parametrization of the force fields of the computational approaches mentioned above.
This work focuses on the second problem, with the aim of defining a proper computational protocol tailored to discriminate, among the degrees of freedom of the system, those molecular properties relevant to the control of the supramolecular polymorphism.
This study has been inspired from a previous work [28] in which a family of molecules based on the poly(c-benzyl-L-glutamate)-PBLG-polymer was studied. Polymorphism was observed by changing the capping pattern of the peptides. If two C 60 moieties were capping the C-and N-termini of the peptide, self-assembled vesicles were observed. When the N-terminus C 60 moiety was removed, leaving the NH 3 ? group, the peptides self-assembled into tori. Moreover, acetylating the N-terminus led to self-assemblies shaped as spherocylinders (not shown in ref. 28). The peptides were 84 amino acids long. Repeating the self-assembly study on a 25 amino acid-long, doubly C 60 -capped PBLG, vesicles were still obtained, but of reduced dimension. (The radius was one order of magnitude smaller than the radius of the vesicles obtained with the 84 amino acid-long PBLG.) As a conclusion, it was observed that the shape and dimensions of the self-assemblies could be modulated with just small, localized changes.
In this work, all-atom molecular dynamics simulations (AA-MD) have been carried out for two similar molecules: NH 3 ? -PBLG 25 -C 60 (P C from now on) and C 60 -PBLG 25 -COO -(P N ) peptides in both THF (where PBLG can be dissolved) and water (where PBLG auto-assembles). P N is expected to behave as the acetylated peptide that self-assembled in spherocylinders. The data obtained as output from the MD simulations are complex, containing all details of the system without any hierarchical distinction. The purpose of this work was to interpret these computational experiments to extract the relevant details that bridge the structure of the peptide to that of the nanostructure. In particular, the objective was to quantify how and to which extent site-directed chemical modifications of the peptide are reflected in different peptide-peptide and peptide-solvent interactions. The balance between these two kinds of interactions is at the basis of the supramolecular polymorphism.
The case study systems considered in this work are similar to those studied in ref. 28: in water, the N-terminus of P C has an unprotected NH 3 ? group, which was considered the molecular detail responsible for the formation of the tori. P N , similarly to P C , bears only one C 60 moiety, but the latter is capping the amino group, and thus, it is expected that P N should assemble into spherocylinders. The two peptides were 25 amino acids long to compare with previous C 60 -PBLG 25 -C 60 (P NC ) AA-MD simulations carried out in the past [28]. Despite the simplification of the polymers studied here, simulations of the single PBLG polymers in the two solvents allowed a thermodynamic interpretation of the formation of different shapes. Moreover, the simulations of couples of peptides brought to light different paths of aggregation that are compatible with the different shapes of the self-assembled nanostructures observed experimentally, providing kinetic insight on the shape selection. The molecular interpretation provided here of the thermodynamics and kinetics behind the supramolecular polymorphism, as well as the analysis protocol, is transferable to similar polymers.

Methods
MD simulations have been run with GROMACS 5.1.4 [29] using an all-atom representation of the peptides and of the solvent. For the neutralizing ions and for the ''natural'' part of the BLG amino acid, the CHARMM27 force field has been employed (with the CMAP correction), while the benzyl and C 60 moieties, and THF solvent have been parametrized with the GAFF force field. Finally, water has been modelled with the TIP4P force field.
While P NC peptide atomistic MD simulations were available from a past work (for details, the reader is referred to the supporting information of ref. 28), 12 new simulations have been carried out for the P C and P N peptides. A representation of the two molecules is provided in Fig. 1. For each of the peptides, two simulations of the single molecule in water or THF have been carried out. Then, two-molecule trajectories have been run placing 2 P C or 2 P N peptides inside a water or a THF solvent box. Two different initial configurations have been considered, i.e. headhead (hh) or head-tail (ht) initial relative orientation Figure 1 Representation of the initial configuration of a P N , and b P C peptides. The amino acids are folded in a-helix, as depicted by the red cartoon. c Schematic showing how the helix (r helix ) and i-th phenyl (r i ) axes have been defined to calculate order parameters. Blue spheres represent the 2nd and 24th backbone N atoms used to define the helix axis (green arrow), while the yellow spheres indicate the couple of C atoms used to calculate the local axis on the phenyl ring (red arrow). of the peptides, where the ''head'' is the N-terminal side, and ''tail'' the C-terminal. Details of the modelling are reported in Table 1.
For each of the systems, the simulation protocol consisted of an initial energy minimization followed by a 3 ns-long heating protocol from 0 to 298.15 K. Finally, one production run of 50 ns was carried out. Simulations have been run under periodic boundary conditions and in NPT canonical ensemble. The simulation parameters are summarized in Table 2.

Single-peptide simulations
The analysis of the MD trajectories of the three peptides has been conducted with the intent to investigate which molecular properties are likely to be responsible for the different shapes of the selfassemblies. To this purpose, the first observation concerns the driving force that is pushing the selfassembly process. As it was shown with the support of coarse-grained calculations in the previous work on P NC peptides [28], the assembly process is driven by the hydrophobicity of the peptides. Seen in a different way, the driving force should be considered as the balance between the peptide-peptide and peptide-solvent interactions. The hypothesis is that in THF the peptide-solvent interactions are stronger than in water, destabilizing the aggregation of the peptides. In water, the opposite is happening.
By looking at the snapshots of the single-peptide trajectories, it emerges that the phenyl rings behave in a very different way in the two solvents. Figures S1-S3 in the electronic supporting material (ESM) analyze quantitatively the observation that while in THF, the phenyl rings are randomly oriented with respect to the axis of the helix, in water, they tend to be aligned with the helix, resulting in a lower exposure to the solvent. Such an observation can explain (at the molecular scale) the mechanisms that  control the balance between the peptide-peptide and peptide-solvent interactions and thus the strength of the driving force of self-assembling. Our interpretation is that a better affinity of the phenyl rings with the solvent decreases the degree of order, providing in turn a thermal disturbance to the phenyl-phenyl stacking interactions, the latter being at the origin of the peptide-peptide attractive potential.
To quantify the observations, the order parameters of the phenyl rings can be used. In particular, the idea is to interpret the axis of the helix as a director generating an orienting potential on each of the rings. As a first approximation, a phenyl ring is supposed to have the possibility to explore a hemisphere with its centre fixed on the C a carbon. Figure 1c schematically depicts the model. The vector r helix is parallel to the helix axis, defined by the line passing through the N atoms of the 2nd and 24th BLG amino acids (see Fig. 1c), and oriented in the N-C direction. For the ith phenyl ring, the vector r i is defined as the difference between the positions of the two C atoms shown in Fig. 1c (the C connected to the rest of the side chain, and the C in para with respect to the former). The cosine of the polar angle h i is calculated simply as It is assumed that the orienting potential is only affecting the h i polar angle. This limits the choice of the order parameters to the family of averages of Legendre polynomials. Moreover, by examining the trajectories, it emerged that there is a preferential C-N ''polarization'' of the ordering (see, e.g. Fig. 1a, b). This may be due to the chirality of the helix. With such an observation, the subset of odd order Legendre polynomials should be used to describe ordering. Here, the simplest choice has been made, i.e. the firstorder Legendre polynomial. The order parameters for the phenyl rings have been computed according to the average Þ¼cos h i ð Þ, and . . . h i means averaging over the trajectory snapshots.
The order parameter defined in Eq. 2 equals 1 if the phenyl ring is perfectly ordered along the helix axis in the N-C direction, -1 if it is aligned in the opposite versus, and 0 if the phenyl is uniformly distributed. Figure 2a, b shows how the order parameters of the phenyl rings change along the peptide chains. Overall comparison shows that the phenyl rings are characterized by a sensitive higher degree of order in water with respect to THF. The average order parameters for P C , P N , and P NC in water are, respectively, 0.50, 0.64, and 0.70, while in THF are 0.20, 0.29, and 0.30. In water, the degree of order decreases as P NC [ P N [ P C .
From this observation, it should be expected that the overall stacking interaction would be stronger in the P NC peptide and weaker for the P C molecule. Moreover, Fig. 2a shows that in water, the three phenyl rings in the head of P N (where the C 60 moiety is bonded) show higher order than the three phenyl rings in the tail. The opposite is seen for P C . Finally, P NC appears quite disordered in the N-terminus, and low order is also observed at the C-terminus. However, the middle of the P NC peptide is uniformly highly ordered, differently from the other two molecules.
Comparing P N and P C , the presence of the fullerene moiety seems to lead the ordering pattern. In fact, P N in water is highly ordered in the N-terminus, while at the other side (ending with COO -), the order parameter is lower. Analogously, for P C, the phenyl rings are highly ordered at the C-terminus (same side where the fullerene is located), while ordering is lower close to the NH 3 ? moiety. However, the degree of disorder is higher in the head of P C with respect to the tail of P N , suggesting higher affinity to water of NH 3 ? than COO -. This is reflected in a more penetrating effect of water molecules that, in turn, gives a stronger local disturbance of the ordering of the phenyl rings.
To inspect such a hypothesis, the solvent structure around the peptides has been studied by means of pair correlation functions, g r ð Þ. Given two atoms, the pair correlation function quantifies the probability of finding the second atom at a certain distance r from the first one, relative to the probability calculated as the density was isotropic. Peaks of the g r ð Þ thus tell how and to which extent, averagely, the two atoms interact. As it was done with the order parameter, a first global information on the structure of the solvent around the polymers has been extracted, followed by a more detailed residue-by-residue analysis. Figure 3 reports the g r ð Þ functions calculated between any of the backbone Ca atoms of the PBLG peptides with any of the oxygen atoms of the solvent molecules (water or THF). The plots are similar for the three peptides in the two solvents. The pair distribution functions in THF in the proximity of the peptide are above those in water, providing the information that peptide-water interactions are weaker than peptide-THF interactions. Such an observation is compatible with the experimental observation that the peptides remain dissolved in the organic solvent, while they precipitate in water. Figure 4 reports a more local description of the peptide-solvent interactions. In particular, g r ð Þ functions have been computed for each of the single Ca atoms with any of the oxygen atoms of either water or THF. Figure 4a-c shows the pair distribution functions in water for, respectively, P N , P C , and P NC . For P N , residue 25 (i.e. the C-terminal) shows the highest affinity to water. On the contrary, for P C, this is observed for residue 1 (i.e. the N-terminal). This is due both to the fact that the terminal residues without the C 60 moiety are less sterically hindered with respect to the C 60 end-capped termini, and because the free terminal residues are charged. In the case of P NC , the terminal residues show again a peak on their g r ð Þ, differently from the behaviour of the other residues.
For the P NC peptide, a lower value between 0.5 and 0.75 at the termini is observed. The order of Figure 2 Colour map representation of the order parameters of the phenyl rings along the three peptides P N , P C , and P NC in a water and b THF. The colour bar at the bottom provides the ranges of S 1 j j, covered by the colour. The arrows indicate the direction of ordering of the phenyl rings. Figure 3 Pair radial distribution functions of the overall probability of finding the O atom of any solvent molecule with respect to any of the backbone C a atoms of the PBLG chain of P N in water (black, dash-dash-dotted line), P C in water (red, dotted line), P NC in water (violet, dotted line), P N in THF (green, dashdotted line), P C in THF (blue, dash-dot-dotted line), and P NC in THF (orange, solid line).
structuring of water around the termini of the peptides is in accord with the order of affinity with water as suggested by the analysis of order parameters, i.e. NH 3 On the other hands, the central residues show pair distribution functions very similar to each other and resemble the trend in Fig. 3. Figure 4d-f in THF shows again that the solvent can be more easily structured around the terminal residues. Trends of residues 1 and 25 are opposite to those observed in water. A difference, however, is observed for residue 1, which is always showing the largest affinity to THF. This is especially true when it is end-capped with C 60 . The order of affinity (height of the peak of the first residue) is P NC [ P N [ P C . The C-terminus shows a peak that is nearly independent upon the end-capping pattern. Finally, as it happens for water, the central residues show no peaks. However, their pair radial distribution functions are more ''penetrating'', i.e. they go to zero around 3 Å , while in water, they reach zero around 5 Å .
Similar conclusions on the terminal parts of the peptides can be drawn selecting the N or C backbone atoms in, respectively, the N and C termini. A short discussion is reported in Figure S4 of the ESM.
To summarize, analyses of the trajectories in terms of order parameters and solvent structure suggest that P NC molecules in water will tend to be more ''sticky'' with respect to P N and the latter more than P C . Moreover, the overall affinity to water increases in the same order, i.e. P NC \ P N \ P C . The P NC peptides will thus be expected to be pushed more by water to decrease the exposed surface, and such a process will be favoured by a large stacking interaction. On the opposite, P C molecules will feel less pushing force and will tend to interact less. P N sits in the middle. Therefore, it is expected that the surface exposed to water of the three self-assemblies should follow the order P NC \ P N \ P C .
From the experiments, it is known that P NC , P N , and P C self-assemble, respectively, into spherical vesicles, spherocylinders, and tori. In Sect. 3 of the ESM, it is discussed that for these solids, the surface per unit volume follows the order sphere [ spherocylinder [ torus, which is the same expected trend deducted from the analyses of the atomistic MD trajectories.

Two-peptide simulations
MD simulations of 2 P C or 2 P N peptides have been computed both in THF and water to inspect the aggregation process with the aim of investigating whether relevant differences are observed at the atomistic scale that can be related to the differences in the final different shapes of the self-assembled structures. To this purpose, eight different simulations have been carried out with the polymers placed in the two solvents in two different initial relative configurations, i.e. head-head (hh) and head-tail (ht)-see Figure S6 in the ESM. Figure 5 shows on a coarse-grained temporal scale what is happening in time from the minimization stage to the end of the production trajectory in water. Important differences are seen in the four trajectories. Firstly, by examining the shape of the aggregates, it is observed that both peptides give origin to two different types of aggregates. In particular, P N -ht and P C -hh evolve to an aggregate that we shall call ''longitudinal'', i.e. the two main molecular axes are parallel. On the other hands, the P N -hh and P C -ht simulations end up with ''transversal'' aggregates, where the axes of the two a-helices show an angle between 90 and 120 deg.
Longitudinal aggregates in Fig. 5a, d of, respectively, P N -ht and P C -hh show the important difference that the relative orientation of the two peptides is kept, respectively, head-tail and head-head. These aggregates show the lowest solvent accessible surface area (SASA), with respect to the transversal aggregates. The longitudinal is thus expected to be the thermodynamically more stable structure. The headhead P C configuration can be interpreted from the highest affinity of the N-terminus (NH 3 ? ) to water with respect to the other peptides. This is compatible, as shown in Fig. 2a, with the lowest order of the phenyl rings at the N-terminus with respect to the order observed at the C-terminus. This is different in the case of the head-tail longitudinal configuration of the 2 P N aggregate. The transversal aggregates also show an important difference. The P C -ht initial configuration leads to a transversal aggregate keeping the head-tail configuration (Fig. 5c). In particular, the N-terminus of a peptide is stacked to the C-terminus of the other one. This can be explained by a ''solvation'' effect of the C 60 moiety by the phenyl rings close to the N-terminus (see Figure S7 of the ESM). Again, this is possible because of the lower order of the phenyl rings at the N-terminus with respect to the other ends. The time evolution of the P N -hh configuration ends with a transversal aggregate in which the two peptides are placed at 90 deg, one with respect to the other, giving a ''T'' shape. Both transversal aggregates seem to be stable, i.e. they do not evolve to the longitudinal ones even increasing from 50 to 100 ns over the length of the trajectory. Moreover, MD simulations were repeated on larger boxes to ensure that no artefacts due to the minimum image approximation of the periodic boundary conditions algorithm are at play. The same aggregation scheme was found.
To quantify the observations, the SASA time series have been extracted from the trajectories. They are reported in Fig. 6a-d. The relevant information that can be obtained from the simple observation of the time series is that the transversal aggregates are formed with faster kinetics with respect to the longitudinal ones, which are formed just in the initial stages of the equilibration (within the first 2 ns of simulation), the formation of the longitudinal aggregates is a slower process that takes 10 ns. The deduction from such an observation is that the shape of the self-assemblies shall be determined by the transversal aggregates, which will behave as templates for the growing nanoscopic architectures. Finally, the histograms have been built from the SASA trajectories. In particular, for each of the simulations, two histograms have been extracted: the first using the first few ns of the trajectory to plot the SASA distributions of the single, separated peptides. The second, using the tails of the trajectories to access the SASA distributions of the aggregates.
The histograms are reported in Fig. 6e. The probability density P S ð Þ is normalized over the integration of the SASA, S, from 0 to infinity. The rightmost part of the plot reports the probability densities of the SASA for the separated peptides. They appear Gaussian, with an average value of S ¼ 74 nm 2 , and a standard deviation of 4 nm 2 . It has to be noticed that the P C -ht distribution appears very large and unresolved since the aggregation process is so fast that there is not a sufficient number of points to build a statistical meaningful histogram.
The leftmost part of the plot reports the four histograms for the aggregates. The longitudinal ones show smaller SASA's with respect to the transversal ones. In particular, the P N -ht and P C -hh are centred to similar average values of S, respectively, 59 nm 2 , and 57 nm 2 . Moreover, the distributions show standard deviations of 2 and 3 nm 2 . Transversal aggregates of P N -hh and P C -ht are, instead, centred on, respectively, 65 and 63 nm 2 , with a larger standard deviation of 2.5 and 3.5 nm 2 . Larger standard deviation (i.e. larger fluctuations of the SASA) may be caused by: (1) a lower order of the phenyl rings, which in turn means a less attractive force of the two peptides, or (2) a larger affinity to water, meaning that the driving force to aggregation is weaker. However, taking into consideration the order parameters in Fig. 2, P N shows an average order larger than P C . For this reason, it would be expected that P N should give the longitudinal aggregate with smaller SASA. However, the contrary is observed, leading to the conclusion that a relevant role played in the aggregation is the affinity to water. The strength of aggregation led by peptide-peptide interactions (that depends on the order parameters) is, instead, reflected in a slightly smaller standard deviation. The same conclusion can be drawn for the transversal aggregates.

Conclusions
In this paper, all-atom molecular dynamics simulations have been carried out and analysed to rationalize the polymorphism of PBLG self-assemblies originated from small, localized changes in the molecular structure of the building blocks.
To encompass such an objective, a small set of parameters has been defined and used. Results obtained from the application of the protocol to the analysis of the P N and P C MD simulations, which have been carried out ex novo in this work, are confirmed by the application of such a protocol to the P NC molecule. In a previous work, [28] numerical experiments have shown that P NC self-assembles in vesicles, the same structures observed experimentally. The fact that the MD simulations analyses agree with the expected (and proved) system behaviour confirms that the protocol is catching the relevant ingredients needed to understand, at the molecular level, the properties of van der Waals interactiondriven self-assembling polymers.
What has been learned is that in the early stages of aggregation, the kinetically favoured dimers are the transversal ones, while the longitudinal (probably more stable thermodynamically) are formed within a slower time scale. Transversal aggregates are stable at least for 100 ns. Thus, it is reasonable to conjecture that the different transversal geometries are the templates of the self-assembly process. P C peptides may be able to form circles that can grow and close to form tori. On the other hands, P N molecules can be expected to stack into long wires that can grow laterally, giving origin to spherocylinders. Moreover, since these aggregates showed more rigidity (lower fluctuations in SASA), it is not possible to bend the long-range aggregate. It is worthy to note that such an interpretation is also compatible with the fact that changing the size of the molecules, the size of the self-assemblies is also affected. We interpret the self-assembly mechanism as the balance between peptide-peptide and peptidewater interaction forces. Thermodynamic equilibrium will be reached when the force that is pushing the peptides to reduce the total SASA for the reason of hydrophobic interactions, which is opposed by the force that describes the short-range interpeptide interactions. Work is in progress to build a coarsegrained description of the peptides based on the MD analysis protocol here presented. Such a modelling would allow to simulate a box with hundreds of polymers and thus to simulate the self-assembling process with the force field parametrized by these allatom trajectories.