Sequence-specific response of collagen-mimetic peptides to osmotic pressure

Native collagen molecules usually contract upon dehydration, but the details of their interaction with water are poorly understood. Previous molecular modeling studies indicated a spatially inhomogeneous response, with a combination of local axial expansion and contraction. Such sequence-dependent effects are difficult to study with native collagen. In this article, we use collagen-mimetic peptides (CMPs) to investigate the effect of osmotic pressure on several collagen-mimetic sequences. Synchrotron x-ray diffraction combined with molecular dynamics simulations shows that CMPs pack differently depending on osmotic pressure and exhibit changes in the helical rise per residue of individual molecules. Infrared spectroscopy reveals that osmotic pressure affects the stability of the triple helix through changes in triple helix-stabilizing hydrogen bonds. Surprisingly, CMPs with the canonical collagen sequence glycine–proline–hydroxyproline are found to elongate upon dehydration, while sequence modifications are able to reverse this tendency. This strongly suggests that the overall contraction of native collagen molecules is not programmed into the canonical sequence but is specific to local amino acids that substitute for proline or hydroxyproline along the protein chain. Collagen is an essential protein in mammalian extracellular tissues and a better understanding of its mechanical function is important both from a materials science and from a biomedical viewpoint. Recently, collagen has been shown to contract along the fibre direction when subjected to osmotic stress, a process that could play important roles in strengthening bone and in developing tissue tension during extracellular matrix development. The present work uses collagen-like short peptides to show that the canonical collagen sequence is not responsible for this contraction. The conclusion is that the collagen amino acid sequence must have evolved to include guest sequences within the canonical glycine-proline-hydroxyproline repeat that provide the observed contractility. Collagen is an essential protein in mammalian extracellular tissues and a better understanding of its mechanical function is important both from a materials science and from a biomedical viewpoint. Recently, collagen has been shown to contract along the fibre direction when subjected to osmotic stress, a process that could play important roles in strengthening bone and in developing tissue tension during extracellular matrix development. The present work uses collagen-like short peptides to show that the canonical collagen sequence is not responsible for this contraction. The conclusion is that the collagen amino acid sequence must have evolved to include guest sequences within the canonical glycine-proline-hydroxyproline that provide the observed contractility.


Introduction
Collagen, the most abundant protein in mammals, plays a key structural and biochemical role in the extracellular matrix. 1,2 All 28 different collagen types known to date contain at least one right-handed helical domain, made up of three polypeptide chains with the repetitive sequence (Xaa-Yaa-Gly) n . In mammalian collagens, Pro is frequently found in the Xaa position, while the Yaa position is often occupied by the post-translationally modified Pro derivative 4R-hydroxy-2S-proline (Hyp). The triple helical structure, which is stabilized by one interchain hydrogen bond per Xaa-Yaa-Gly tripeptide unit, 3,4 is highly abundant in fibril-forming collagens. These include collagen I, which is an important building block of connective tissues, such as tendons, ligaments, dermis, cornea, and bone. 1,2 Upon fibrillation, triple helices self-assemble into a highly regular, periodic structure where molecules are axially staggered with a socalled D-period of 67 nm. [5][6][7] This staggered alignment results in tightly packed overlap regions and less dense gap regions. In the overlap regions, the triple helices are laterally packed in a distorted hexagonal fashion, while the molecular arrangement in the gap regions is more variable. [6][7][8][9] Many studies of fibrillar collagens and triple helix-forming collagen-mimetic peptides (CMPs) have shown that water content (i.e., osmotic pressure) plays a crucial role in determining collagen structure and function. 4,[10][11][12][13][14][15][16][17][18][19] Hydration and protein-induced water structuring are fundamental processes that determine the biochemical properties of the triple helical collagen molecule and its higher-order assemblies as well as its tightly connected multiscale mechanical performance. For example, small-angle x-ray scattering (SAXS) and Raman experiments on collagen-based tissues have shown that small changes in osmotic pressure significantly affect the lateral packing density of collagen triple helices and their interaction with water. 20,21 Atomic force microscopy of collagen fibrils has further shown higher degrees of osmotic pressure-induced swelling in the gap regions when compared to the overlap regions. 22 Osmotic pressure-dependent, dynamic structural features thus critically determine solute diffusion, exposure of integrin binding and protease cleavage sites, 19,23 accessibility of nucleation sites for mineralization 9 as well as the tensile and compressive stiffness along and perpendicular to the fibril axis. 22,[24][25][26] For fibers of collagen I, it has been shown that osmotic pressure-dependent mechanical properties are directly related to molecular conformational changes that affect the axial length. 27 Within the physiological osmotic pressure range, this length change can generate axial forces larger than those produced by muscles. In mineralized collagen-based tissues, these forces further induce enormous compressive stresses (estimated to reach 1 GPa) that act on the apatite crystals associated with the collagen fibrils. 25 This purely passive physicochemical mechanism is fundamentally different from other biological processes where force generation requires energy in the form of ATP (e.g., as in the case of molecular motors). Molecular dynamics (MD) simulations suggested that these conformational changes are not homogenously distributed along the collagen molecule. 27 Instead, some regions elongated, while others contracted so that the net molecular contraction observed upon dehydration (and vice versa in the case of rehydration) was predicted to be the sum of highly localized processes. The axial response of collagen to osmotic pressure is thus expected to be finely controlled by the local amino acid sequence. Sequence-level control over the macroscopic hydration-induced strain also exists in silk; 28,29 however, silk is based on β-sheets rather than helical conformations and contracts upon hydration. Other, fundamentally different osmotic pressure responses in biochemical systems include water-based actuation in plants 30 and bacillus spores. 31 To understand the origin and functional consequences of local hydration patterns in collagen I, it is essential to investigate the structure of individual triple helices and their higher-order assemblies in a sequence-controlled manner. Each triple helix of full-length, fibrillar collagen I consists of a large number of amino acids (~ 3000), forming more than 200 unique Xaa-Yaa-Gly tripeptide units. 32 It is thus not easily possible to obtain sequence-controlled and thermodynamically stable fragments of native collagen and to investigate their properties as a function of environmental changes. To overcome this limitation and to obtain precise structural and chemical information at the molecular level, we utilized CMPs and investigated their response to osmotic pressure changes. CMPs are chemically well-defined, synthetic peptides with the sequence (Xaa-Yaa-Gly) n that fold into the characteristic triple helical structure. [33][34][35][36] First CMPs primarily consisted of the canonical sequences (Pro-Pro-Gly) n and (Pro-Hyp-Gly) n . 37,38 These CMPs have critically contributed to understanding the fine molecular differences determining collagen structure and hydration patterns, 16,[39][40][41][42][43][44] folding, 45-47 thermodynamic stability, 37,38,46,[48][49][50] and higher-order assembly 51 as well as the specific role of Hyp in determining the above-mentioned properties. 16,38,40,[49][50][51][52] Later, so-called host-guest CMPs have been introduced, where either individual amino acids or tripeptide units were modified in the central part of the peptide sequence. 13,[53][54][55] With this strategy, also disease-related mutations have been investigated 13 as well as segments of biochemically active sequences, such as protease cleavage and integrin binding sites. 17,36,56,57 In this work, we utilized CMPs to investigate the response of different sequences to changes in osmotic pressure. While precisely controlling the relative humidity (RH) of microcrystalline CMP powder samples, we determined three essential structural parameters that are critical for understanding triple helix conformational changes as well as higher-order assembly. Specifically, we performed in situ infrared (IR) spectroscopy to obtain information about the effect of CMP-water interactions on the strength of the interchain hydrogen bond. Synchrotron-based x-ray diffraction (XRD) was used in combination with MD simulations to investigate the lateral packing arrangement, specifically focusing on the center-to-center distance between CMP triple helices. The XRD data were further used to obtain the helical rise per residue to obtain information about the axial molecular contraction/elongation of individual triple helices. We show that decreasing the osmotic pressure causes stronger interchain hydrogen bonding. Furthermore, the chemical properties of the triple helix surface influence the swelling behavior (lateral packing) of higher-order triple helix assemblies, causing a sequence-specific rearrangement of the helices in the range of physiologically relevant osmotic pressures. The peptide sequence also critically determines the osmotic pressure-dependent helical rise per residue and even small changes in the sequence can reverse the direction of hydration-induced axial deformation. These results provide support for the hypothesis that the response of full-length collagen to osmotic pressure is indeed controlled at the level of the local amino acid sequence.

Sequence design
With the goal of investigating the sequence-specific response of CMPs to changes in osmotic pressure, we utilized four different CMPs, each consisting of 10 tripeptide units. This CMP length ensures sufficient thermodynamic stability, 38,46,49,50,58,59 is known to yield well aligned triple helical structures and has frequently been used for crystal structure analysis. 11,13,16,[39][40][41][42][43]59 Using the canonical sequences (Pro-Pro-Gly) 10 and (Pro-Hyp-Gly) 10 , abbreviated as (PPG) 10 and (POG) 10 in the following, we first focused on the ability of the imino acids to participate in hydrogen bonds. The tripeptide units PPG and POG occur in human collagen I with a frequency of 1% and 9.6%, respectively (Figure S1). These two CMPs were complemented with the peptide (Hyp-Hyp-Gly) 10 , abbreviated as (OOG) 10 . The OOG tripeptide unit is not a natural building block of mammalian collagens and has frequently been used for investigating the role of Hyp in determining collagen structure and stability. [59][60][61][62][63] The structures of these three CMPs at low osmotic pressure have previously been determined with x-ray crystallography. 40,[42][43][44]59,61 They all form triple helices described by the so-called 7/2 model, with 7 amino acids per 2 turns of the helix (Figure 1). 64 Their crystal structures revealed that water decorates the triple helix in a highly ordered manner, forming well-defined first and second hydration shells. 40,43,44,59 A large number of water molecules interact directly with the main chain carbonyl (C=O) groups of Gly and Yaa and occupy identical positions in (PPG) 10 , (POG) 10 and (OOG) 10 ; however, also clear variations in the hydration patterns have been established for different CMPs. Specifically, hydroxyl groups of Hyp serve as anchoring points for the water network surrounding the triple helix while clathrate-like water structures assemble around hydrophobic Pro residues. 14,62 Several crystal structures have further revealed direct hydrogen bonds between neighboring triple helices, with their number and location being dependent on the specific structure of the CMP and the surrounding water network. 43,44,59 Interestingly, localized changes in hydration were found for host-guest CMPs, X-ray diffraction MD simulation 7 Figure 1. Scattering profile S(q) of (PPG) 10 in dry conditions. Shown is an experimentally obtained S(q) profile with the assignment of peaks 1-7 (RH ≈ 10%.; Π ≈ 320 MPa). Peak 1 with a Bragg distance of d 1 = 1.0 nm is related to the lateral center-tocenter distance a (top left inset). Peak 7 represents the helical rise per residue z p (molecular axial model on the right). Several of the peaks 2-6 originate from scattering units located in lattice planes that combine lateral and axial components. For the molecular dynamics (MD) simulation, an array of 8 × 8 (PPG) 7 triple helices was used. Considering the 7/2 model for CMPs, (PPG) 7 was repeated periodically in the axial direction to obtain infinitely long triple helices. From the simulation containing 0.4 water molecules per amino acid, discrete scattering line intensities were computed for each accessible set of Miller indices (h, k, l), as described in the section on "Materials and Methods." To account for differences in the hydration level, the q-axis associated with the simulated data was rescaled by a factor 1.041 such that the position of peak 1 matches the experimental value. For better comparison, the intensities were normalized with respect to peak 1. Calculated lines with the highest relative intensities can be assigned to the experimentally detected peaks (Table S1).
where specific Pro and Hyp residues were substituted with other amino acids. 17,19 The fourth CMP studied here contains the guest sequence Ala-Arg-Gly-Ser-Asp-Gly inserted between two (Pro-Pro-Gly) 4 host sequences, so that the full sequence is (Pro-Pro-Gly) 4 -Ala-Arg-Gly-Ser-Asp-Gly-(Pro-Pro-Gly) 4 . This specific guest sequence is found in the α2 chain of collagen I ( Figure S1) and is highly conserved among mammalian species. Using MD simulations, this sequence was predicted to be highly sensitive to osmotic pressure, displaying significant contraction upon dehydration. 27

Peak assignment of XRD scattering patterns
To quantify osmotic pressure-induced structural changes, we performed in situ XRD experiments. We typically started with a powder of microcrystalline CMPs, obtained by freeze drying, that was equilibrated at low relative humidities (RH < 5%). Subsequently, RH was slowly increased in a controlled fashion and the scattered x-ray intensity was collected on a 2D detector ( Figure S2A).
Typically, the scattering profiles consisted of concentric rings of azimuthally isotropic intensities. This confirms that the CMP powder samples consist of small, randomly oriented domains of coherently aligned triple helical structures. For (PPG) 10 measured in dry conditions (RH ≈ 10%; Π ≈ 320 MPa), the azimuthally integrated 2D scattering profile (I versus q plot; S(q)) exhibits several peaks (named peak 1 to 7, located at q 1 -q 7 ) ( Figure 1). Each peak contains information about the characteristic real-space distance correlations within the system that can be calculated as d n = 2π/q n . The measured distances indicate correlations over a larger range than only the very first neighbors. The integrated profile is not straightforward to interpret as the measured S(q) profile is not compatible with the diffraction pattern of highly hydrated crystal structures and no reference crystal structure is available for high osmotic pressures.
Based on earlier experimental work on dehydrated collagen fibers, 21,65 we initially consider a hexagonal arrangement of triple helices ( Figure 1). In this case, the S(q) profile carries information about the Bragg distance between triple helices d 1 (peak located at q 1 ≈ 6 nm −1 , corresponding to a distance d 1 ≈ 1.0 nm), which in turn yields the center-to-center distance a = d 1 /(√3/2). The S(q) profile further shows the helical rise per residue z p that describes the spacing between amino acids in the axial direction (peak located at q 7 ≈ 22 nm −1 , corresponding to a distance d 7 ≈ 0.285 nm; Figure 1). The positions of peaks 1 and 7 closely resemble those of natural collagen. 25,66 Most of the other characteristic peaks, located between q = 7 nm −1 and q = 15 nm −1 , cannot be directly related to a 2D hexagonal arrangement of objects. This becomes evident when following the relative peak shifts as a function of osmotic pressure (Figure 2). Upon hydration, all peaks shift to smaller values of q (except peak 7, which is related to z p ); however, the extent of this change in the relative Bragg dis- Figure 2b). Peaks related to the in-plane arrangement of the triple helices are expected to fall on the same normalized curve (i.e., the curve of d rel 1 ). As this is only the case for some peaks, we propose that the other diffraction peaks originate from lattice planes having components in both axial and lateral directions.
To validate this interpretation, complementary MD simulations were performed. The initial unit cell of the simulation contained 8 × 8 CMPs, each consisting of triple helix chains with 7 PPG tripeptide units ( Figure S3). The 64 triple helices were then placed into a triclinic simulation box with equal spacing in the x, y directions. The triple helices were periodically repeated in the z direction and the system was hydrated  Figure 2. Structural parameters of (PPG) 10 as a function of osmotic pressure. (a) Evolution of the S(q) profile of (PPG) 10 upon increasing RH from 10 to 90% (shown are steps of ~ 5% RH). With decreasing osmotic pressure, peak 1 shifts toward smaller values of q and peak 7 to larger q values, respectively. Peaks 2-6 also shift toward lower values of q while at the same time their intensity decreases. (b) Peak shifts as a function of RH. Shown is the relative change in Bragg distance. For peaks 3-6, the intensity loss does not permit the detection of all peak positions over the entire RH range. using 0.4 water molecules per amino acid (N W /N AA ), corresponding to a low RH in the experiment (see also supporting information for an estimate of N W /N AA in the XRD experiment). In the experiment, the coherent scattering domains are randomly oriented (powder averaging). In contrast, all triple helices are aligned in the simulations so that all three spatial directions can be distinguished. When comparing the experimental S(q) profiles with those obtained numerically from the MD simulations, it is thus possible to identify the relevant crystallographic planes associated with each experimental peak in terms of the three Miller indices ( Figure 1). For this assignment, the magnitude q of the scattering vector q associated with the strongest lines in the simulation-based structure factor was compared to the experimental peak positions ( Table S1). The result clearly confirms that the S(q) profile of (PPG) 10 indeed corresponds to a distorted hexagonal packing of triple helices (Figures S3-S4).
The comparison between XRD and MD results further shows that the distance d 1 , as well as d 3 and d 4 , are associated to planes parallel to the triple helix axis (i.e., l = 0). For these peaks, the increase of d rel n with hydration is thus expected to be identical, as is clearly observed for peaks 1 and 3 ( Figure 2b). Peak 4 does not meet this expectation. This may be explained with the overlap of peaks 4 and 5, which complicates the fitting of both peaks with increasing hydration. Distances d 2 , d 5 , and d 6 correspond to planes with Miller indices (0,8,6), (8,12,6) and (16,8,6). These planes are neither parallel nor perpendicular to the axial direction. As a result, the observed increase in d rel n is smaller than for the purely lateral peaks (Figure 2b). The experimental S(q) profiles of (POG) 10 and (OOG) 10 ( Figure S5) also exhibit peaks 2-6 although their intensities are lower. This is a strong indication that also these CMPs assume a similar distorted hexagonal arrangement at high osmotic pressures. The lower intensities of the peaks with high Miller indices indicate that (POG) 10 and (OOG) 10 are less ordered than (PPG) 10 .

Crystalline and partially crystalline molecular packing as a function of hydration
The osmotic pressure-dependent evolution of the lateral center-to-center distance was subsequently compared for the three different CMPs (PPG) 10 , (POG) 10 and (OOG) 10 (Figure 3a). The peptides exhibit different center-to-center distances in dry conditions. The values of a are 1.16 nm for (PPG) 10 , 1.21 nm for (POG) 10 and 1.26 nm for (OOG) 10 , respectively. As hydroxylation increases the side chain volume, it directly affects the effective triple helix diameter and, as a result, the center-to-center distance between triple helices. The effective packing depends on the optimal hydrogen bond arrangements between the helices, in which the side chains may play an important role. Interestingly, (PPG) 10 exhibits sharper Bragg peaks than (POG) 10 and (OOG) 10 (Figure S5). This indicates higher structural order in (PPG) 10 crystals as compared to the other two CMPs.
When increasing RH, the center-to-center distance increases almost monotonically for all CMPs (Figure 3a). This is consistent with the incorporation of water between the triple helices; however, the increment Δa (Δa = a >95% − a 10% ) merely reaches ~ 0.1-0.2 nm at the highest value of RH that could be reached experimentally. This is less than the diameter of a water molecule (0.3 nm). This behavior clearly illustrates that the triple helices cannot be simplified as hard cylinders. If the CMPs were hard cylinders, discrete swelling steps of the size of water molecules would be expected. Instead, a smooth and gradual increase in the center-to-center distance is observed upon water uptake. This suggests that the triple helices are soft, deformable and chemically heterogeneous structures that feature niches where water molecules can insert locally. The localized binding of water causes only small changes in the molecular arrangement and gradually expands the average lateral distance between CMPs. This interpretation is clearly corroborated when comparing the experimental data to MD simulations performed at two different hydration levels (0.4 and 0.8 N W /N AA ). Figure 4a shows the radial distribution function of water molecules around (PPG) n in cylindrical coordinates around the central axis of the triple helix (RDF axis ). In this representation, where the water distribution is averaged along the axial direction, it is seen that water molecules assume a rather continuous distribution between each triple helix and its direct neighbors, in line with the deformability and softness emphasized before. Upon increasing the hydration level, the positions of the maxima of the water distributions change only slightly, while the absolute RDF axis values increase proportionally with the amount of water. This reflects that the water layer expands only slightly in the equatorial direction. In fact, when increasing the hydration level from 0.4 to 0.8 N W /N AA , the lateral center-to-center distance increases only by δa = 0.09 nm.
On the other hand, the spherical radial distribution function of water around the oxygen atom of the carbonyl groups (RDF C=O ), shows sharp distinct maxima (Figure 4b). This demonstrates that water interacts preferentially with determined collagen sites (i.e., the exposed C=O groups), which are the only polar groups on the outer surface of (PPG) n . In fact, water forms a specific structure characterized by longrange order up to ~ 1 nm. When comparing RDF C=O at the two different hydration levels, an increase of the first peak at around 0.28 nm is observed. The increase of this peak, which corresponds to water molecules that are directly bound to the C=O groups, is smaller than the average increase seen at larger distances. This indicates that water preferentially binds within collagen niches at low hydration. With increasing hydration, more and more water molecules insert between triple helices. This localized binding can also be observed in the simulation snapshot in Figure 4b.
XRD experiments were also performed using CMP samples fully immersed in water, that is, at full hydration ( Figure 5; Figure S6). When comparing the S(q) profiles of (PPG) 10 obtained at full hydration with the S(q) profile of the sample with controlled RH, it is observed that the inner peak (peak 1, related to the center-to-center distance a) splits into four peaks when the CMPs have reached their highest possible level of hydration. This peak splitting originates from the rearrangement of the hexagonal packing into a new configuration with its own characteristic lateral distances. The new, hydrated configuration is nearly identical to previously reported crystal structures for (PPG) 10 . 40,42 Excellent overlap of scattering peaks is obtained when comparing the S(q) profile of our XRD results with the S(q) profile obtained when rotationally averaging single crystal data of the PDB entry 1a3j 40 ( Figure 5).
The configuration found for the crystal structure corresponds to a Penrose tiling (i.e., a combination of squares and triangles). For this configuration, the lateral center-to-center  10 (PPG) 10 (PPG) 4 -ARGSDG-(PPG) 4 (POG) 10 10 . Upon increasing RH from 10 to 90%, each of the three CMPs shows its own distinct response of the axial length to hydration. The data points in the blue shaded area correspond to the CMPs measured in bulk water (5 mg ml −1 ), that is, conditions where the peptides are fully hydrated. (c) Evolution of the lateral center-to-center distance a when exposing the host-guest CMP to decreasing osmotic pressure, that is, increasing RH from 5 to 64%. The data for (PPG) 10 are shown for comparison. When decreasing the osmotic pressure, both peptides show an increase in the lateral center-to-center distance. (d) Evolution of the helical rise per residue z p when exposing the host-guest CMP to decreasing osmotic pressure. In contrast to (PPG) 10 , the helical rise per residue increases for the host-guest CMP.
distance is ~ 1.4 nm between first neighbors and ~ 1.95 nm between second neighbors ( Figure 5). 40 For our experiments, this suggests that the triple helices are arranged in a distorted hexagonal configuration at low to intermediate hydration levels (at least up to N W /N AA = 1), while a new configuration with fivefold in-plane coordination is obtained at higher hydration levels. The triple helices thus do not reach infinite dilution when immersed in bulk water. Instead, they organize into a crystalline assembly with localized interactions between neighboring triple helices and a specific water network with N W /N AA ≈ 2 ( Figure 5). 40 The CMP (POG) 10 shows a similar behavior as (PPG) 10 . When immersed in bulk water, the scattering profile shows a reorganization into a configuration with a fourfold in-plane coordination of triple helices, as observed in the published crystal structure of (POG) 11 (PDB 1v6q; 43 Figure S6). This again confirms that the triple helices are not infinitely diluted in bulk water but reach a maximum hydration of N W /N AA ≈ 2. While the coordination changed from six to five nearest neighbors for (PPG) 10 , only four nearest neighbors are observed for (POG) 10 . For (OOG) 10 , the intensity profile in bulk water was not experimentally accessible, as the CMPs completely dissolved under these conditions. For the closely related CMPs (OOG) 10 and (GOO) 9 , also no consistent crystal structures have been obtained, suggesting that CMPs containing OOG tripeptide units interact only weakly. 59,61 The reduced overall strength of interhelix interactions is most likely caused by the higher number of hydrogen bond donors and acceptors even though we cannot fully exclude that it may be caused by the presence of 4R-hydroxy-2S-proline in the Xaa position of (OOG) 10 (instead of the naturally occurring 3R-hydroxy-2S-proline).
Upon increasing the hydration level to 2 N W /N AA , the CMPs (PPG) 10 and (POG) 10 appear to display common features as well as clear sequence-specific differences. In both cases, a transition from a hexagonal packing to more complex structures is observed; however, these structures are characterized by varying numbers of nearest neighbors. This indicates that the presence and number of hydroxyl groups in the side chains dictates the nature of interhelical interactions or allows the CMP to dissolve, as we observed for (OOG) 10 . For (PPG) 10 and (POG) 10 , it appears more energetically favorable to break only a few interhelix bonds so that large volumes open up for the local accommodation of water molecules. This is particularly relevant for understanding previously observed differences between the gap and overlap regions and within parts of the gap regions in natural collagen I. 9 Specifically, this observation may help to explain why the triple helices seem to show a hexagonal arrangement in the overlap regions while they lose this arrangement in some parts of the gap regions and assume a highly distorted configuration, where specific interhelical interactions are possibly maintained.

Effect of hydration on intramolecular hydrogen bonding
In addition to alterations in the lateral packing of triple helices, also osmotic pressure-dependent changes in molecular bonding and structure are expected. Vibrational spectroscopies allow for observing such changes in molecular conformation and interactions. In the case of CMPs, FTIR spectroscopy has been shown to provide information about the triple helix-stabilizing interchain hydrogen bond between the amine group (N-H) of Gly and the carbonyl (C=O) in the Xaa position (Figure 6) as well as about the interaction of the C=O groups of Gly and Yaa with solvent. 67,68 Here, FTIR spectroscopy was used to follow the molecular properties of the respective C=O and N-H bonds for (PPG) 10 , (POG) 10 and (OOG) 10 while gradually increasing RH in situ (Figure S2B). The FTIR spectra of these CMPs exhibit a characteristic band located between 1700-1600 cm −1 (Figure 6), which corresponds to the stretching vibration of C=O (ν C=O ; amide I). As reported earlier, 67,68 the components at 1670 cm −1 , 1645 cm −1 and 1629 cm −1 can be assigned to the different amino acids Xaa, Gly and Yaa, respectively. The band located between 1500 and 1580 cm −1 corresponds to the bending of the Gly N-H bond (δ N-H_Gly ; amide II) ( Figure 6). When increasing RH, all peaks related to ν C=O gradually shift to lower wavenumbers. At the same time, δ N-H_Gly shifts in the opposite direction.  Figure 6. Osmotic pressure effect on the backbone structure of the collagen-mimetic peptide (CMP) (PPG) 10 . A thin film of the CMP was measured with Fourier transform infrared spectroscopy while RH was increased in situ from 5 to 80%. Spectra with steps of ~ 5% RH are shown, zooming into amide I (ν C=O of Xaa, Gly and Yaa) and amide II (δ N-H of Gly) regions. A larger range of wavenumbers is shown in Figure S7. With decreasing osmotic pressure, the spectra show a shift of the amide I and amide II bands to lower and higher wavenumbers, respectively. For comparison, a spectrum of (PPG) 10 in ultrapure water (5 mg ml −1 ) is also shown. 67,68 For better visualization, the intensity values of this spectrum were multiplied by 250 and an offset of 1.5 was added. The 2D scattering pattern and the S(q) profile highlight peak 1 that is associated with the lateral center-to-center distance a between triple helices. This peak is consistent with a distorted hexagonal configuration where each triple helix has six nearest neighbors. (b) Lateral packing arrangement of triple helices immersed in bulk water at a concentration of 5 mg ml −1 . The 2D scattering pattern and the S(q) profile show the appearance of four new peaks. These peaks are consistent with the diffraction pattern calculated for the rotationally averaged crystal structure PDB 1a3j, 40 which displays a Penrose packing configuration. In this configuration, the number of nearest neighbors is five. The distances to the first (nearest) neighbors are shown as black lines, while the second neighbors are highlighted in red.
The shifts of ν C=O_Gly and ν C=O_Yaa probably arise from stronger and stronger hydrogen bonding of the C=O groups with the surrounding solvent. The C=O groups of Gly and Yaa are located on the triple helix surface and are thus exposed to more and more water molecules when the hydration level increases (as also indicated by the MD results). Interestingly, the engagement of these C=O groups in hydrogen bonding also affects the C=O group of Xaa. Being involved in the interchain hydrogen bond, the C=O group of Xaa does not directly interact with solvent. The observed shift of ν C=O_Xaa to lower wavenumbers thus indicates that the change in the chemical environment around the Gly and Yaa carbonyls affects the entire backbone of the molecule, causing the interchain hydrogen bond to become stronger with increasing hydration. This interpretation is supported by a shift of δ N-H_Gly toward higher wavenumbers. As the hydrogen bond becomes stronger, the proton of the N-H bond is more strongly attracted to the hydrogen bond acceptor. This makes the N-H bond harder to bend and δ N-H shifts to higher wavenumbers. The FTIR spectra for the CMPs (POG) 10 and (OOG) 10 show similar trends even though the ν C=O peaks are less well resolved ( Figure S7). Altogether, these results indicate that the interaction between the triple helix and water increases the strength of the interchain hydrogen bond between the C=O group of Xaa and the N-H bond of Gly for all three CMPs. As this hydrogen bond is oriented perpendicular to the triple helix axis, we conclude that the helical structure becomes more stable at higher hydration. The degree of this stabilization appears to be different for the three CMPs (PPG) 10 , (POG) 10 and (OOG) 10 ( Figure S7). In particular, a qualitative comparison of the extent of these spectral changes suggests that this stabilizing effect seems to be larger for (OOG) 10 . This may be related to a possible hydrogen bonding interaction between the Hyp residues in the Xaa and Yaa positions. 58 These observations, again, confirm a sequence-specific effect of the imposed osmotic pressure changes.

Effect of hydration on the axial length
The sequence-specific molecular response to osmotic pressure is not restricted to the change in the interchain hydrogen bond strength. In the following we show that also the characteristic triple helical structure undergoes significant changes, as witnessed by changes in the helical rise per residue z p upon hydration. For (PPG) 10 a small, gradual decrease in z p is observed (Figure 3b). In contrast, for (POG) 10 , z p first increases slightly until ~ 50% RH (Π ≈ 96 MPa) and then decreases strongly at higher humidities. Finally, (OOG) 10 exhibits a small, gradual increase in z p until ~ 50% RH. But then z p remains constant until the highest RH tested. When measured at full hydration (i.e., when immersed in bulk aqueous medium), the CMPs (PPG) 10 and (POG) 10 exhibit a further decrease in z p relative to the 90% RH value. (Figure 3b). This indicates a considerable contraction of the triple helix (up to 2.5%) in the low osmotic pressure range. This results in an overall contraction of (PPG) 10 and (POG) 10 when moving from the dry to the fully hydrated state. For (OOG) 10 , the helical rise per residue could not be determined in bulk water, as this CMP dissolved under these conditions. Overall, also for the helical rise per residue, it is evident that each sequence exhibits its own characteristic response to hydration.
Extrapolating to longer CMPs and natural collagen, a similar overall axial response is expected. On the one hand, the chains are axially staggered by one residue. As a result, maximally 9 out of 10 tripeptide units can fold into a triple helical structure while the peptide termini are disordered. On the other hand, close packing may partially stabilize and align the termini. Based on the available data, no clear conclusion can be drawn about the relative contribution of disordered termini on the obtained XRD pattern. Comparing our results to natural collagen I, the observation that the canonical sequences (PPG) 10 and (POG) 10 undergo an overall contraction upon hydration ( Figure 3b) is unexpected, considering that natural collagen I displays the opposite response and elongates upon hydration. As the canonical collagen tripeptide units PPG and POG are abundant in natural collagen I, this result strongly suggests that other sequences need to elongate in response to hydration to obtain the net elongation of full-length collagen I that has been observed in earlier studies of rat tail collagen and turkey leg tendon. 25,27

Effect of hydration on the properties of a host-guest peptide
To address this discrepancy, we consider one example of a host-guest peptide with the guest sequence Ala-Arg-Gly-Ser-Asp-Gly (ARGSDG). This motif, located in the gap region of collagen I ( Figure S1), has been predicted to be highly responsive to osmotic pressure. 27 To guarantee a stable triple helix configuration, the guest motif was inserted into a PPG host sequence to yield the host-guest CMP (PPG) 4 -ARGSDG-(PPG) 4 . For this host-guest CMP, the center-to-center distance a is larger than for (PPG) 10 (Figure 3c; Figure S8). This larger spacing between triple helices most likely originates from the large size of the Arg side chain that prevents closer packing of the triple helices. Upon hydration, the spacing between triple helices increases in a continuous fashion and with a larger slope than for the PPG host motif (Figure 3c). Importantly, the host-guest CMP elongates with increasing hydration (Figure 3d) and, therefore, shows the same osmotic pressure response as natural collagen I and as predicted in MD simulations. 27 Hence, the guest sequence seems to fundamentally alter the axial behavior of the entire peptide. This is remarkable, as (PPG) 10 and the host-guest CMP differ in only four out of thirty amino acids. In fact, considering the one-residue stagger of the individual chains in the triple helix, we assume that the guest sequence affects 3 out of 9 tripeptide units (i.e., ~ 30% of the overall sequence). Under this assumption, and considering that the host sequence behaves similar to (PPG) 10 , the net elongation of the host-guest CMP is the result of the superposition of the previously established small contraction of the PPG host sequence and a much larger elongation of the guest sequence. In a crude approximation, we estimate that the guest sequence contributes, in the range from 5 to 64% RH, with an elongation of 0.14 Å to the overall length change of the host-guest CMP. This means that the elongation of the guest sequence is 10 times larger than the contraction found for the host sequence.
Within the tested RH range, the computed elongation of 0.14 Å corresponds to a relative axial length change of ~ 5%, a value that roughly compares with the average elongation predicted for the gap region of collagen I using MD simulations (i.e., 10.7%). 27 Even though the trend points into the same direction, it needs to be noted that the absolute values of the length change should not be compared. The local structural environment of the guest sequence is different in a PPG host and in natural collagen I. Nevertheless, this result seems to corroborate the hypothesis that specific sequences respond differently to osmotic pressure. It may further indicate that the structural and molecular properties of sequences located in the gap and overlap regions respond differently and are tailored to control the local response of full-length collagen I to hydration. To further corroborate the specific role of sequences in the gap and overlap region, further experiments with a more diverse range of sequences will be needed, possibly considering the fact that collagen I is a heterotrimer.

Discussion and conclusion
This study reveals that osmotic pressure affects CMPs in a sequence-specific manner and that hydration-induced structural changes occur both at the level of individual triple helices and within higher-order assemblies. At the molecular level, increasing hydration causes a strengthening of interchain hydrogen bonding. Individual triple helices further show an osmotic pressure-dependent axial response. CMPs consisting of the abundant tripeptide units PPG and POG contract at high levels of hydration while elongation was observed for the host-guest peptide. Our results thus further support earlier findings that water is an essential part of the collagen molecule. 13,14,[16][17][18][19]27,43,62,69 The amino acid sequence determines the local interaction of the triple helix with water and, as a result, the axial response to hydration. Most interestingly, significant axial changes are observed in the near-physiological osmotic pressure range (< 1 MPa [70][71][72] ). Extrapolating to collagen fibers and collagen-based tissues, we conclude that the previously observed hydration-induced response 25,27 is thus indeed the sum of local and sequence-specific contraction and elongation.
The sequence-specific osmotic pressure response extends to higher-order assemblies. At low hydration levels, the CMPs (PPG) 10 , (POG) 10 and (OOG) 10 all show a distorted hexagonal configuration with sequence-specific center-tocenter distances between triple helices. Triple helix packing changes into a new CMP characteristic configuration when the triple helical CMPs are immersed in bulk water. This indicates that the amino acid sequence also determines direct and watermediated interactions between triple helices. The different packing configurations with a characteristic number of nearest neighbors probably reflects a subtle balance between the strength of direct interhelical bonds and the geometry and strength of water interactions. These findings bear interesting implications for natural systems. They suggest that the local structure and composition of collagen side chains determines the local packing within collagen fibrils. As a consequence, the sequence-specific reorganization of nearest neighbors may play an important regulatory role as the exposure of functional binding sites and channels for ions or mineralization precursors is likely to be altered. 9,23,26 Axial responses and the reconfiguration of triple helix packing also affect the mechanical properties of individual triple helices and higher-order assemblies. Hydration-induced contraction or elongation of molecules, fibrils and fibers primarily aids the maintenance of tissue pretension. At the same time, force propagation pathways through collagen assemblies may be altered, possibly even affecting localized bond rupture processes at high strains. 73 From a mechanical point of view, collagen has evolved to utilize different structural levels to gather energy from water chemical potential gradients and to efficiently convert it into mechanical energy. Unveiling the molecular basis of this behavior is of utmost importance for understanding the mechanical structure-function relationships of collagen. Molecular level insights will ultimately aid establishing the principles of tissue formation and homeostasis and provide new concepts for the design of bioinspired actuation systems.
Toward such applications, it will be highly interesting to investigate CMPs with a mixed content of PPG, POG and OOG tripeptide units. In particular, the analysis of CMPs with a controlled 3D arrangement and spacing of hydroxyl groups will shed light on possible synergistic effects between neighboring hydroxyl groups. It appears likely that other amino acids with hydrogen bond donating or accepting side chains can substitute for Hyp as an anchoring point for hydrogen bonds. In combination with amino acid size, these properties may eventually be used to control the water structure and thus the osmotic pressure response of CMPs. Experimental and/or simulation studies where individual or several neighboring amino acids are systematically varied will ultimately allow for establishing the relationship between amino acid sequence and lateral and axial osmotic pressure responses.

X-ray diffraction
XRD experiments were performed using lyophilized peptides as obtained from the suppliers. The lyophilized CMP powder (~ 0.1 mg) was placed into an open borosilicate glass capillary. The capillary was connected to a humidity generator (WET-SYS, Caluire, France) to provide air with controlled water content ( Figure S2A). During XRD measurements, a flow rate of 70 ml min −1 was used. The temperature and the relative humidity (RH) were monitored by a sensor (Sensirion, Staefa, Switzerland) that was placed close to the sample. RH was converted into Π, according to � = (RT /V w ) · ln(RH) , where R is the gas constant, T the temperature, and V w the molar volume of water (18 cm 3 mol −1 ).
Experiments were performed at the BESSY II synchrotron (Helmholtz-Zentrum Berlin, μ-Spot beamline). An x-ray beam of 100 μm in diameter and an energy of 15 keV (wavelength: 0.826 Å) was used. The capillary, connected to the humidity generator, was mounted horizontally on a motorized x, y, z sample stage. To reduce possible beam damage, the sample was moved 100 µm horizontally between each measurement (1 cm overall displacement perpendicular to the primary beam direction). The scattered intensity was recorded on a 2CCD (MarMosaic 225, Rayonix, Evanston, IL, USA) or an EIGERX (Dectris, Baden, Switzerland) x-ray detector (3072 × 3072 pixels and 3110 × 3269 pixels, respectively). The precise position of the beam center, of the sample-to-detector distance (around 310 mm), and of the detector tilt were determined using quartz as a standard. Azimuthal integration of the obtained 2D intensity patterns was performed using the DPDAK (Directly Programmable Data Analysis Kit) software. 74 The program provides the 1D intensity profiles, S(q), where q denotes the magnitude of the scattering vector. Peaks 1-7 were identified and their q-positions were determined using a home-written Python script. Peak 1 was fitted with a Lorentzian function, while Gaussian fits were used for peaks 2-7. Baseline correction was applied before fitting the respective peaks.

Attenuated total reflection (ATR)-Fourier transform infrared spectroscopy (FTIR)
To perform ATR-FTIR spectroscopy, a film of each CMP was prepared on the ATR crystal. The CMP powder was dissolved in ultrapure water to a concentration of 5 mg ml −1 . A drop of this solution (20 μl) was then spread on the ATR crystal and the water was slowly evaporated under gentle nitrogen flow. The sample was connected to a humidity generator, using a setup similar as for the XRD experiments ( Figure S2B). FTIR spectra were collected on a VERTEX 70 FTIR spectrometer (Bruker, Billerica, MA, USA) with 64 scans for each spectrum. The resolution was set to 4 cm −1 and a Blackman-Harris 3-Term apodization function and a Zero-filling factor of 4 was used. After obtaining a stable spectrum for the film equilibrated in dry conditions (< 10% RH), RH was gradually increased up to 80 percent.

Molecular dynamics simulations
The simulation box ( Figure S3) was built based on the crystal structure of (OOG) 10 (PDB 1wzb). 59 Starting from this structure, three triplets and all OH groups were removed to obtain (PPG) 7 . The CMP triple helices were periodically connected and a total number of 8 × 8 triple helices were assembled in the simulation box in an antiparallel fashion ( Figure S3). An antiparallel triple helix orientation is common in crystal structures of CMPs with free terminal NH 2 and COOH groups. 42 It allows for neutralizing the local charges in assemblies of tightly packed triple helices. During the simulations, the x, y box angle was fixed at 60° so that the triple helices were forced to maintain a hexagonal lattice structure ( Figure S3).
The GROMACS 2016 simulation package 75 was used. The peptide force field was AMBER 99 76 in the NpT ensemble, with a semi-isotropic Parrinello-Rahman barostat at 1 bar and a temperature of 300 K, using the velocity rescaling thermostat. 77 The water model was TIP4p 2005. Periodic boundary conditions were applied to avoid surface effects of triple helices and water. The Smooth Particle-Mesh Ewald method was used for calculating long-range electrostatics. The total simulation time was 50 ns with a 2 fs time step. The trajectories were analyzed using our own Python library (located at https:// gitlab. com/ netzl ab/ maicos) based on MDAnalysis. 78 The intensities S(q hkl ) of the discrete scattering lines for each accessible set of Miller indices (h, k, l) were determined by first "rectangularizing" the triclinic simulation box (Figure S3) and subsequently computing for all frames in the trajectory. Here, N is the number of atoms, r n the position of the n th atom, q =|q hkl | is the magnitude of the scattering vector, and f n (q) the atomic form factor based on numerical Hartree-Fock wave functions. 79,80 The line intensities represent the Fourier coefficients of the electron density distribution.