Choosing the right density for a concentrated protein system like gluten in a coarse-grained model

Large coarse-grained simulations are often conducted with an implicit solvent, which makes it hard to assess the water content of the sample and the effective concentration of the system. Here the number and the size of cavities and entanglements in the system, together with density profiles, are used to asses the homogeneity and interconnectedness of gluten. This is a continuation of an earlier article, "Viscoelastic properties of wheat gluten in a molecular dynamics study" (Mioduszewski and Cieplak 2021b). It turns out there is a wide range of densities (between 1 residue per cubic nanometer and 3 residues/nm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^3$$\end{document}3) where the system is interconnected, but not homogeneous: there are still large empty spaces, surrounded by an entangled protein network. Those findings should be of importance to any coarse-grained simulation of large protein systems. Supplementary Information The online version contains supplementary material available at 10.1007/s00249-023-01667-8.


Polyglutamine simulations
The main manuscript describes simulations of gluten.In the Supplementary Information a dierent system is presented: polyglutamine chains with chain lengths 20, 40 and 60 residues were simulated in order to study polyglutamine aggregation and droplet formation [1].The simulation box had periodic boundary conditions in all three dimensions and after the initial squeezing the size of the box was constant (no deformation occurred).All the data presented here was gathered in a 500 000 τ period after squeezing and equilibration of the system, which took longer (1 000 000 τ ).The number of chains is denoted as N .The total size of the system was 1800 residues (N = 90 chains of Q 20 , N = 45 chains of Q 40 or N = 30 chains of Q 60 ).

Clusterization
In the DSB model [2] the sidechain-sidechain interaction between glutamine residues is modelled by a Lennard-Jones potential with the minimum r 0 = 8.63 Å.If two residues are closer than r 0 , they are considered connected.A pair of protein chains is connected if at least one pair of their residues is connected.A cluster is a set of proteins that are connected: all the chains from the set are connected with at least one other chain from the set and with no chains from outside of the set.
If such a set cannot be divided into smaller sets with those properties, it is called a cluster.The number of chains in a cluster is called the cluster size.
We can plot probability distributions of cluster sizes.Cluster size 1 means a monomer, cluster size equal to N means that all chains are connected.An example of such distribution is shown on Fig. 1 from the article about polyglutamine simulations [1].An unimodal distribution with a single * E-mail: l.mioduszewski@uksw.edu.plmaximum around 1 means that most of the chains are monomers and the system is in the dilute regime.A bimodal distribution means that some chains are monomeric, and some are in bigger clusters (the intermediate regime).An unimodal distribution with a single maximum around N means that the system is connected and is in the dense regime (but we get no information about its homogeneity).In practice, determining whether the probability distribution is bimodal or unimodal requires making a histogram of cluster sizes (taking cluster sizes from simulation snapshots taken at time interval 5000 τ ).Fig. 1 shows the results of classifying such histograms made for the Q 20 system for dierent values of temperature T and density ρ.For higher temperatures, the thresholds between the dilute, intermediate and dense regimes are 0.75 nm −3 and 1.4 nm −3 , respectively.When the temperature is lowered, the system undergoes a phase transition into an amyloid glass phase

Percolation
For polyglutamine simulations percolation can be dened as a state where all protein chains are in one cluster.In one snapshot the system may be in that state, but in another snapshot taken at a dierent time it may be not.Thus we can calculate the probability P of a system being in the percolation state (P equal to 1 means that it is always the case).Fig. 2 shows plots of P (T, ρ).
The threshold between P ≈ 0 and P ≈ 1 denes the border between the dilute and dense regime.
For Q 20 it occurs for ρ ≈ 0.75 nm −3 , which corresponds to the dilute-intermediate threshold from the cluster size analysis (see Fig. 1).For Q 40 and Q 60 systems the border is not well dened: few large clusters bounce o each other, randomly and temporarily reaching the percolation state.
[1], and the intermediate regime covers a larger range of densities (which means that proteins form several big amyloid-like clusters).A few red dots for high densities indicate systems that formed two or three big clusters with dierent sizes.

Figure 1 :
Figure 1: Classication of cluster size histograms into bimodal and unimodal for the Q 20 system for dierent values of temperature T and density ρ.

Figure 2 :
Figure 2: Probability of percolation (whether all chains are connected, see the text) for Q 20 , Q 40 and Q 60 systems, as a function of the temperature T and the system density ρ.