## Abstract

Ion mobility mass spectrometry (IM/MS) can provide structural information on intact protein complexes. Such data, including connectivity and collision cross sections (CCS) of assemblies’ subunits, can in turn be used as a guide to produce representative super coarse-grained models. These models are constituted by ensembles of overlapping spheres, each representing a protein subunit. A model is considered plausible if the CCS and sphere-overlap levels of its subunits fall within predetermined confidence intervals. While the first is determined by experimental error, the latter is based on a statistical analysis on a range of protein dimers. Here, we first propose a new expression to describe the overlap between two spheres. Then we analyze the effect of specific overlap cutoff choices on the precision and accuracy of super coarse-grained models. Finally, we propose a method to determine overlap cutoff levels on a per-case scenario, based on collected CCS data, and show that it can be applied to the characterization of the assembly topology of symmetrical homo-multimers.

## Introduction

Most proteins assemble into complexes to achieve a specific biological function [1]. Atomic-level information about these complexes can provide precious insights into their mode of action. However, obtaining such high-resolution information is often technically challenging. In this context, integrative modeling approaches can be used to combine low-resolution experimental data on the complex with high-resolution structural information on its subunits, to build models rationalizing all observables [2].

Native ion mobility mass spectrometry (IM/MS) reports on the connectivity between protein subunits and allows deriving the collision cross section (CCS) of these, as well as their sub-complexes [3]. In recent years, efforts have been dedicated to exploit this data within integrative modeling protocols [4, 5]. Unfortunately, sometimes no atomic model of all the subunits of a complex is available. In this case, super coarse-grained models may be adopted, whereby every molecular subunit is represented by one (or a few more) large sphere [6, 7].

The orientation-averaged projected area of an object can be taken as an approximation of its CCS [8]. This approximation includes a hard-sphere contribution given by the radius of the buffer gas used as probe, while ignoring long-range interactions and multiple collisions with it. In the case of folded proteins, it has been shown that, upon scaling, this yields values in good agreement (3% error) with experimental CCS data [9, 10]. When the object under study is convex, its average projected area is equal to a quarter its surface [11]. As such, the radius *r* of a sphere having a CCS equal to the protein it represents, when probed in a drift cell filled with an inert gas having radius *r*_{gas}, can be calculated analytically:

The simplest scenario is that of modeling a protein dimer as two spheres using as a guide the CCS of the subunits and that of the resulting complex. Having defined the radius of the two representative spheres as per Eq. 1, the objective is to identify how much these should overlap (or co-penetrate) so that the CCS of the resulting complex has a minimal discrepancy from the experimental value. The overlap has been typically defined as the spheres’ center-to-center distance [6, 7]. However, two spheres would be effectively fully overlapping when the smallest is fully embedded in the largest (Fig. 1a). In this extreme case, the CCS of the complex will be equivalent to that of the largest sphere. Not representing this feature in the definition of sphere-overlap means that the same complex’s CCS will be associated to a range of overlap levels, the size of which will be proportional to the difference in radius between the two interacting spheres. This complicates the definition of an overlap cutoff criterion applicable to any pair of interacting spheres. Given a center-to-center distance *d* of two spheres with radii *r*_{1} and *r*_{2}, we suggest the following as a more suitable metric to define their overlap *O*:

It should be noted that, in the absence of substantial conformational changes upon binding, it will always be possible to find an overlapping arrangement of two spheres so that their combined CCS matches that of the complex they form.

To assess the relationship between spheres’ overlap and their associated CCS, we selected an ensemble of 1988 protein couples from the PiQSi database [12]. Of these, 241 were crystallized as dimers, whereas the rest were proteins being in contact within 526 crystal structures of larger assemblies. Using IMPACT [9], software numerically estimating the CCS of molecular structures using the projection approximation method, we calculated the CCS of each dimer, as well as that of their constituent subunits. Then, for each pair, we placed a sphere having radius as per Eq. 1 (with *r*_{gas} = 1 Å, representing helium) on the center of mass of each protein subunit and calculated their resulting overlap, hereon called *O*_{struct}. Such test has been already performed previously, on smaller datasets, to identify an overlap interval representative of most protein couples [6]. This led to proposing a confidence interval between 15 and 45% for sphere-overlap, usable to guide super coarse-grained integrative modeling protocols exploiting CCS data. Analyzing the average value of *O*_{struct} may however not be perfectly suited to this context. Indeed, integrative modeling protocols typically exploit an optimization engine to find an arrangement of protein subunits minimizing a scoring function usually including terms for the physics of molecular interactions (e.g., van der Waals, electrostatics), and assessments of models’ match against available experimental data. As such, optimizers will be naturally guided to the overlap level *O*_{best} associated to an arrangement of spheres having the smallest deviation from the target dimer CCS. Therefore, for each protein pair, we also tested a range of overlap levels (from 0 to 100%, with steps of 1%), assessing their error with respect of the known dimer CCS, and identifying the optimal overlap *O*_{best} for each of them. For this test, the CCS of each sphere dimer was calculated with IMPACT. The collected *O*_{struct} and *O*_{best} values were both Gaussian distributed and centered at 25.4 ± 16.2 and 22.6 ± 15.6%, respectively (Fig. 1b). Analyzing solely protein pairs generated for dimers, and pairs extracted from larger complexes, yielded similar results.

Any overlap confidence interval used to determine whether a sphere arrangement is suitable will be associated to a CCS error: the larger the interval, the broader the range of accepted CCS values. On the other hand, the wider this interval, the higher the likelihood of including within it the most suitable overlap level. For instance, defining the acceptable overlap interval as being within one standard deviation of *O*_{best} mean value, i.e., anything between 7.0 and 38.2%, is associated to a CCS error of ± 7.4%, and a likelihood of 73.7% of including *O*_{best} in this interval (Fig. 1c). Taken in the context of a modeling framework, this observation indicates there is a non-negligible likelihood for a constraint based on CCS and one based on the statistical distribution of overlaps to be inconsistent. It is therefore not advisable to use such an overlap restraint where CCS data is available.

Marklund has noted that the CCS of a complex can be derived from the CCS of the individual binding partners and their associated orientation-averaged occluded area [13]. Taken in the context of intersecting spheres, since the occluded area depends on sphere-overlap, sphere-overlap and CCS values are connected. Therefore, a suitable overlap confidence interval should be predictable on the basis of given CCS measurements. We observed that the ideal overlap percentage of two spheres is correlated to the ratio of the sum of subunits’ CCS and the complex CCS. Let two molecules, *M1* and *M2*, and *CCS*_{M1+M2} their CCS when in a complex. We define *CCS*_{ratio} as:

The relationship between *CCS*_{ratio} and best overlap *O*_{best} can be fitted with the following non-linear model (Fig. 1d):

Here, *CCS*_{ratio} is always greater than 1.09, i.e., the (numerically estimated) minimal possible value associated to spheres being just in contact. We note that this relationship is expected to hold only when treating the overlap of two convex objects. *CCS*_{M1}, *CCS*_{M2}, and *CCS*_{M1+M2} will all be subjected to a specific experimental error. Using error propagation, the error associated with *CCS*_{ratio} is:

We calculated *CCS*_{ratio} and *err*(*CCS*_{ratio}) for each protein pair in our benchmark dataset, supposing a generous experimental error of 3% on each CCS measure (larger than the typical experimental error [9, 14]). These values allowed us to define, for each protein pair, a custom overlap confidence interval, i.e., an overlap region consistent with data derived by ion mobility spectrometry. On average, the obtained intervals had a size (distance from minimum to maximum acceptable overlap) of 13.1%, i.e., less than half than what is typically considered when adopting the same, statistically determined, interval for all protein dimers. Furthermore, for all pairs, the predicted intervals included their specific *O*_{best} value. Within these intervals, CCS measurements had an average standard deviation of 3.5%. In summary, our data-driven method to define overlap restraints, hereafter called “adaptive cutoff,” is both more precise and accurate than the traditionally used constant cutoff (i.e., same for each case) based upon a statistical analysis of an ensemble of protein pairs.

We next tested the performance of these two alternative overlap distance restraints for the determination of a macromolecular assembly-specific topology. For this test, we selected three simple cases from the PiQSi database: two forming homo-hexameric circles, and one forming a homo-dodecameric octahedron (i.e., assemblies where all protein-protein interfaces are identical). For each of those, we assessed whether the correct assembly topology could be identified from a range of candidate symmetries (Fig. 2). For each candidate topology, we generated a range of assemblies with varying overlap level. An assembly model would be considered valid (i.e., a specific topology would explain the data) if it had a CCS error < 3%, and the overlap of its constituting spheres was within a designated confidence interval. When using the constant cutoff method, the octahedral topology could be correctly identified, for one of the two hexamers a false positive was obtained (both tetrahedron and circle were considered plausible) and for the other a false negative was produced (tetrahedron instead of circle). With our adaptive cutoff method, all three cases were instead unambiguously assigned to the correct topology. Using a CCS cutoff smaller than 3% would have increased the errors in the case of the statistics-based overlap, but not in the case of our adaptive method.

In conclusion, we suggest Eq. 2 to be a more suitable metric to define the overlap between two spheres representing super coarse-grained models of proteins. When information about the CCS of both spheres and their complex is available, our adaptive cutoff method should be used to define a suitable confidence interval for the overlap between two spheres, with the overlap defined as per Eq. 1. We note that, in case binding leads to conformational changes altering the CCS of the individual binding partners, the adaptive cutoff will impose a tighter or looser sphere-overlap level. When no information about the CCS of both spheres and their complex is available, the confidence interval should be instead defined on the basis of the constant cutoff criterion we determined by analyzing a large protein pair dataset. The mean overlap value we determined here is Gaussian distributed at 22.6 ± 15.6%. We have however observed that the identification of a protein assembly topology applying such a cutoff on spheres overlap is prone to both false negatives and false positives. Still, we should stress that our tests were simple cases based on symmetrical homo-multimers. It cannot be excluded that better performances may be observed when modeling larger hetero-multimers with no symmetry. Our data-driven adaptive cutoff led to accurate topology prediction in all test cases. This method suffers of two limitations: (1) it currently only applies to symmetrical homo-multimers and, (2) besides the CCS of a single building block and the whole complex, it also requires the CCS of both a monomer and a dimer. Nevertheless, we believe that our observations indicate that exploiting experiment-based overlap restraints for the characterization of protein assembly topologies is a promising route for substantially increasing super coarse-grained models’ accuracy.

## Change history

### 11 February 2019

In this issue, the citation information on the opening page of each article PDF is incorrect. It should read ?Journal of the American Society of Mass Spectrometry (2019)?,? not ?Journal of the American Society of Mass Spectrometry (2018)...?

### 11 February 2019

In this issue, the citation information on the opening page of each article PDF is incorrect. It should read ���Journal of the American Society of Mass Spectrometry (2019)���,��� not ���Journal of the American Society of Mass Spectrometry (2018)...���

### 11 February 2019

In this issue, the citation information on the opening page of each article PDF is incorrect. It should read ���Journal of the American Society of Mass Spectrometry (2019)���,��� not ���Journal of the American Society of Mass Spectrometry (2018)...���

## References

- 1.
Gavin, A.-C., Bösche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.-M., Cruciat, C.-M., Remor, M., Höfert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.-A., Copley, R.R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., Superti-Furga, G.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature.

**415**, 141–147 (2002) - 2.
Joseph, A.P., Polles, G., Alber, F., Topf, M.: Integrative modelling of cellular assemblies. Curr. Opin. Struct. Biol.

**46**, 102–109 (2017) - 3.
Ruotolo, B.T., Benesch, J.L.P., Sandercock, A.M., Hyung, S.-J., Robinson, C.: V: ion mobility–mass spectrometry analysis of large protein complexes. Nat. Protoc.

**3**, 1139–1152 (2008) - 4.
Baldwin, A.J., Lioe, H., Hilton, G.R., Baker, L.A., Rubinstein, J.L., Kay, L.E., Benesch, J.L.P.: The polydispersity of αb-crystallin is rationalized by an interconverting polyhedral architecture. Structure.

**19**, 1855–1863 (2011) - 5.
Politis, A., Stengel, F., Hall, Z., Hernández, H., Leitner, A., Walzthoeni, T., Robinson, C.V., Aebersold, R.: A mass spectrometry-based hybrid method for structural modeling of protein complexes. Nat. Methods.

**11**, 403–406 (2014) - 6.
Hall, Z., Politis, A., Robinson, C.V.: Structural modeling of heteromeric protein complexes from disassembly pathways and ion mobility-mass spectrometry. Structure.

**20**, 1596–1609 (2012) - 7.
Eschweiler, J.D., Frank, A.T., Ruotolo, B.T.: Coming to grips with ambiguity: ion mobility-mass spectrometry for protein quaternary structure assignment. J. Am. Soc. Mass Spectrom.

**28**, 1991–2000 (2017) - 8.
Mack, E.: Average cross-sectional areas of molecules by gaseous diffusion methods. J. Am. Chem. Soc.

**47**, 2468–2482 (1925) - 9.
Marklund, E.G., Degiacomi, M.T., Robinson, C.V., Baldwin, A.J., Benesch, J.L.P.: Collision cross sections for structural proteomics. Structure.

**23**, 791–799 (2015) - 10.
Benesch, J.L.P., Ruotolo, B.T.: Mass spectrometry: come of age for structural and dynamical biology. Curr. Opin. Struct. Biol.

**21**, 641–649 (2011) - 11.
Vouk, V.: Projected area of convex bodies. Nature.

**162**, 330–331 (1948) - 12.
Levy, E.D.: PiQSi: protein quaternary structure investigation. Structure.

**15**, 1364–1367 (2007) - 13.
Marklund, E.G.: Molecular self-occlusion as a means for accelerating collision cross-section calculations. Int. J. Mass Spectrom.

**386**, 54–55 (2015) - 14.
Zhong, Y., Hyung, S.-J., Ruotolo, B.T.: Characterizing the resolution and accuracy of a second-generation traveling-wave ion mobility separator for biomolecular ions. Analyst.

**136**, 3534 (2011)

## Acknowledgements

We thank Lucas Rudden, Justin Benesch, and Valentina Erastova for critically reviewing this manuscript.

## Funding

This work was supported by the Engineering and Physical Sciences Research Council (grant EP/P016499/1).

## Author information

### Affiliations

### Corresponding author

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

### Cite this article

Degiacomi, M.T. On the Effect of Sphere-Overlap on Super Coarse-Grained Models of Protein Assemblies.
*J. Am. Soc. Mass Spectrom.* **30, **113–117 (2019). https://doi.org/10.1007/s13361-018-1974-2

Received:

Revised:

Accepted:

Published:

Issue Date:

### Keywords

- Molecular modeling
- Protein assembly
- Native mass spectrometry
- Ion mobility, super coarse-grain