Introduction

Intrinsically disordered proteins (IDPs) are an important and widespread class of proteins [13]. They lack ordered three-dimensional structure, being at the same time biologically active. They populate dynamic ensembles of coexisting and interconverting conformations, rich in random coil but also containing regions or nuclei of ordered secondary structure and variable degrees of global compactness [4]. IDPs can fold partially or completely into ordered structures upon interactions with ligands, such as metal ions, small molecules, partner proteins, membranes, or nucleic acids [5]. They typically have unrestrained interactions and can acquire different conformations depending on the bound partner. Large disordered regions can persist in the bound state, giving rise to structurally ‘fuzzy’ functional complexes [6]. Natural evolution has selected for the extreme plasticity of these macromolecules, altering the balance of order- and disorder-promoting amino acids relative to globular proteins. As a result, IDP sequences are depleted of hydrophobic residues and enriched in polar and charged residues, and cannot organize a proper hydrophobic core [7]. These peculiar structural properties are functional to crucial regulatory roles, such as signal transduction and transcriptional regulation, resulting in a high frequency of IDPs among tumor-related proteins. IDPs are also over-represented among amyloid proteins, which cause significant diseases such as Parkinson’s and Alzheimer’s. Thus, IDPs represent relevant potential targets for the treatment of cancer and neurodegeneration [8]. It is predicted that 25% to 30% of eukaryotic proteins are disordered and more than 50% contain at least one disordered region longer than 30 amino acids [9, 10].

Understanding molecular recognition by IDPs and developing potential inhibitors to modify their function require structural characterization of the ensembles populated by these systems in solution. In spite of the development of structural proteomics in the last decade, structural characterization of IDPs is still facing major technical challenges that are slowing down progress in the field. “Native” mass spectrometry (MS) represents a very powerful tool for structural proteomics, combining the unique analytical power and high-throughput potential of MS with structural investigation [1114]. The charge state distributions (CSDs) observed by electrospray ionization (ESI) are strongly affected by global protein compactness at the moment of transfer from solution to gas phase. Conformationally heterogeneous samples give rise to multimodal distributions, allowing detection of individual, coexisting components (Figure 1). This kind of analysis offers a very valuable tool for low-resolution structural characterization of molecular ensembles. Therefore, applicability of such an approach to the characterization of structural disorder is a very attractive perspective.

Figure 1
figure 1

Comparison of CSDs for globular and disordered proteins. Nano-ESI-MS spectra of 10 μM globular proteins (a)-(c) or IDPs (d)-(f) under non-denaturing conditions (10 mM ammonium acetate pH 7). (a) chicken-egg lysozyme; (b) bovine β-lactoglobulin; (c) human transferrin; (d) Sic1-KID from Saccharomyces cerevisiae (residues 215-284); (e) human stathmin-4; (f) murine ataxin-3 (residues 1-291)

It has been shown that ESI-MS data can deliver quantitative information on the solvent accessible surface area (SASA) of molecular structures [1517]. Several studies have pointed out a power-law correlation between average charge state and SASA from experimental structures of folded globular proteins, which leads to linear log(Z av ) versus log(SASA) plots. The original equation [18]

$$ \mathrm{y}=0.69\mathrm{x}\hbox{--} 4.08 $$
(1)

has been updated to [17]

$$ \mathrm{y}=0.59\mathrm{x}\hbox{--} 3.12 $$
(2)

and [16]

$$ \mathrm{y}=0.60\mathrm{x}\hbox{--} 3.28 $$
(3)

on increasingly large datasets. A very similar relation has been shown to hold for unfolded proteins, using estimates of the average SASA from simulated conformational ensembles [15, 16, 19]. The equation [16]

$$ \mathrm{y}=0.91\mathrm{x}\hbox{--} 6.01 $$
(4)

fits the data obtained with the high-charge components of chemically denatured proteins or IDPs under nondenaturing conditions [16]. If, instead, Z av is plotted as a function of mass, very different curves are obtained for folded or unfolded proteins [15]. IDPs lie in between, with the lowest-charge component approaching the reference line of globular proteins and the highest-charge component approaching the line of fully denatured proteins. Therefore, the ionization pattern under nondenaturing conditions offers a fingerprint useful for IDP identification [15].

Alternative ESI mechanisms have been put forward for folded and unfolded proteins. The similarity between experimental protein charge and Rayleigh-limit charge of a corresponding water droplet, for compact protein structures, is generally taken as evidence in favor of the charged-residue model (CRM) [20, 21]. Extended protein structures, instead, are thought to follow a chain-ejection process, based on a lower propensity to form salt adducts [22]. Thus, IDPs ensembles would reflect both mechanisms, similar to a globular protein under partially destabilizing conditions with coexisting folded and unfolded components. However, many factors affect protein behavior during electrospray, and discriminating alternative ESI mechanisms is not straightforward [23, 24]. Furthermore, protein ionization should not be interpreted merely on the basis of droplet charge. For instance, the solvent surface tension does not have the effect predicted by the Rayleigh equation [25], whereas gas-phase basicity and its dependence on protein conformation has been shown to play an important role [16, 2629].

The vanishing solvent conditions of electrospray definitely create a puzzling environment for structural investigation. Globular proteins seem to rearrange slightly upon desolvation by folding side chains against the backbone scaffold and increasing the number of intramolecular hydrogen bonds and salt bridges [27, 28, 30]. The same driving force maximizing self-solvation (i.e., intramolecular interactions) in the gas phase could affect the flexible polypeptide chains of IDPs more dramatically, potentially altering conformational properties under electrospray conditions. Furthermore, electrostatic unfolding due to Coulomb repulsions could lead to extended conformations for protein molecules with high net charge [31]. However, it is important to discriminate between gas-phase and ESI droplets environments. Gas phase conformation affects IM results, whereas protein conformation inside the ESI droplets (the so-called intermediate regime [32]) affects CSDs in ESI-MS spectra. At each of these two levels, there is an urgent need to assess reliability of MS data for IDP conformational studies [3335]. The present discussion is limited to CSD analysis and protein conformation in the intermediate regime, and does not deal with comparison between solution and gas-phase structures as depicted by IM-MS. An attempt is made to summarize the available literature evidence concerned with characterization of IDP conformational ensembles by ESI-MS, solution methods, and computational simulations.

Evidence of Consistency

Several papers report ESI-MS data in agreement with solution or computational results in describing IDP conformational ensembles or transitions triggered by environmental conditions. One of the most intensively investigated IDP is α-synuclein (AS), the amyloid protein involved in Parkinson’s disease and other neurodegenerative conditions. Solution spectroscopy [3641], single-molecule approaches [13, 4244], and small angle X-ray scattering (SAXS) or small angle neutron scattering (SANS) analyses with ensemble optimization modeling (EOM) [45, 46] indicate that the protein in solution, in the absence of lipids, is fully disordered but also visits compact conformational states. These features seem to be reflected by CSDs, which are consistent with the presence of a prevalent component corresponding to an extended protein conformation with minor components representing compact or partially collapsed states of AS [33, 42, 47, 48]. These ensembles can be modulated by alcohols and ligands, with good agreement between ESI-MS and solution methods [42, 47, 4951]. Furthermore, AS is known to collapse to a compact state at acidic pH [52] and, again, a consistent transition is observed by ESI-MS as a function of the pH of the original solution [33, 47].

In the case of the regulatory protein prothymosin α, NMR data show that Zn2+ binding increases helical propensity and induces a transition to compact conformations [53]. Consistently, ESI-MS shows a sharply bimodal CSD and preferential binding of Zn2+ to the low-charge component. The relative intensity of the two peak envelopes do not appear to change significantly. However, if the signals of the free and metal-bound protein for each charge state were added together, it would be evident that the compact component at least doubles upon metal addition. Moreover, the protein–metal complexes in this specific case could be affected, not only by the usual in-source dissociation but also by the dialysis step performed before electrospray. Most importantly, comparison with Ca2+ and Mg2+ adducts indicates strong specificity of Zn2+ binding by prothymosin α [53]. The enzymatic IDP UreG from Bacillus pasteurii displays strong predominance of a compact conformation over minor components of partially collapsed and fully disordered conformations, as assessed by ESI-MS [54], in agreement with predominance of a molten globule over pre-molten globule and random coil states identified by solution spectroscopy methods and differential scanning calorimetry [55]. UreG offers another example of metal-specific conformational change, as detectable by ESI-MS, with Zn2+ promoting protein compaction in agreement with NMR results, while Ni2+ leaving the CSD almost unchanged [54]. Analogously, Cd2+ shifts metallothionein-2A CSD towards lower charge states, although within a unimodal peak envelope [56]. Binding of 7 Cd2+ ions leads to a reduction in average charge state from 4.8 to 4.3, which implies a ~10% decrease in SASA [16]. Such an effect could be compared with the results of molecular dynamics simulations reported in the same study. Although the collisional cross-section (CCS) profiles of apo- and Cd7-protein in water seem to converge to quite similar values, it would be interesting to compute the average SASA of the simulated structures in solution for comparison with ESI-MS results. A structural rearrangement of metallothionein-2A induced by Cd2+ is consistent with successful crystallization of the metalated form, as opposed to the apo-protein [56].

Unimodal CSDs have been observed also for other IDPs, not necessarily of low molecular weight. Examples are milk bovine αs-casein (data not shown), and the cyclin-dependent protein kinase inhibitor Sic1 from Saccharomyces cerevisiae [57]. Sic1 isolated N- and C-terminal fragments, instead, show bimodal distributions [5860]. The C-terminal fragment (kinase inhibitor domain, KID) shows a more prominent contribution of the low-charge component, compared with the N-terminal fragment, consistent with partial-proteolysis and CD results [57, 59, 60]. Molecular-dynamics simulations and energy-landscape analysis identify compact conformations for the C-terminal fragment, likely representing metastable states in solution and the SASA of which is in good agreement with the value inferred by ESI-MS (Figure 2) [61]. Furthermore, the low-charge component of Sic1 C-terminal fragment is selectively lost upon acidification, with a steep transition consistent with destabilization of compact conformations [58]. This is in agreement with predominance of polar interactions in the intramolecular networks revealed by computational models [61]. Interestingly, the pH dependence of Sic1 and AS are opposite to each other, with acidification leading to average charge reduction for AS [42, 47] and increase for Sic1 [58, 61]. Analogously, Sic1 and other IDPs display very different effects of organic solvents on CSDs. For instance, methanol does not affect the CSD of Sic1-KID [61], while promoting lower charge states for AS [42]. Again, these differences are consistent with CD results [42] and with the different nature of intramolecular interactions that are thought to stabilize the compact conformations of these proteins [61].

Figure 2
figure 2

Conformational analysis of yeast Sic1. (a) Nano-ESI-MS spectra of 10 μM Sic1-KID under nondenaturing conditions (50 mM ammonium acetate pH 6.5). (b) Free-energy landscape representation calculated using the first two principal components (PC) as reaction coordinates. The major basins are labeled by upper case letters (A to L). The free energy is given in kJ/mol and indicated by the color code shown on the figure. The average structure identified for each conformational basin by structural clustering is represented as a cartoon. The structural models are highlighted by a color gradient, from N-terminus (cyan) to C-terminus (yellow). The range of SASA values is reported, as inferred by three independent ESI-MS measurements and by the empiric charge-to-SASA relation for globular proteins (Equation 3) (a), or as calculated on the compact structural models identified in the simulated molecular ensemble (b). Modified from reference [61]

The disordered, monomeric, N-terminal domain of p53 (Np53) displays a broad CSD that shifts towards lower charge states upon addition of the ligand RITA (2,5-bis(5-hydroxymethyl-2-thienyl) furan). Although the complex could not be detected, the shift in the CSD has been interpreted by an induced conformational change, in agreement with the slight decrease in disordered secondary structure observed by CD spectroscopy [62]. Such a mechanism would imply a memory effect of CSDs in the presence of in-source ligand dissociation, in agreement with other reports [63, 64].

Analogously, the disordered N-terminal domain of the human prion protein shows broad CSDs with either unimodal or bimodal profiles, depending on the sequence fragment and without apparent correlation with fragment length [65]. Simulation results on fragments of the homologous mouse protein are consistent with a large ensemble of conformations containing transient secondary-structure elements and adjacent regions of different disorder propensities [66]. For all the four tested fragments of the human protein, copper binding induces a shift of the CSD towards lower charge states and loss of the bimodal profiles [65].

Evidence of Discrepancy

The examples discussed above suggest that IDP CSDs obtained by nano-ESI are related to the conformational properties of the protein in solution. On the other hand, discrepancies between ESI-MS and solution methods have been reported for several systems. The acyl carrier protein (ACP) from Vibrio harveyi, for instance, displays ESI-MS spectra mostly insensitive to its solution conformation [67], although a minor effect of myristoylation is detectable by nano-ESI-MS in negative-ion mode, as expected, based on NMR experiments [67]. Interestingly, dramatic effects on the CSD are induced by cyclization of the L46W mutant, with narrowing and shifting towards lower charge states. Altogether these results are consistent with a highly labile intrinsic structure of apo-ACP, which readily unfolds during electrospray but can be stabilized by cyclization, as also indicated by CD [68].

A somehow opposite behavior is displayed by the apolipoprotein C‐II (ApoC‐II), which shows a compact conformation by ESI-MS regardless solvent conditions, including those in which hydrogen exchange experiments suggest an extended conformation in solution [33]. The authors speculate that this anomalous behavior might be due to a peculiar mechanism of ion production, intermediate between CRM and CEM, followed by proteins containing highly segregated compact and extended regions [33].

Modeling the fuzzy complex between the viral disordered domain Ntail with the three-helix bundle PXD yields SASA values consistent with the high-charge component and fails to provide structural models for the low-charge component. It should be pointed out that calculation of the free-energy landscape of a disordered assembly is very demanding and that no exhaustive representation of the conformational ensemble was reached by the computational simulations performed on this system [69]. Nonetheless, SAXS or NMR, too, failed to detect compact conformations that might help rationalizing the presence of a low-charge component [7072].

Analogously, p53 and its complexes with a small DNA molecule reveal significantly lower charge states than predicted by structural models based on SAXS and NMR data [73] (Figure 3). Either the full-length protein, the isolated globular domains, or their combinations with disordered regions systematically display lower Z av than predicted by the experimental structures, with measured values ~75% of the calculated ones for globular domains and ~68% in the presence of disordered regions (Table 1). The best agreement is found for the tetrameric DNA binding domain in complex with DNA (~82%), whereas the largest discrepancy is observed for the tetramer of the full-length protein (~64%). Ion mobility, instead, reveals good agreement between predicted and experimental CCSs in the case of folded domains, whereas dramatic discrepancies are found for tetramers including disordered regions. Simulating structural rearrangements in vacuo by molecular dynamics yields Z av and CCS values in good agreement with the experimental ones for the tetrameric construct containing the DNA-binding and the tetramerization domains linked by a disordered region, with and without DNA [73]. These findings suggest that IDP propensity to undergo conformational collapse in the gas phase affects not only IM data but also the conformational ensemble in the intermediate regime [32]. In contrast, however, solution crosslinking experiments can be explained only by considering a structure of the p53 tetramer more compact than suggested by the SAXS data, showing good agreement with CCS values derived by IM-MS [74].

Figure 3
figure 3

Schematic representation of the p53 protein and mass spectra of individual domains and constructs. (a) Tetramerization domain (T); (b) DNA-binding core domain (C); (c) C in the presence of DNA (26–nucleotide p53 response element); (d) DNA-binding and tetramerization domains linked by the disordered region (CT); (e) CT in the presence of DNA; (f) full-length protein (FL). The structural models of the constructs are reported in each panel. Reprinted from reference [73] with permission of John Wiley and Sons

Table 1 Structural Parameters for p53 Constructs and their Complexes with DNA. The SASA and Z av Values Predicted by the Structural Models and the Experimental Z av Values are Reported. Reprinted from Reference [73] with Permission of John Wiley and Sons

A set of seven IDPs has been analyzed in parallel by ESI-MS and a combined SAXS-EOM strategy [75]. Although the CSDs are multimodal, the optimized ensembles derived by the simulations display unimodal size distributions, consistent with the intermediate components of the CSDs. Such a discrepancy raises the issue of whether the SAXS-EOM approach fails to resolve some components of the ensemble (e.g., because of too low abundance, or spatial and temporal averaging), or the multimodal CSDs are artifacts of the intermediate regime of the ESI process. It is remarkable that the compact states detected by CSDs can represent a very little fraction of the total population as, for instance, for the ERD10 protein, which exhibits ~1% compact and ~10% intermediate components [75].

Further studies will be needed to elucidate this point. In particular, it would be important to show whether in-silico ensembles built according to the subpopulations of the ESI-MS spectra (i.e., with the relative abundancies and SASA values taken from the CSDs) could be discriminated by simulated SAXS-EOM procedures from more homogeneous ensembles containing only the predominant component. A control of this kind is performed by the authors [75] using, however, higher % and higher compactness of the collapsed component than indicated by the ESI-MS data. Furthermore, the kinetics of interconversion among different conformations relative to the time scale of the experiment should be considered. These issues should be explored in detail for each considered system, since they can vary significantly from protein to protein. It is interesting to note that the studies published so far, allowing comparison between SAXS-EOM and native-MS data, would support opposite conclusions [42, 45, 47, 75].

In the latter study, the Z av of the SAXS-based simulated ensembles approximates the Z av of the intermediate component of the experimental CSDs. Based on this finding, the authors suggest that the higher- and lower-charge subpopulations in the ESI-MS spectra reflect an expansion of the conformational space experienced by IDPs under electrospray conditions. It should be pointed out that, in order to affect protein ionization, such changes should take place in the intermediate regime, and it is not enough to invoke structural rearrangements in the gas phase.

Conclusions

Whether MS-derived species distributions are bona-fide descriptors of the solution conformational ensemble remains an object of debate [12, 76]. It should be pointed out that growing evidence from solution methods indicates that equilibrium between compact and extended states in IDP conformational ensembles seems to be more the rule than the exception. Thus, a possible parallelism between such a structural heterogeneity and the widespread, but still highly protein-specific, multimodal profiles of CSDs in ESI-MS is worth further investigation. It would be interesting, for instance, to analyze by ESI-MS IDPs, the heterogeneous solution ensembles of which have been characterized in detail by other techniques such as osteopontin [77, 78] and the N-terminal region of vesicular stomatitis virus phosphoprotein [79]. Analogously, documentation on ligand-specific effects would have a big relevance discriminating real conformational changes from ESI artifacts [80]. Finally, advances in structural characterization and computational modeling of IDPs in solution will generate data useful at testing the SASA values inferred from ESI-MS [81, 82].