Background

A wide variety of signaling pathways form a vast communication network that regulates basic cell activities and coordinates biological functions within an organism. Some signaling pathways even allow chemical communication between different organisms or species. Cells communicate with each other to coordinate a variety of functions between cells, tissues, and organs, and all cells must respond to environmental cues. Many mechanisms have evolved to transmit these signals. Extra-cellular signaling involves releasing or displaying any of a variety of chemicals, which are specifically recognized by a receiving cell that then activates an intracellular signal transduction pathway to respond to the signaling event. These events typically rely upon a series of protein–ligand and protein–protein interactions. Such signals must effectively propagate across long distances and even across barriers, such as the cell membrane. Each step in the signaling pathway must be highly specific, both to preserve the integrity of the signaling pathway’s ability to generate a reliable response, and to ensure that the pathway is not erroneously activated. Because a cell’s response to a signal must be transitory, each of these molecular interactions must be reversible. Ligand recognition must both augment and stabilize the response to the signal. These requirements of cell signaling impose unique, and often conflicting requirements on the proteins that constitute the signaling pathways, such as signal integrity versus crosstalk. These apparent conflicts extend to the individual interactions that propagate the signal, an apparent conflict between specificity (large interaction interface) versus reversibility (low free energy of interaction).

These conflicting needs have been resolved to a large extent by including intrinsically disordered proteins in cell signaling pathways through biological evolution. Intrinsically disordered proteins (IDPs) or intrinsically disordered regions (IDRs) of proteins fail to fold into stable, defined structures as free monomers. Numerous functional advantages of intrinsic disorder are outlined in dedicated studies (e.g., see [1,2,3,4,5,6,7]. Many IDPs/IDRs are capable of at least partial ordering upon interaction with specific partners [7,8,9,10,11,12,13,14,15,16,17,18]. Upon binding of an IDP/IDR to a signaling partner, the free energy required to bring about the disorder-to-order transition subtracts from the interfacial, contact-free energy, resulting in a highly specific interaction that can be combined with a low net free energy of association [3, 18]. Thus binding-induced folding decouples binding affinity from specificity, enabling cell signaling to be reversible. Some IDPs/IDRs may remain unfolded and dynamic even in the bound state [19,20,21,22,23], creating fuzzy complexes (Fig. 1) [24, 25]. An extreme example of this behavior is provided by two highly disordered human proteins, histone H1 and its nuclear chaperone prothymosin-α, which form a picomolar affinity complex, but in which they preserve completely their structural disorder, their long-range flexibility, and their highly dynamic character [19]. Whether or not stable structure results from the interaction, ligand or protein interactions shift the conformational ensemble of the IDP/IDR, linking protein function to binding. Because some signaling IDPs/IDRs lack stable (secondary) structure, the energetic barriers between the bound and free states are low, allowing disordered regions to act as reversible, extremely sensitive sensors. In addition to chemical signals, environmental conditions may also instigate signaling pathways that detect mechanical stress, light, pH, or redox potential [26]. Once the signal is received, the response can be propagated over hundreds of nanometers from the cell membrane to the nucleus, a feat that could dilute the signal. The low energetic barriers that characterize the transition between active and inactive states in intrinsically disordered proteins help shift the equilibrium toward the active state [27, 28]. When protein interaction sites are located within intrinsically disordered regions, the protein associations required to propagate cell signaling pathways are significantly accelerated [29]. Furthermore, the protein–protein interactions that propagate the intracellular signal often allosterically trigger post-translational modifications (PTMs) [30]. The combination of allosteric regulation with a catalytic output (e.g. kinase activity) can also amplify the response to the signal, ensuring it successfully reaches the nucleus [31]. Indeed, the presence of intrinsically disordered regions increases the potential for allosteric regulation [32, 33]. Finally, disordered proteins provide many avenues for integrating multiple signaling pathways [27], including providing a scaffold that binds proteins from multiple pathways [34], regulating multiple disordered substrates through PTMs, and varying pathway components through alternative mRNA splicing [35,36,37]. This combination of regulatory and environmental factors sometimes modulates protein behavior in a rheostat-like manner [30].

Fig. 1
figure 1

When binding a partner protein, intrinsically disordered regions can adopt multiple unstructured or structurally ambiguous topologies to form fuzzy complexes (A), fold to create stable secondary/tertiary structure (B) or adopt an unstructured yet static conformation (C)

The combined use of intrinsic disorder, alternative splicing and PTM widely enables the context-dependent orchestration of signaling in developmental biology and other complex processes [38, 39]. The mRNA involved in alternative splicing shows a strong preference to code for disorder rather than for structure (with structured-domain-encoding exon skipping being the main exception to this preference); adding and deleting protein segments is much less disruptive in IDRs than in structured regions [40]. Likewise, phosphorylation [41] and many but not all other PTMs [42] show a strong preference to be localized in IDRs, likely because flexibility enhances the ability of a motif, destined for PTM addition, to enter and bind to the active site of the corresponding enzyme. The concentration of both PTMs and segments encoded by alternative splicing within IDRs suggest these two sequence modifications may be co-localized, providing an opportunity to integrate two different regulatory inputs. Indeed, the signaling functions carried out by G protein-coupled receptors (GPCRs), which are transmembrane proteins; the nuclear factors of activated T cells (NFATs), which are transcription factors; and the Src family kinases (SFKs), which are signaling enzymes, are all modulated by the collaboration of PTMs and alternative splicing that map to the same IDRs [39]. Computer searches suggest that PTMs and alternative splicing are commonly studied individually for their effects on cell signaling, but they are rarely considered together. The supplemental data in Zhou et al. shows the results of such computer searches for 11 signaling pathways and 28 well characterized signaling proteins of various kinds [39]. The strong preferences of PTMs and protein segments encoded by alternative splicing to be colocalized in IDRs means that alternative splicing and PTMs will commonly work together to orchestrate signaling. From these considerations, this system has been called the IDP-AS-PTM toolkit [39].

Different combinations of PTMs can integrate information from multiple signaling pathways to create diverse outcomes. Indeed, multiple signaling pathways reversibly add different PTMs to the disordered tails of histone proteins [43]. The resulting collection of PTMs constitutes a “histone code” that elicits unique gene regulatory responses [44]. These differences can carry over to following generations, indicating that these multiple PTMs provide the basis for an epigenetic mechanism. Different signaling outcomes based on different combinations of PTMs have been observed for additional proteins [45,46,47,48,49,50,51]. These various multiple PTM-based signaling systems have been given different names, including PTM code [45], which will be used herein. Whether any of these additional applications of the PTM code result in epigenetic effects is unknown to these authors. Experiments and predictions indicate that for all of these proteins the multiple PTMs are located in IDRs [52]. Thus, IDRs are an important feature of both the PTM code and the epigenetic effects observed for the histone code. Furthermore, the supplemental data in Zhou et al., 2018 show that all of these proteins undergo alternative splicing, suggesting the possibility that alternative splicing could be a previously unsuspected regulator of the epigenetic effects resulting from the histone code [39]. Finally, histone tails may be accessible or may adhere to the nucleosome and be inaccessible for protein interactions, demonstrating that the context provided by the nucleosome is also a key contributor [53].

Undoubtedly, understanding and manipulating cell signaling pathways requires clearly defining the roles of IDPs and IDRs. The problem is further complicated by the nesting doll (Matryoshka)-like organization of the cellular signaling, which represents a complex network of networks, with even a single node in a protein–protein interaction network representing a multileveled network itself [54]. Here, at the lowest level, different segments of polypeptide chain form secondary structure elements that represent local networks of hydrogen bonds and residue-residue interactions. The next level of the protein intramolecular network is formed by interactions between the secondary structure elements, which are the local networks themselves. Next, proteins domains are higher level networks that are generated via interactions between these second-tier networks. A functional monomeric protein represents seemingly highest-level network that includes inter-domain interactions and interactions between domains and second-tier networks. Formation of an oligomeric protein or engagement in the temporary protein–protein interactions within the signaling network would require a new level of inter-subunit interactions, where the inter-protein interaction network might include interactions between the networks of various lower levels [54]. This manuscript is part of a collection, the goal of which is to highlight the crucial role of intrinsic disorder in cell signaling, introduce scientists to the basic concepts, common experimental approaches, and diverse molecular mechanisms that characterize each field in the hopes of increasing understanding and promoting further interdisciplinary studies. This review will (1) summarize best practices for identifying intrinsically disordered regions, (2) highlight the variety of cell signaling pathways that employ disordered proteins, and (3) identify examples of currently known molecular mechanisms implemented by these proteins and regions.

Approaches to examining intrinsic disorder

Under physiological conditions, ordered proteins are known to possess unique three-dimensional (3D) structures, in which each atom of the polypeptide chain has a unique position in space. These structures are determined by a vast number of intrachain non-covalent side chain-side chain, side chain-backbone, and backbone-backbone interactions. Such structure-determining conformational interactions include hydrophobic interactions, hydrogen bonding, Van der Waals forces, and ionic/electrostatic interactions. These interactions have different physico-chemical natures and their strengths depend on the amino acids involved and on the peculiarities of the interactions of a polypeptide chain with solvent. Kinetically, the self-organization of a protein molecule from its unfolded state to a unique 3D structure represents a complex sequential process accompanied by the formation of several partially folded intermediates [55,56,57,58,59,60,61,62]. Furthermore, for many ordered proteins, various partially folded intermediates, such as more extended pre-molten globules or compact molten globules [59, 62,63,64,65,66,67,68,69] can be stabilized by changes in the environment even under the equilibrium conditions. On the other hand, intrinsically disordered proteins (IDPs) or intrinsically disordered protein regions (IDRs) do not have stable 3D-structures under physiologic conditions, existing instead as highly dynamic conformational ensembles, members of which interconvert on a number of timescales. It seems that due to the peculiarities of their amino acid sequences, folding of IDPs/IDRs under physiological conditions is halted at different stages, creating extended-disordered conformations (native coils or native pre-molten globules) or collapsed-disordered conformations (native molten globules) [3, 11, 69].

The situation is further complicated by the fact that not only the entire protein, but its different parts can be disordered to different degrees. As a result, IDPs are not homogeneous, but represent a very complex mixture of a broad variety of potentially foldable, partially foldable, differently foldable, or not foldable segments [70,71,72]. More globally, a typical protein represents a highly heterogeneous entity with a mosaic spatiotemporal structural organization containing foldons (independent foldable units of a protein), inducible foldons (disordered regions that can fold, at least in part, due to the interaction with binding partners), inducible morphing foldons (disordered regions that can differently fold upon interaction with different binding partners), non-foldons (non-foldable protein regions), semi-foldons (regions that are always in a semi-folded form), and unfoldons (ordered regions that have to undergo an order-a to-disorder transition to become functional) [70, 73,74,75].

Again, this structural heterogeneity is determined by specific features of the amino acid sequences of IDPs/IDRs. In fact, in comparison with ordered proteins and domains, most IDPs/IDRs are systematically depleted in order-promoting residues (Cys, Trp, Tyr, Phe, Ile, Leu, Val, and Asn), while being simultaneously enriched in disorder-promoting residues (Pro, Arg, Gly, Gln, Ser, Glu, Lys, and Ala) [4, 6, 76,77,78,79,80,81,82] and commonly containing repeats [83,84,85,86]. In other words, the amino acid alphabet of IDPs/IDRs is decreased in comparison with that of ordered proteins and domains and such disordered proteins/regions are characterized by the reduced informational content of their amino acid sequences [71]. These traits make the location of IDPs/IDRs within an amino acid sequence predictable and accurate [79, 87,88,89,90,91]. In a recent experiment, structure/disorder prediction algorithms were applied a set of 646 proteins with regions of structure and disorder unknown beforehand to the researchers who carried out the predictions. The top three predictors exhibited balanced accuracies on this dataset ranging from 76 to 80% [92]. Since various attributes and computational approaches can be utilized for the development of computational tools for predicting intrinsic disorder from protein sequence, it is not surprising that there are multiple computational tools that one can use to (a) evaluate the overall disorder status of a query protein and (b) analyze the peculiarities of distribution of the intrinsic disorder predisposition within its amino acid sequence [93]. Detailed description of these tools is outside the scope of this article. Interested readers can find related information in a number of dedicated reviews (e.g., [94,95,96,97,98,99,100,101,102]).

Application of these algorithms to various proteomes indicate that IDPs and IDRs are abundantly present in any given proteome, although eukaryotes have much more disorder than prokaryotes [3, 103,104,105,106,107]. In one such study, the proteomes of a collection of archaea and eubacteria are predicted to have about 15–30% of their encoded residues to be intrinsically disordered, while, in a collection of eukaryotic proteomes, 30–50% of the encoded residues are predicted to contain in IDPs plus IDRs [107].

Contrarily to mostly catalytic, transport, and protein interaction domain functions of ordered proteins and domains [78, 108, 109], IDPs/IDRs are typically involved in signaling, regulation, recognition, and control of various cellular pathways [10, 11, 14, 109,110,111]. In addition, by changing its shape, a single disordered protein or region can readily bind one-at-a-time to multiple divergent, targeted partners to associate with many different targets [4, 112,113,114,115,116,117]. Furthermore, sites of various catalytically driven PTMs, such as hydroxylation, acetylation, glycosylation, ubiquitination, SUMOylation, methylation, phosphorylation, etc. and sites of proteolytic attack are often associated with regions of intrinsic disorder [3, 52]. One should also remember that in addition to their structure-based catalytic activities, many enzymes contain functionally important IDRs [118].

Often, disorder-to-order-transition-based recognition is mediated by the specific functional elements known as molecular recognition features (MoRFs) [119,120,121,122], which are implicated in important biological processes, including regulation and signaling [123]. Importantly, such disorder-based binding sites can be predicted. For example, α-MoRF-Pred and α-MoRF-PredII algorithms identify disordered regions that have a propensity to become α-helical upon partner binding [119,120,121]. Another computational tool, MoRFPred, can identify all MoRF types (α, β, irregular, and complex) [123]. Yet another tool, ANCHOR, predicts disordered binding regions (DBRs) located in IDPs or IDRs by identifying segments in a generally disordered region that cannot form enough favorable intrachain interactions, but still have the capability to energetically gain by interacting with a globular partner protein [124, 125]. Some other tools for finding disorder-based binding sites include MFSPSSMpred [126], MoRFCHiBi [127,128,129], Retro-MoRFs [130], MoRFPred-plus [131, 132], OPAL [133], length-specific MoRF predictor OPAL+ [134], ensemble deep convolutional neural network-based MoRF predictor en_DCNNMoRF [134], SPOT-MoRF [136], MoRFMPM [137], and MoRFPred_en [138].

An alternative approach has been to identify DBRs from their sequence patterns or motifs. Examples of this approach include eukaryotic linear motifs (ELMs) [139, 140], linear motifs (LMs) [141], and short linear motif (SLiMs) [142,143,144]. One advantage of linear motifs compared to MoRFs is that example binding partners are often known for the former but typically not for the latter. DBRs identified by sequence patterns and MoRFs identified by localized features within regions predicted to be disordered are essentially equivalent, differing mainly in how they are identified.

Protein–protein interactions have been further studied by high throughput methods such as the yeast 2 hybrid assay [145]. Such high throughput methods have taken us from function-specific pathways developed using one-by-one research methods to the more comprehensive proteome-wide protein–protein interaction networks. In these networks, most proteins bind to very few or even just one protein partner, while very few proteins, called hubs, bind to many partners [146]. Hub protein deletion is typically much more deleterious than the deletion of non-hubs [147]. These hub proteins were suggested to have special features enabling them to readily form new connections over evolutionary time, and, indeed, enabling them to be able to form alternative connections with different partners [148], thus leading to the question: what special feature gives hub proteins the ability to bind to multiple protein partners and to readily evolve so as to bind to new partners [149]? IDRs were proposed to be this special feature [10, 117].

Hub proteins have been shown to employ IDRs for multiple partner binding using two completely different mechanisms [10]: namely, (1) one DBR in one IDR associates individually over time with many different structured partners (one-to-many binding); and (2) many different DBRs in one or more IDRs associate individually with one structured partner (many-to-one binding). The p53 transcription factor is an exemplar of the former [117], while the 14-3-3 protein interaction domain and the Sarc Homology 2 (SH2) interaction domain are examples of the latter [117]. Collections of interactions pairs of both types have been studied, revealing the general importance of IDR conformational flexibility for enabling one IDR to bind individually to multiple partners (one-to-many binding) [114] or for enabling many different IDRs to bind individually to a single partner (many-to-one binding) [112].

There are also computational tools for predicting disorder-based sites responsible for interaction with RNA and DNA (e.g., DisoRDPbind [150,151,152], and regions associated with multiple PTM sites [52]. Advantages and disadvantages of many of these tools were systematically analyzed in several recent studies [101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153], and another comprehensive review shed some light on “a new page in protein science, where molten keys operate on melted locks and where conformational flexibility and intrinsic disorder, structural plasticity and extreme malleability, multifunctionality and binding promiscuity represent a new-fangled reality” [154].

Associated with a multitude of computational tools for finding intrinsic disorder in proteins and predicting various aspects of disorder-based functionality is a huge arsenal of experimental approaches that allow focused investigations of the structures and conformational dynamics of IDPs/IDRs (reviewed in [98, 155,156,157,158,159]) and for the analysis of their functions [154]. These tools are too numerous to be even briefly considered here. This is not surprising, since a protein molecule is a complex entity with multi-levelled structural organization, and since multiple experimental approaches are elaborated for the analysis of protein structure (and lack thereof) in general and for specifically examining the different levels of protein structural hierarchy.

IDPS/IDRS pervade signaling pathways in all kingdoms of life

Cell signaling requires transient yet highly specific protein interactions, signal sensitivity, signal integration and amplification, and mechanisms to activate/inactivate the entire process in response to changes in the chemical or physical environment. Intrinsic disorder provides the functional diversity, interaction specificity, and regulatory mechanisms that cell signaling processes require. Not every protein in every cell signaling cascade includes intrinsic disorder, and disorder is more prevalent in some cell signaling pathways than others [160]. Nevertheless, intrinsically disordered proteins are present in diverse cell signaling cascades in all kingdoms of life. Increased complexity in eukaryotes creates an increased need for cell signaling and regulation [120].

Aside from the well-studied mammalian cell signaling pathways, disorder is also present in signaling pathways in bacteria [161], algae (see CP12 discussion under redox signaling, below) [26], fungi [34], and plants (see UVR8 discussion under light signaling, below) [162,163,164]. In bacteria, changes in environment are often detected through protein activity sensing, in which sensing is mediated by post-translational modification of intrinsically disordered regions or unfolding of signaling proteins [165]. A variety of proteins can serve as activity sensors, including enzymes and membrane channel proteins. For example, aconitase serves as an enzyme in the Krebs/citric acid cycle [165]. However, in a variety of bacteria species aconitase can also undergo an environmentally-triggered conformational change that switches its activity from energy generation to post-translational regulation of metabolism and motility. When oxidation or iron depletion destroys the iron-sulfur clusters in aconitase, this enzyme partially unfolds and binds to specific mRNA sequences. The nature of the mRNA determines whether aconitase binding increases or decreases mRNA stability, and thus increases or decreases mRNA translation of the respective proteins. The changes in concentration of the target proteins, for example, FlgR and urease, regulate metabolism and bacterial motility.

Although yeast contain many disordered proteins involved in signaling, a particularly interesting example is the hub protein Killer Nine Resistant 4 (Knr4), which links cell wall synthesis and cell wall integrity with morphogenesis and cell cycle progression [34]. Both the cell wall integrity pathway and the calcineurin pathway are needed to regulate cell wall synthesis and maintenance in response to stress. Knr4 binds the Slt2 MAP kinase in the cell wall synthesis pathway and can repress all of the chitin synthase genes. Knr4 also binds calcineurin in the calcium-calcineurin pathway, and loss of the knr4 gene makes cells hypersensitive to calcium. Knr4-calcineurin participate in multiple cell cycle checkpoints, coupling cell division, and bud growth, and daughter cell size. While Knr4 phosphorylation is required for binding to at least some of its protein partners, including Slt2 MAP kinase, phosphorylation also appears to facilitate Knr4 degradation. The network of protein interactions formed by Knr4 is conserved among fungi.

IDRS/IDPS pervade pathways that respond to a wide variety of signals

IDPs/IDRs are found in pathways initiated by a variety of molecular signals, ranging in size from single-atom ions, small molecules such as steroid hormones, and biomacromolecules like nucleic acids and proteins [166, 167]. The examples of intrinsically disordered proteins described below highlight many of the mechanisms by which IDPs/IDRs fulfill the needs of cell signaling pathways.

Ions Multiple proteins in the calcium signaling pathway are intrinsically disordered. First, calcium channels permit the passive transport of Ca2+ into a cell, either by voltage-gated and/or ligand-gated mechanisms. In the spine, the N-methyl-D-aspartate (NMDA) receptor is a tetrameric Ca2+ ion channel which induces different cellular responses—long-term potentiation or long-term depression—based on the intracellular concentration of Ca2+ and frequency of stimulation by which it is activated [168]. NMDA receptor activation requires membrane depolarization, which prevents Mg2+ from blocking NMDA receptor activity [169], and binding by both glutamate and either glycine or serine. Thus, the NMDA receptor is sensitive to both voltage and ligands. Upon entry into a nerve cell, Ca2+ binds calmodulin, and either increases synapse response (long-term potentiation) or decreases synapse response (long-term depression). Protein complexes formed by the intrinsically disordered intracellular tail of the NMDA receptor modulates the cellular response to NMDA activity. High concentrations of Ca2+ activate calmodulin-dependent kinases, such as calmodulin-dependent kinase II, and thus long-term potentiation, whereas low concentrations of calcium activate the only phosphatase, calcineurin, and thus stimulate long-term depression [169,170,171,172,173].

The intrinsically disordered long C-terminal tail of the NMDA receptor also regulates calcium signaling by altering the properties of the channel and Ca2+ trafficking through the channel [169]. This tail also serves as a scaffold to assemble the downstream signaling proteins, including calmodulin, kinases, and calcineurin. The close proximity of these factors boosts signaling throughput, and the length of the tail defines the search radius for interacting proteins. This reach is varied by alternative splicing [170] and calpain digestion [169], while PTMs regulate nearly every aspect of protein function, including stability, trafficking, recycling, protein interactions, and calpain digestion [169, 171]. The presence of intrinsic disorder in proteins enables regulation by a combination of protein interactions, alternative splicing, and PTMs, which in turn allows multiple signals to fine-tune cell protein function and regulate signal strength, which in turn determines the cellular response [172]. This protein appears to be a candidate for using the IDP-AS-PTM Toolkit for regulating its signaling complexity as described earlier for three other proteins [39].

Calcineurin provides a second example of the role of intrinsic disorder in ion signaling, which is described detail by Trevor Creamer in this collection [173]. Together, calmodulin and calcineurin act as an intracellular Ca2+ sensors and responders [173,174,175,176,177,178,179,180]. Both proteins bind calcium, and then each other to form an active phosphatase. Calcineurin activity is also linked to its structure which can occupy three unique states [173]. In the inactive state, a regulatory domain within calcineurin is protected from proteases, and thus presumably folded. In the presence of Ca2+ but the absence of calmodulin, the regulatory domain is unfolded and solvent-exposed. Calmodulin binding to the regulatory region induces folding to a helix, and release of an auto-inhibitory domain from the active site of calcineurin. Such coupling of protein (or ligand) binding with protein folding is another common feature of intrinsically disordered proteins [12, 174, 175].

Hormones Coupling ligand binding to the folding of an intrinsically disordered region expands the range of binding free energies at which allosteric regulation of protein function can occur [176]. One prominent example of this regulatory mechanism is the glucocorticoid receptor, a representative member of the steroid hormone receptor family [176, 177]. The glucocorticoid receptor (GR) consists of a C-terminal ligand binding domain, a central DNA binding domain, and an intrinsically disordered N-terminal domain which is required for the protein to activate transcription (Fig. 2). The ligand binding domain also contains an activation domain, termed AF2. The N-terminal disordered domain is further subdivided into the R and AF1 regions. These regions have different functions, despite the fact that they are disordered, contiguous in the protein sequence, and thermodynamically coupled [176]. AF1 is a transcription activation domain, whereas R is an allosteric repressor of the AF1 domain (reviewed in [178]). Interestingly, the activity of the AF2 transcription activation domain is dependent on ligand binding, whereas AF1 can activate transcription in truncation mutants in which the ligand binding domain is removed, suggesting that the unbound ligand binding domain also inhibits AF1 function in the full-length protein [178].

Fig. 2
figure 2

Intrinsic disorder predisposition of human glucocorticoid receptor (UniProt ID: P04150) evaluated by PONDR® VSL2 [179], PONDR® VL3 [180], PONDR® VLXT [6], PONDR® FIT [181], IUPred2A_long and IUPred2A_short [181, 182]. Mean disorder score is shown as well. Positions of the N-terminal domain (residues 1–420), DNA binding domain (residues 420–485), a hinge region (residues 486–527), and ligand binding domain (residues 528–777) are shown as red, cyan, blue and lime shaded areas. Clearly, the N-terminal domain and a hinge region are mostly disordered

The presence of intrinsically disordered regions in a protein allows the allosteric response to be tuned [176]. Prior to activation, the glucocorticoid receptor (GR) resides in the cytosol. In this unliganded state, the N-terminus of the glucocorticoid receptor GR is intrinsically disordered, and interactions with multiple chaperones in the cytosol help GR remain intact and primed for ligand binding [178]. Steroid hormones are able to pass through the membrane to bind their cytosolic receptors, inducing profound structural changes in GR, including folding the N-terminal domain, and release of interactions with cytosolic proteins. GR then translocates to the nucleus where it forms large complexes with co-activator proteins, its target DNA binding sites, and the general transcription apparatus [178] (Fig. 3).

Fig. 3
figure 3

The function of the glucocorticoid receptor is regulated in part by its intrinsically disordered C-terminal tail. A The GR tail interacts with chaperones in the cytoplasm in the unliganded state. B Alternative splicing and post-translational modifications impact these interactions

The disorder-to-order transition in the N-terminal domain is regulated by ligand binding and many other factors as well, including interactions with many other proteins, DNA binding, and phosphorylation of the AF1 domain [178, 183]. The multiplicity of activating regulatory mechanisms facilitates AF1’s engagement with co-regulatory proteins and stabilize the final complex. Conversely, disruption of any of these interactions promotes dissolution of the complex, allowing GR to dynamically regulate multiple target DNAs.

Intrinsic disorder in the GR receptor not only enables multiple allosteric regulatory interactions to impact function, but also allows deployment of different surfaces of the protein to enable binding to many different sets of macromolecules, and regulation of these interactions via mRNA splicing and phosphorylation [178]. In addition, combinations of alternative translation initiation and alternative mRNA splicing result in the production of multiple glucocorticoid receptor isoforms from one gene [184]. These isoforms are able to regulate different genes [177]. Furthermore, the use of alternative translation start sites truncates the repressive R region in the disordered N-terminal domain. From our view, alternative splicing and alternative start sites similarly enable one gene to yield multiple transcripts and so are equivalent in this regard [38]. Many additional splicing isoforms and PTM variants have been recently discovered and an associated database has been constructed [185]. The various isoforms exhibit distinctive tissue distribution patterns and altered transcriptional regulatory profiles. Phosphorylation and the binding of additional proteins are discussed above as significant regulators of GR receptors, but these receptors are also regulated by other types of PTMs [186], including ubiquitination, phosphorylation, and sumoylation. These variations likely contribute to the complexity glucocorticoid signaling and help to determine cell-specific response to glucocorticoids [184, 186]. Thus, the GR receptor is a candidate to be a user of the IDP-AS-PTM toolkit for complex context-dependent (e.g. tissue or cell specific) regulation as discussed previously and above for GPCRs, N-FATs, and Sarc Family kinases [39].

Lipids In addition to forming membranes and storing energy, lipids can also activate or regulate cell signaling. Lipid-activated cell signaling also relies on IDPs/IDRs. For example, the Phosphatase and Tensin homolog (PTEN) protein inhibits signaling via the PI3K/AKT/mTOR proliferative pathway, in which phospholipids act as a second messenger. PI3 Kinase bound to an activated receptor phosphorylates PI(4,5)P2 to create PI(3,4,5)P3, which in turn activates cell proliferation and survival via AKT and promotes cytoskeletal changes via Rac/Rho/cdc42. PTEN dephosphorylates PI(3,4,5)P3 to inhibit signaling and thus proliferation. Consequently, PTEN is not only a powerful tumor suppressor, but also a prognostic marker that predicts response in many human cancers [187]. Inactive PTEN exists in a cytoplasmic pool and lacks substrate access [188]. Recruitment and the extent of activation of membrane-associated PTEN depends on the composition, and thus the surface characteristics such as charge, of the membrane (reviewed in [187]). Interestingly, surface charge can also be modified by signaling via phospholipase C [188].

The PTEN protein contains an N-terminal PI(4,5)P2 binding site, and a structured catalytic domain followed by an intrinsically disordered auto-inhibitory C-terminal tail which culminates in a PDZ binding domain [188, 189]. The tail contains two groups of serine/threonine residues which can be phosphorylated by kinases such as CK2 and GSK3. Phosphorylation of most of these amino acids forces PTEN into a closed, more stable, inactive conformation, in which membrane association, PDZ binding, ubiquitination, and degradation are all suppressed [188, 190]. Phosphorylation of T366 appears to counter the impact of phosphorylation of the other residues [191]. The disordered tail is also modified via ubiquitination as part of protein degradation, and by acetylation, sumoylation, and S-nitrosylation [192]. Interactions with other proteins via the PDZ binding domain or other regions of the protein both enhances PTEN stability and diversifies its function [188, 192].

Multiple alternate translation and splicing start sites in the PTEN gene creates many versions of the protein, most notably PTEN-L, which contains an extra 173 amino acids on the N-terminus [187, 192, 193]. This region is also primarily disordered and post-translationally modified [192]. Signal peptides within this region facilitate passage into and out of cells and organelles [192,193,194]. Thus, it is not surprising that PTEN and PTEN-L exhibit different substrate specificities and mechanisms of membrane binding [195]. Again, this protein appears to take advantage of the previously described IDP-AS-PTM Toolkit [39].

Proteins Signaling pathways activated by proteins often are regulated by IDPs/IDRs in multiple steps of the pathway. In canonical cell signaling, an extracellular ligand is recognized by a membrane protein which transmits the signal, typically through phosphorylation through a series of cytoplasmic/nuclear proteins, culminating in the regulation of a transcription factor which alters transcription of specific genes. The use of proteins as the signal provides extra regulatory opportunities through modulating signal production (transcription and translation), signaling transport and availability via binding to extracellular matrix proteins, and signal activity via PTMs.

There are many categories of proteins that serve as cell signals. For example, cytokines are small secreted immunomodulatory protein signals. Osteopontin is a multifunctional cytokine with key roles in inflammation, cell viability, and tissue repair, which also functions as a bone matrix protein that mediates osteoclast adhesion [196, 197]. Through these functions, osteopontin is also involved in cardiovascular diseases, cancer, diabetes, and formation of kidney stones [196, 197]. Unliganded osteopontin interconverts between extended, random coil-like conformations as well as a collapsed, cooperatively folded state capable of generating sigmoidal structural denaturation curves [198]. These results suggest that interactions of other proteins with osteopontin generally occur via conformational selection [198].

While cytokines regulate the immune system, growth factors are protein signals that primarily target other types of cells. Vascular Endothelial Growth Factor (VEGF) regulates angiogenesis, and thus plays significant roles in animal development, wound healing, and carcinogenesis. The vegf mRNA is alternatively spliced to produce a family of protein isoforms with varying affinities for different VEGF receptors [199]. Active VEGF-A forms a structured dimer with disordered N-terminal and C-terminal tails. Many splice variants of vegf-a alter the lengths of these tails (e.g. VEGF165 versus VEGF143). The VEGF “B” splice variants are generated by use of an alternate 3’ acceptor site for exon 8, lengthening the intrinsically disordered C-terminal tail by an additional 6 amino acids (e.g., VEGF165 versus VEGF165b) [36, 200]. This addition reverses the function of VEGF: while VEGF165 is a potent stimulator of angiogenesis, VEGF165b binds VEGF receptors but fails to activate them in a robust, sustained manner. By ineffectively occupying a binding site on the receptor, the VEGFXXXb isoforms inhibit the function of the VEGFXXX variants [36, 200]. Inclusion of these extra amino acids also alters the ability of the protein to stimulate proliferation and invasion of non-small cell lung carcinoma cells. Indeed, the ratio of VEGF165b/VEGF165 corelates with lymph node metastases [201].

Multiple chemical stimuli Some signaling pathways are capable of responding to a variety of chemical stimuli. For example, the GPCR-G protein signaling system is a complex machine responsible for the recognition of a wide variety of extracellular signals and controls various cellular responses to these signals by triggering the numerous intracellular signaling cascades. The complexity of this machinery is determined by the multitude of the members of the GPCR family (in humans, there are more than 850 different GPCRs [202,203,204,205]) that are capable of being recognized and activated by more than one thousand natural and artificial extracellular ligands, ranging from photons to amines, lipids, nucleotides, organic odorants, peptides, and proteins [30, 204]. At the next step, a cytoplasmic domain of an activated GPCR interacts with one of the intracellularly located guanine nucleotide-binding proteins (G proteins), which are heterotrimers composed of α, β, and γ subunits that can control different cellular pathways [206,207,208,209,210]. In humans, there are 23 Gα, 6 Gβ, and 12 Gγ subunits that can be assembled into numerous different heterotrimers [211]. Furthermore, cells contain ~ 40 of the regulator of G signaling (RGS) proteins (which are G protein effectors, modulators, and scaffold proteins) that are capable of interaction with various Gαβγ heterotrimers or their dissociated subunits [212]. All this indicates that the combinatorics of the GPCR-G protein system is gigantic, which can serve as one side of the mechanics of the multitude of corresponding signaling pathways. Recently, based on the comprehensive bioinformatics analysis of human GPCRs and G proteins supported, at least in part, by experimental evidence, it was concluded that intrinsic disorder and associated structural plasticity are crucial for this signaling system [213]. In fact, human GPCRs and G proteins represent dynamic conformational ensembles containing multiple IDPRs and numerous PTMs and MoRFs, and the entire mode of action of these proteins is based on the recognition of a signal followed by conformational change needed for recognition of another partner that is crucial for the downstream transmission of the signal [213]. Therefore, multifunctionality of GPCRs and G proteins, which is required for recognition of a wide variety of extracellular signals and for transmission of this extracellular information for triggering a multitude of the intracellular pathways, is determined by the presence of intrinsic disorder. In other words, this intrinsic disorder-based multifunctionality of the GPCR-G protein signaling system represents an important illustration of the structure–function continuum concept applied to cellular signaling [213]. Finally, as mentioned above, the GPCR molecule is another candidate for taking advantage of the IDP-AS-PTM Toolkit [39].

Environmental conditions as signals Unlike many structured protein domains, the function of IDPs/IDRs can persist in extreme environmental conditions. This trait allows IDPs to reliably sense extreme conditions and instigate responsive signaling pathways [26]. The prominent roles that IDPs/IDRs play in responding to light, mechanical forces, pH, redox potential, and drought/salt concentration are discussed below.

Light Plants must sense and adapt to light in order to optimize energy production, to limit photodamage, and to set/maintain their circadian clock. The UVR8 photoreceptor in plants is crucial for generating photomorphogenic and protective responses to UV light [162]. The UVR8 protein includes intrinsically disordered N- and C-terminal tails which regulate protein activity [162]. UVR8 is partially inactivated by dimerization, producing an equilibrium between an inactive compact dimer and an active extended monomer. Photoexcitation of UVR8 triggers dimer dissociation and enables the extended C-terminal tail to bind COP1, which regulates light signaling in plants, and propagates the signal [162]. Conversely, active monomers also bind RUP proteins, which inhibit UVR8 signaling.

To avoid harmful light exposure, blue light is used by free-swimming bacteria to modulate both the length and directionality of their run [214]. To this end, several specific proteins are used as blue-light photoreceptors. An illustrative example of action of such photoreceptors is given by photoactive yellow protein (PYP) from a motile, alkalophilic and halophilic bacterium Ectothiorhodospira halophila. This water-soluble ~ 14 kDa protein contains a thioester-linked p-coumaric acid cofactor and acts as the photosensor [215,216,217,218]. Upon light excitation, trans/cis isomerization of a double-bond in the chromophore triggers a cycle of structural events yielding a long-lived, blue-shifted intermediate (known as pB) with a life-time on the order of 1 s [216, 219]. High-resolution solution NMR spectroscopy demonstrated that this long-lived pB intermediate is characterized by a noticeable level of disorder and exists as an ensemble of multiple conformers interconverting on a millisecond time scale [220]. Although these light-induced structural perturbations affected almost the entire molecule, the ordered structure of PYP is restored once pB converted back to its ground state (pG). This cycle of light-induced unfolding and dark-promoted refolding has been proposed to regulate protein function, with the disordered pB state being able to bind partner molecules, allowing the swimming bacterium to operate the directional switch that protects it from harmful light exposure [220].

Redox potential The conditionally disordered chloroplast protein of 12 kDa (CP12), found in the chloroplasts of photosynthetic organisms such as plants, cyanobacteria, algae, and cyanophages. CP12 regulates the Calvin-Benson-Bassham cycle, which is a series of redox reactions that converts carbon dioxide into glucose [26]. The extent of disorder, and thus the activity, of CP12 is determined by redox conditions, although CP12 remains highly mobile in both the oxidized and reduced states. In dark or oxidizing conditions, CP12 forms limited, marginally stable structure and 2 disulfide bonds which are required to bind and inactivate two enzymes that participate in the Calvin-Benson-Bassham cycle (glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and phorphoribulokinase (PRK)). In light/reducing conditions, the disulfides bonds break and the CP12-GAPDH-PRK ternary complex dissociates, re-activating the enzymes and thus carbon fixation.

Mechanical forces Many cellular processes that are regulated by chemical stimuli, such as proliferation, differentiation, motility, and survival, are also influenced by the mechanical properties of the substrate supporting the cells [221]. Mechanosensing/mechanotransduction induces cellular responses to compression, tensile stress, shear stress, and hydrostatic pressure. Alterations in tissue stiffness are associated with many diseases, including cardiovascular disease, muscular dystrophy, and cancer [222]. Mechanical stress is transmitted between cells via cell–cell adhesion adherens junctions composed of the ABE complex (alpha-catenin, beta-catenin, and epithelial cadherin cytoplasmic domain) [223]. The ABE complex is flexible and pliable, and thus adopts a wide variety of structures [223]. This structural versatility arises from protein-domain motions in α and β catenin, and is thought to provide reversibility and sensitivity to stress sensing [223].

In a second example, the mouse protein CasSD includes an intracellular, proline-rich disordered domain. In the absence of mechanical stretching forces, this region formed polyproline II helices hypothesized to bind LIM domain proteins, thus protecting CasSD from phosphorylation. Application of mechanical stretch has been proposed to unfold the PPII conformation, precluding LIM protein binding, thus enabling CasSD phosphorylation and signal propagation [224]. Mechanical stretching similarly unfolds talin and other mechanosensitive proteins, thus exposing cryptic binding sites [225, 226].

pH The low pH of the mammalian stomach is one of the natural barriers to protect the organism from microbial infection. Enteric bacteria, such as E. coli, have adapted to sense, respond to, and survive in acidic environments. Sudden immersion in acid requires an immediate adaptive response. Thus, the extended signaling pathways used in non-life-threatening situations in other organisms are reduced to single protein sense-and-respond units to detect acid in bacteria. For instance, the CadC protein is a pH-responsive one-component signaling system composed of an N-terminal winged helix-turn-helix DNA binding domain, followed by a 50 amino acid intrinsically disordered region linking to a single transmembrane helix and ending in a C-terminal pH sensing domain which extends into the periplasm [161]. Intriguingly, the disordered linker is required to transduce the pH-dependent status of the periplasmic sensing domain to the DNA binding domain via dimerization. Likewise, the E. coli periplasmic protein HdeA behaves as an acid-inducible chaperone [227]. At neutral pH, HdeA is folded and inactive. Exposure to acidic conditions unfolds HdeA, allowing it to bind its substrate proteins.

Hydration Dehydrins are a family of intrinsically disordered proteins that act as effectors whose expression is induced by signaling pathways that sense abiotic stresses, such as cold or draught. Dehydrins protect plants from dehydration and from cold [228]. Although the impact of cold in plants is amplified by dehydration, Dehydrins can protect both protein activity and membrane structure [228]. Hydrophobic amino acids are necessary for these activities [229]. Dehydrin proteins are categorized by the presence of three conserved motifs—the K-, Y-, and S-segments, which are enriched in lysine, tyrosine, and serine respectively. Not all proteins contain all three motifs, although more than one copy of any motif may be present. Specific dehydrins may also contain additional motifs that impart additional functions (e.g., a poly-histidine region confers metal binding and self-dimerization) [228]. In vitro, interactions with metal ions, other proteins, and sodium dodecyl sulfate induce the formation of limited structure for some dehydrins [228, 229], which can be transiently stable in vivo [230].

Heat The dehydrin protein family discussed above may also protect from heat. The Arabidopsis thaliana dehydrin Early Response to Dehydration 14 (ERD14) can increase the viability of E. coli subjected to 15 min at 50 °C [230]. This protection relies on K- and H-segments which form stable helices upon binding to partner proteins in vivo. Multivalent binding of one ERD14 to difference surfaces of a single partner protein holds different regions of the same protein in close proximity. Conversely, multiple ERD14 proteins that each bind multiple partners both sequester exposed hydrophobic residues and prevent partner proteins from approaching one another, which would otherwise lead to aggregation.

IDRS/IDPS are found in every category of cell signaling pathways

Based on the source of the signal and the relative location of the responding cell, cell signaling is divided into autocrine, juxtacrine, intracrine, paracrine, and endocrine pathways. Despite the large variety of signal transmission mechanisms used, IDRs/IDPs act as crucial components in each of these different categories.

Intracrine signaling self-regulates by producing hormones that bind intracellular receptors. Thus the cell stimulates itself because the signal, and hence the signaling cascade, never crosses the cell membrane. The nature of intracrines can vary: structurally diverse hormones (e.g., steroid hormones (which are mostly synthesized from cholesterol), growth factors, DNA-binding proteins, and enzymes all can have intracrine activity [213]. Furthermore, several protein/peptide hormones might act as intracrines as well, with the most notable example (in the light of the ongoing COVID-19 pandemic) being peptides of the renin–angiotensin system, such as angiotensin II and angiotensin, conversion between which is controlled by the angiotensin-converting enzyme 2 (ACE2), which also serves as the main entry point into cells for some coronaviruses including SARS-CoV-2 [231]. In the intracrine signaling pathways, the corresponding intracrines are recognized by and act through the specific intracellular receptors, which are often of nuclear or cytoplasmic origin. In the case of steroid hormones, the most studied intracellular receptors are the members of the nuclear receptor subfamily 3 (NR3) that include receptors for estrogen (group NR3A) [232] and receptors for 3-ketosteroids (group NR3C) [233], which first form a complex with the hormone binding estrogen receptors (ERs) and then activate transcriptional processes and/or signaling events that result in the control of the gene expression. There are two ERs in humans, ERα (595 residues) and ERβ (530 residues), which can exert their regulatory functions via genomic [234] and non-genomic estrogen-mediated signaling events [235]. In direct genomic signaling, ERα and ERβ act as ligand-activated transcription factors. Here, hormone binding triggers conformational changes and dimerization of the receptor leading to the translocation of the complex to the nucleus, where it binds to the chromatin at specific DNA sequences known as estrogen response elements (EREs), which are present in many gene promoters [236]. However, ~ 35% of genes targeted by estrogen lack ERE-like sequences [237, 238], and the corresponding genes are expressed via “indirect genomic signaling” or “transcriptional cross-talk”, where the ER complexes operate via interactions with other transcription factors [239]. Furthermore, ERs can be activated in the absence of estrogen by either phosphorylation at specific residues [240] or by interaction with co-regulators, co-activators and co-repressors, that can enhance or decrease transcriptional activity of ERs, respectively [241, 242] and which are regulated by various PTMs themselves [243]. Therefore, ERs act as multifunctional proteins capable of (a) interaction with small molecules-ligands; (b) undergoing conformational changes triggered by hormone binding; (c) oligomerization; (d) translocation to nucleus; (e) interaction with DNA; (f) interaction with other transcription factors; (g) interaction with various co-activators; and (h) undergoing various PTMs. Collectively, these abilities clearly indicate that the activity of ERs relies on intrinsic disorder [17]. In line with this hypothesis, there are several studies showing that ERα contains an intrinsically disordered transactivation domain (AF1) in its N terminus [244], activity of which is regulated by phosphorylation and associated phosphorylation-coupled proline isomerization [245, 246].

Autocrine signaling occurs when a cell is stimulated by a signal that was produced and secreted by that same cell. Autocrine brain-derived neurotrophic factor (BDNF) has been implicated in the structural and functional plasticity of dendritic spines [247]. BDNF also regulates neuronal plasticity, including structural long-term potentiation, an aspect of learning. The BDNF proprotein must be cleaved to form an intrinsically disordered N-terminal prodomain, and a structured C-terminal protein which dimerizes to form mature BDNF (mBDNF), both of which are secreted and have activity [248]. The autocrine mBDNF promotes neuronal survival, growth, and development. The cleaved prodomain, proBDNF, binds other proteins to promote cell death, dendritic remodeling and long-term depression. Interestingly, a mutation in the prodomain, V66M, is present 20% of humans and increases the occurrence of anxiety disorders, depression, memory deficits, and impairs recovery after traumatic brain injury (reviewed in [248]). The wild type and mutant prodomain exhibit few differences in structure or protein interactions. However, binding by Zn2+, which is present outside neurons, creates distinct conformations and dramatically alters prodomain oligomerization.

Juxtacrine signaling requires contact between cells, allowing a ligand on one cell surface to bind a receptor on an adjacent cell. In metazoans, Notch signaling plays key roles in early stages of embryonic development, as well as development of the cardiovascular system, the central nervous system, and the respiratory system, among others, and has well-defined roles in cancer progression [249, 250]. Notch and its ligands are all transmembrane proteins, in which the extracellular domains mediate the inter-protein interactions that activate the pathway. Upon ligand binding, the Notch intracellular domain, which includes a 111-amino acid intrinsically disordered region, binds the transcription factor CSL (an acronym of human and mouse CBF1/RBPJ-κ, Drosophila Suppressor of Hairless, and C. elegans Lag-1), the coactivator Mastermind, and Notch target DNA [251]. Within the Notch intracellular domain, both the N-terminus of the disordered region and the ankyrin repeat region bind distant sites on CSL, with the remainder of the disorder region linking the two interaction regions. The pattern of positive and negative charges within this linker region are thought to mediate additional interactions to stabilize the complex [252].

A specialized type of juxtacrine signaling is enabled by gap junctions. Gap junctions allow ions and small metabolites to exchange between adjacent cells, and are formed by two connexin proteins, one provided by each cell [253]. The intrinsically disordered C-terminal domain of connexins (148 amino acids) determines whether the channel is open or closed. Phosphorylation in this region regulates protein interaction, channel assembly, channel degradation, and metabolic and electrical coupling, and thus controls trafficking through the channel [253]. Different connexin proteins are expressed in different tissues, and respond differently to phosphorylation. For instance, Connexin 32 is expressed in the liver and brain, whereas Connexin 43 is produced in cardiac muscle [253, 254]. Phosphorylation of the C-terminal domain of Connexin 43 inhibits channel function, whereas phosphorylation of Connexin 32 stimulates channel function [254].

Paracrine signaling involves the release of diffusible chemical signals to communicate between nearby cells in which cell–cell contact is not required. One prominent example is neurotransmission. Glutamate is the primary neurotransmitter for excitatory stimulation. Signaling via glutamate is a critical component of long-term potentiation and long-term depression, which alter the strength of downstream signaling in response to glutamine binding to receptor. These adaptations are crucial events in learning and the formation of memory [169]. Glutamate binds both ionotropic glutamine receptors and metabotropic glutamine receptors. Both have long cytoplasmic C-terminal intrinsically disordered domains (CTDs) [169, 255]. The CTDs of both receptors are alternatively spliced, and post-translationally modified (phosphorylation and SUMOylation for metabotropic receptors, phosphorylation and palmitoylation for ionotropic receptors) [169, 255]. These modifications diversify the intracellular sites available for protein interactions, allowing different complexes to be formed and altering the transmitted signal [169, 255].

In endocrine signaling, endocrine cells produce signals that target distant cells in the body. Most of the intrinsically disordered proteins with well-studied roles in signaling operate in endocrine signaling pathways. The Wnt signaling pathway exemplifies how intrinsic disorder can play multiple roles in a single pathway (Fig. 2). A mouse oncoprotein signal (Int-1) and a Drosophila body-plan-controlling developmental protein (Winged) were identified as homologues, leading to the portmanteau Wnt as the family name for these proteins [256]. The Wnt family signaling proteins are both glycosylated and palmitoylated and are universal across multicellular members of the animal kingdom but absent in single cell members, with a few paralogues in sponges and with 19 paralogues in humans [257, 258, and The Wnt Homepage (stanford.edu)]. Humans also have 10 members of the Frizzled (Fz) protein family, which serve as Wnt receptors [258]. While some biological processes may integrate signals propagated by multiple different Wnt-Fz complexes, some Wnt-Fz complexes drive opposing biological responses, indicating that Wnt-Fz interactions must have the capacity for selectivity. This selectivity can be achieved by formation of larger Wnt-receptor complexes, in which Fz interaction is mediated by conserved Wnt residues, and divergent, intrinsically disordered regions of the same Wnt bind additional receptors, such as Reck [258].

Prior to Wnt signaling, the transcription factor β-catenin is maintained at low levels by the β-catenin destruction complex, which is an assembly of the 852 residue mostly disordered scaffold protein axin, the 2,843 residue massively disordered protein adenomatous polyposis coli (APC), and the three mostly structured proteins β-catenin, casein kinase Ia (CKI-a), and glycogen synthetase kinase 3b (GSK3b). The two kinases phosphorylate residues within a long IDR at β-catenin’s amino terminus [259, 260]. Indeed, most protein phosphorylation occur in IDRs [41]. The phosphorylation modifications signal β-catenin’s subsequent ubiquitination, which also occurs primarily within IDRs [261]. Next, the β-catenin proteins undergo proteasomal destruction [259, 260], which is greatly accelerated by the presence of IDRs in addition to the bound ubiquitin signal [262].

The two kinases and β-catenin bind to Axin’s long intrinsically disordered region, thereby connecting structured proteins with flexible linkers. Random movements of the bound proteins enabled by the flexible linkers bring about frequent kinase-substrate collisions, which, in turn, lead to efficient phosphorylation. Thus, the destruction complex works by random movements of a ‘‘stochastic machine,’’ not by cooperative conformational changes [259].

Laboratory experiments show that a 55 residue segment of axin containing binding sites for both β-catenin and GSK3b has the biophysical properties of an unstructured protein [263], which is in agreement with the predictions of disorder on this protein. Furthermore, addition of this disordered segment to solutions of β-catenin and GSK3b greatly accelerates the rate of phosphorylation of β-catenin by GSK3b. If too much of this disordered peptide is added, the elevated phosphorylation rate decreases, likely because, in the presence too much peptide, many peptides bind either GSL3b or β-catenin but not both proteins. Finally, if shorter fragments containing either one of the two binding sites are added to solutions of β-catenin and GSK3b, no rate acceleration is observed. Overall, these data show that the phosphorylation rate enhancement occurs by the binding of both the kinase and the substrate to a single flexible tether and not by activation resulting from the binding of axin to the enzyme or the substrate [263]. These experiments provide strong experimental validation of the stochastic machine model [259].

Upon encountering the target cell, Wnt binds to two co-receptors, the seven transmembrane helix protein Frizzled (Fzd) and the single pass lipo-related-receptor protein 5/6 (LRP5/6). This binding promotes recruitment of the scaffolding protein disheveled (Dvl), which results in the phosphorylation of the cytoplasmic domain of LRP5/6, a domain whose disorder [264, 265] reflects disorder in the cytoplasmic domains of other single pass membrane proteins [266] and like most other segments that undergo phosphorylation [41, 265]. The resultant molecular complex Wnt-Fzd-LRP5/6-Dvl forms a structural region for Axin interaction that disrupts Axin-mediated phosphorylation/degradation of the transcriptional co-activator β-catenin, thereby allowing it to stabilize and accumulate in the nucleus where it activates the expression of multiple Wnt-dependent genes.

Due to its prominent physiological function, the Wnt/β-catenin signaling must be strictly controlled because its dysregulation, which is caused by different stimuli and also by many different mutations that lead to alterations in cell proliferation, apoptosis, inflammation-associated cancer or alterations in stem cell proliferation or self-renewal, for both embryonic and various types of adult stem cells [257].

IDRS/IDPS are found in every step of cell signaling pathways

The sections above highlight the different structures of cell signaling pathways. Intrinsic disorder may be present, and provide regulatory opportunities, for any of the following steps: ligand production, ligand activity, ligand bioavailability, receptor structure, intracellular transmission, termination/intracellular trafficking, and effector proteins (Fig. 4). Indeed, in addition to Wnt signaling, ten other pathways associated with development of multicellular metazoans, including pathways also associated with cancer, or also associated with stem cell proliferation were tested for their utilization of IDRs. Like Wnt, all ten additional developmental pathways also extensively used proteins containing IDRs [267].

Fig. 4
figure 4

Disorder can occur at any step of the Wnt cell signaling pathway. A A schematic of signaling components in the core canonical Wnt signaling pathway, showing the inactive state on the left and the active state on the right. The cell membrane is indicated by an arc and the nucleus by a light blue oval. Wnt signaling is able to control many processes by employing different variants of many proteins involved in Wnt signaling, which exist due to gene duplication, alternative splicing, and PTMs [258]. Consequently, a protein was marked as disordered (using a starburst) if the sequence off any variant, not post-translationally modified, was identified as intrinsically disordered in the literature. Wnt [258], Fz [268], LRP [268], Dsh/Dvl [264, 265], APC, CK1, Axin, GSK3 [269], β-catenin [269, 270], TCF/LEF [271], Groucho [272] all can include intrinsic disorder. B Analysis of 117 proteins involved in Wnt signaling based on disorder score and percent of disordered residues. Large values of each parameter indicate increasing disorder. Color blocks indicate regions in which are mostly ordered (blue and light blue), moderately disordered (pink), or mostly disordered (red). If the two parameters agree, the corresponding part of background is dark (blue or pink), whereas light blue and light pink reflect areas in which only one of these criteria applies. It is noteworthy that no Wnt pathway proteins are very structured (dark blue) and only two proteins can be considered mostly disordered. The remaining 115 proteins are either moderately disordered or highly disordered

Ligand production The production of many signaling molecules is highly regulated at the level of gene transcription. Furthermore, the transcription factors involved are often regulated by other signaling pathways (Fig. 4). Since intrinsic disordered regions are highly prevalent in transcription factors [273,274,275,276,277], intrinsic disorder is a major factor in regulating the production of cell signals.

Ligand activity/bioavailability The bioavailability of protein ligands is determined by highly regulated interactions with proteoglycans, which are ubiquitous components of the extracellular matrix. Heparin is a glycosaminoglycan in which disaccharide units may be sulphated [278]. Heparan sulfate proteoglycans (HSPGs) consist of a protein core with chains of heparan sulfate covalently bound. Most cells express at least one HSPG. Heparin binds > 400 proteins, including many involved in cell signaling [279]. Examples include growth factors such as FGF, VEGF, and HGF, EGF, and pro-inflammatory cytokines such as IL-8 [278, 280]. GFs bound to HSPGs are sequestered and thus not active [280]. Cleavage of heparan sulfate by Heparanase releases these signaling proteins [280]. Heparanase levels are regulated to control signaling and are elevated in tumorigenesis, metastasis, and angiogenesis [280]. Likewise, the affinity of cell signals for heparin is a major determinant of signaling strength. Proteins bind heparin via intrinsically disordered sequences rich in lysine and arginine [259]. The affinity of growth factors/cytokines for heparin correlates with the percentage of disordered residues in heparin-binding sites [259].

Receptor structure Transmembrane receptors transduce the signal generated by ligand binding across the membrane. Many receptors require intrinsically disordered cytoplasmic tails to function properly [169, 281,282,283]. In a common strategy, conformational changes in the receptor triggered by ligand binding promote release of the cytoplasmic tail from association with the membrane. Once free, disordered tails engage in the protein–protein interactions required to propagate the signal. For the Epidermal Growth Factor Receptor (EGFR), this phenomenon is observed in the juxtamembrane region, which links the transmembrane α-helix with the tyrosine kinase domain. Prior to ligand binding, both the monomeric and inactive dimeric conformations of EGFR allow basic residues in the juxtamembrane region to bind the membrane. Upon ligand binding, the transmembrane helix re-arranges and EGFR forms active dimers [284, 285]. In the active dimer, the lipid bilayer releases the two juxtamembrane regions, enabling them to form antiparallel helices. This conformational change promotes autophosphorylation, and hence activation, of the two tyrosine kinase domains [281]. This arrangement can be regulated by altering the affinity of the juxtamembrane region for the membrane: PIP2 binds the juxtamembrane region to facilitate dimerization, whereas T654 phosphorylation decreases membrane affinity and thus activation [281, 286]. Furthermore, oncogenic mutations that stabilize the juxtamembrane region cause EGFR to be constitutively active [287].

IDPs/IDRs are particularly enriched in signaling proteins associated with membranes. Because the presence of intrinsic disorder provides unique opportunities for interactions with membranes (reviewed in detail by Cornish et al. [281]), it is perhaps not surprising that 15% of all disordered proteins bind lipid [288, 289]. The enrichment of positively charged amino acids within disordered regions enables electrostatic interactions with lipid head groups, which can induce membrane curvature [281]. Conversely, membrane curvature can reduce the motion, and hence conformational entropy, of disordered regions, allowing these proteins to act as curvature sensors. Disorder would expose any hydrophobic side chains, allowing their insertion into the membrane [281]. When receptors, scaffolds, and intracellular mediators of cell signaling pathways serve as protein interaction hubs, the membrane increases their effective concentration and restricts diffusion to two dimensions, thus increasing the probability of protein interactions. The presence of the membrane as a physical barrier can sterically prevent non-productive interactions from forming. Furthermore, the orientation of one protein to the membrane can expose or hide protein binding sites and thus regulate signal progression through the pathway [290].

Integrins not only mediate two-way communication between the cell interior and the extracellular matrix, but they also regulate ion channels, growth factor receptors, and the activity of cytoplasmic kinases [291]. These regulatory interactions allow integrins to coordinate cytoskeletal structure with growth factor-mediated processes such as cell adhesion, migration, and invasion of the extracellular matrix. The affinity of integrins for their ligands/the extracellular matrix is regulated by their intrinsically disordered cytoplasmic tails. These tails also act as a hub to form and regulate intracellular protein complexes [292,293,294]. The ability of integrins to bind extracellular ligands is regulated by talin, a cytoplasmic cytoskeletal protein [295,296,297,298]. The α-helical propensity, dynamics, and affinity in the β tails of integrins strongly suggest that conformational entropy plays an important role in Talin binding, with a preformed helix binding more readily than a disordered one [299].

Similar regulatory mechanisms have been established for G-Protein Coupled Receptors (Fig. 5), which were recently reviewed by Zhou et al. [39].

Fig. 5
figure 5

Adapted from Zhou et al. [39]

Alternative splicing and PTMs, localized in intrinsically disordered regions, direct differential CXCR4 signaling. Predicted disorder identified by PONDR-FIT is depicted on a heat map (lower left), with red and blue indicating predicted disorder and order, respectively. A crystal structure of the structured regions (28–303 residues, PDB ID: 3OE9) is shown as a blue ribbon. Alternative splicing regulates receptor function by generating three tissue-specific isoforms by replacing the first five residues at the disordered N-terminus with other sequences of varying length. Multiple PTMs regulate different aspects of CXCR4 function: sulfation of Y7, Y12, and Y21 modulates receptor-ligand binding and dimerization [300], and glycosylation of N11 plays a role in masking the coreceptor functional activity [301]. Likewise, phosphorylation of Y157 is required for activation of the Gi-independent JAK2/STAT3 pathway [302]. Consequently, combinations of C-terminal PTMs are associated with three different biological processes: phosphorylation of S339 in G protein-coupled receptor kinase 6 (GRK6) and possibly GRK2 phosphorylation (two residues from S346-S348 and S351-S352) lead to receptor-arrestin3 binding, G protein uncoupling, and subsequent receptor desensitization. In contrast, phosphorylation of GRK3 (at the same regions as GRK2, but probably different residues), and GRK6 (S330 and S339) result in arrestin2 recruitment and subsequent ERK1/2 activation [303]. In addition, protein kinase C (PKC) and GRK6 phosphorylation (S324 or S325, S330 respectively) initiate degradation modulated by ubiquitination of K327, K331, and K333 [303, 304].

Large multi-site docking proteins (LMDs) leverage the protein binding capacity of intrinsically disordered tails. Many cell signaling pathways require large multi-site docking proteins to transduce signal from the activated receptor to downstream intracellular effectors [305]. Signaling hubs bind many proteins, but are limited to a few interactions at a time. This arrangement can allow response to a single signal to evolve with time or enable one protein to transmit multiple different signals based on the protein interactions formed [281]. Scaffold proteins spatially and temporally regulate cell signaling pathways by binding and sequestering signaling proteins [306]. Thus, LMDs bind to both integrate signals from multiple pathways and coordinate the downstream response [27, 307, 308]. Formation of these higher-order complexes allows amplification and integration of multiple signaling pathways instigated by cytokines, growth factors, and antigen receptors [27, 119, 309]. For instance, disordered hub regions can facilitate engagement of kinases with target proteins [310]. Gab2 is a type of LMD protein that operates as part of many signaling pathways [308, 311] and transmits signals from integrins, receptor tyrosine kinases, cytokine receptors, multi-chain immune recognition receptors, and G protein-coupled receptors, and is required to activate Akt, Ras/Raf, Rac, c-fos, Jak/Stat, Rac, and a host of other intracellular transducers [308, 311, 312]. Upon phosphorylation by protein tyrosine kinases, Gab2 binds both intracellular domains of receptors and many signaling proteins to activate multiple pathways by bringing the necessary factors into proximity [305, 308, 313].

Although most of the 74 kDa Gab2 protein is intrinsically disordered, it does contain a folded PH domain at its N-terminus, which anchors Gab2 to the membrane via interactions with the lipid PIP3 [308]. Gab2 function is critically dependent on binding to Grb2, which physically links Gab2 to the activated receptors [27, 314]. The Gab2-Grb2 interaction illustrates how complexes are organized by the long, disordered tails in the LMD class of proteins. Even though the disordered region of Gab2 is ~ 550 amino acids long, only two short regions (~ 20 amino acids) interact with Grb2, both binding the C-terminal SH3 domain of Grb2 [313, 315]. In isolation and in solution, the interacting regions of Gab2 are mostly disordered, with some residual signatures of extended β and polyproline II conformations [27]. Studies on the similar LMD protein Gab1 demonstrate that allosteric interactions and binding-induced folding are critical for the correct formation of these multiprotein complexes [307].

Proteins that bind to Gab2 often contain an SH2 protein interaction domain [305, 308, 313]. SH2 domains, which were discovered by Tony Pawson and colleagues, are non-catalytic structured domains that bind target sequences containing a phosphorylated tyrosine, and are found in several different multidomain proteins [316]. The many protein interaction domains are each wide-spread and found in multiple proteins, and their associated binding specificities have revolutionized our views of cell signaling [317]. The 14-3-3 proteins and proteins with phospho-tyrosine binding domains also use structure to bind to DBRs located in IDRs. We wondered whether the use of DBRs in IDRs for associating with protein interaction domains is rare or common. A convenient source containing more than 80 well characterized protein interaction domains is the Pawson Lab website (search “The Pawson Lab—Home”, click “domains—map”). So far more than 30 of these protein interaction domains have been shown by published experiments and/or by prediction to bind to DBRs in IDRs, with none so far binding to structured domains (work in progress). We suggest that developing a comprehensive list of protein interaction domains from a given eukaryotic model organism, then determining which ones bind to DBRs located in IDRs and which ones, if any, bind to structured proteins would be a very useful exercise.

Intracellular transmission of signals relies on a series of protein interactions. Many kinases include IDRs, which facilitate the intermolecular interactions critical for the function and specificity of the signaling cascade [318]. For interactions mediated by disordered tails, the disordered character of these regions provides multiple opportunities for regulation that can be applied simultaneously to diversify potential outcomes and refine the cell’s response. For instance, Ras, a p21 GTPase, is activated (1) by receptor tyrosine kinases (2) when bound to GTP and (3) when anchored in the membrane. Once activated, Ras binds its effector proteins, activating signaling cascades that control cell proliferation and survival until GTP hydrolysis switches the signaling off [319]. Although the catalytic domain is highly conserved among Ras family proteins (90–100% identical), the disordered C-terminal hypervariable regions exhibit substantial sequence diversity. Membrane anchoring positions the catalytic domain relative to the membrane and effector binding sites in the tail [320]. The disorder in the tail enables the occurrence of PTMs that add lipids and other groups to the tail, regulating membrane anchoring, domain positioning, auto-inhibition, effector protein binding, and, ultimately, Ras-mediated signaling [320, 321].

The function of K-Ras, a member of the Ras family with a lysine-rich tail, can be affected by alterative mRNA splicing which alters the amino acid sequence of the tail (e.g., generating K-Ras4A and K-Ras4B). K-Ras4B is an oncogenic isoform of Ras, in which GTP hydrolysis, aided by the GAP protein, is sterically obstructed [320]. Consequently, the duration of K-Ras4B activation is longer than the other Ras isoforms. In addition, the effector binding side, which is inaccessible in inactive Ras, is more exposed in this oncogenic Ras isoform. Thus, the disordered tail controls many of the functions of this critical protein, and sequence variations in the tails create a family of Ras proteins that recognize different effectors and have unique activity profiles [319, 322]. Many of the regulatory mechanisms available to IDPs/IDRs are employed by Ras. Ras and its various isoforms and functions are extensively reviewed by Cornish et al. [281].

Transcription factors Greater than 90% of transcription factors either contain IDRs or are entirely intrinsically disordered, thus it is not surprising that transcription factors regulated by cell signaling are also likely to include intrinsically disorder [275, 323]. Many transcription factors contain structured DNA binding domains, whereas the domain(s) that regulate transcription activation and repression are unstructured. An example of cell signaling-regulated transcription factors with this organization includes Gli3, a transcription factor regulated by Sonic Hedgehog signaling, which has an intrinsically disordered repression domain embedded with protein interaction sites [324]). Conversely, an example of a signaling-regulated transcription factor with a disordered DNA binding domain is the LEF/TCF protein Lymphoid enhancer-binding factor-1 (LEF-1) [271]. In response to Wnt signaling, LEF-1 bound to both DNA and β-catenin activates Wnt-responsive genes. LEF-1 contains a single High Mobility Group (HMG) domain, which binds, bends, and distorts the minor groove of its cognate DNA. In the absence of DNA and β-catenin, the helix I and the C-terminal end of Helix III of the LEF-1 HMG domain are unstable, fluctuating on the millisecond to microsecond timescale. This region cooperatively folds upon DNA binding. This disorder appears to be a hallmark of HMG domains that bind specific DNA sequences, as opposed to proteins containing multiple HMG domains that recognize DNA structure rather than DNA sequence [271]. A final type of cell signaling regulated transcription factor includes Smad proteins, which are regulated by TGFβ/BMP signaling. These transcription factors are composed of two structured domains separate by an intrinsically disordered linker, allowing the protein structure to range between compact and extended structures [325]. Smad dynamics are hypothesized to be important for modulating Smad function and thus signal transduction.

Most gene-specific transcription factors regulate transcription by recruiting components of general transcription activation or repression complexes. These components also include IDPs/IDRs. To continue the example of LEF-1, in the absence of Wnt signaling LEF-1 binds the corepressor TLE (termed Groucho in Drosophila). Groucho is composed of structured domains near both termini, and a central disordered domain that prevents promiscuous binding and unrestrained repression of transcription [272].

Termination/intracellular trafficking Many cell signaling pathways rely on vesicle trafficking to terminate cell signaling and/or recycle the receptor proteins [326]. In neurotransmission, signaling molecules are also released from the upstream neuron by vesicles fusing with the axon terminus. IDPs/IDRs participate in vesicle release and recycling at nerve terminals (reviewed in Snead 2019). Long disordered regions mediate protein–protein interactions and are often located adjacent to catalytic domains [327, 328]. As discussed above, many disordered regions also act as lipid curvature sensors, which is detected by the intrinsically disordered amphipathic region of the GTPase-activating protein ArfGAP1. This region acts as an amphipathic lipid-packing sensor, forming α-helices upon binding highly curved membranes [327].

Conclusions

Intrinsically disordered proteins play many diverse, yet critical roles in cell signaling pathways. Signaling imposes many logistical demands on a cell, requiring mechanisms to amply, integrate, differentiate, and propagate signals, as well as to generate unique responses to similar signals with overlapping gene expression patterns. IDPs/IDRs are uniquely suited to solving these problems, as demonstrated by several examples detailed in this review (Table 1). The many advantages conferred by disorder to cell signaling cascades means that (1) understanding signaling required definition of the roles disorder plays in each pathway, (2) many more examples of disordered proteins in cell signaling pathways are likely to be discovered, and (3) more mechanisms by which disorder functions remain to be elucidated.

Table 1 Examples of regulatory mechanisms, enabled by intrinsic disorder, that contribute to cell signaling

The importance of disorder is highlighted by its presence in cell signaling proteins from all kingdoms of life (animals, plants, bacteria, fungi), in every category of cell signaling pathways (autocrine, juxtacrine, intracrine, paracrine, and endocrine) and at each stage (ligand, receptor, transducer, effector, terminator) in the cell signaling process. Clearly, any particular cell signaling pathway cannot be fully described without understanding the mechanisms by which intrinsically disordered protein regions contribute to that pathway. Understanding these mechanisms requires not only understanding the statistics of the conformational ensembles generated by intrinsically disordered protein regions [320], but also defining how alternative splicing, PTMs, mutation, ligand binding, effector protein binding, and changes in sub-cellular location can dynamically alter these ensembles.