Introduction

Proteins are the workhorse molecules of the cell, driving virtually every function and developmental program in biology. Amazingly, many of these critical molecules readily aggregate and assemble inside of living cells through interactions amongst unfolded and folded domains. This can occur aberrantly and lead to disease; but there is also accumulating evidence that aggregation phenomena can be regulated by the cell and used to carry out important and beneficial biological functions ranging from molecular scaffolding to memory [1,2,3,4]. Moreover, when designing synthetic cellular systems using synthetic biology, we argue that protein aggregation may be viewed as a “feature” rather than a “bug”, and that self-assembling elements possess unique properties that can be exploited to engineer new biological functions [5]. In this Review, we provide a brief introduction to protein assembly and the spectrum of aggregation phenomena found in nature, we survey the diverse and rapidly expanding set of biological functions driven by supramolecular assemblies, and finally we offer a prospective discussion of the methods and benefits of their purposeful manipulation in cells and organisms.

Biological parts

Protein components can self-assemble into higher-order complexes or assemblies within the cell. A common feature of many of these proteins is the presence of intrinsically disordered regions (IDRs). IDRs are protein sequences that do not adopt a single three-dimensional structure, but instead endow proteins with flexibility to adopt a range of states, from unstructured to partially structured [6]. Due to this flexibility, IDRs can enable proteins to engage multiple partners and participate in the different types of interactions that facilitate initiation of protein assembly, e.g., (1) specific interactions among or between folded domains and unfolded sequences [7,8,9] and (2) non-specific weak interactions among IDRs [10, 11]. Depending on the relative strength and avidity of these interactions, as well as other factors such as the physical-chemical state of the cellular environment, a broad spectrum of assembly phenomena can arise (Fig. 1). On one end of the spectrum, proteins can be recruited and maintained in highly dynamic, metastable assemblies that are characterized by liquid-like properties [12, 13]; at the other end of the spectrum, these initial interactions can give rise to more ordered interactions that produce stable higher-order aggregates, like amyloid fibers. Below, we provide a brief overview of these different classes of supramolecular assemblies, discussing their key properties and hallmark examples.

Fig. 1
figure 1

Overview of higher-order assemblies. a Protein assemblies display a spectrum of material properties, from solid-like amyloid fibers to highly dynamic liquid droplets. Examples of assemblies are shown below the spectrum. Highly stable assemblies include MAVS (mitochondrial antiviral signaling) protein fibers and Aβ (amyloid β) peptide amyloid fibrils. Highly dynamic assemblies include nucleoli, membraneless organelles with liquid-like shell around a more organized rigid core. The yeast prion protein Sup35 can convert between different structures: it constructs stable amyloid fibrils in its prion conformation and undergoes reversible gel formation under pH stress. Stress granules and P-bodies can also exist in different states, depending on the physiology of the cell. b Prions are self-propagating protein conformations. The prion conformation (purple) serves as a template to convert the soluble (gray) conformation into the prion conformation, which usually results in the growth of amyloid aggregates. The aggregates are fragmented by chaperone proteins, producing seeds that can nucleate the conformational conversion

Dynamic assemblies

At the dynamic end of the spectrum are supramolecular assemblies based on phase separation. Biological phase separation has generated tremendous excitement as recent discoveries point to its critical role in many cellular processes, such as the formation of organelles and regulatory complexes, as well as in human diseases [14]. Our goal is to provide the reader a very brief introduction and primer to this phenomenon; for more details on the properties, formation, regulation, and function of this class of assembly, we refer the reader to several recent reviews [12, 14,15,16]. For simplicity, we will broadly refer to these assemblies as dynamic condensates or dynamic assemblies [17].

Phase separation is a physical phenomenon that occurs when a solution of molecular components spontaneously separates or demixes into co-existing (lower free energy) phases, such as when oil is mixed with water [1]. In cells, molecular assembly via phase separation provides a general strategy for organizing and compartmentalizing biological matter. Specifically, protein-based phase separation can drive the formation of cellular compartments and organelles that lack membrane boundaries. A canonical example is the nucleolus, a large and dynamic complex of proteins and RNA found within eukaryotic nuclei that is the site of ribosome biogenesis. Nucleoli can display a broad range of sizes, coalesce into larger droplets, and display a range of viscosities, all of which represent classic properties of liquids [18]. Nucleoli are just one example of a larger family of membraneless organelles. The first membraneless organelle characterized as liquid-like was the P-granule [19]. Since then, P bodies, stress granules, Cajal bodies, nuclear speckles, PML bodies, and germ granules have all been demonstrated to form through phase separation and possess liquid-like physical properties [20, 21].

As discussed more below, these dynamic assemblies can help to compartmentalize biological reactions and in some cases can even be composed of distinct, non-miscible sub-compartments with different protein compositions [22]. In general, weak multivalent interactions among components of the assembly drive the formation of condensates and regulate their dynamic properties. The interactions can be summarized into two main types: (1) specific interactions among folded domains [23] or between folded domains and unfolded sequences [7,8,9, 24]; and (2) non-specific weak interactions among IDRs [10, 11]. In the first case, the strength and valency of interactions and the relative component concentrations determine their phase behavior, which can span reversible droplet formation to gelation into a meshwork of non-covalently crosslinked hydrogels [8]. Proteins that undergo the second type of interaction often have disordered regions with specific sequence compositions, called low complexity regions (LCRs). LCRs feature highly repetitive sequences, such as polyglutamine (polyQ) repeats or glutamine/asparagine (Q/N)-rich motifs. Other key elements of these regions are clusters of hydrophobic and aromatic residues [25] and patches of charged side chains [26]. Interestingly, the dynamics of phase transition and the physical properties of the assembly, such as the viscosity, are intimately related to the sequence composition of the constituent proteins [22, 27], presenting unique opportunities to elucidate a “molecular grammar” [27] for assembly formation and to create designer assemblies based on sequence alone.

Stable assemblies: amyloid fibers

At the other end of the spectrum are solid structures, such as amyloid fibers. Amyloids are highly stable assemblies composed of protein molecules organized in a cross-β-sheet lattice of indefinite length [28]. Amyloid fibrils form through an unfolded-to-folded transition, wherein partially unfolded protein sequences lock into β-sheet conformations and self-assemble by aligning their β-sheets and subsequently growing in a linear fashion [29]. This process is generally described as nucleation-dependent polymerization, where the protein needs to access a metastable conformation in order to trigger aggregation. Once this state is achieved, the reaction proceeds rapidly to completion as a first-order reaction [30,31,32,33].

Proteins with high propensity to form amyloids contain disordered regions enriched in hydrophobic and polar amino acid residues. However, amino acid composition alone is insufficient to predict amyloid-forming capacity, as the position within the protein sequence also matters. Collectively, these insights have been used to predict amyloid-forming sequences in the proteomes of organisms [34, 35].

Amyloid fibrils are extraordinarily robust biomaterials. They are protease-, heat-, and detergent-resistant and can have stiffness comparable to that of spider silk and collagen fibers [36]. Both protein sequence and the physical/chemical conditions of the solvent [37, 38] contribute to the biophysical properties of the resulting fibril, such that the same underlying protein can form fibrils with a variety of morphologies and degrees of rigidity [38].

Amyloids have been linked to many human diseases, ranging from diabetes to systemic amyloidoses [39, 40]. Perhaps most notably, amyloid deposits are hallmarks of neurodegenerative disorders and their formation has been linked to the etiology of diseases, such as Alzheimer’s disease (AD) and Parkinson’s disease (PD) [41]. However, as described below, examples of non-disease, functional amyloids are being increasingly uncovered and characterized. For a more detailed treatment of these structures and their roles in disease and physiology, we refer the reader to other reviews [29, 42,43,44].

Heritable assemblies: prions

Prions are a unique class of aggregating proteins. Prion proteins, first discovered as the causative agents of transmissible neurodegenerative disorders in mammals (e.g., scrapie, bovine spongiform encephalopathy, and Creutzfeld-Jakob disease [45, 46]), have several unusual properties. First, they can exist in functionally distinct conformational states. Conversion between a soluble conformation (associated with “normal” protein activity) and a prion conformation results in dramatic change in protein activity that can lead to new cellular phenotypes, including disease. Second, the prion conformation is self-replicating. That is, it serves as a template to convert the soluble conformation into the prion conformation; once induced, this prion state can self-replicate on long biological timescales [47, 48]. Third, the prion conformation is infectious: it can be transmitted from one cell to another and, in the case of bona fide prions such as the disease-causing human prion protein (PrP), from one organism to another. Because of these properties, intracellular prions and their associated phenotypes are heritable [42]. Proteins that fulfill all of these criteria except for inter-organismal transmission are often referred to as prionoids, and there is accumulating evidence that proteins associated with human neurodegenerative diseases, such as Tau in AD, have prion-like seeding and spreading properties [49].

Most known prions form amyloid aggregates, templating the amyloid conformation onto newly synthesized proteins. Conversion between the prion and soluble conformation is a reversible process [50]. However, once insoluble amyloid fibers are formed, they are generally considered irreversible: amyloids are not cleared by the protein quality control system and stored as inclusions in cellular compartments [51, 52]. Not all prions are known to form amyloids. Certain prion proteins can also form dynamic condensates [32, 53]; others have yet undetermined conformations [54], and it is likely these elements can take on a broad spectrum of biophysical and conformational properties.

Much of our molecular understanding of prions originates from studies in yeast, where over ten bona fide prion proteins have been characterized to date (Table 1) [60, 73, 82]. Unlike mammalian PrP, certain prions in yeast have been proposed to play useful regulatory functions (see below). These studies have revealed two common molecular properties. First, canonical prions contain prion-forming domains (PrDs), which are modular intrinsically disordered domains (often Q/N-rich) that confer their prionogenic behavior. In silico screens trained on known PrDs have been able to predict new prion sequences [68, 83,84,85,86]. However, it should be noted that not all known prions have PrDs and canonical sequence properties, and thus identification exclusively on these requirements is insufficient [66]. Second, prions interact with and depend on the activity of chaperone proteins for propagation. In yeast, transmission and maintenance of the prion state are highly dependent on the disaggregase Hsp104, which serves to fragment amyloid fibers and create infectious, low molecular weight prion “seeds” that infect daughter cells [87]. Though recent studies suggest that Saccharomyces cerevisiae may harbor many more uncharacterized prionoid elements that appear to be independent of Hsp104, and instead dependent on other chaperones [81, 88]. Overall, new experimental tools will be needed to discover and characterize these elements.

Table 1 Prion proteins: bona fide prions, prion candidates, and prionoids

Finally, some prions can assemble into multiple distinct self-templating structures. These structural variants—or conformational alleles—give rise to distinct and stable prion activities, called strains [89,90,91,92]. The concept of the “prion strain” was first suggested based on studies of PrP, in which different structures of the prion protein were identified and linked to distinct disease pathologies [93]. Now, many prions are known to give rise to strains, and it has been shown that strain variation derives from different amyloid fiber structures (with different physical properties) that correlate with heritable phenotypes [37, 94, 95]. For example, a number of strains have been identified for the well-studied yeast prion [PSI+], which is formed by self-templated aggregation of the translation termination factor Sup35 [96]. Two of the most common [PSI+] strains are referred to as weak and strong [91]. Strains carrying weak [PSI+] have lower efficiency of translational read-through relative to strong [PSI+], and thus a weaker phenotype. However, weak [PSI+] fibers are thermodynamically more stable, with larger and more rigid amyloid cores compared to strong [PSI+]. This structural difference results in less efficient fragmentation by the chaperone machinery, and consequently the generation of fewer seeds, slower templating of new protein to the prion form, and less stable inheritance [97]. Understanding these sequence–structure–phenotype relationships will provide important blueprints to guide the purposeful manipulation and engineering of these powerful elements of inheritance.

Biological functions

Protein aggregation has been classically viewed as an aberrant process with pathological consequences. Indeed, cellular aggregates are associated with a large number of human diseases, including neurodegeneration [98], type 2 diabetes [39], and aging [99], and extensive work has been dedicated to elucidating their role in these and other diseases (reviewed elsewhere [42, 43]). But in part due to the development of new techniques to study protein aggregates (Table 2), we are gaining a new appreciation and understanding of their non-pathological roles. Protein aggregates play positive functions in a variety of cellular processes, including gene regulation [126, 127], signaling [76, 128, 129], memory storage [56, 130, 131], DNA repair [132], cell fate decisions [130, 133,134,135], and even evolution [2]. These examples should serve as inspiration for synthetic biologists aiming to purposefully manipulate information flow in living systems (Fig. 2).

Table 2 Methods to evaluate and quantify protein assemblies
Fig. 2
figure 2

Protein assemblies play important roles in a variety of critical cellular processes. a In eukaryotic transcription, co-activators and transcription (txn.) factors form highly dynamic protein condensates that recruit RNA polymerase II (RNA pol II) and drive robust gene activation. b RNA-binding proteins (RBPs) and RNAs coalesce to form RNP granules, which serve different RNA processing functions, such as mRNA storage and degradation, ribosome biogenesis, and localized translation. In one intriguing example, prion-like aggregation of CPEB3 promotes translation in activated synapses to potentiate long-term memory. c Higher-order assemblies play key roles in innate immunity. For example, prion-like polymerization of the MAVS adaptor protein in response to viral infection leads to amplification and stabilization of the antiviral response. d In yeast, stochastic switching between [prion] and [PRION+] states in a population of cells enables phenotypic diversification and may promote survival in uncertain environments. Figure adapted from Fig. 1B in [136]. In prion nomenclature, brackets denote non-Mendalian inheritance and capital letters denote dominance in crosses

Gene regulation

Many components controlling aspects of gene expression form dynamic protein assemblies that contribute to their regulatory mechanism. Strikingly, different steps of eukaryotic gene transcription appear to utilize regulated phase separation mechanisms [137]. A first step in transcription is the binding of transcription factors (TFs) to enhancer regions. Phase separation was found to be important in this process at super-enhancers, which are clusters of enhancers driving robust transcription of cell identity genes. In particular, certain TFs were shown to phase separate via their IDRs into liquid-like condensates that help to compartmentalize the transcriptional apparatus [138]. A following step in transcription involves Mediator, a complex that connects signals from TFs to RNA polymerase (Pol) II. Mediator has been shown to form phase-separated clusters both with TFs [127] and with Pol II [126] at active sites of transcription. Finally, the process of transcription elongation relies on phosphorylation of the C-terminal domain (CTD) of Pol II. This is accomplished in part by the enzyme complex positive transcription elongation factor b (P-TEFb). To ensure hyper-phosphorylation of the CTD and efficient elongation, P-TEFb undergoes phase separation into nuclear speckles capable of recruiting Pol II [139]. Interestingly, protein phase separation has also been implicated in gene silencing through recent work demonstrating that HP1α proteins, key factors involved in the formation of heterochromatin domains, have the ability to form liquid droplets in a regulated fashion [140,141,142].

Proteins regulating the RNA life cycle, downstream of transcription, are among the most prominent examples of molecules that undergo phase separation. RNA-binding proteins (RBPs) are particularly rich in disordered, low-complexity sequences. Many RBPs possess IDRs that have been shown to undergo liquid–liquid phase transition in cells [20, 22, 23], thus driving the formation of membraneless organelles important in RNA metabolism (these include nucleoli, stress granules, P-bodies, and Cajal bodies) [143]. As one example, condensation of components of the human miRISC complex facilitates recruitment of deadenylation factors that promote degradation and silencing of mRNAs [144]. RBPs have gained recent attention because, on the one hand, their aggregation can drive the formation of these functional membraneless RNP bodies, yet on the other hand, mutations in their low-complexity sequences are causal factors in neurodegenerative diseases, including amyotrophic lateral sclerosis (ALS) and multisystem proteinopathy (MSP) [145]. Interestingly, nucleotide repeat expansions, one class of mutation associated with neurotoxicity, which can cause gain or loss of function of genes encoding RBPs, have also been shown to alter the properties of RNAs. Specifically, repeat-containing RNAs have been shown to form gels in vitro by creating opportunities for multivalent base-pairing, and can accumulate in aberrant and potentially toxic nuclear foci in cells that sequester RBPs [146].

Immunity

The innate immune system is an ancient and rapid first-line defense that higher organisms deploy to defend against invading pathogens. This system consists of interconnected signaling pathways that activate inflammatory responses in an effort to eliminate the pathogen, as well as regulate different types of cell death, such as apoptosis and necroptosis (programmed necrosis). These pathways must be able to be rapidly deployed, but also tightly controlled and balanced in order to prevent excessive inflammatory responses or cell death. Many signaling components involved in these pathways are capable of oligomerization to form higher-order assemblies, sometimes generically referred to as signalosomes [147]. This capacity for oligomerization is an important mechanism for increasing specificity of and amplifying signal transduction [76].

One prominent example is the RIP1/RIP3 necrosome mediating programmed necrosis, a form of cell death (distinct from apoptosis) that represents an important host defense mechanism [128]. Here, the RIP1 and RIP3 kinases form a functional amyloid-based signaling complex to trigger programmed necrosis. Importantly, RIP1 and RIP3 kinase activation is required for this amyloid complex formation, which in turn can further enhance kinase activation through phosphorylation, thereby amplifying/propagating the pronecrotic signal.

Another key host defense mechanism is activation of inflammatory responses. This is carried out by the inflammasome complex, which translates pathogen and cellular danger signals recognized by sensors, such as NLRP3, into inflammatory responses through the adaptor protein ASC. Intriguingly, ASC was shown to be a bona fide prion in yeast [76, 77, 129]. Thus, in response to upstream sensors, initial oligomerization of ASC can result in prion-like nucleation; this in turn enables the templating of other ASC molecules to form large polymers capable of robustly recruiting caspase-1 molecules to induce their activation and propagate the inflammatory signal. Similarly, viral infection triggers the prion-like aggregation of the mitochondrial antiviral-signaling (MAVS) adaptor protein into fibrillar structures, which in turn recruit other soluble MAVS proteins, amplifying and stabilizing the antiviral response [76, 148]. These examples highlight how prion-like polymerization may provide a (evolutionarily conserved) mechanism for highly sensitive and robust response to cellular signals.

Memory

One of the most fundamental and remarkable aspects of organismal behavior is the ability to make memories of past events, and to subsequently modify behavior by learning. Cells have multiple mechanisms for making molecular memories that outlast the half-life of proteins. One mechanism is prion-like aggregation [130, 149]. In animals, the cytoplasmic polyadenylation element binding protein 3 is a highly conserved RBP (CPEB in Aplysia, Orb2 in Drosophila, and CPEB3 in mice) that plays a role in the formation of new memories [56, 150, 151]. Specifically, prion-like aggregation of CPEB3 in the synapses of stimulated neurons leads to the formation of RNA granules that bind and drive translation of mRNAs involved in synaptic plasticity and growth [57]. CPEB3, and possibly also other RBPs, represents a fascinating example of how conformational changes at the molecular scale can produce macroscopic changes in animal behavior, linking molecular self-replication, cellular memory, and neuronal memory.

Evolution

Since their initial discovery, we have come to understand prions not only as causative agents of disease, but also as sources of new and sometimes adaptive cellular functions [47, 66, 88, 121, 136, 152,153,154,155,156,157]. This has been most apparent in yeast, where several central regulators of information flow and metabolism have been determined to be prion proteins (Table 1). A canonical example is the S. cerevisiae prion [PSI+], formed by the translation termination factor Sup35 [48]. At a low frequency, Sup35 converts from a soluble, functional conformation to a self-templating prion. This allows ribosomes to read through stop codons, uncovering previously silent genetic variation on a genome-wide level, and thus producing diverse and heritable phenotypes that are often disadvantageous but that can provide advantages in particular environments [158, 159]. This has led to the provocative hypothesis that yeast prions may serve as adaptive ‘bet-hedging’ elements to promote cellular survival in stressful environments [160]. In support of this was the discovery that hundreds of wild yeast strains contain heritable prion states, which frequently confer beneficial phenotypes under selective conditions [88].

Other notable examples of yeast prions that enable epigenetic switching to endow cells with beneficial phenotypes in specific metabolic and environmental conditions include [URE3] [47], [MOT3+] [88], and [GAR+] [161]. In particular, the [SWI+] prion formed by the yeast Swi1 protein represents an intriguing potential example of bet-hedging. As the main subunit of the SWI/SNF chromatin remodeling complex, Swi1 serves as a global transcriptional regulator. When Swi1 adopts its prion form, a variety of phenotypes can be induced, such as growth phenotypes on alternative carbon sources, sensitivity to antifungal agents, and, importantly, abolished adhesion to other cells or substrates [162]. By maintaining a small population of [SWI+] cells, an isogenic population can effectively safeguard against unpredictable environments via these diverse and potentially beneficial phenotypes, for example, by providing an opportunity for non-adherent [SWI+] cells to disperse to new locations to ensure re-population and survival [160].

Recently, the first bacterial protein capable of prion formation (the Clostridium botulinum global transcriptional terminator Rho [59]) and the first viral protein exhibiting prion-like self-propagating activity (formed by the baculovirus LEF-10 protein [78]) were discovered, suggesting that prion-based mechanisms for phenotypic diversification may be more pervasive than originally thought. We expect that more of these elements will be discovered with the development of new genetic tools to characterize and validate putative prions from other organisms [117, 163].

Synthetic biology of protein assembly

Synthetic biology aims to synthesize complex biological function using basic molecular parts, a bottom-up approach that has been productively used to study design principles of cellular systems and purposefully manipulate them for useful applications [164,165,166,167,168]. While the field has undergone tremendous growth [169], there are still many challenges and limitations for engineering biological systems with predictive and ambitious functions [170]. Relatedly, considerable biology remains largely untapped from an engineering perspective. One primary example is protein self-assembly and aggregation. Nature has ingeniously exploited this seemingly simple and ancient form of establishing molecular interactions to create emergent systems that accomplish many complex cellular tasks. Synthetic biologists would be well-served to bring these powerful elements into the engineering toolbox and to develop methods for manipulating protein assembly systems for study and application (Fig. 3).

Fig. 3
figure 3

Application of higher-order protein assembly in synthetic biology. a Synthetic membraneless organelles, formed using proteins that undergo phase separation, can be used to enforce orthogonality of regulatory connections and biochemical reactions. This principle was recently used to create synthetic orthogonally translating (OT) organelles as sites for producing proteins that incorporate unnatural amino acids. b Exacting control over the formation of intracellular protein assemblies using optoDroplets. In this scheme, IDRs fused to light-inducible oligomerization domains enable the induction of phase separation by illumination with light. c Protein assembly systems as the basis of sensing and signal processing devices. Left: Protein assemblies can undergo dramatic changes in structure in response to small variations in environmental conditions, enabling exquisite sensing capabilities. Right: Changes in aggregation can be used to control downstream cellular processes. In the yTRAP system, the solubility state of an assembly domain is coupled to the activity of a synthetic TF and consequent activation of GOIs. d Prion proteins can exist stably in distinct conformational states, offering the potential to create synthetic memory devices based on prion switching

From sequence to behavior

One vision of synthetic biology is programming cellular behavior entirely at the level of DNA sequence. This presents an immediate challenge for engineering protein self-assembly systems, as these require an understanding of protein structure and stability and often rely on disordered and highly flexible structures. Indeed, while powerful software suites like Rosetta, I-TASSER, and QUARK can predict 3D structure of folded proteins based exclusively on amino acid sequence [171, 172], disordered proteins remain a challenge. To address this, computational approaches using hidden Markov models [173] and machine learning [174] have been developed to score propensity for properties, such as disorder [175], secondary structure [174], aggregation [176], prion behavior [68, 84,85,86, 177], and propensity to phase-separate [178, 179]. Genome-wide searches for PrDs utilizing such computational approaches have enabled the successful identification of new yeast prions, such as [MOT+] [68] and [RNQ+] [180].

One approach often used by synthetic biologists to engineer new synthetic protein systems is to leverage the functional modularity of certain protein domains. Fortunately, many PrDs have been found to be modular, which allows transfer of prion-forming capability onto new proteins by simply fusing them to PrDs [181]. These fusions can allow the activity of a protein to be regulated in a manner that depends on the prion conformation [68]. Moreover, because of the modular nature of PrDs, synthetic proteins harboring multiple PrDs could be generated, enabling artificial cross-seeding of aggregates and the study of higher-order prion interactions [117].

Another approach for engineering self-assembling protein systems based on sequence is to take advantage of studies using de novo peptides [182,183,184]. These studies have shown that tunable and reversible phase behaviors can, to some degree, be encoded into de novo synthetic proteins by combining low complexity domains in particular arrangements [183]. Using this knowledge, researchers have generated synthetic peptides with predictable phase behaviors [8, 182, 183, 185], synthetic prion domains [186], as well as redesigned bacterial amyloid proteins for application as structural biomaterials [187, 188], adhesives [189], and nanowires [190]. For example, researchers fused the Escherichia coli CsgA protein, responsible for curli amyloid fiber formation, to the mussel foot protein (Mfps) in order to create a synthetic protein capable of generating a fibrillary structure of adhesive proteins that outperforms mussel foot in adhesion to underwater surfaces [189].

Making circuit and pathway connections

One focus of synthetic biology is the construction of genetic circuits, networks of interacting regulatory molecules that can manipulate information flow in living cells. Fundamental to genetic circuit design is the ability to make user-defined molecular interactions. Owing to the relative ease of engineering specific and orthogonal protein–DNA interactions, synthetic biologists have made great strides in building and exploring applications of transcriptional circuits [170, 191,192,193,194,195,196,197]. Building blocks for these circuits include naturally occurring microbial transcription factors (TFs) with well-defined DNA binding motifs as well as synthetic TFs that use programmable DNA-binding domains, such as zinc fingers, TALEs, or CRISPR/dCas9 [198,199,200]. Similarly, synthetic RNA-based circuits have been designed to great effect by exploiting simple Watson–Crick base-pairing interactions [201,202,203,204]. On the other hand, engineering protein-based systems, such as synthetic signaling circuits, has been more limited in part because of the inherent difficulty in making programmable protein–protein interactions [205, 206]. To date, synthetic biologists have relied on a relatively limited set of well-characterized, folded interaction domains to accomplish this task, for example, using leucine zippers and PDZ domains to enable recruitment of signaling proteins and pathway modulators [207,208,209]. More recent work has begun to expand this protein toolkit to viral proteases [210, 211] and even synthetic phospho-regulon motifs for building phosphorylation circuits [212].

Protein self-assembly could be used to enhance many of these efforts. For example, in eukaryotic transcriptional circuits, it is well-appreciated that genes are naturally regulated by large, multivalent TF complexes, rather than through the one-to-one interactions commonly used in synthetic circuits. The regulation of genes via multivalent assemblies provides opportunities to integrate many signals at a promoter and to achieve highly cooperative (‘digital’) transcriptional responses. In an exciting and extreme case of this, condensate formation at genomic loci has recently been shown to be associated with eukaryotic transcription initiation, and has been implicated in enabling highly cooperative and robust gene regulation [126, 137, 139]. Our group recently pioneered a method for engineering synthetic, multivalent TF complexes that utilize cooperative assembly to regulate genes, and showed that cooperative TF assemblies enable the construction of genetic circuits with complex signal processing behavior in yeast [192]. This framework could be extended to incorporate post-translationally regulated IDRs [213] in order to explore whether transcriptional condensates can be synthetically engineered and used to enable new forms of transcriptional control [214].

Protein self-assembly also represents an alternative mode of programming regulatory connections for synthetic protein circuits, one that can efficiently facilitate the creation of many multivalent interactions. For example, by creating fusions to modular IDRs, signaling proteins can be directed to scaffold or phase separate, thereby increasing the specificity and efficiency of a signaling task. Moreover, the formation of these assemblies can be controlled in a variety of ways. One interesting way is through post-translational modifications, which are known to dramatically alter the biophysical properties of an IDR. For example, phosphorylation can dismantle RNA granules [215], while methylation [216], acetylation [213], and SUMOylation [217] can promote the dissolution of several types of condensates. Exploiting these post-translational modifications, either via synthetic or endogenous mechanisms, could provide a means to control the formation and reversal of protein assemblies that bring together regulatory proteins of interest.

Constructing synthetic organelles

The ability to control the formation of assemblies offers the intriguing possibility for compartmentalizing biochemical reactions into spatially separated “synthetic organelles”. These could be programmed to serve as factories for the production of complex chemicals [218] or signal transduction hubs in synthetic signaling circuits, or spatially separate synthetic processing units inside the cell. This strategy offers a number of unique advantages for synthetic biology. First, constraining reaction components into a small compartment creates a “reaction crucible” that can increase the efficiency of enzymatic reactions [219]. Second, biochemical specificity is inherently enforced by spatially separated compartments, thus effectively insulating synthetic circuits and addressing a key challenge in synthetic biology of component cross-talk. Finally, multiple co-existing assemblies could in principle be encoded in a single cell, each performing new and different reactions. In a very recent and striking example of this concept, researchers designed an artificial membraneless organelle capable of sequestering and supporting orthogonal protein translation machinery (mRNA, suppressor tRNA, unnatural amino acid tRNA synthetase, and ribosomes) to efficiently produce proteins that incorporate unnatural amino acids [220] (Fig. 3).

The formation of intracellular organelles can be synthetically and spatiotemporally controlled by new methods, such as optoDroplets [219]. OptoDroplets are composed of a protein fusion between the IDRs of RBPs, such as FUS and DDX4, and the blue-light inducible oligomerization domain CRY2. Light-stimulated oligomerization of CRY2 serves to increase local concentration and nucleate the formation of assemblies. Critically, the assembly properties of optoDroplets can be adjusted based on the protein fusion and the light stimulation. Low intensity light and short exposures lead to reversible droplets, whereas high intensity or increased exposure induces formation of more stable, amyloid-like aggregates.

Synthetic organelles that carry out desired reactions can also be engineered with other classes of self-assembling proteins, including modular folded proteins. One notable example involves the encapsulin family of proteins [221]. These bacterial proteins assemble into large, hollow nanocompartments, which can be loaded with cargo proteins that have been equipped with an encapsulation tag [222]. By tagging enzymes of a biochemical pathway, a desired reaction can be physically constrained and efficiently performed within the nanocompartment [223].

Cellular sensing and signal processing

Cells have exquisite sensitivity for a diverse array of chemical and biological stimuli and the ability to actuate appropriate responses to these signals. Synthetic biologists aim to co-opt these systems in order to create engineered cellular sensors and signal processing devices that are responsive to desired ligands and stimuli. Protein assembly systems have unique features that can potentiate these efforts. For example, protein assemblies can undergo dramatic changes in structure in response to small variations in environmental conditions. A striking example of this exquisite sensitivity involves the yeast stress granule polyA-binding protein 1 (Pab1), which undergoes phase separation and hydrogel formation in response to increases in temperature [23]. Specifically, in cells, this protein is soluble in the cytoplasm at 30 °C and readily forms droplets at 46 °C. Within this physiological range of temperatures, Pab1 was shown to form droplets of increasing size as temperature was increased in in vitro studies.

In addition to considering assemblies as direct sensors of physiological changes, significant technological efforts are underway to enable sensitive detection and processing of aggregation states inside living cells. One recently developed technology, termed distributed amphifluoric FRET (DamFRET), uses fusions to a photoconvertible fluorophore to quantify protein aggregation states based on FRET signal [32]. DamFRET can provide information about the proximity and conformation of protein monomers as a function of their concentration. As such, it is useful for quantifying the kinetics of nucleation of a protein aggregate in cells. Nucleation is a rare, kinetically slow, and unfavorable step that precedes a highly favorable elongation step, leading to a stable aggregate. Since most proteins exist in solution at concentrations near the nucleation barrier, stochastic nucleation events may occur. By increasing protein concentration, the probability of nucleation events increases in a way that is intimately related to the sequence properties of the biomolecule. By evaluating how mutations in protein sequence affect the critical concentration for nucleation, DamFRET experiments are able to elucidate the sequence properties that affect nucleation kinetics and therefore favor or disfavor protein aggregation.

Changes in aggregation can also be coupled to and used to control downstream cellular processes. One example of this capability was demonstrated in the design of the yTRAP (yeast Transcriptional Reporting of Aggregating Protein) system, a genetic tool we developed for high-throughput sensing and control of protein aggregation states in yeast cells [117] (Fig. 3). Specifically, by fusing an aggregation-prone protein domain of interest to a synthetic TF, the activity of the synthetic TF is coupled to the solubility state of the protein domain of interest. Therefore, in the “soluble state”, the yTRAP module is free to regulate its cognate synthetic reporter locus, whereas in the “aggregated state”, the module participates in cellular aggregation quantitatively affecting the transcriptional activation. This framework enables high-throughput genetic and chemical screens to discover aggregation-prone domains and modulators of their aggregation. Additionally, this mechanism could be adapted to create more elaborate synthetic protein assembly systems that control other cellular processes.

Engineering memory and inheritance

Memory is fundamental to computation by man-made devices. Similarly, the ability to store memory of past events is a universal feature of living systems, and is a requirement for a number of fundamental biological processes, such as environmental adaptation, cellular differentiation, and multicellular development. Biological memory in this case is defined as the conversion of a transient signal into a sustained response. Implementing synthetic systems that achieve cellular memory has been a long-standing goal of synthetic biologists, dating back to the origins of the field. Among the first artificial genetic circuits reported was the genetic toggle switch, in which two bacterial transcriptional repressors were arranged in a mutual inhibitory network to give rise to bistability, i.e., a system that can switch between two stable states [224]. Since then, many other molecular mechanisms for encoding cellular memory, naturally inspired or otherwise, have been implemented and explored [225, 226]. These can be broadly divided between epigenetic (transcriptional feedback loops, heritable chromatin changes, etc.) [227,228,229] and inducible DNA mutations/alterations [230,231,232,233,234]. Taken together, this work has yielded foundational synthetic elements for building more sophisticated biological systems that enable cell state changes, memory of gene expression states, and cellular devices that record lineage and environmental information.

Yeast prions have several properties that, in principle, make them excellent candidates for building stable synthetic memory [225]. First, they exhibit bistability, meaning that a given cell can stably exist in either a [prion] or [PRION+] phenotypic state. Second, cells can reversibly transition between the two stable states in all-or-none fashion [235]. Third, the states propagate for long biological time scales: because the aggregated prion conformation is transmitted through the cytoplasm, these confomers and their associated phenotypes are robustly inherited by progeny. Taken together, this forms the basis of bistable switches that can set, reset, and store long-lived biological memory [236]. These capabilities were recently demonstrated in the construction of a synthetic memory device that recorded a transient environmental event into a population of yeast cells using prion switching [117]. Specifically, by placing expression of a novel [PSI+]-inducing factor under the control of a temperature-sensitive promoter, cells could be programmed to remember a short exposure to elevated temperature over ten generations later. These types of prion-based memory elements could be deployed in populations to record and report on environmental variables experienced in natural or industrial contexts, such as in industrial bioreactors.

Prion conformations and their associated phenotypes are inherited in a dominant non-Mendelian fashion [48]. As such, they effectively act as the epigenetic analogs of gene drives, genetic systems that bias the standard Mendelian inheritance of a specific allele to increase its prevalence in a population [237]. Manipulating these elements could thus open up the possibility for driving or reshaping the inheritance of an epigenetic trait in a population. As a first step toward this goal, prion alleles with the propensity to cure prions were identified and used to construct anti-prion drives (APDs), systems that can reverse the dominant inheritance of prions (and in some cases eliminate them) [117]. In the future, engineered strains carrying prion and APD elements could be deployed in wild-type populations to compete and perform population-level control of desired epigenetic traits. Building out this toolkit of manipulable prion-like elements should provide synthetic biologists with new strategies for engineering computational and evolutionary functions into cells and populations.

Concluding remarks

Protein self-assembly remains a little explored and exploited mechanism for synthetic biology and cellular engineering; but it could offer many advantages, such as the ability to program emergent nonlinear behaviors and catalyze drastic cellular changes with a relatively small set of constituent parts. However, the unique properties of these systems can also make them difficult to design and manipulate. For example, challenges associated with designing and manipulating disordered and self-assembling proteins include: (1) The high false positive rate of structure prediction algorithms. This necessitates experimental validation of assembly formation for each predicted protein or domain. (2) Predicting and designing the stoichiometry of disordered protein assemblies. This will require an increased understanding of the basic biophysics of assembly formation, or either focusing on applications that are insensitive to stoichiometry or using structured protein domains to introduce well-defined molecular interactions. (3) Lack of methods for precisely controlling formation and dissociation of assemblies. Here, the use of engineered post-translation modifications known to modulate assembly formation could be highly advantageous, but will require the development of synthetic tools for exacting control over these signaling events. Overall, through the combination of synthetic biology manipulation, quantitative studies, and an increased understanding of their underlying biophysics, we can make possible an era of creating designer protein assemblies for application and study.