Gene organization and evolutionary history

Proteolysis within the membrane was discovered in seemingly rare contexts nearly 15 years ago [13]. It is now widely appreciated that this fascinating regulatory paradigm permeates most areas of modern cell biology [47]. Of the three protease families that catalyze intramembrane proteolysis, rhomboid enzymes are the only family that were not discovered from the direct study of human disease. The name 'rhomboid' has its origin deep in the rich folklore of Drosophila genetics. Rhomboid emerged from the historic quest to identify all genes required to organize construction of a free-living organism from a single cell [8, 9]. Because genes were named after the altered appearance of the mutant larval cuticle, the mis-shaped, rhombus-like head skeleton of the mutant embryo earned rhomboid its name. Mutating the growth factor that rhomboid activates yielded indistinguishable head-skeleton defects, and was named spitz ('pointed' in German).

The rhomboid gene was cloned and sequenced by Bier and colleagues in 1990, revealing a seven transmembrane (7TM) protein with no homology to any sequence known at the time [10]. The spitz sequence was more informative, encoding a clear epidermal growth factor (EGF)-like protein [11]. The fact that rhomboid mirrored spitz phenotypically [9], and encoded a seven TM protein, led to the proposal that it might be a serpentine receptor for Spitz signaling [11]. But as sequencing of genomes from diverse organisms began to reveal rhomboid homologs in every form of cellular life [12], it became clear that rhomboid proteins may be at the core of very diverse biological regulation.

Sequence analysis, however, yielded no clues about the underlying biochemical function of rhomboid proteins, and no other homologs were as well studied as Drosophila rhomboid. A decade of Drosophila genetics had, however, set the stage for a biochemical approach: rhomboid was definitively implicated as an upstream activator of Spitz in the signal-sending cell, providing a framework for analyzing its molecular function [1215]. A focused analysis of Spitz activation eventually yielded four key pieces of the puzzle [16]: (i) substochiometric levels of Rhomboid triggered Spitz proteolysis, implying that Rhomboid acts enzymatically; (ii) proteolysis depends absolutely only on four Rhomboid residues, and their identity is consistent with serine protease catalysis; (iii) Spitz proteolysis is blocked only by serine protease inhibitors; and (iv) Spitz is cleaved within its TM segment at a depth similar to that of the putative rhomboid catalytic serine. These pieces fit together into a model in which rhomboid acts as an intramembrane serine protease for Spitz [16], which was confirmed 4 years later by reconstituting cleavage with pure proteins [1719].

The ability to study Spitz proteolysis as a direct test of rhomboid activity was used to determine that even distant bacterial homologs are functional intramembrane serine proteases [20]. Most bacterial species are now known to encode one rhomboid protease, while some encode two, and very few encode three [21]. Rhomboid proteases are also present in many if not most Archaea, but the greatest expansion occurred in multicellular organisms and some parasitic protozoa. Although the human, mouse and Drosophila genomes encode at least seven rhomboid genes, the largest number of rhomboid genes are encoded by plants (13 in Arabidopsis), which do not have EGF signaling [2123]. In many of these diverse organisms, at least one rhomboid has directly been demonstrated to have proteolytic activity (Table 1) [16, 20, 2326].

Table 1 Known rhomboid protease substrates and functions across evolution

Rhomboid proteases are found in all branches of life, yet the sequence identity across all family members is strikingly low, around only 6% [12, 21, 22]. We suggest that this is not despite rhomboid proteases being so widespread but because of it. This divergence is exacerbated by their sequences being predominantly transmembrane and thus experiencing a different evolutionary pressure [27]. This has made phylogenetic analyses noisy, yielding few incontrovertible conclusions and inevitably fueling debate. Of particular intrigue is their evolutionary origin: rhomboid proteins have been argued to be perhaps the most widely distributed membrane proteins in nature [21] (Figure 1). This near ubiquity is instinctively viewed as evidence of an ancient enzyme family that evolved early [12]. Although this is likely if the last universal common ancestor already encoded several different rhomboid proteins, phylogenetic analysis has also raised the possibility of a different history in which rhomboid proteins are a later invention of bacteria that rapidly spread to most other organisms [21]. This scenario requires a controversial amount of horizontal gene transfer to populate all kingdoms of life. Currently the true nature of rhomboid phylogenetic history remains a point of inconsolable debate, but three observations serve as valuable guiding principles.

Figure 1
figure 1

Phylogenetic tree of rhomboid proteins. Rhomboid protein sequences are widely scattered throughout all branches of cellular life. A subset of 109 Rhomboid and Derlin family protein reference sequences, retrieved from the NCBI RefSeq database, was chosen to illustrate their diversity. Mega 5.05 was used to align sequences by MUSCLE and construct an unrooted neighbor-joining phylogenetic tree. Branches are labeled according to their common characteristics and are shaded or outlined to denote active or inactive protease sequences, respectively. Individual sequence names are colored black, blue, or red to indicate a 6TM, 6+1TM, or 1+6TM arrangement, respectively, with each RefSeq accession number included within parentheses. Despite the tremendous number and diversity of rhomboid proteins, structures of only two 6TM rhomboid proteases have been solved (yellow stars).

Characteristic structural features

The first organizing principle emerging from sequence analysis is the observation that rhomboid proteases come in three distinct topological flavors (Figure 2) [21]. The simplest consists of the 6TM core, which itself is the smallest catalytically active unit. This form predominates in bacteria, but is also represented, albeit more rarely, in eukaryotic organisms, including animals. To this basic unit eukaryotes add a seventh TM segment following the 6TM core (6+1TM form). Five of the seven Drosophila, human, and mouse rhomboid proteins are of this form. Analogous 7TM forms also occur in bacteria, but are rare. Lastly, a distinct form of 7TM rhomboid proteases exists in endosymbiotic organelles, adding the seventh TM preceding the 6TM core (1+6TM form). The best studied are those imported into mitochondria [2830], although interest in plastid-resident rhomboid proteins has recently been sparked [31]. Although the sequence analysis is clear on these three topological distinctions, their functional relevance is unclear. The expectation is that they confer different biochemical properties, although current evidence, albeit limited, indicates that many bacterial 6TM forms and eukaryotic 6+1TM forms show similar activity against surrogate substrates, including Spitz [17, 32, 33].

Figure 2
figure 2

Rhomboid proteins exist in three topological forms. The smallest, catalytically active form of a rhomboid protease consists of a 6TM core, with variable amino termini (dashed lines). Most eukaryotic and mitochondrial rhomboid proteases have an additional TM segment, added either carboxy-terminally (eukaryotes, blue) or amino-terminally (mitochondria, red) to the 6TM core, as depicted. Catalytic residues are in yellow for nucleophilic chemistry (hydrolysis) and white for electrophilic residues (oxyanion transition state stabilization). Cytoplasm is down in each diagram.

Although protease activity has been reconstituted with both the 6TM and 6+1TM rhomboid forms in vitro, it is only a 6TM form called GlpG from Escherichia coli and Haemophilus influenzae that has proven amenable to structural analysis [3437]. This major breakthrough - the first atomic-resolution structure of any intramembrane protease - not only confirmed that proteolysis is intramembrane and catalyzed by a serine protease apparatus, but revealed an unanticipated and complex architecture. Although a thorough description is beyond the scope of the current discussion (see [38] for a comprehensive review), two features are characteristic (Figure 3): although most TM helices are long and run roughly perpendicular to the membrane, the fourth TM segment runs slanted relative to the others and enters the center of the protein as an extended loop, converting to an α helix at the catalytic serine. More unexpected was the orientation of the long L1 loop connecting TMs 1 and 2, which forms a lateral hairpin that lies half submerged in the membrane. This feature, which has not been encountered before or since, has major structural implications and results in a highly asymmetric protein. It is assumed that the structure of the other rhomboid forms will be analogous, and recent modeling of the mitochondrial 1+6TM form on E. coli GlpG hints at an unanticipated level of similarity [39].

Figure 3
figure 3

Structural features of the rhomboid 6TM core. The crystal structure of the 6TM core of the E. coli rhomboid protease GlpG (PDB 2NRF molecule A) is shown from three vantage points ('top view' is looking at the cell from the outside with the membrane in the plane of the page). The protein forms a compact helical bundle, with two characteristic features. A short and slanted TM4 (black) forms a helix below the catalytic serine (circled in the 'back' view), but an extended loop (L3) above it. This slanted trajectory and extended loop create a cavity above the serine. The L1 loop (purple) forms a hairpin structure that nestles between TMs 1 and 3 and protrudes laterally into the outer leaflet of the membrane (red dashed lines representing the membrane interface are provided only for reference). Catalytic dyad residues serine and histidine are in cyan; putative oxyanion-stabilizing electrophilic asparagine and histidine residues are in red.

In addition to the number of TMs, two further variations provide potential for additional rhomboid diversity. First, in all three forms, the cytosolic amino termini are highly variable, ranging from large domains to being non-existent. The implications, however, remain unclear, at least partly because achieving well-diffracting crystals required absence of this domain, making its relationship to the catalytic core speculative. On the simplest level, these domains may house sorting signals [40].

Secondly, rhomboid proteins are often encountered that clearly lack catalytic residues. These should be considered rhomboid proteins but not rhomboid proteases. Two predominant clusters are a distinct 6+1TM form in animals, called iRhom proteins [22, 41], and a 6TM form that is represented widely in eukaryotes by the Derlin proteins [4245]. Both of these have been implicated in endoplasmic reticulum-associated degradation (ERAD). Derlins have clear sequence homology near the membrane-submerged L1 loop, but also less conspicuously along their entire length, and are thus likely to adopt a GlpG-like 6TM structure. Although clearly not proteolytic, their potential similarity to other aspects of the rhomboid protease mechanism should not be discounted at this early stage (but lie beyond the scope of this review).

Localization and function

The second guiding principle stems from the tremendous diversity of organisms that encode rhomboid enzymes. Since these include organisms that do not encode any known forms of cell-to-cell communication, sequence information implies that rhomboid proteins perform an ancient and fundamental role in cell biology. This function is not essential for cell survival, however, because several lineages are missing rhomboid genes entirely, presumably by gene loss [21]. Although defining the cellular functions of rhomboid proteases has proven a persistent challenge, focused investigations have succeeded in documenting the function of at least one rhomboid in nearly a dozen organisms (Table 1). These functions are usually regulated by substrate trafficking, and fall into four broad categories (Figure 4).

Figure 4
figure 4

The cellular roles of rhomboid proteases fall into four categories. Top left: Rhomboid proteases initiate EGF signaling during Drosophila development. Rhomboid-1 is localized in the Golgi apparatus, and cleaves Spitz (green) after it is transported from the endoplasmic reticulum by Star (purple). Cleaved Spitz is secreted to activate EGF signaling in neighboring cells. Top right: The mitochondrial rhomboid PARL cleaves PINK1 to reduce Parkin recruitment to mitochondria and downregulate mitophagy. Cleavage may depend on changes in PINK1 topogenesis in response to mitochondrial potential. Bottom right: Malaria parasite-encoded rhomboid proteases cleave adhesins to disassemble the junction formed between parasite and host erythrocyte at the end of invasion. Note that adhesins (in black), initially held in internal organelles, encounter rhomboid only when they are secreted onto the surface and motored to the posterior of the parasite. Bottom left: The Providencia rhomboid protease AarA activates TatA by removing a small amino-terminal extension. This allows TatA to assemble into the machinery required for protein (and presumably quorum-sensing signal) export. In the left two roles rhomboid cleavage activates a latent factor whereas in the right two roles cleavage inactivates the target protein.

First, rhomboid proteases initiate animal cell signaling by releasing growth factors from the membrane. This function emerged from detailed genetic study of Drosophila development; rhomboid proteases are localized in the Golgi apparatus and act as the signal-generating component by cleaving Spitz to initiate the pathway in neighboring cells [16, 46]. Although a role in regulating EGF signaling is also seen in Caenorhabditis elegans vulval development, CeROM-1 has a surprisingly minor role as a target of EGF signaling that sets up a paracrine loop to amplify and spread the signal [47]. Even less is clear in mammals: recent investigations have localized rhomboid proteins to the secretory pathway and cell surface and begun to uncover increased rhomboid expression in cancer cells with potential links to growth factor signaling [24, 48, 49]. However, this is not limited to active rhomboid proteases; expression of the iRhom RHBDF1, which is localized in the endoplasmic reticulum in human epithelial cancer cells, increased secretion of the EGF ligand transforming growth factor-α [50]. Accordingly, RHBDF1 silencing decreased pathway activation through EGF receptor (EGFR), ERK and AKT phosphorylation, and limited tumor growth in mice [51]. The Drosophila homolog, however, was recently found to have the opposite effect of decreasing EGFR signaling by promoting the ERAD-mediated degradation of EGF ligands [41]. The basis of this remarkable discrepancy is currently unclear; knockout mouse studies are expected to provide clarity on the physiological roles of rhomboid proteins.

Recent studies have also placed the mitochondrial rhomboid protease at the nexus of key pathways that govern mitochondrial fusion, mitophagy and apoptosis. All mitochondrial rhomboid proteins are encoded in the nuclear genome, and imported into mitochondria. The main function of the mitochondrial rhomboid Pcp1 is to release the dynamin-like GTPase Mgm1 from the membrane [28, 52, 53]. Because Mgm1 is essential for mitochondrial fusion and Mgm1 cleavage occurs only in healthy mitochondria, this limits fusion to occurring between healthy organelles [54]. A similar function was described in Drosophila [30], but genetic interactions soon revealed further complexity in metazoans; the mitochondrial rhomboid DmRho-7 also participates in the Parkin/PINK1 pathway that malfunctions in Parkinson's disease [55]. It has recently become clear that the human mitochondrial rhomboid PARL cleaves PINK1 to suppress its ability to recruit the Parkin ubiquitin ligase onto mitochondria [5658]. Without PARL cleavage, PINK1 accumulates in mitochondria and fails to be recruited properly to damaged mitochondria. A PARL knockout mouse suffers tremendous atrophy several months after birth resulting from malformed mitochondria and elevated apoptosis, although without mitochondrial fusion defects [29]. PARL has also been implicated in suppressing apoptosis in lymphocytes, potentially through a different substrate, High-temperature regulated A (HtrA, also called Omi) [59]. Intriguingly, mutations in PARL have recently been found in Parkinson's disease patients [58] and diabetes patients [60], although the significance of these mutations for disease remains speculative.

The third category of rhomboid function was revealed in Providencia stuartii, a Gram-negative bacterial pathogen. Genetic screens identified its rhomboid homolog, AarA, to be required for production of an unidentified signal for quorum sensing [61, 62]. Once the similarity to rhomboid was noted [63], proteolytic activity of AarA was demonstrated against Spitz [20], and AarA was found to partially rescue tissue development of Drosophila mutant in rhomboid [64]. Historically, the intriguing similarity of activating Drosophila EGF signaling and producing an auto-inducer for bacterial quorum sensing, both by a rhomboid, received much attention [63, 65]. But the similarity proved to be superficial when the substrate was identified to be TatA, a component of the twin-arginine translocation machinery [66]. As such, AarA removes a short amino-terminal extension, presumably to activate the machinery for signal secretion, rather than activating the signal itself. TatA from other bacteria, including E. coli, lacks this short extension and is immediately active, and the AarA function is therefore an exception. Nevertheless, this is the only known function for a rhomboid protease in any prokaryote, and it dramatically highlights the apparent diversity of rhomboid function even within similar bacteria.

Finally, rhomboid proteases help to dismantle adhesive junctions in unicellular eukaryotic parasites. This is the only role that was discovered by searching for rhomboid targets using substrate specificity determinants [33]. The adhesins of Plasmodium and Toxoplasma are necessary for host-cell invasion, making them essential proteins for the survival of these obligate intracellular parasites [67]. These parasites encode six or more rhomboid proteases, two of which in each organism are known to process these adhesins at the end of the invasion program [25, 26, 6870]. The precise need for this dismantling is not entirely clear, but has been thought to free the parasite from being tethered to the host plasma membrane. Recent knockdown experiments indicate that this processing is important for efficient invasion [71], although the full extent is incompletely understood and may involve later functions during parasite replication within the host cell [72]. Even the non-cell-invasive Entamoeba histolytica encodes a highly active rhomboid protease, which is localized to the parasite surface but which relocalizes to phagosomes during feeding and the bud neck during immune evasion, perhaps to shed surface proteins, including lectins [73, 74]. The functions of other Plasmodium or Toxoplasma rhomboid proteases not involved in invasion are not yet understood [75], and many other parasites encode rhomboid enzymes whose functions have never been explored.

Mechanism

Perhaps the most powerful, yet subtle, guiding principle that can be deduced from the near ubiquity of rhomboid proteases is that they possess a biochemical property that is both very rare and highly useful: but what? Solving this riddle requires understanding the enzymatic features of rhomboid proteases, and remarkable progress has been made towards these goals (reviewed in [38]).

There is now proof beyond doubt that rhomboid enzymes are serine proteases. This includes reconstitution of proteolysis with pure proteins [17, 19], protease inhibitor profiling [16, 17, 76], extensive analysis of residues essential for activity [16, 18, 19, 77], and structural visualization of catalytic residues and with a covalently bound inhibitor [3437, 78]. Moreover, the initial paradox of how water is delivered to the membrane-immersed active site for hydrolysis was largely addressed by structural analyses [3437]: the active site lies submerged about 10 Å below the presumed membrane surface, but with an open cavity above the active site for water access (Figure 3).

Structure-function analyses of rhomboid proteases have also revealed several unusual proteolytic properties that make them unlike most serine proteases. These differences are clear evidence of convergent evolution to a serine protease mechanism down an independent path. First, structural analysis indicates that nucleophilic catalysis is achieved by a histidine-serine catalytic pair, rather than the more common aspartate-histidine-serine catalytic triad [3437]. Catalytic dyads have been noted in a minority of exceptional serine proteases [79]. The identity of the residues that stabilize the oxyanion transition state is uncertain, but this stabilization is most likely mediated by asparagine and/or histidine side-chains [36, 78] (Figure 3). Use of an asparagine for oxyanion stabilization is uncommon but strikingly analogous to the mechanism of the conventional serine protease subtilisin [80].

The third unusual catalytic property of rhomboid proteases relates to the direction in which substrates lie across the active site cleft relative to the catalytic residues. Although initially thought to be similar to nearly all other serine proteases [34, 37], identification of the substrate gate on the opposite side of GlpG relative to expectation mandated that substrates approach the catalytic residues from the so-called 'si' face [35, 77, 81]. This stereochemical arrangement is very uncommon and had only been encountered in α/β-hydrolyses [82]. Consistent with this stereochemistry are rhomboid's resistance to most canonical serine protease inhibitors and a weak but specific sensitivity to monocyclic β-lactams [16, 17, 76]. It should be stressed that the definitive evidence for substrate orientation, identity of the oxyanion hole, and the nature of substrate stabilization await a co-structure with a peptide substrate.

Rhomboid proteases have been studied largely within the framework of an established serine protease precedent as a way to interpret rhomboid mechanism, which is instructive but does not help to understand how they are different. Although deciphering the specifics of the catalytic chemistry is essential for designing effective inhibitors, the key functional properties of rhomboid enzymes that are relevant to the cell are unlikely to be determined by its catalytic mechanism. These defining features most likely result from membrane-immersion of the enzyme, and more recent investigations have started to study rhomboid proteases as integral membrane proteins directly.

The greatest impact of membrane immersion is on how substrates and rhomboid proteases behave (as reviewed in [38]). The closed ring of TM segments observed in the first crystal structure suggested that something must move to clear a path for lateral substrate entry [3437]. Only mutations that weaken TM5 packing with TM2 were found to enhance protease activity by up to ten-fold, thereby identifying the gate functionally [77, 81, 83]. This dramatic enhancement also revealed that gate opening is the rate-limiting step for intramembrane proteolysis. Molecular dynamics simulations and structural analysis in a bicelle also suggest membrane thinning surrounding GlpG, but its mechanistic implications remain unclear [84, 85]. Investigating the role of the membrane in greater detail promises to reveal the defining features of the rhomboid proteolysis system.

Frontiers

The rhomboid gene was identified in the Drosophila screens of the late 1970s and early 1980s [8], and it was cloned and sequenced about a decade later [10]. It took another decade, until 2001, for its biochemical function as an intramembrane serine protease to be revealed [16]. It has now been a decade since that turning point, and advances in the intervening period have culminated in rhomboid proteases becoming widely regarded as the best understood of all intramembrane proteases [38]. Biochemical insights and defined roles in parasitic protozoa (reviewed in [6]) place rhomboid study on the cusp of becoming applicable in a therapeutic setting. A major lingering obstacle is a rudimentary understanding of its unusual enzymatic mechanism, but these questions are being pursued intensively, and momentum towards a sophisticated understanding is building [38].

By contrast, defining the cellular roles of rhomboid proteases has been a slow process [86]. Although even early biochemical insights have led to the identification of substrates that can be cleaved, whether these candidates are indeed physiological targets, and if so, whether they truly represent a major rhomboid function, remain unknown. For example, although the study of human RHBDL2 over the past 7 years has uncovered at least three well-cleaved substrates (thrombomodulin [24], B-type ephrins [87], and EGF [48]), it is still unclear which, if any, are actual physiological targets, and whether cleavage represents a bona fide contribution to cellular function. Perhaps the most humbling example is E. coli GlpG, whose atomic details have been revealed in over a dozen structures and countless mutants, yet its cellular function remains a complete mystery [88]. In reality, it is not the ability to find substrate candidates but rather their validation that has proven to be the bottleneck in these studies. Refining search algorithms is unlikely to contribute much towards solving this problem. The urgent need is for approaches with which to study enzymes under physiological settings on a higher throughput scale. This, in turn, will focus biochemical investigations by providing physiological targets and new functional contexts.