Background

The design and construction of microbial cell factories is often focused on the engineering of intracellular enzymes and pathways, and the role of membrane-embedded proteins is often overlooked. It is nevertheless becoming increasingly clear that integral membrane proteins, in particular transporters, are critical for the performance and stability of microbial production strains [14]. Membrane-embedded transport proteins mediate the cellular uptake and extrusion of a wide diversity of solutes. As a consequence, adding or engineering a suitable uptake system into a production strain may greatly enhance substrate utilization and flux towards product [57]. Likewise, secretion systems increase flux, resolve toxicity-caused limitations, and facilitate product purification by secretion of the desired product to the extracellular environment [810].

Besides introducing and engineering known transport proteins for biotechnological applications, there is a critical need to identify novel transporters from the wealth of sequence data that we are currently amassing from nature [11]. Fortunately, integral membrane proteins are typically easy to predict from primary sequence data, as they contain one or more stretches of ~20 hydrophobic amino acid residues (transmembrane segments). Similarly, secreted proteins that contain a hydrophobic, amino-terminal cleavable signal peptide can also be identified using this approach [1215]. Using sequence analyses and homology, putative membrane proteins may be sorted into functional classes such as receptor proteins or solute transporters, to serve as guidelines for downstream experimental targeting and characterization [16, 17].

To increase the known repertoire of transporters that are of particular interest for cell factory engineering, it is inviting to look at the membrane proteins from organisms that nature has adapted for food sources that are out of reach for conventional microbes. One such example is the anaerobic gut fungi that inhabit the intestines of herbivores such as horses and sheep, where they secrete powerful cellulases and other saccharolytic enzymes that break down recalcitrant plant biomass into digestible sugars [1821]. A variety of hydrolyzed sugars and low molecular weight cellodextrins are both sensed and transported across fungal bilayers. Whereas their tremendous biotechnological potential is unquestionable, isolation and cultivation of the gut fungi under laboratory conditions has proven challenging and only a relatively small number of strains have been isolated to date. Moreover, very little is known about the membranes and membrane proteins of these early branching fungi, and the extreme AT-richness of their genomes have precluded high quality genomic data from being obtained [22, 23].

Here, we mined transcriptomic data collected from three recently isolated strains of anaerobic fungi, Neocallimastix californiae, Anaeromyces robustus and Piromyces finnis, for integral membrane proteins that would shed light on the physiology of these unusual microorganisms. In particular, we sought to characterize the membrane-bound machinery that underlies their remarkable ability to survive and persist in the competitive, biomass-rich environment of the herbivore gut. We hypothesized that apart from the secreted biomass-degrading enzymes, the fungi possess membrane-embedded transporters and receptors that support the lignocellulolytic lifestyle of the fungi and confer an ecological or evolutionary advantage [24, 25]. Importantly, the transporters and receptors that we identify have the potential to advance metabolic engineering efforts for biomass utilization and conversion in model microbes [19]. Overall, this serves as the first comprehensive study of the membrane protein components within anaerobic gut fungi, providing deeper insight into the physiology of these understudied organisms, and a wealth of transporters and receptors that can be further adopted for strain engineering.

Results and discussion

Integral membrane proteins in anaerobic gut fungi: a birds-eye view

Recently, three novel strains of anaerobic gut fungi were isolated from animal feces: Neocallimastix californiae (N. californiae) from goat, Anaeromyces robustus (A. robustus) from sheep, and Piromyces finnis (P. finnis) from horse [26]. To assemble a complete transcriptome, RNA was collected from each strain grown on a number of different representative substrates ranging from insoluble plant material and cellulose to soluble carbon sources such as cellobiose and glucose [19]. Here, we identified secreted and integral membrane proteins from over 60,000 transcripts using complementary bioinformatics approaches. As is shown in Fig. 1, about 20% of assembled transcripts in each fungal strain encode proteins that have a predicted signal peptide and/or transmembrane segments [15, 27]. About a third of the trafficked proteins are predicted to be completely secreted to the extracellular environment; among these are cellulases, glucosidases and proteases that allow the fungi to degrade plant material extracellularly into soluble sugars [19, 28]. The remaining two thirds of the trafficked proteins have at least one predicted non-cleaved transmembrane segment, and as such they are likely integral membrane proteins: there are 4353 transcripts encoding putative membrane proteins in N. californiae, 2627 membrane protein transcripts in A. robustus, and 2383 membrane protein transcripts in P. finnis (Fig. 1). Almost half of these proteins are predicted to have only one transmembrane segment, i.e. they are bitopic, and these are displayed separately as these proteins may be cleaved and released to the extracellular environment [15].

Fig. 1
figure 1

Integral membrane proteins are identified from gut fungal transcriptomes using bioinformatics filtering. a Displays a quantitative ‘funneling process’, where the total transcriptome is reduced to the membrane protein component by filtering the predicted soluble proteins, antisense transcripts, and extracellularly secreted proteins. b Demonstrates the pipeline used for protein annotation. All possible ORFs are extracted from the assembled transcripts, and protein annotations, gene ontology (GO) terms, and enzyme commission (EC) numbers are obtained by aligning the ORFs to the NCBI database (E ≤10−3) with BLASTx and comparing the ORFs to the EMBL database with the InterProScan tool. InterProScan utilizes SignalP and TMHMM to predict ER targeting signal peptides and transmembrane domains. Finally, the ORFs are aligned to the TCDB database to identify possible transporters and predict transporter substrates

As shown in Fig. 2, gene ontology (GO) annotations suggest that at least a third of the membrane proteins in each fungal strain are involved in transport, sensing, signaling or catalysis [29]. Within these groups are pumps and channels for diverse solutes, peptides and proteins; GPCRs and associated factors; and proteins with catalytic activity such as cellulases, chitinases, glucosidases and glycosyl transferases. Many proteins have more than one GO-term and thus more than one putative function. Here each transcript was counted only once, and as a consequence the assignment of these classes is not exhaustive. For example, manual inspection reveals that a number of the ‘Catalysis’ proteins (proteins that have a GO-term ending with ‘-ase’) are transporters that hydrolyze ATP as part of the transport process, and similarly Receptor Tyrosine Kinases are known to have major functions in cellular signaling and sensing [30, 31]. Likewise, although the ‘Other’ category contains proteins that have ‘other functions’ such as adhesion proteins and chaperones, this class also contains a number of transporters and receptors. Overall, initial bioinformatics funneling and sorting of the membrane proteome reveals the expected machinery for a microbe that deconstructs biomass and catabolizes hydrolyzed byproducts. Notably, around half of the predicted membrane proteins do not have a GO-annotation. This is likely because the relatively low natural abundance and amphiphilic nature of membrane proteins renders their characterization and classification challenging, and thus they are poorly represented in sequence databases. In particular, small membrane proteins have received much less attention than their larger counterparts, and consequently many of the bitopic membrane proteins fall into the ‘Unknown’ category [3234]. In addition to these limitations, it is important to note that no high-quality genomic sequences exist to describe the early-branching fungi, and only roughly 30% of each transcriptome can be annotated through comparison to the NCBI databank [19, 35].

Fig. 2
figure 2

Putative functions of integral membrane proteins in three strains of anaerobic gut fungi as classified by gene ontology (GO). The strains in this study represent three of seven currently acknowledged genera: Neocallimastix californiae, Anaeromyces robustus and Piromyces finnis. Integral membrane protein candidates were binned into one of five functional categories as described in the methods section. The percentages show how many of the predicted integral membrane proteins in each strain falls within a given category

Transporters in the anaerobic gut fungi

To gain a deeper understanding of the underlying systems that permit the gut fungi to mediate transport of sugars and other metabolites, we aligned assembled transcripts to the transporter classification database (TCDB) using BLASTx [17, 36, 37]. TCDB is a manually curated database that organizes proteins according to function and phylogeny. In TCDB, each transport system receives a five-tiered identity tag to describe its familial relationship and function, and this gives us the opportunity to sort the transporter proteins at finer resolution. As many transporters contain subunits that are only peripherally associated with the membrane, we included all transcripts in this analysis, regardless of whether the proteins were predicted to have transmembrane segments or not. This inclusive approach also allowed us to identify putative beta-barrel membrane proteins that are present in the outer membranes of mitochondria and plastids, and that TMHMM fails to identify since they lack the canonical alpha-helical stretches of hydrophobic amino acid residues [38, 39]. To increase confidence in transporter predictions, we applied a stringent 70% coverage criterion, where 70% of the query sequence must match the subject sequence, and vice versa, with an E- value less than 10−3.

As shown in Fig. 3, using these stringent criteria, we identified 826 transcripts in Neocallimastix; 554 transcripts in Anaeromyces; and 488 transcripts in Piromyces that encode putative transporter system components. For engineering purposes, it is worth noting that the minimal functional unit of many solute carriers is a single polypeptide, whereas other transporter systems are multi-subunit complexes such as the large nuclear pore complex (multiple copies of ~30 different subunits) [40], meaning that the actual numbers of complete transport systems is somewhat smaller than that shown here. Also, it is important to take into account the energy requirements of the transporter, that is, whether they are passive channels or use e.g. ATP hydrolysis or an ion gradient to pump solutes across the membrane (Additional file 1: Figure S1). Notably, the placement of a protein in a certain category is not always unequivocal; e.g. here we have placed nucleotide-sugar transporters in the solute transporter category, although most of these are likely localized to the ER and Golgi membranes and their function is in protein biogenesis, as many proteins are expected to be glycosylated while they progress through the secretory pathway [35]. Nevertheless, it is clear that all three strains have a number of conserved transport systems that are involved in protein biogenesis and intracellular sorting, and that approximately half of all transport systems in all three strains are involved in transmembrane translocation of a range of small solutes. These systems are described in more detail below.

Fig. 3
figure 3

Putative functions of fungal transporters based on transporter classification data base (TCDB) analysis. a 1868 fungal transporter components from three gut fungal strains were sorted based on TCDB homology using a stringent 70% coverage criterion. The major functional transporter categories are: solute transport, protein biogenesis/general secretory pathway, nuclear import/export, peroxisomal import machinery, and import into plastids (hydrogenosomes). The “Other” category contains accessory factors and incompletely characterized transport systems. b Shows the distribution of the functional categories in the three gut fungal strains. Total number of transcripts encoding transporter components in Neocallimastix: 826 transcripts; Anaeromyces: 554 transcripts; Piromyces: 488 transcripts. The number of transcripts in the different categories is shown in brackets

Proteins involved in intracellular sorting, secretion and quality control

In eukaryotic systems, many components are targeted to different intracellular organelles, and the ability to alter localization and secretion is a valuable path for cell engineering [41]. For example, most of the proteins that are destined to the plasma membrane or the extracellular environment are first targeted to the endoplasmic reticulum (ER): there, the proteins are either inserted into the ER membrane or translocated into the ER lumen via the universally conserved Sec translocon, and packed into vesicles and trafficked to the plasma membrane via the Golgi network [14]. Although it is known that the gut fungi secrete a large number of biomass degrading enzymes, very little is known about the molecular details underlying protein trafficking in these understudied primitive eukaryotes.

As is shown in Fig. 3, we find that many gut fungal transcripts encode proteins that function in the biogenesis and intracellular trafficking of proteins; some of these components have previously been identified in Orpinomyces sp. [35]. For example, parts of the general secretory pathway (TCDB 3.A.5) are easily identified by homology, including four signal recognition particle (SRP) proteins, (SRP14, SRP54, SRP68 and SRP72); both SRP receptor subunits and a heterotrimeric Sec61 translocon as well as Sec62/63 [14]. Further, we find some 30 proteins that are homologous to heat shock proteins (TCDB 1.A.33), and proteins that belong to the endoplasmic reticular retrotranslocon family (TCDB 3.A.16) and that are implicated in protein folding and quality control [42]. We also find evidence for vesicular trafficking and membrane remodeling, in several Synaptosomal Vesicle Fusion Pore proteins (a.k.a. SNAREs) (TCDB 1.F.1); Synaptic Vesicle Associated Calcium Channels (1.A.55); and Annexin-like Proteins (TCDB 1.A.31) that are involved in the trafficking of vesicles and modulation of cell shape [43, 44].

Anaerobic gut fungi have intracellular hydrogenosomes that are related to the mitochondria of aerobic eukaryotes, which generate ATP by substrate-level phosphorylation [45, 46]. Apart from the above mentioned heat shock proteins, of which a subset may be located to the hydrogenosomes, we find evidence for components that are homologous to the mitochondrial and chloroplast import machinery (TCDB 1.B.33, 1.B.8, 3.A.8, 3.A.9), such as the central mitochondrial import receptor TOM40, the inner membrane translocases TIM22 and TIM23, and accessory factors TIM9 and TIM10 [47, 48]. Further, and although it is not entirely clear whether gut fungi have peroxisomes as such, we find evidence for the peroxisomal import machinery (TCDB 3.A.20); as well as many subunits of the large Nuclear Pore Complex and proteins that are implied in the maturation and nuclear export of RNA (TCDB 1.I.1, 3.A.18, 3.A.22, 9.A.50) [49, 50]. Finally, the ‘Other’ category captures proteins that are involved in energy conversion (TCDB 3.D.1 and 3.D.10), fatty acid translocators (TCDB 4.C.1), accessory factors (TCDB 8), and incompletely associated transport systems (TCDB 9).

Potential transporters for biotechnology and strain engineering

Virtually any solute in cells has to pass through a membrane-embedded transporter; this is true for ions and large molecules as well as for small molecules like water and glycerol [51, 52]. Given the ability of anaerobic fungi to persist in a competitive, lignocellulose rich environment, we hypothesize that their membrane proteome must therefore be well stocked with components that sense sugars and metabolites, selectively transport them, and extrude waste products or secondary metabolites. As shown in Fig. 4, in all three fungal strains we find a number of putative transporters for sugars and metabolites such as amino acids, organic ions, and nucleotides; putative drug transporters and lipid flippases; and channels and pumps for ions and trace metals.

Fig. 4
figure 4

Substrates of 983 solute transporter components identified in three gut fungal strains, based on hits in TCDB. The proteins were sorted into these categories based on TCDB homology using a stringent 70% coverage criterion of both subject and query, with and E-value cutoff of 10−3. In the case of multiple matches, the match with lowest E-value was taken. Total number of transcripts encoding putative small-solute transporters in Neocallimastix: 435 transcripts; Anaeromyces: 312 transcripts; Piromyces: 236 transcripts

Transporters for sugars, organic ions and other metabolites

417 transcripts in the three gut fungal strains encode transport components that are involved in the uptake or extrusion of sugars and other organic metabolites, which are the end products of biomass breakdown (Fig. 4). Sugar transporters are attractive targets for microbial engineering, and several efforts have been made to identify and engineer transporters that enhance the uptake of underutilized sugars. For example, transporters that mediate flux of five-carbon sugars derived from hemicellulose could open the way for pentose sugar metabolism in yeasts [5, 5356]. Eukaryotic sugar uptake systems typically belong to the major facilitator superfamily (MFS) (TCDB 2.A.1); the solute sodium symporter family (SSS) (TCDB 2.A.2); and the recently characterized Sugars Will Eventually be Exported Transporter family (SWEET) (TCDB 2.A.123) [57]. These proteins are mostly secondary carriers, and although some function as uniporters, most couple the transport of the solute to the downhill transport of ions such as protons or sodium. As shown in Table 1, all these families are represented in the three fungi: in total we find 24 MFS transporters; 7 SSS transporters and 10 SWEET transporters. Using the fifth digit of the TCDB system we can tentatively assign substrates to a few of the proteins: mannose, fructose, xylose, sucrose, cellobiose and myoinositol, however without experimental characterization, these homology-based assignments remain putative [58].

Table 1 Putative sugar uptake systems identified in three gut fungal strains

Unexpectedly, 60% of the predicted sugar transporter components that we have identified in the three fungi are homologous to the substrate binding protein (SBP) of prokaryotic solute uptake systems that belong to the ATP binding cassette (ABC) transport superfamily (TCDB 3.A.1). Although ABC transporters as such are abundant in all kingdoms of life, SBP-coupled ABC uptake-systems have to date only been found in prokaryotes [59, 60]. Typically, these modular transport systems consist of two cytoplasmic nucleotide-binding domains, two transmembrane domains, and an extracellular SBP encoded on up to four different polypeptides [30, 60] (Fig. 5a). The extracytoplasmic SBP delivers the substrate to the membrane embedded domain that utilizes ATP to pump the substrate across the membrane [61]. Based on structural details, SBPs and SBP-domains can be divided into three classes: Type I (SCOP superfamily SSF53822), Type II (SCOP superfamily SSF53850), and Type III (SCOP superfamily SSF53807) [6265]. While SBP-coupled ABC uptake systems seem to be exclusively prokaryotic, SBP-domains are found in eukaryotic membrane proteins such as guanylyl cyclase-linked natriuretic peptide receptors, ligand-gated ion channels and class C GPCRs [62, 63, 66, 67]. The eukaryotic membrane bound SBP-domains are typically Type I, with the exception of ligand-gated ion-channels that have a Type II domain encoded by two non-consecutive parts of the polypeptide chain [68]. Strikingly however, the SBP proteins that we find in the gut fungi are invariably similar to Type II proteins, and while some of them are predicted to have transmembrane helices, there is nothing in the sequence that immediately suggests that they form e.g. a ligand gated-ion channel (Fig. 5b). Based on similarity to proteins in the TCDB, the gut fungal SBPs are related to palatinose, trehalose/maltose/sucrose and xylobiose-binding proteins from the bacteria Agrobacterium tumefaciens, Erwinia rhapontici, Sinorhizobium melioti, Streptomyces coelicolor, Streptomyces thermoviolaceus, Thermus thermophilus, Thermotogae, Rhodobacter sphaeroides, and the archaeon Thermococcus litoralis. Most of these microorganisms are associated with soil and plants and it is not unlikely that the fungi have acquired the genes by horizontal gene transfer [19].

Fig. 5
figure 5

Prokaryotic SBPs and gut fungal SBP-homologs. a Shows a cartoon of a prokaryotic ABC-importer. The SBP delivers the substrate to the membrane embedded component that utilizes ATP to translocate the substrate across the membrane. b Shows a cartoon of gut fungal SBP-homolog identified from transcriptomics with currently unknown function. The identified gut fungal SBPs are homologous to Type II SBPs and have one or more predicted amino- or carboxy-terminal transmembrane helix with no known homology to other proteins

Although we failed to identify any other putative ABC-importer components among the fungal transcripts, i.e. the membrane-embedded and cytoplasmic nucleotide binding domains, it is possible that these remain to be identified in the genomes. Alternatively, the sequence similarity to other transporters may be so low that our stringent 70% criterion fails to identify the other ABC transporter components. In any case, the isolated SBP proteins are not likely to function as transmembrane carriers on their own; however, it is possible that some of these have functions that we cannot easily discern from primary sequence alone. It is tempting to speculate as to their function in the fungi: do these SBP proteins communicate with fungal transporters, or do they act as sugar sequesters that grasp onto the sugars that the extracellular cellulolytic machinery produces? This could conceivably increase the local sugar concentration around the fungus and lead to increased sugar uptake. Further, SBP proteins in prokaryotes are known to communicate with chemotaxis proteins, and it is possible that the gut fungal SBPs play a role in directing the fungal zoospores to nutrient sources by a yet unknown mechanism [69].

In addition to sugar transporters, we find a diverse repertoire of transporters for several classes of organic ions and amino acids (TCDB 2.A.1.19, 2.A.18, 2.A.22, 2.A.79, 2.A.85), ammonia (TCDB 1.A.11) and sugar alcohols such as glycerol (TCDB 2.A.50). There are also putative channels for formate and nitrate (TCDB 1.A.16), and transporters for nucleotides and nucleosides (TCDB 2.A.7.11, 2.A.7.12). It is worth to note that each fungal strain has more than ten transcripts encoding proteins that are homologous to mitochondrial carriers (TCDB 2.A.29). These proteins are typically involved in the compartmental exchange of solutes such as ATP/ADP, and are likely localized to the hydrogenosomes [7072].

Promiscuous drug extruders and lipid flippases

227 transcripts in the three gut fungal strains encode putative promiscuous drug extruders and lipid flippases (Fig. 4), which could enhance the tolerance and yields of metabolically engineered chemical production strains [4, 8, 73, 74]. In all three fungal strains, we find a number of Drug:Proton antiporter proteins (DHA) (TCDB 2.A.1.2, 2.A.1.3; 55 proteins in total). DHA proteins belong to the MFS and are abundant in the fungal kingdom and believed to be involved in the extrusion of various mycotoxins such as polyketides [75, 76]. Although implicated in chemical stress tolerance, drug resistance and pathogenicity, DHA transporters are abundant also in non-pathogenic fungi and thus their role is not entirely clear, however it has been speculated that some of the extruded compounds are used to restrain microbial competition [77, 78]. In addition, we find evidence for a number of transporters from the multi antimicrobial extrusion (MATE) family (TCDB 2.A.66, 19 proteins in total) and a number of ABC exporters that are involved in broad specificity drug resistance (TCDB 3.A.1.201, 24 proteins in total).

Lipid flippases are involved in the organization of lipids within cellular membranes, the modulation of the fluidity of cell membranes, and the formation of extracellular glycoconjugates and polysaccharides [79, 80]. In each fungal strain, we find evidence for lipid flippases, primarily from the ABC transporter family ABCA (3.A.1.211), and the P-type ATPase superfamily (TCDB 3.A.3.8). As the substrates of lipid flippases are hydrophobic and oil-like, they could conceivably be engineered for biofuel tolerance or the production of e.g. high-value terpenoid compounds [81, 82].

Transporters for inorganic ions and trace metals

339 transcripts in the three gut fungal strains encode channels and pumps for inorganic ions and trace metals (Fig. 4). Inorganic ion transporters are typically involved in the maintenance of cellular pH homeostasis, signal transduction, and the buildup of ion gradients that the cell uses for downstream applications [83]. Alkali and transition metals are important enzymatic and structural cofactors in a wide range of enzymes. These transporters may thus enhance the stability and enzymatic performance of microbial production strains; in addition, metal transporters can be used for the detection and bioremediation of heavy-metal contaminations [8486]. Apart from voltage-gated potassium channels and chloride channels (TCDB 1.A.1, 2.A.40), we find several subunits of V-type ATPases and P-type ATPases that are typically involved in the pumping of protons and other cations across cellular membranes, although some P-type ATPases have also been implied in lipid transport (TCDB 3.A.2, TCDB 3.A.3) [79, 80]. There are a handful of proteins that are similar to bacterial arsenite transporters (TCDB 2.A.59), as well as putative transporters for zinc, iron and magnesium (TCDB 1.A.26, 2.A.5, 2.A.89).

Anaerobic gut fungi possess novel GPCRs

Next, we sought to investigate unique receptors identified from sequencing all three strains of gut fungi, which may have a role in sugar sensing. Across genera, we identified a wealth of GPCRs, which is the largest receptor class in eukaryotes [87]. Using the InterProScan tool and BLAST annotations, we identified 53 putative GPCRs in N. californiae, 25 GPCRs in A. robustus and 34 GPCRs in P. finnis (Table 2). The heptahelical GPCRs typically display an amino-terminal ligand-binding domain at the surface of the cell, recognize a wide range of ligands, and are involved in numerous sensory processes, cellular growth and development. Based on sequence analyses and phylogeny, GPCRs can be sorted into at least five (Glutamate, Rhodopsin, Adhesion, Frizzled and Secretin) or six (A-F) classes [88, 89]. Using the InterProScan tool, we determined that a small number of the GPCRs in these gut fungi are rhodopsin-like or possibly related to the cAMP receptors that were first identified in the slime mold Dictyostelium discoideum (Dicty-CAR; IPR017452; IPR017981) (Table 2) [90]. Interestingly, the Dicty-CAR receptors are implicated in cell differentiation in D. discoidum, and it is possible that these GPCRs are involved in the complex gut fungal life cycle, which involves a motile zoospore state and a sessile state that burrows into plant material [90].

Table 2 Putative GPCRs in the three gut fungal strains

The vast majority of the gut fungal GPCRs that we identified in this study have the highest similarity to Glutamate, or class C GPCRs (a.k.a. class 3; IPR017978), a class that until recently was believed to be absent from the fungal kingdom (Table 2) [91]. Class C GPCRs comprise metabotropic glutamate receptors, calcium sensing receptors, sweet taste receptors and gamma aminobutyric acid receptors type B (GABAB) [92]. These receptors typically have a long (>400 amino acids) ligand-binding domain called the Atrial Natriuretic Factor receptor (ANF) domain (IPR001828), which is related to prokaryotic amino acid binding proteins that belong to the structural SBP Type I superfamily (SCOP superfamily SSF53822) [66, 67]. With the exception of GABAB receptors, all known class C GPCRs also have a pattern of 9 conserved Cysteine residues between the amino-terminal domain and the seven transmembrane helices (IPR011500) [93].

Gut fungal class C GPCRs have a non-canonical architecture with putative carbohydrate-binding domains

As shown in Fig. 6, all gut fungal class C GPCRs identified in this study are predicted to have the characteristic large extracellular domain, sometimes reaching well over 1000 amino acid residues (Fig. 6; Additional file 2: Figure S2). However, instead of an ANF domain, around 30% of the GPCRs display a pectin lyase fold/virulence factor (IPR011050; IPR012334), sometimes accompanied by several parallel beta-helix repeats (IPR006626) and in a few cases by an EGF-like domain (IPR000742) (Fig. 6; Additional file 2: Figure S2). Pectin is a major component of plant cell walls, and pectin and pectate lyases are virulence factors that are secreted by plant pathogens [94, 95]. Both enzymes display beta strand repeats, a common motif among enzymes that recognize carbohydrate substrates [96]. EGF-domains are typically around 40 amino acid residues long and found in many different proteins in one or multiple copies [97]. EGF domains contain a motif of six cysteines; and some EGF domains are known to bind calcium. Notably, EGF-domains are found at the amino-terminus of so called Adhesion GPCRs (Class2/B), that are characterized by very long extracellular domains with multiple functional domains; however nothing in the sequences identified here suggest that the gut fungal GPCRs belong to class B [98].

Fig. 6
figure 6

Domain architecture of gut fungal class C GPCRs identified from transcriptome data. All class C GPCRs are predicted to have a long amino-terminal domain and seven carboxy-terminal transmembrane helices. The amino-terminal domain ranges from 200 to 1600 amino acid residues with the average length being 600 residues. Around 30% of the putative GPCRs are predicted to have an extracellular pectin lyase fold (IPR011050; IPR012334), parallel beta-helix repeats (IPR006626), and/or an EGF-like domain (IPR000742). Around 50% of the GPCRs are predicted to have a domain that is homologous to SBP Type II (a.k.a. Periplasmic binding protein-like II, SCOP superfamily SSF53850). Several putative GPCRs do not have any apparent homology to known InterPro domains. In approximately 30% of the cases we can identify a canonical ER targeting signal peptide at the very aminoterminus (not shown). N amino-terminus. For more details, see Additional file 2: Figure S2

Interestingly, almost half of the gut fungal GPCRs have an amino-terminal SBP Type II-domain (SCOP superfamily SSF53850). As mentioned previously, this domain is related to—yet structurally different from—the ANF domain that is found in metazoan class C GPCRs. In the gut fungal GPCRs, the Type II domain is invariably similar to prokaryotic substrate binding proteins that are associated with sugar uptake systems (Fig. 5). In agreement with our findings, it was recently shown that fungal class C GPCRs display an unprecedented variety of amino-terminal domains, among them SBP Type II domains that resemble the domains that are identified in this study [91]. Strikingly however, we failed to find a single example of a class C GPCR with the ANF domain, which is the dominating amino-terminal domain in all characterized class C GPCRs. Also, although several of the gut fungal GPCRs have up to 10 cysteines in their amino-terminal domain, the sequences do not align to the conserved nine cysteines domain. It has been suggested that class C GPCRs evolved through the fusion of a prokaryotic SBP and a bacteriorhodopsin [67, 92, 99101]. The diversity of amino-terminal domains in our gut fungal GPCRs corroborate that these fusions may have happened more than once and between different genes.

Conclusions

Integral membrane proteins are a vital component of all living cells, and it is becoming increasingly clear that membrane-embedded transporters and receptors are essential for the engineering and stability of microbial production strains. Here, we searched for integral membrane proteins in transcriptomic data collected from three different genera of lignocellulolytic anaerobic gut fungi that are highly relevant for applications that convert renewable biomass into value-added compounds. We hypothesized that these extraordinarily persistent microorganisms possess a wide variety of solute transporters and receptors that are involved in the uptake and recognition of carbohydrates.

A relatively simple strategy that integrates transcriptomics with sequence similarity-based comparisons revealed a treasure trove of novel membrane proteins from anaerobic fungi that are of broad biotechnological interest. In the absence of high-quality genomic information, the resolution of the transcriptome is indeed remarkable, capturing the “active” part of the genome most critical to the lifestyle of these fungi. Here, we identified hundreds of novel sugar transporters and solute extruders from these unexplored fungi, which can be used to bolster substrate acquisition and tolerance in model microbes like Escherichia coli, Saccharomyces cerevisiae, and even more evolved fungi. Additionally, we find transcripts that encode universally conserved proteins, e.g. all three subunits of the heterotrimeric Sec61 translocon as well as other conserved components of the general secretory pathway that provide a path forward for understanding and engineering protein secretion in these early-branching fungi.

Of particular interest for future characterization are the unique and seemingly prokaryotic transporters and receptors identified here that bear unexpected N-terminal domains with putative sugar binding and transport functionalities. Along with transcripts that encode membrane-anchored carbohydrate-binding domains, we speculate that these domains may be involved in carbohydrate sensing and sequestration that convey a competitive edge to these slow growing fungi in microbial communities. Overall, this study reveals entirely new subsets of membrane protein transporters and receptors from nature to enhance biomass breakdown and substrate utilization.

Methods

Fungal strains and RNA isolation

Three novel gut fungal species from distinct genera of Neocallimastigomycota (Piromyces finnis, Anaeromyces robustus, and Neocallimastix californiae) were isolated from environmental samples [26] for study and analysis. We grew these cultures in 10 mL batch cultures of anaerobic Medium C, on a range of diverse fibrous and soluble carbon substrates (e.g. reed canary grass, glucose, cellobiose) before extracting their total RNA content with the RNeasy® Mini Kit (Qiagen, Valencia, CA) as previously described [19]. To maximize the number of transcripts observed, we pooled RNA preps from different substrates in equimolar proportions, as measured by a NanoDrop 2000 (ThermoScientific, Wilmington, DE), before sequencing.

Transcriptome acquisition and annotation

Fungal transcriptomes were previously acquired for all fungal strains, which serve as the base dataset for this study [19]. Briefly, RNA sample integrity was validated with a 2200 Tapestation (Agilent Technologies, Santa Clara, CA). Intact samples were used to generate strand-specific cDNA libraries, sequenced on an Illumina HiSeq (Illumina, San Diego, CA) and annotated as described previously to obtain de novo transcriptomes [19]. Briefly, we annotated the transcriptomes using the automated BLAST2GO pipeline [16], which analyzes sequences for similarity (blastx) and protein domains via hidden Markov model signatures (InterProScan). Significant hits had an E value of ≤10−3. Annotations from this pipeline included a protein description, delineation of internal domains, functions described by Gene Ontology (GO) terms associated with these domains, and assignment of Enzyme Comission (EC) numbers, if available. We were also able to identify non-coding antisense transcripts within the strand-specific transcriptome on the basis of the annotation reading frame (−1, −2, −3).

Identification of integral membrane and other secreted proteins

We identified secreted proteins within the transcriptomes by parsing the annotation files provided by BLAST2GO for InterPro domain hits. Transmembrane domains were predicted by Phobius [102] and TMHMM [13], and signal peptides were predicted by Phobius and SignalP [15].

Filtering and classifying the transcriptome

Membrane protein candidates were classified into one of four primary roles on the basis of their associated GO Terms in the precedence order: ‘Transport’, ‘Sensing and Signaling’, ‘Catalysis’, ‘Other’, and ‘Unknown’. Each GO annotation was parsed and searched for functional keywords as follows: Transport encompasses all membrane proteins with a stated “transport”, “symport”, or “V-type ATPase” role such as ABC transporters, P-type ATPase ion pumps, solute symporters, antiporters, and uniporters; Sensing and Signaling includes proteins annotated with a “receptor”, “signal”, or “sensor” function; Catalysis proteins all have roles that terminate in ‘-ase’; Unknown includes proteins that cannot be assigned a GO term while Other counts the remaining unassigned proteins. To better represent the total protein count encoded in the transcriptome, proteins with multiple functions are only assigned to the role of highest precedence. For example, ABC transporters with both transport and catalytic ATPase functions are binned only once under Transport.

Transporter analysis

The translated amino acid sequence for each transcript was aligned to the transporter classification system database (TCDB) [17] using a local installation of NCBI BLAST’s blastp. TCDB database was downloaded January 15, 2015. To increase the confidence in our predictions, we filtered the results to include only hits that covered at least 70% of the amino acid sequences of both the query and the subject. After filtering by coverage, the hit with smallest E-value was selected, with a maximum cutoff of 10−3.

Identification of putative GPCRs

Transcripts with putative GPCR function were identified by searching the functional annotations provided by NCBI BLAST and InterPro databases for keywords ‘GPCR’ and ‘G-protein coupled receptor’. From this subset, only sequences that contained between 7 and 9 transmembrane domains as identified by transmembrane hidden markov models (TMHMM). This ensured that transcripts identified were full length GPCRs with 7 transmembrane domains and allowed for the presence of hydrophobic signal sequences that may also be identified as transmembrane domains. Predicted N-terminal domains were identified by the InterPro based annotations present in the extracellular N-terminal region. These were identified by selecting all domains from the GPCRs that were present before the first of the seven transmembrane sequences typical of GPCRs, restricting the search to only the N-terminal extracellular region.