Background

Genomes of a phylogenetic lineage hold the information of the function potentials and ecological adaptations, which can provide hints for its enrichment and isolation [1, 2]. Unraveling such information requires a proper tool. For genetic analysis, pan-genome is wildly used to characterize the key features of a population [3, 4]. However, pan-genome analysis is originally designed for complete genomes [5], while most genomes of the not-yet-cultured populations are incomplete. To apply pan-genome to these non-yet-cultured populations, we need a novel approach for the incomplete genomes.

Genomic information alone may not be sufficient to provide such traits. Still, many important populations are not-yet-cultured even with available genomes and the knowledge of their optimal niches [6]. Taking Candidatus Accumulibacter (Accumulibacter) as an example, as the primary functional population in the enhanced biological phosphorus removal (EBPR), 13 Accumulibacter genomes have been retrieved and analyzed [7,8,9,10,11,12]. Besides, its optimal enrichment conditions have been well studied for more than two decades [13,14,15]. However, the lack of pure cultures is still limiting our understanding. One critical problem is that different lineages of Accumulibacter display diverse niche adaptations of, i.e., salinity, carbon sources, and electron acceptors [11, 16,17,18]. Thus, even though we can use the same optimal conditions from previous studies to enrich the Accumulibacter, it may not be favorable for the specific strains inside.

Combining the genetic information with the transcriptional analysis can provide the clues to enrich the strains in our systems. The transcriptional analysis has been conducted on Accumulibacter [19,20,21], but mainly focusing on the separate pathways of carbon (C), nitrogen (N), phosphorus (P), and sulfur (S). Instead, if we look at the whole picture of the coordination among pathways in response to the environmental conditions [22], we could differentiate the optimal and suboptimal status of a population. This highlights the need of a tool to mine the whole picture from the transcriptional patterns and link it to the environmental conditions.

In this study, we present a method to decode the hidden traits from genetic and transcriptional datasets, for guiding the enrichment of not-yet-cultured bacterial populations. We use Accumulibacter as a demonstration for both anaerobic (AN) and aerobic (AE) organotrophs and chemoheterotrophs. Its biochemical complexity and significance to environmental engineering make it an example of both interest and importance.

Materials and methods

Genomic analysis

Pan-genome analysis designed in this study classifies the key features of a phylogenetic lineage using the complete/incomplete genomes. The pan-genome is defined as a whole set of non-orthologous genes in all available genomes (total number is N) of a phylogenetic lineage [23]. All non-orthologous genes are subdivided into core-, dispensable-, and strain-specific genomes based on the frequency of their occurrence in N genomes. Previously, core-genome is defined as genes shared by all genomes [4, 5]. However, because of the incomplete (draft) genomes, such a strict definition will result in the low coverage of core-genome. Thus, we propose an approach for pan-genome subdivision by evaluating the false-negative (FN) and false-positive (FP) rates. Core-genome is defined as genes shared by at least n genomes (n ≤ N) when all FN and FP rates are less than 1% (Additional file 1: Supplementary information and Table S1). Based on this cutoff n, the pan-genome is subdivided into core-, dispensable-, and strain-specific genomes as the collection of common genes shared by at least n genomes, accessory genes shared by a subset (2 to n-1 genomes), and unique genes of one genome, respectively. The coverage (100%—FN) and the accuracy (100%—FP) of core-, dispensable-, and strain-specific genomes are maintained as 99%.

The representativeness of the Accumulibacter genomes is illustrated by gene occurrence distribution and pan-genome sampling curves. The occurrence frequency of a gene is calculated by counting its orthologous genes in all N genomes (Additional file 1: Table S2) [5, 24]. This occurrence frequency is used to subdivide the pan-genome and to generate the pan-genome sampling curves. The size of core-, strain-specific, and pan-genomes is predicted by fitting exponential decaying functions, with each addition of a new genome [23, 25].

A new metabolism framework: construct a whole picture

First of all, the metabolic pathways of pan-genome are annotated. We define the pan-pathway as all the non-redundant functions encoded by the pan-genome. The pan-genome represents all genotypes (sequences) meanwhile the pan-pathway represents all phenotypes (functions) of a population. The Accumulibacter pan-genome is annotated by KEGG (21st November 2016) [26] and eggNOG 4.5 databases [27, 28] to construct the Accumulibacter pan-pathway. The pan-pathway is also subdivided into core-, dispensable-, and strain-specific pathways as the collections of common functions shared by at least n genomes, accessory functions shared by a subset (2 to n-1 genomes), and unique functions of one genome, respectively. The split pathways are summarized into functional modules, such as glycolysis and tricarboxylic acid cycle (TCA cycle).

We construct a novel metabolism framework to assess the main role of a module in the carbon, electron, and energy flows. Here, the carbon flow refers to the fundamental organic and inorganic carbon metabolism, the electron flow refers to the redox reactions between electron donors and electron acceptors, and the energy flow refers to the generation and consumption ATP. It assigns the main role to a module by evaluating its contribution to the carbon, electron, and energy flows as sources or consumers (Table 1). Then, it distinguishes the primary sources and consumers from the secondary sources and consumers. For example, it first assigns the main role of glycolysis as carbon-providing and electron-providing. Then, it distinguishes that the glycolysis is a primary electron-providing module for Accumulibacter in AN phase (Table 2).

Table 1 The main roles (shades) and contribution (nodes) of functional modules to carbon, electron, and energy flows
Table 2 The main roles (shades) and contribution (nodes) of Accumulibacter pan-pathway to carbon, electron, and energy flows in AN phase

A new transcriptomic analysis: diagnose the status

The balance among the primary carbon, electron, and energy sources is the key feature to differentiate the optimal and suboptimal conditions. These three sources are linked by the electron flow as two balance pair (carbon flow versus electron flow and electron flow versus energy flow). The carbon flow is motivated by redox reactions from primary sources to primary consumers. The electron flow is driven by the cellular redox and the electron transport phosphorylation (ETP) to recycle electron carriers [29], balance redox condition, and convert energy. Within the energy flow, the energy-providing and energy-consuming rates [30] are well balanced [31] for the equilibrium of ATP/ADP. Overall, the primary electron sources are competed by the primary carbon sources, primary energy consumers, and redox balance.

The varying status of two balance pair (carbon flow versus electron flow and electron flow versus energy flow) represents the status of a population. Each balanced pair has three types (i.e., primary carbon source is excessive, balanced, or insufficient than primary electron source), which results in totally nine scenarios to represent all the status. We use transcriptional data to evaluate these two balance pairs, to diagnose the status, and to optimize the enrichment conditions. When the two balance pairs are balanced (the optimal status), the transcriptional pattern mainly involves the fundamental pathways (primary sources and consumers) for the most effective growth. However, the disruption of any balance pair may result in the coordination of secondary sources and consumers to help balance the pair for a more effective growth.

Pipeline demonstration

We summarize the above methods into a bioinformatic pipeline called Pan-genome and Pan-pathway Pipeline (PAPP) (Additional file 1: Supplementary information and Figure S1). PAPP is demonstrated by 13 Accumulibacter genomes (Additional file 1: Table S2) and two available metatranscriptomic datasets of EBPR studies (IMG/M-3300002341-3300002346, NCBI-SRP038016). It constructs the Accumulibacter pan-genome and pan-pathway. It transforms the metatranscriptomic datasets to cellular relative transcriptional activity (CRPKM) (Additional file 2: Table S3). Then, it visualizes the Accumulibacter pan-pathway and the transcriptional dynamics by using Cytoscape 3.3.0 [32]. Based on the transcriptional patterns, it diagnoses the status of Accumulibacter in two enrichment conditions. The PAPP pipeline is available on https://github.com/caozhichongchong/PAPP.

Results and discussion

Accumulibacter pan-genome: complete representativeness

The Accumulibacter genomes can completely represent the core-genome of Accumulibacter. The cutoff of Accumulibacter core-genome is 9 of 13 (Additional file 1: Tables S1 and S4). By the occurrence frequency of all genes (Additional file 1: Figure. S2), Accumulibacter pan-genome is subdivided into 21% core genes, 40% dispensable genes, and 39% strain-specific genes. The composition of pan-genome in this study shifted to the core-genome side compared to a previous study [33], which uses a strict criterion for orthologs. The size of core-genome drops and gradually reaches the plateau of 1761 (Ω) genes (Fig. 1a), and it would not decrease even with new genomes in the future. However, new genomes will continuously provide new strain-specific genes (Fig. 1b), supplementing 258 genes (tg(θ)) per genome (2.5% of Accumulibacter pan-genome). It indicates that Accumulibacter pan-genome would keep increasing with new genomes, although not significantly. This also suggested that even with a larger genome collection in the future, it would be impossible to cover all the diversity within the Accumulibacter population.

Fig. 1
figure 1

Reconstructed sampling curves of 13 Accumulibacter genomes by the exhausted subsampling method. The number of genes was calculated as a function of adding an nth genome into the (n-1) genomes. The total number (≤ TN) of effective permutation for each n was represented by the number of circles, which were obtained by different genome combinations. a Core-genome sampling curve. The average number of core genes at each n number was plotted as squares, and the continuous curve represented the least-squares fit of the function \( \mathrm{Fc}=\mathrm{Kc}\ \exp \left[-\frac{n}{\uptau \mathrm{c}}\right]+\Omega \). The best hit vector for Kc, τc, and Ω was 3000, 3.04, and 1761 with correlation r2 0.98. b Strain-specific genome sampling curve. The average number of strain-specific genes at each n number was plotted as squares, and the continuous curve represented the least-squares fit of the function \( \mathrm{Fs}=\mathrm{Ks}\ \exp \left[-\frac{n}{\uptau \mathrm{s}}\right]+\mathrm{tg}\left(\uptheta \right) \). The best hit vector for Ks, τs, and tg(θ) was 1234, 4.05, and 258 with correlation r2 1.00. c Pan-genome sampling curve. The average number of all genes (size of pan-genome) at each n number was plotted as squares, and the continuous curve represented the least-squares fit of the function \( P(n)=D+\mathrm{tg}\left(\uptheta \right)\left[n-1\right]+\mathrm{Ksexp}\left[-\frac{2}{\uptau \mathrm{s}}\right]\frac{1-\exp \left[-\frac{n-1}{\uptau \mathrm{s}}\right]}{1-\exp \left[-\frac{1}{\uptau \mathrm{s}}\right]} \). With the best hit vector 1234, 4.05, and 258 for Ks, τs, and tg(θ) of strain-specific genome fitting and D as 3634; the correlation r2 of pan-genome fitting is 1.00

Accumulibacter pan-pathway: the whole picture of its functions

All lineages of Accumulibacter have one consistent core-pathway that covers all key functions involved in EBPR. The Accumulibacter pan-pathway contains 1676 functions, including 78% core functions, 18% dispensable functions, and 4% strain-specific functions. We present a whole picture of Accumulibacter pan-pathway (Fig. 2) in both AN and AE phases. The main role of each module including the three modes of TCA cycle [34] is assessed (Tables 2 and 3; Additional file 3: Table S5 and Additional file 4: Table S6) based on the metabolisms [7] and kinetic analysis [33]. We find out that all the primary pathways in carbon, electron, and energy flows (solid edges in Fig. 3) are completely accomplished by one consistent core-pathway (red edges in Fig. 2). It indicates that the Accumulibacter core-genome has a complete coverage for the EBPR functions. For Accumulibacter, the primary carbon, electron, and energy sources in AN phase are acetate, glycogen, and polyphosphorus (PolyP) respectively; and in AE phase are polyhydroxyalkanoates (PHA), PHA, and oxygen. Different sources in two phases result in different primary flows, and the continuous switching of two phases has an accumulative effect on the status of Accumulibacter. A brief summary of Accumulibacter pan-pathway is described below, with a complete description in Additional file 1: Supplementary information.

Fig. 2
figure 2

PAO pan-genome metabolic and non-metabolic model of core genes, dispensable genes, and unique genes. The core- (red), type I dispensable- (purple), Type II dispensable- (blue), cross-type dispensable- (yellow), and strain-specific (others) pathways were specifically highlighted by different colors. The main metabolic modules in Accumulibacter had different patterns and directions during AN and AE phases in the EBPR process. The carbon, electron, and energy flow of Accumulibacter pan-pathway were specifically demonstrated and discussed in two phases. The providers/consumers and the carrier form of each flow were distinguished by shapes and colors. The abbreviations were listed in the list of abbreviations and Additional file 1: Supplementary information

Table 3 The main roles (shades) and contribution (nodes) of Accumulibacter pan-pathway to carbon, electron, and energy flows in AN phase
Fig. 3
figure 3

Simplified Accumulibacter pan-pathway separated into carbon, electron, and energy flows of EBPR anaerobic (AN) and aerobic (AE) phases. The variations of behavior represented by each scenario were specifically labeled for each flow, referring to the expression and activity (high and low) of related modules. a1 Carbon flow in AN phase. a2 Carbon flow in AE phase. b1 Electron flow in AN phase. b2 Electron flow in AE phase. c1 Energy flow in AN phase. c2 Energy flow in AE phase. The abbreviations of modules and chemical components are listed in the list of abbreviations and Additional file 1: Supplementary information

Carbon flow

Carbon flows from short-chain fatty acids (SCFAs) to PHA in AN phase. Accumulibacter can root and converge multiple carbon sources to acetyl-CoA (Acyl-CoA) as the hub of carbon flow for further allocation. Primarily, it uses SCFAs as the primary carbon source in AN phase, such as acetate or propionate (Pro), which is transported by actP and activated to Acyl-CoA and propionyl-CoA (Pro-CoA) (Fig. 3a1 and Table 2). It can also use the modules of glycolysis/gluconeogenesis (Gly), long-chain fatty acid (LCFA), amino acid (AA), nitrogen (glutamate), and Calvin cycle to provide secondary carbon sources [35]. Carbon is primarily consumed in by PHA module in AN phase, and them by secondary modules of complete TCA cycle, phospholipid (PL), and LCFA. In contrast to split and partial TCA cycles (transferring Acyl-CoA to PHA), complete TCA cycle supplements electrons at the cost of Acyl-CoA.

Carbon flows from PHA to complete TCA cycle in AE phase. Accumulibacter uses PHA as the main carbon source to feed the complete TCA cycle for electrons and energy in AE phase (Fig. 3a2 and Table 3). Accumulibacter employs partial TCA cycle to partition the carbon flow for glycogen generation by shunting the decarboxylation steps of complete TCA cycle [36]. Accumulibacter also invests carbon to LCFA, PL, AA, and exopolysaccharide (EPS) for cell synthesis.

Electron flow

Accumulibacter has flexible modules to maintain redox condition in AN phase. In AN phase, Gly module provides the primary electrons (electron donors) for Accumulibacter (Fig. 3b1). Moreover, complete, partial, and split TCA (reductive branch) cycles and LCFA modules also supplement electrons. The electrons are consumed to synthesize PHA [37, 38]. Like most assimilatory metabolisms, PHA synthesis requires NADPH, while the electrons available are mainly in other forms (NADH, fdH2, FADH2, and QH2). Thus, it is crucial for Accumulibacter to maintain the balance between the electron generation and transformation (transhydrogenases). To do that, Accumulibacter recruits the modules of TCA cycle, N modules (denitrification), ETP, and hydrogenases. ETP is proposed possible in AN phase with cytochrome b/b6 oxidase [7], using nitrate, nitrite, and fumarate as terminal electron acceptors (TEAs) [39, 40]. Thus, when Accumulibacter has excessive electrons, these flexible modules could be activated to consume electrons at AN ending to maintain the recycle of electron carriers.

Accumulibacter uses partial TCA cycle to control the carbon, electron, and energy flows. In AE phase, electrons are released from Acyl-CoA (Fig. 3b2) through complete TCA cycle. Meanwhile, PHA and partial TCA cycle can release electrons. These electrons are mainly used by ETP for energy generation and by Gly module for glycogen production. Although partial TCA cycle has lower efficiency of electron generation compared to complete TCA cycle, partial TCA cycle is used to shunt the carbon flow to Gly. The flexibility between partial and complete TCA cycles to control the carbon, electron, and energy flows is a crucial ecological benefit endorsed by Accumulibacter.

Energy flow

The usage of PolyP as the energy source adapts Accumulibacter to the cycles of AN and AE environments. PolyP provides the energy for Accumulibacter to compete and store carbon sources in AN phase (Fig. 3c1). The energy is used to transport and activate acetate. In AE phase, the PolyP is recharged using the energy generated by ETP (Fig. 3c2).

The status of Accumulibacter: one good example versus one bad example

The optimal status of Accumulibacter is maintained by the balance pair of carbon versus electrons in AN phase and the balance pair of electrons versus energy in AE phase. Based on the whole picture of Accumulibacter pan-pathway, we summarize nine scenarios to represent all optimal and suboptimal status (Figs. 3 and 4). Under optimal status (scenario A1), the primary sources of carbon, electron, and energy flows are balanced and Accumulibacter expresses the primary pathways (solid lines in Fig. 3). However, suboptimal status could impact the transcriptional patterns of Accumulibacter, as indicated by the other eight scenarios. We test this framework using two previously published metatranscriptional datasets (IMG/M-3300002341-3300002346 and NCBI-SRP038016) of clades IB and IIA [11, 41] from two acetate-feeding EBPR reactors (Additional file 1: Figures S3, S4 and Supplementary information). Even running under similar operational parameters, the reactor of clade IIA is stable and effective, while the reactor of clade IB experiences several deteriorations and has poor phosphorus removal performance.

Fig 4.
figure 4

Nine scenarios of the behavior of Accumulibacter (metatranscriptomic pattern and chemical parameters) in response to all environmental conditions (balance of carbon, electron, and energy flows) with specific strategies proposed to balance the carbon and phosphorus feeding to optimize reactor operation. Number labels of modules: 1, acetate uptake; 2, PHA module; 3, LCFA module; 4, AA module; 5, glutamine/glutamate in N module; 6, Gly module; 7, complete TCA cycle; 8, reductive branch of split TCA cycle; 9, oxidative branch of split TCA cycle; 10, partial TCA cycle; 11, denitrification and dissimilatory nitrogen reduction in N module; 12, hydrogenases; 13, PolyP module; 14, phosphorus transporters (pst and pit); 15, ETP; 16, PL module; 17, EPS module; 18, transformation of acetate to pyr

The balance pair of carbon versus electrons

Accumulibacter can employ secondary modules to stabilize the slightly disturbed status in AN phase back to a balanced status for the following AE phase. When provided with excessive primary carbon source (acetate) than primary electron source (Gly) (scenarios B1–B3), Accumulibacter partitions acetate to provide additional electrons (no. 7 and no. 10 in Fig. 4; green-dotted lines in Fig. 3b1) and turns off the oxidative branch of split TCA cycle (no. 9 in Fig. 4) to reserve electrons. On the contrary, under the status of insufficient primary carbon source than primary electron source (scenarios C1–C3), Accumulibacter provokes modules that supplement additional carbon (no. 3–5 in Fig. 4; green-dotted lines in Fig. 3a1) and modules that consume electrons (no. 11 and no. 12 in Fig. 4; red-dotted lines in Fig. 3b1) to maintain redox condition.

Clade IB has overloading acetate (ccenario B1–B3), and Clade IIA has insufficient acetate (scenario C1–C3). In AN phase, carbon is flowing from acetate to PHA and electrons are flowing from glycogen to PHA in both clades. Only clade IIA is found to recruit secondary carbon sources (LCFAs, AAs, and glutamate), indicating that acetate is limited for clade IIA but not for clade IB. Regarding the downstream of carbon flow, clade IB seems to use the complete TCA cycles to generate more electrons, which suggests that the carbon source may not be a limiting factor for clade IB. In addition, the electrons provided by Gly are insufficient in clade IB and overloading in clade IIA. Complete and partial TCA cycles are highly expressed only in clade IB to supply secondary electrons. Instead, clade IIA employs modules to consume excessive electrons, including split TCA cycle (oxidative branch), hydrogenases, and denitrification. All these observations imply that the primary carbon source is overloading for clade IB and insufficient for clade IIA.

Both two clades successfully coordinate the secondary modules to stabilize the disrupted status in AN phase back to the balanced status for the next AE phase. Clades IB and IIA share similar expression profiles of carbon and electron flows in AE phase. Electrons are even enough for hydrogenases and denitrification in both clades (nos. 11 and 12 in Fig. 4; red-dotted lines in Fig. 3b2). Since primary energy source is determined by the preferability and availability of electron acceptors [42], when oxygen is sufficient, the other electron acceptors are mainly used for redox balance. It indicates that the electrons are excessive and that these two clades have recovered to a balanced status.

The balance pair of electron versus energy

When provided with limited phosphorus, Accumulibacter can store more electrons in AE phase to compensate energy in the AN phase; scenarios A2, B2, and C2 describe the unbalanced status of overloading phosphorus. These three scenarios display an increased expression of the electron flow from primary electron source (complete TCA cycle) to primary energy source (ETP) in AE phase (no. 7 and no. 15 in Fig. 4), to supply extra energy for phosphorus uptake and storage. On the contrary, in scenarios A3, B3, and C3, the status of inadequate phosphorus can cause the low expression of primary energy consumer (PolyP) and phosphate transporters (no. 13 and no. 14 in Fig. 4). Since this directly influences the primary energy source (PolyP) in the coming AN phase, more electrons (i.e., Gly) will be used for energy compensation in AN phase. Thus, in the current AE phase, Accumulibacter can allocate more Acyl-CoA to Gly module (no. 6 and no. 10 in Fig. 4; red-dotted lines in Fig. 3c2) for the flowing AN phase [43].

Clade IB has limited phosphorus (scenario B3), and clade IIA has sufficient phosphorus (scenario C1 and C2). In AN phase, PolyP provides energy for both clades and emits part of the intracellular phosphorus [44, 45], while the phosphate transporters (pst and pit) are expressed only in clade IIA. The low expression of PolyP module and phosphate transporters suggests that clade IB is provided with limited phosphorus, when additional energy is provided by the high expression of ETP. In AE phase, the ETP coupling with complete TCA cycle is highly expressed in clade IIA to provide energy. Instead, in clade IB, when limited phosphorus is provided, we find that partial TCA cycle is provoked for additional glycogen replenishment to fuel ETP in the following AN phase [43].

Evaluation of status: good or bad for enrichment

We propose that scenarios A1 and A2, and C1 and C2 are advantageous for Accumulibacter enrichment, to provide overloading phosphorus and slightly limited acetate (Fig. 4). Accumulibacter could prefer the acetate balance (scenarios A1–A3) or acetate shortage (scenarios C1–C3) situations than the acetate overloading (scenarios B1–B3) situation. Those unconsumed acetates will promote the unwanted growth of its competitors, such as glycogen-accumulating organisms (GAOs). In contrast, phosphorus overloading (scenarios A2, B2, and C2) is beneficial for Accumulibacter to compete carbon, while limited phosphorus (scenarios A3, B3, and C3) weakens the ability of Accumulibacter as PAOs and turns it into GAOs [43].

We demonstrate one good example (clade IIA) and one bad example (clade IB) for Accumulibacter enrichment. Two enrichment systems display totally opposite scenarios and should use different strategies for optimization. Clade IIA is fed with limited acetate and insufficient phosphorus, while clade IB is fed with sufficient acetate and inadequate phosphorus. Compared to clade IIA, the status of clade IB is the least favorable for Accumulibacter, in line with our operational experiences of this reactor for 4 years.

Accumulibacter diversity: contribution from non-core-genomes

The diversity within Accumulibacter [46,47,48] contributed by the non-core-genomes could provide strategies to enrich specific clades. Generally, the flexibility of Accumulibacter is mainly related to metabolism, cellular processes, and environmental information processing (Additional file 1: Figure S5), especially in membrane transport, signal transduction, and metabolisms of amino acids and carbohydrates. Specifically, as to carbon sources, type II has an additional feature to reclaim carbon from cysteine and serine while type I could only use the common carbon sources of SCFAs (Fig. 2). This suggests that cysteine and serine could be the selective substrates in future studies to enrich type II and suppress type I. For the electron sources, the catalase (cat) associated with partial TCA cycle is found in clades IIA, IIC, and IA to reoxidize electrons [49]. It implies the flexibility of these clades in maintaining redox balance, and that hydrogen peroxide could be a selective force for these clades. Denitrification could be an advantage of clades IIA, IIC, IIF, IA, IB, and IC to use nitrate and nitrite as TEAs in AN phase. Besides, all clades in type II and clade IB in type I have the potential of nitrogen fixation, while the regulatory and nitrogen stabilization genes (nifW and nifZ) are only harbored by type II.

Overall, Accumulibacter type II, especially clade IIA, encodes more diverse adaptations than type I, which could explain the observation of a wide distribution of clade IIA in different wastewater treatment plants (WWTPs) [18]. This observation also implicates the importance of clade IIA and points out the priority of clade IIA in future studies.

Conclusion

In this study, we present a comprehensive mining method to decode the hidden traits combining genetic and transcriptional datasets, to guide the enrichment of not-yet-cultured populations. We focus on the whole picture of the involvement and cooperation of pathways in the carbon, electron, and energy flows. A new transcriptional analysis is designed to diagnose the status of not-yet-cultured populations in the experimental systems. By doing this, the genomic and transcriptomic data could be linked to the environmental conditions, which could indicate a potential strategy to optimize the enrichment systems. This method is tested on a group of functional microbes by in silico analysis, the Accumulibacter. We find that Accumulibacter can coordinate multiple pathways to stabilize the disrupted status back to balance. This method could help diagnose and provide traits for the enrichment and even isolation of not-yet-cultured populations. We would like to point out the limitation of this study that no experimental validation has yet been conducted to test this method.