Background

Cytochrome P450 is the collective name for a super family of heme-containing monooxygenases. P450 enzymes not only participate in the production of diverse metabolites but also play critical roles in organism's adaptation to specific ecological and/or nutritional niches by modifying potentially harmful environmental chemicals. In fungi, P450 enzymes have contributed to exploration of and adaptation to diverse ecological niches [1, 2].

Rapidly accumulating genome sequences from diverse fungal species, including more than 80 species with more currently being sequenced [3], offer opportunities to study the genetic and evolutionary mechanisms underpinning different fungal life styles at the genome level [47]. To support such studies with the focus on cytochrome P450s, we constructed a new platform named as the Fungal Cytochrome P450 Database (FCPD), which archives P450s in most sequenced fungal and oomycetes species and allows comparison of the archived data with previously published datasets, such as the Cytochrome P450 Engineering Database [8], a manually curated P450 database at http://drnelson.utmem.edu/CytochromeP450.html (referred as the Nelson's P450 database herein), and P450 datasets derived from extensive phylogenetic analyses of selected fungal taxon groups [9, 10]. The FCPD also supports multifaceted analyses of P450s using various web-based bioinformatics tools supported by the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr/) [3]. The FCPD, in combination with high-throughput experimental approaches, will advance our understanding of the roles and evolution of P450s.

Construction and content

Pipeline for identifying and classifying fungal P450s

To identify P450 proteins from genome sequences, standardized genome databases managed by CFGP (http://cfgp.snu.ac.kr/) [3] and annotated information of each ORF by InterPro scan [11] were used. The pipeline for the identification and archiving of P450s consists of four steps (Figure 1). In the first step, all proteins carrying one or more of 16 InterPro terms associated with cytochrome P450 were identified and classified according to associated InterPro terms. Domain information of P450 proteins was also retrieved from the InterPro scan results. To filter out potential false positives (i.e., those carrying a very short domain), the minimum length for IPR001128 (Cytochrome P450) was set at 25 amino acid (aa). Since some of these potential false positives might indeed belong to novel P450s, rather than discarding them, they were labelled as "questionable P450" in FCPD. Secondly, using the collection of putative P450 sequences, cache tables, especially for results from several statistical analyses, were created to speed up data retrieval. BLAST datasets were also generated to support BLAST searches of P450s via the FCPD web site and cluster analysis. Thirdly, class-specific and cluster-specific neighbour joining phylogenetic trees that show relationships among P450s within individual phylogenetic groups (e.g., Figure 2) were constructed (bootstrapped with 2,000 or 10,000 repeats), which are displayed by Phyloviewer (http://www.phyloviewer.org/; Park et al., unpublished) on the FCPD web site. Using the BLAST dataset, fungal P450s were clustered using tribe-MCL [12], and compared with the data in three publicly available databases: the Cytochrome P450 Engineering database [8], the Nelson's P450 database, and a set of phylogenetically analyzed P450s in multiple fungal species [9, 10]. Results from this comparison were stored in the FCPD for viewing via the FCPD web site. For species with multiple versions of genome annotation, data generated using different versions were linked to provide the history of annotation.

Figure 1
figure 1

Data retrieval pipeline in FCPD. Four-steps involved in identifying and classifying fungal P450s in FCPD are presented as a flowchart.

Figure 2
figure 2

Phylogenetic analysis of E-class P450, group IV. A bootstrapped phylogenetic tree was constructed using Phyloviewer http://www.phyloviewer.org/. Four different clades in the tree are indicated as blue lines.

As the fourth step, using BLAST all P450s archived in FCPD were matched to the corresponding families in the Nelson's P450s database, which contains manually curated data based on the P450 International Nomenclature [13, 14]. For each P450, the assigned family name was considered highly confident ('> = 44% identity' in the site), when the degree of aa sequence identity was 44% or higher. When no match at that level could be found in the Nelson's P450 database, the best hit in BLAST search was chosen to assign the family name and labelled as low confidence ('< 44% identity' in the site). Considering that P450s are very diverse and that the Nelson's P450 database covers less fungal species than FCPD, it is highly likely that some of the P450s with low confidence represent novel families that have yet to be registered in the Nelson's P450 database (Figure 3). This annotation result was stored in FCPD and can be viewed through the FCPD web site.

Figure 3
figure 3

Confidence levels in the family assignment of individual P450s in different fungal phyla. Five fungal phyla and oomycetes are shown below the X-axis. The Y-axis indicates the proportion of P450s classified with high confidence or low confidence. The numbers on the top of each bar indicate the number of P450 in each class.

In the genomes of 66 fungal and 4 oomycete species, 4,538 putative P450 genes were identified. Although oomycete species belong to the kingdom Stramenophila and show closer phylogenetic relationships to brown algae and diatoms [15], they have been traditionally studied by mycologists due to their morphological similarities with true fungi, and their P450s were included in FCPD.

Evaluation of the accuracy of annotation via the automated pipeline in FCPD by comparing with data archived in the manually curated Nelson's P450s database

The automated annotation process of P450 in FCPD may result in some false-positives and negatives. To evaluate its accuracy, all 886 P450s identified using the pipeline in 12 fungal species were compared with manually curated data in the Nelson's P450 database. The positive predictive value (PPV; the proportion of the predicted P450s in FCPD to P450s that have been archived in the Nelson's P450 database) was 0.894 (792 out of 886 P450s in FCPD were matched to P450s in Nelson's P450 database). Some putative false positives in FCPD appeared to be pseudo genes. Another factor that contributed to the discrepancy between the two sources is that some data in the Nelson's P450 database were based on a version earlier than what was used for FCPD (e.g., version 4 of Magnaporthe oryzae genome having been used for the former, while FCPD being based on version 5). Gene prediction models employed to analyze different versions might have had different predictions. In contrast, 1,032 out of 1,034 fungal P450s curated in the Nelson's P450 database were identified as P450 by the FCPD pipeline (99.8% sensitivity), supporting the reliability of the FCPD pipeline. The two P450s not identified as P450 by FCPD came from Phytophthora sojae and P. ramorum, respectively and corresponded to truncated sequences (34 and 89 aa, respectively, and were labelled as fragment of P450 in the Nelson's P450 database). Detailed analyses of the underlying reasons for the inconsistency between the two sources will help us improve the automated annotation pipeline of FCPD.

Notable features in fungal P450s in the taxonomic context

The numbers of P450s in individual species exhibited certain taxon-specific features (Table 1). Within the phylum Ascomycota, members of the subphylum Pezizomycotina typically carry around 100 P450s with the exception of four species (Coccidioides immitis, Histoplasma capsulatum, Uncinocarpus reessi and Neurospora crassa) that only carry 22 to 46 P450s. The proportion of P450s in the total proteome in the subphylum Pezizomycotina (0.63% in average) is twice as large as that of vertebrates (0.33%) but is less than that of plant species (0.82%). In contrast to the Pezizomycotina, species in the subphyla Saccharomycotina and Taphrinomycotina have a very few P450s (e.g., only 3 P450s in Saccharomyces cerevisiae and 2 P450s in Schizosaccharomyces pombe). Within the phylum Basidiomycota, Postia placenta carries 353 P450s (2.06% of the total proteome), while strains of Cryptococcus neoformans have 5 to 6 P450s (0.08% ~ 0.09% of the total proteome). Interestingly, Encephalitozoon cuniculi and Antonospora locustae, species in the phylum Mycosporodia, do not appear to have any P450s, probably reflecting their obligate, intracellular parasitic life style. Four oomycete species, including Phytophthora infestans, P. sojae, P. ramorum and Hyaloperonospora parasitica, also carry relatively low numbers of P450s (9 to 35 and 0.06 to 0.2% of the total proteome).

Table 1 P450s in the fungal kingdom

Three P450 classes defined by InterPro terms, including group I in E-class P450, group IV in E-class P450 and Cytochrome P450, contain 3,866 out of 4,538 (85.2%) fungal/oomycete P450s. Only 8 out of 16 classes have fungal/oomycete P450s. Among other classes, P450s belonging to the pisatin demethylase (PDA)-like class are present only in the subphylum Pezizomycotina (phylum Ascomycota) and in the phylum Basidiomycota, suggesting the possibility that PDA-related P450s might have emerged twice independently during fungal evolution.

Distribution patterns of fungal P450s among clusters and clans

When fungal/oomycetes P450s were combined with 5,447 P450s extracted from 40 other eukaryotic and prokaryotic species and clustered using tribe-MCL (with inflation factor of 5.0; the most strict condition for clustering based on sequence similarity), 141 clusters were identified. Among these, 74 clusters contain only fungal P450s, suggesting that many fungal P450s have a configuration unique to fungi. The taxonomic origins of fungal P450s in the 26 clusters that contain more than 10 fungal P450s were analyzed (Figure 4). P450s in the phylum Ascomycota are dominant because of abundant genome sequences from members of this group. Cluster 19.1 is dominated by P450s encoded members of the subphylum Agricomycotina (phylum Basidiomycota) and Clusters 3.1 and 4.1 are Zygomycota-specific. Cluster 8.1 contains 101 out of 106 oomycetes P450s (95.3%). Nine P450s encoded by Batrachochytrium dendrobatidis, the only sequenced species in the phylum Chitridiomycota, are scattered to 8 clusters, suggesting that they likely have distinct functions and evolutionary origins. Sequences of additional genomes are needed to further investigate the evolution of P450s in this phylum.

Figure 4
figure 4

Distribution pattern of 25 major P450 clusters. Cluster names are shown below the X-axis, and the names of fungal phyla are shown at the y-axis. Non-fungi indicate P450s in plants and animals. The Z-axis indicates numbers of P450s in individual groups.

To compare the relationship between P450 clusters and clans, 115 clans identified in four species (including 375 P450s in total), including M. oryzae, Fusarium graminearum, N. crassa and Aspergillus nidulans [9], were collected and analyzed. Interestingly, only 4 out of 115 clans (6.1%) are scattered to more than one P450 clusters. For example, P450s included in clan FF59 were distributed to four P450 clusters (Clusters 4.1, 8.1, 31.1 and 73.1). However, each of the remaining clans belongs to one specific cluster, supporting a good correlation between two classification systems.

Assignment of P450s archived in FCPD to individual P450 families based on the international nomenclature scheme

The Nelson's P450 database classified 1,016 (98.26%) out of 1,034 fungal/oomycete P450s into 276 P450 families. Most P450s in FCPD (4,446 out of 4,538; 97.97%) were matched to corresponding families in the Nelson's P450 database (see above). 2,978 P450s (66.98%) were tagged to specific families with high confidence, while 1,468 P450s (33.02%) were assigned to families with low confidence (Figure 3). In the phylum Ascomycota, the assignment of 1,007 P450s (29.24%) was supported with low confidence. In the phylum Basidiomycota, the proportion was 44.56% (352 out of 790 P450s). More than 90% P450s (104 out of 110) in the phylum Zygomycota and 100% P450s in the phylum Chytridiomycota did not closely match with any families in the Nelson's P450 database. These results strongly suggest that new fungal families need to be defined.

Update of FCPD

Considering the rapid increase in fungal genome sequencing [3], timely update of FCPD is critical to present the latest information to users. The BLAST dataset, bootstrapped phylogenetic trees specific for individual classes and clusters, results from clustering analysis and annotation of P450s based on the international P450 nomenclature will be updated automatically once new P450s have been identified via the identification pipeline. Since the identification of P450s depends on the accuracy of a gene model employed to annotate the genome, as a new version of previously released genome sequences becomes available, FCPD will be updated with the data based on earlier versions being tagged as an "Old putative P450 sequences." Links between new and old versions will be provided.

Utilities and discussion

Accessing lists and sequences of fungal P450s based on species of origin and taxonomic position

To support efficient search and retrieval of sequences of P450s, data archived in FCPD can be browsed and searched through multiple methods. Upon selecting a species of interest, general information about the species and a list of its P450s can be viewed. From this list, any P450 sequences can be stored in a personal data repository called the Favorite, in which six useful bioinformatic tools can be utilized to analyze the stored data. The Favorite is a virtual space for storing sequences archived in CFGP [3]. A list of P450s belonging to each class defined by InterPro terms or cluster can also be displayed. Taxonomical distribution of P450s, resulted from comparison with data in the Cytochrome P450 Engineering Database (CYP450ED) [8] and two previous studies on fungal P450s [9, 10], can be browsed. P450 sequences in FCPD can also be searched by gene name.

BLAST search of all or subsets of P450s

In FCPD, five different databases of P450s, including all P450s (including those from plants and animals), all fungal/oomycete P450s and three fungal phylum-specific databases of P450s, can be searched using BLAST. Additionally, fungal P450 sequences in the Nelson's P450 database can also be searched. From BLAST search results, sequences of individual P450s can be saved in the Favorite for subsequent analyses.

Analyses of P450s using tools in the Comparative Fungal Genomics Platform

Many on-line databases that archive gene families allow downloading of all or part of data to user's computer but often do not provide data analysis tools via the database site. Consequently, to conduct desired analyses, users may have to visit multiple websites to access desired data analysis tools and/or install programs in personal computer. In FCPD, sequences of one or more fungal P450s can be selected by clicking check boxes next to each P450 and stored them into the Favorite. The Object Browser in FCPD supports the transfer of chosen sequences from the Favorite to CFGP in which the data can be analyzed using six useful bioinformatics tools [3]. These tools include BLAST, ClustalW, InterPro Scan, PSort, SignalP 3.0 and BLASTMatrix. The BLASTMatrix is a novel tool for surveying the presence of genes homologous to a query in multiple species simultaneously. Once any new analysis tool has been added to CFGP, users of FCPD will be able to use the tool immediately.

Visualization of chromosomal distribution patterns of P450s via SNUGB

To aid for the visualization of chromosomal distribution pattern of P450s for species with available physical chromosome map information, FCPD provides a diagram illustrating position of P450s on individual chromosomes (Figure 5), which are drawn by a newly developed genome browser called SNUGB (http://genomebrowser.snu.ac.kr/; Jung et al., submitted). Currently, chromosomal maps of 13 fungal species are available.

Figure 5
figure 5

Chromosomal distribution of P450s on the genome of Aspergillus fumigatus. On eight chromosomes of A. fumigatus, P450s identified in FCPD were displayed as red bars with their names. When mouse cursor moves on each name, a yellowish label will appear, which provides link to information page of chosen P450. This display is supported by SNUGB http://genomebrowser.snu.ac.kr/.

Conclusion

To our knowledge, FCPD is the most comprehensive database that archives and classifies P450s in publicly available fungal and oomycete genomes (65 fungal and 4 oomycete species) through a systematic identification pipeline. The reliability of the pipeline in retrieving fungal P450 sequences was evaluated by comparing resulting data with other established datasets, and the data from these sources were archived in FCPD for comparison and search. The pipeline also links annotated information from different versions of fungal genome sequences. Numbers of P450s in individual fungal species vary widely, and fungal specific P450 clusters were found via clustering analysis. In combination with other bioinformatic platforms, such as CFGP http://cfgp.snu.ac.kr/[3], Phyloviewer (http://www.phyloviewer.org/; Park et al., unpublished), and SNUGB (http://genomebrowser.snu.ac.kr/; Jung et al., submitted), FCPD provides a highly integrated platform supporting systematic studies on fungal P450s.

Availability and requirements

All data described in this paper can be freely browsed and downloaded through the FCPD web site at http://p450.riceblast.snu.ac.kr/.