Background

Mutations in DJ-1 have been described recently that are associated with recessively inherited Parkinson's disease (PD). Evidence to date suggests that the mutations cause disease by a loss of function mechanism. The reported mutations either delete several exons and result in an effective gene knockout [1] or are point mutations that destabilize the protein [2]. Therefore, the normal cellular function of DJ-1 is a critical piece of information in understanding how these mutations cause PD. DJ-1 has a number of reported functions, including cellular transformation [3], transcriptional effects [4], control of mRNA stability [5] and response to oxidative stress [6, 7] and it is unclear quite how all of these relate to the pathways involved in PD [8].

One way to understand protein function is to find other proteins of known function with sequence or structural homology. This approach has helped in understanding of other PD proteins; parkin was found to be an E3 protein-ubiquitin ligase based on homology to other proteins with similar domain structures [9]. DJ1 shows sequence homology to a number of proteins that contain a ThiJ domain, including protein chaperones [10], catalases [11] proteases [12, 13] and the ThiJ kinases [14, 15]. Previous analyses have suggested that the ThiJ domain may be a member of the large glutamine amidotransferase (GAT) superfamily [11]. Crystal structures of DJ-1 [1620] and other members of this DJ-1/ThiJ/PfpI superfamily including the protease PH1704 [13] have been reported. The proteins have an overall α/β sandwich structure, arranged similarly to the Rossman fold, which is also present in members of the GAT superfamily [11]. The structure is similar to another protein of much lower sequence homology, the E coli chaperone Hsp31 [10, 20].

The multitude of functional groupings within the DJ-1/ThiJ/PfpI superfamily limits our ability to make predictions about the cellular role of the human ortholog. A putative catalytic cysteine, cys106, is present, which has led to the suggestion that DJ-1 may be a protease [17]. However, structural data generally argues against DJ-1 having protease activity as the invariant catalytic triad seen in other cysteine proteases is present but in an unfavorable conformation [18, 20]. On the other hand, one recent report suggests human DJ-1 possesses weak protease activity [21], disputing another claim of chaperone activity [20]. In an attempt to gain further insight to possible roles of DJ-1 we performed a detailed analysis of several hundred sequences of the DJ-1/ThiJ/PfpI superfamily members. These include orthologs (sequences that are separated by speciation) and paralogs, i.e., those that are separated by other types of rearrangements. Surprisingly, we found that the nearest homologous sequences are the bacterial ThiJ genes, suggesting that DJ-1 may have evolved from thiamine synthesis genes that have been dispensed with in eukaryotes.

Results

Using human DJ-1 as a seed sequence for PSI-BLAST, we identified 311 sequences of proteins with significant homology (see additional file 1 for a list of all the sequences we identified). Within this large group there are several distinct subgroups supported by bootstrap analysis. Proteins with similar annotations across different species generally clustered together into distinct clades (Figure 1). Those proteins annotated as DJ-1 clustered into a specific node that included several eukaryotic species, which are likely to be orthologs of each other. As expected, primate members (Homo sapiens and Cercopithecus aethiops, 100% identity) clustered together, with progressively lower similarity to rodent (Rattus norvegicus, Mus musculus and Mesocricetus auratus; 91–97% identity) or other vertebrate (Gallus gallus, Salmo salar and Xenopus laevis; 80–89% identity) homologues. Within each of two invertebrate species with reported sequences (Caenorhabditis elegans and Drosophila melanogaster) there appear to be two distinct DJ-1 paralogs, each with about 40% identity to the human protein. The closest grouping to the eukaryote DJ-1 orthologs is the ThiJ family of 4-methyl-5(β-hydroxyethyl)-thiazol monophosphate biosynthesis enzymes, which we have analyzed in more detail (discussed below).

Figure 1
figure 1

Cladogram of the DJ-1/ThiJ/PfpI superfamily. Consensus maximum likelihood tree with branch distances corresponding to level of bootstrap support. Known structures are highlighted along with the corresponding PDB identifier. From this tree it is clear that the DJ-1 superfamily contains proteins with diverse functions and that the DJ-1 cluster is most similar to the ThiJ subgroup. Group labels are guided by the annotation of the constituent sequences (for more details see text). Unlabeled clusters had a majority of sequences with unknown or disparate function. Sequence identifiers and files for the construction of this tree can be found in the supplemental information. Numbers in parantheses indicate percentage identities; the first number identity within the group, the second is the identity with human DJ-1.

Outside of the DJ-1/ThiJ group, there are a number of distinct clades that have at least one member whose function is known. Of these, three can be separated from the DJ-1/ThiJ proteins by the presence of diagnostic structural elements. Firstly, a series of plant homologues group together and appear to be paralogs as both have a duplicated DJ-1/ThiJ (Pfam PF01965) domains, as described Chinese cabbage [22]. These proteins, from Arabidopsis thaliana and Oryza sativa and Brassica rapa subsp. pekinensis, are annotated as ThiJ or protease-related, but cluster close to the ThiJ family. Secondly, there are a number of bacterial proteins containing a catalase domain and a DJ-1/ThiJ domain. These are large subunit catalases (EC 1.11.1.6), the structure of one of which has been solved [11]. Thirdly, another prominent clade includes the AraC type transcriptional regulators from bacteria. These proteins can be defined by presence of one or more helix-turn-helix (HTH) motifs in the C-terminal portion of the protein. The HTH motif is thought to mediate DNA binding, whilst the ThiJ-like domain may be an amidase, although this is unproven.

Other families have a single DJ-1/ThiJ domain with variable extensions at the C- or N-termini. A major grouping includes two proteases from thermophilic bacteria, PfPI and PH1704 whose ATP-independent protease activity has been demonstrated [12, 23]. The crystal structure has been solved for PH1704 [13]. This protein is hexameric, in contrast to dimeric human DJ-1 and this difference in oligomer formation is mediated by differences at the C-terminal of the two proteins [18]. The proteases that grouped together in this analysis all lack the most C-terminal α-helix found in DJ-1, and thus are also likely to be hexameric, distinguishing them from the DJ-1/ThiJ clade.

Several proteins annotated as sigma cross-reacting proteins cluster together. This family has a unique conserved composition (91.4% identity) that distinguishes it from neighboring families. This group also appears to be most similar to a larger family that includes E coli Hsp31, a chaperone [10, 20], which we have annotated as ThiJ/PfPI-like proteins including chaperones. A Saccharomyces cerevisiae protein, YDR533C, whose transcription is up regulated when yeast cells enter the quiescent state after carbon starvation or in the presence of misfolded proteins [24, 25], is also present in this larger clade. Further analysis (see discussion) supports this group as having protease activities and we have annotated this family as ThiJ/PfpI-like protease/chaperones. How distinct this grouping is from the PfpI-like proteases is unclear, and they might be regarded as a single group. However, identity between these groups is only moderate (approximately 20%), therefore we have annotated them separately (fig 1). The sigma cross-reacting proteins have a distinct ElbB domain (COG3155.1), related to ThiJ but having a moderate overall homology. Hence we have kept these as a separate branch. The structure of a member from E coli has been solved (pdb entry 1OY1) and is dimeric.

To further assess the similarities and differences between the DJ-1 and ThiJ families, we extracted the sequences and realigned them. The resulting cladogram is shown in figure 2. The member of the prokaryotic ThiJ enzymes with highest homology to eukaryotic DJ-1 proteins is from Leptospira interrogans, which has a 42% identity with human DJ-1. This emphasizes the high degree of conservation between these two groups and indicates that there is likely to be structural conservation. Figure 3 and 4 shows a multiple alignment of the DJ-1 and ThiJ families. There are a series of key residues that have been suggested to be important in DJ-1 function which are well conserved between these two groups (see discussion).

Figure 2
figure 2

The DJ1 and ThiJ families Cladogram of the alignment of the sequences belonging to the ThiJ and DJ-1 subgroups. Bootstrap support for this neighbor-joining tree is labeled at the vertices and the sequences are identified by their species name and accession number. The eukaryotic DJ-1 family members are boxed in blue to highlight their distinctness from the bacterial ThiJ proteins.

Figure 3
figure 3

Amino acid conservation of the DJ-1/ThiJ homologues Multiple alignment of sequences within the DJ-1/ThiJ family shows high homology and presence of a number of absolutely conserved amino acids. Bars below each residue indicate the degree of conservation. As in (a), the eukaryotic DJ-1 family is boxed in blue for clarity.

figure 4

Figure 4

Discussion

The aim of this study was to compare sequence homologies between the clearly identifiable DJ-1 homologues and other members of this superfamily whose function or activity is known. The results of a search using PSI-BLAST yielded many genes with significant homology to human DJ-1. We were particularly interested in examining whether sequence analysis would support the previous suggestions that DJ-1 is a chaperone [20] or a protease [21]. Our analysis provides some degree of separation of members of the DJ-1/ThiJ superfamily with these functions.

Although human DJ-1 does not contain a strong catalytic Cys-His-Asp/Glu triad found in proteases such as PH1704, C106 and H126 have been suggested to contribute a catalytic diad [19]. C106 is absolutely conserved in all of the ThiJ and DJ-1 sequences, as it is within most members of the superfamily (data not shown). However, H126 is conserved only within the DJ-1 family. All higher eukaryotic members have an equivalent histidine with the exception of one of the Drosophila genes, which has a phenylalanine. H126 is probably not involved in catalysis, based on the 1.1 Å crystal structure [18], and the significance of conservation of this residue is therefore unclear.

Further evidence that the DJ-1/ThiJ families would have only minor protease activity comes from examination of the sequence around this conserved cysteine. In the protease family, a consensus sequence AIC HGP is found. In the case of PH1704, the equivalent Cys100/His101 pair form part of the catalytic triad [13]. In contrast, the equivalent sequence in human DJ-1 is AIC AGPT, and is conserved in all DJ-1 homologues. These data are consistent with the lack of protease activity in different assays [18, 20]. The AIC AGPT sequence may, however, contribute to the weak protease activity reported recently in vitro [21]. E coli Hsp31 has both protease and chaperone activities [20], and contains the sequence SLC HGP. This made us analyze all the members of the protease/chaperone family (as annotated in figure 1). The consensus sequence is [Aliphatic] [Aliphatic]CH [SAG], with the cys/his pair being invariant. As all known proteases contain adjacent Cys/His residues whereas substitution with Cys/X is found in all non-protease members, we predict that most of the "Hsp31-like proteases/chaperones" will have protease activity. However, our analysis supports the contention that DJ-1 has only minor protease activity. Further experimental evidence to assess the physiological relevance of this weak activity is required.

It should be noted that using PSI-BLAST with human DJ-1 as a seed sequence has limitations. DJ1 and related proteins represent part of the much larger type I glutamine amidotransferases (GATase; Pfam PF00117) superfamily based on structure and sequence similarities [11]. No type I GATase enzymes were identified with the methods we have used. This might not be a substantial limitation as it is not possible to integrate all members of such large and divergent superfamilies in a single tree without loss of predictive value [26]. Equally, there may be important groups of enzymes within the DJ-1/ThiJ/PfPI superfamily, distinct from type I GATases, that have not been highlighted that could be instructive for finding the function of DJ-1. An example is the phosphoribosylformylglycinamidine synthases (FGAM synthases; Pfam PF02700, EC 6.3.5.3). There are at least 50 enzymes with similar annotations within the DJ1/PfPI superfamily annotated in the public database (PF01965 at http://www.sanger.ac.uk/Software/Pfam/index.shtml). The public domain superfamily was constructed using 44 seed sequences, including an FGAM synthase, and identifies a larger grouping (497 sequences) than found in our analysis (311 unique sequences). This is likely due to the generation of a more specific sequence searching profile than the more broadly inclusive methods used to form Pfam families. The sequence identity between FGAM synthases and human DJ-1 is comparable to those between human DJ-1 and some proteins in the additional file 1. Therefore, the limits of the DJ-1 "superfamily" are unclear and the dataset generated in this study may represent the most tractable set of similar sequences rather than the largest possible grouping.

One area that this analysis has allowed us to highlight is the degree of conservation of specific residues that are mutated in PD. As noted previously [1], Leucine 166, which is mutated to proline, is highly conserved throughout the DJ-2 proteins and ThiJ enzymes (with the exception of a phenylalanine in Fusobacterium nucleatum). It appears that L166P, in the penultimate α-helix of human DJ-1, destabilizes the protein [2] perhaps by disrupting this α-helix. Another putative mutation, A104T [27], is almost completely conserved throughout all DJ-1 and ThiJ members. The site of the M26I mutation [28] is also absolutely conserved in all vertebrate orthologs, although a Leucine is present in invertebrates and in the ThiJ enzymes.

Conclusions

Our analyses demonstrate that the eukaryotic DJ-1 and prokaryotic ThiJ families are closely related. However, they also demonstrate the difficulty of predicting function based on sequence. ThiJ was cloned as an enzyme in the biosynthesis of thiamine in E coli [14, 15]. As thiamine is an essential vitamin for many eukaryotes, presumably another use for the gene family has evolved. The mechanistic details of the enzymatic reaction of ThiJ have not been fully elucidated, but it catalyses a phosphorylation reaction of hydroxymethylpyrimidine phosphate, a precursor to thiamine [14]. An equivalent kinase activity has not been detected in human DJ-1 [18]. In contrast, human DJ-1 has been suggested to have either chaperone [20] or a weak protease activity [21]. A tentative conclusion is that as ThiJ activity was dispensed with, the eukaryotic DJ-1 orthologs have converged on a function that was present in one of the archaic paralogs, namely protein chaperone activity. However, equally feasible is that the major function of DJ-1 is in binding RNA [5] or an unrecognized function. The role of the conserved cysteine residue, catalytic in other members of the family, is unclear.

Methods

We performed homology search using iterative PSI-BLAST [29] using human DJ-1 as the seed sequence (NP_009193.2). PSI-BLAST was performed using default parameters from the NCBI site. The search converged in 7 iterations and the results were trimmed for duplicates and hypothetical results. The resulting 311 sequences were then aligned using CLUSTALW. The results were also aligned with T-COFFEE [30] and SAM 3.4 [31], neither of which offered a significant difference in quality (data not shown).

In order to assess the similarity and quality of subgroups in this alignment, different trees were first made with 1,000 bootstrap replicates using neighbor joining on all three alignment methods from CLUSTALW, T-COFFEE and SAM3.4. Each of these methods gave similar subgroups. Subsequently, the final consensus tree was constructed by maximum likelihood using protpars of the PHYLIP package (version 3.5 c, distributed by J Felsenstein, Department of Genetics, University of Washington, Seattle) with 100 bootstrap replicates. This second method adjusted the position of the subgroups relative to each other compared with the neighbor joining but did not change the overall subgroup membership. For figure 2, the TREEVIEW program [32] was used to render the tree used for figure 1. The subgroup containing human DJ-1 was extracted by removing the most specific tree with 100% bootstrap support containing human DJ-1 and its neighbors, the ThiJ group. The resultant subgroup was realigned using T-COFFEE and visually inspected and altered for corrections. One sequence was removed to obtain a higher quality semi-gapless alignment.