Introduction

Aldehyde dehydrogenases (ALDHs; EC1.2.1.3) represent a group of enzymes that oxidise a wide range of endogenous and exogenous aldehydes to their corresponding carboxylic acids [1]. Endogenous aldehydes are formed during the metabolism of amino acids, carbohydrates, lipids, biogenic amines, vitamins and steroids. Biotransformations of a large number of drugs and environmental chemicals also generate aldehydes. Aldehydes are highly reactive electrophilic compounds which interact with thiol and amino groups, the resulting effects vary from physiological and therapeutic to cytotoxic, mutagenic or carcinogenic. In this respect, ALDHs efficiently oxidise and, in most instances, detoxify a significant number of chemically diverse aldehydes which otherwise would be harmful to the organism. Strong evidence supporting this notion comes from the fact that mutations in ALDH genes cause inborn errors of metabolism associated with clinical phenotypes -- such as Sjögren - Larsson syndrome (SLS), type II hyperprolinaemia and γ-hydroxybutyric aciduria [2]. In addition, mutations in ALDH genes contribute to clinically relevant diseases such as cancer and Alzheimer's disease.

There are instances, however, in which ALDHs catalyse reactions yielding chemically reactive or bioactive metabolites that are essential to the organism. Several ALDH enzymes -- including ALDH1A1, ALDH1A2 and ALDH1A3 -- catalyse the irreversible oxidation of retinal to retinoic acid [3]. Whereas the light-absorbing properties of retinal are a necessary element for vision, the carboxylic acid isomers, all-trans-retinoic acid and/or 9-cis-retinoic acid, serve as ligands for the retinoic receptor (RAR) and the retinoid X receptor (RXR) that mediate gene expression for growth and development [4]. The importance of ALDH enzymes in retinoic acid formation became evident from the fact that homozygous disruption of the mouse Aldh1a2 gene results in an embryonic lethal phenotype due to defects in early heart morphogenesis [5, 6], whereas Aldh1a3 null mice die shortly after birth, due to respiratory distress caused by choanal atresia [7].

Formation of retinoic acid and γ-aminobutyric acid (GABA) are among the most intriguing functions of ALDHs regarding bioactivation. GABA is implicated in the regulation of the GABAergic, dopaminergic and opioid systems. Even though the main pathway for GABA synthesis is the decarboxylation of L-glutamate, this neurotransmitter can also be formed from putrescine by direct oxidative deamination to give γ-aminobutyraldehyde, which is then converted into GABA by an ALDH [8]. All in all, the ALDH gene family represents a truly diverse group of proteins which are critical to metabolism.

Multiple function(s) of the ALDH enzymes

Although the major function of ALDH enzymes is the NAD(P)+-dependent aldehyde oxidation, it has become increasingly clear that some, if not most, ALDHs exhibit multiple functions (Figure 1). For example, ALDH1A1, ALDH2, ALDH3A1 and ALDH4A1 are known to catalyse ester hydrolysis, suggesting that the ALDHs may have more than one catalytic function [9]. Indeed, it has recently been suggested that ALDH2 also possesses nitrate reductase activity, which catalyses the formation of 1,2-glyceryl dinitrate and nitrite from nitroglycerin within mitochondria, leading to the production of cGMP and vasorelaxation [10].

Figure 1
figure 1

Multiple functions of aldehyde dehydrogenase (ALDH) enzymes. Endobiotics, endogenous compounds. Xenobiotics, foreign chemicals.

Aside from their catalytic properties, ALDH proteins are capable of non-catalytic interactions with chemically diverse endogenous compounds and chemotherapeutic agents. In this context, ALDH1A1 has been identified as an androgen-binding protein prominently expressed in human genital fibroblasts; as a cholesterol-binding protein in bovine lens epithelium; and as a cytosolic thyroid hormone-binding protein in Xenopus [11]. ALDH1A1 has also been identified as a flavopyridol-binding protein in non-small cell lung carcinomas and as a daunorubicin binding protein in rat liver [1]. Similar to ALDH1A1, ALDH2 also displays binding capabilities with exogenous compounds, which became evident from its identification as an acetaminophen binding protein [1].

In addition, it has been suggested that some ALDHs may play a critical role in cellular homeostasis by maintaining redox balance [12]. For example, it has been proposed that ALDH3A1 may scavenge hydroxyl radicals via the -SH groups of Cys and Met residues, and that both ALDH3A1 and ALDH1A1 may contribute to the antioxidant capacity of the cell by generating NADPH and/or NADH [13]. The enzymatic activity of ALDH3A1 generates NADPH, which is linked to the regeneration of reduced glutathione (GSH) from its oxidised form (GSSG) via the glutathione reductase/peroxidase system. NAD(P)H may also function as a direct antioxidant by reducing glutathiyl radicals (GSz) or tyrosyl radicals [14]. The expression of ALDH3A1 and ALDH1A1 at very high concentrations in the mammalian cornea and lens (crystallins) has led to additional hypotheses regarding the multifunctional properties of these proteins -- including a structural function contributing to transparency [15, 16]. Finally, the ALDH7A1 gene product is similar to the green garden pea '26g protein' involved in the regulation of turgor pressure, suggesting that the ALDH7A1 protein might have osmoregulatory properties.

Table 1 Human ALDH genes listed in the Human Gene Nomenclature Committee database, plus three pseudogenes
Figure 2
figure 2

Dendrogram of the 19 human aldehyde dehydrogenase ( ALDH ) genes that are bona fide members of the ALDH superfamily. To avoid additional clutter, alternative splice variants of ALDH genes have not been included in the construction of this tree or the three pseudogenes listed in Table 1. This neighbour-joining method gives various branches of different lengths, reflecting that evolutionary divergence is not the same between different branches of the gene tree.

Evolution of the ALDHgenes

ALDHs have a wide distribution in nature, ranging from bacteria and yeasts to plants and animals [17]. Sequence comparisons indicate extensive similarity between bacterial and human ALDHs and suggest that the superfamily has a common ancestral gene, dating back to ~3 billion years ago [18]. A systematic nomenclature scheme for the ALDH gene superfamily (in animals, plants, bacteria and yeasts) has been developed, based on evolutionary divergence [18], which has been implemented with biannual updates [19, 20] and is available via the internet (http://www.aldh.org).

ALDH proteins are conveniently classified into families and subfamilies based on the percentage of amino acid identity. Proteins sharing ≥ 40 per cent identity are assigned to a particular family designated by an Arabic numeral, whereas those sharing ≥ 60 per cent identity are classified in the same subfamily designated by a letter. These cut-off values follow the original recommendations by Margaret Dayhoff and were first applied to the cytochrome P-450 superfamily [21]. At present, more than 130 additional gene superfamilies and large gene families follow this same format.

Endogenous functions of ALDHs

Antioxidants and oxidative stress increase the expression of certain ALDH genes, leading to increased protection of the cell against insult by environmental chemicals and drugs [22]. Increased expression of certain ALDHs in tumour cells, however, leads to decreased cellular sensitivity to cyclophosphamide and other oxazaphosphorines and, thus, to clinical problems in the treatment of cancer patients [23]. The reason for certain ALDHs -- and other non-P450 members of the [Ah] battery -- to be upregulated in some tumours [24] remains an enigma.

Numerous polymorphisms exist in the human ALDH genes, some of which cause inborn errors of metabolism and contribute to clinically relevant diseases [2]. Polymorphism in the ALDH2 gene is associated with altered acetaldehyde metabolism, alcohol-induced 'flushing' syndrome, decreased risk for alcoholism and increased risk of ethanol-induced cancers. The genetic ALDH2 deficiency has also been reported as a risk factor in late-onset Alzheimer's disease [25]. Epidemiological studies have revealed conflicting evidence about the association between the ALDH2 polymorphism and ethanol-induced hypertension [11]. Polymorphisms in the ALDH3A2, ALDH4A1, ALDH5A1 and ALDH6A1 genes are associated with metabolic diseases, which, in most cases, are characterised by neurological complications. Mutations in ALDH3A2 are the molecular basis for SLS, an autosomal recessive disorder characterised by congenital ichthyosis, mental retardation, spasticity, ocular abnormalities and pruritus [26, 27]. Premature birth has also been observed in 73 per cent of children with SLS [28]. Loss of ALDH4A1 function causes type II hyperprolinaemia, an autosomal recessive disorder characterised by plasma accumulation of proline and Δ1-pyrroline-5-carboxylate, as well as neurological manifestations such as seizures and mental retardation [29]. Loss of ALDH5A1 function leads to γ-hydroxybutyric aciduria, a rare autosomal recessive disorder in GABA metabolism associated with accumulation of both GABA and γ-hydroxybutyric acid in blood serum and cerebrospinal fluid [30]. ALDH6A1 (methylmalonic semialdehyde dehydrogenase) deficiency is an inborn metabolic disorder that results in developmental delay [31].

Latest genes in the ALDHdatabase

A search of the Human Gene Nomenclature Committee (HGNC) database using 'ALDH' produced 20 'hits': 19 ALDH genes plus AGPS (encoding alkylglycerone phosphate synthase). This latter entry appears in the database because one of its aliases is 'ALDHPSY'. A search of the HGNC database using 'aldehyde dehydrogenase' produced 21 hits -- the 19 putatively functional ALDH genes plus two others, AASDHPPT and ADH5 (Table 1). After various analyses, it was concluded that these latter two, evolutionarily, do not belong to the ALDH gene superfamily. AASDHPPT was found to belong to the 4'-phosphopantetheinyl transferase superfamily (pfam01648; ACPS) and ADH5 belongs to the alcohol dehydrogenase family (pfam00107: ADH_zinc_N). These two genes appear in the HGNC database in response to the cue 'aldehyde dehydrogenase', because these two words are included within their names: aminoadipate-semialdehyde dehydrogenase-phosphopantetheinyl transferase and form aldehyde dehydrogenase. Interestingly, the Enzyme Commission (EC) database gives L-aminoadipate-semialdehyde dehydrogenase the number EC 1.2.1 [31]. -- meaning that it is closely related functionally to the other ALDH activities (EC 1.2.1.3).

The two most recently discovered ALDH genes are ALDH1L2 and ALDH16A1. The ALDH1L2 protein is very similar to ALDH1L1, which is better known as 10-formyltetrahydrofolate dehydrogenase (TFDH), a bifunctional enzyme formed from the fusion of two unrelated genes; TFDH is highly expressed in human liver, kidney and pancreas [32]. The deduced amino acid sequence of ALDH1L1 contains three domains -- including the amino terminal (residues 1 - 203), which is approximately 30 per cent identical to phosphoribosylglycinamide formyltransferase (EC 2.1.2.2), and the carboxyl terminal (residues 417 - 902), which belongs to the ALDH superfamily [33]. The intermediate domain (residues 204 - 416) does not appear to have any known catalytic function, although it shows significant homology with the structural domain of a calmodulin-like protein [34]. This intermediate domain is apparently an essential structural element that aligns the two functional domains together for 10-formyl-tetrahydrofolate (10-FTHF) dehydrogenase activity [35], -- the primary function of this enzyme [31]. This multidomain enzyme catalyses: (a) the NADP+-dependent oxidation of 10-FTHF to tetrahydrofolate (THF); (b) the NADP+-dependent oxidation of 2-propanal and acetaldehyde and; (c) the NADP+-independent hydrolysis of 10-FTHF to formate and THF [34]. ALDH1L1 is involved in formate metabolism, as well as the regulation of 10-FTHF and THF, which are principal sources of folate in the cell.

The ALDH1L2 gene encodes a protein that is 72.3 per cent identical with the ALDH1L1 and is also a fusion gene, comprising three domains: (a) the formyl-trans-N-formyl transferase (pfam00551) at the amino terminal (residues 23 - 202); (b) the formyltransferase carboxyl terminal domain (pfam02911) in the middle (residues 226 - 327); and (c) the aldehyde dehydrogenase domain at the carboxyl terminal (residues 451 - 910). No functional data have yet been reported for the ALDH1L2 protein. It is worth mentioning that the BLAST scores for the ALDH domain in both the ALDH1L1 and ALDH1L2 genes are much higher than those for the other two domains, which is strong evidence to support the notion that these two genes should be listed as ALDH genes.

The ALDH16A1 gene, located at 19q13.33, was identified recently by the National Institutes of Health Mammalian Gene Collection (MGC) Program, which represents a multi-institutional effort to identify and sequence a cDNA clone containing a complete open reading frame for each human and mouse gene [36]. The ALDH16A1 gene encodes a protein of 802 amino acids (listed in databases as 'hypothetical protein MGC10204'), which is ~35 per cent identical to putatively membrane-anchored ALDHs found in bacterial species such as Sinorhizobium meliloti 102. Putative orthologues of ALDH16A1 are found in mouse, rat and chimpanzee, and exhibit around 72 - 74 per cent amino acid identity with the human ALDH16A1.

Finally, there are three pseudogenes -- ALDH7A1P1, ALDH7A1P2 and ALDH7A1P3 -- located at Chr 5q14, 7q36 and 10q21, respectively, of which only ALDH7A1P1 meets the HGNC criteria for a pseudogene (at least 50 per cent amino acid identity across 50 per cent of the open reading frame); however, the names ALDH7A1P2 and ALDH7A1P3 are proposed because sequence homology with the ALDH7A1 gene is significantly higher than that with any other ALDH gene. An alternative nomenclature system for naming four various types of pseudogenes has recently been proposed [37]. As is commonly seen with pseudogenes in the mammalian genome, 37 all three of these pseudogenes are found at chromosomal locations that differ from that of the ALDH7A1 functional gene from which the pseudogenes clearly originated.

Conclusions

The authors' current analysis of the ALDH genes within the human genome is now probably complete, and it can be concluded that the human ALDH gene superfamily comprises 19 genes in 11 families and four subfamilies (Figure 2). The ALDH1 family contains six functional genes: the cytosolic ALDH1A1 and the mitochondrial ALDH1B1 may be involved in acetaldehyde metabolism; ALDH1A1 also participates in retinal oxidation and the detoxification of cyclophosphamide; the ALDH1A2 and ALDH1A3 proteins are integral to the oxidation of retinal to retinoic acid; the ALDH1L1 gene codes for 10-FTHF dehydrogenase; the ALDH1L2 gene product is very similar to that of ALDH1L1, but no functional data are available yet. The ALDH2 family has a single member, encoding the mitochondrial ALDH that exhibits the highest affinity for acetaldehyde and is critical in ethanol metabolism. Although ALDH2 officially qualifies as a seventh member of the ALDH1 family, its longstanding name of "ALDH2" associated with ethanol emtabolism has been grandfathered into the more recent nomenclature system based on evolutionary divergence [18]. The ALDH3A subfamily contains the dioxin-inducible ALDH3A1 and ALDH3A2, which are primarily involved in the oxidation of medium- and long-chain aliphatic and aromatic aldehydes. The ALDH3B subfamily consists of two structurally related genes, ALDH3B1 and ALDH3B2; as yet, there are no functional data for either gene product. ALDH5A1 encodes the succinic semialdehyde dehydrogenase. ALDH6A1 encodes the acetyl CoAdependent methylmalonate semialdehyde dehydrogenase. The ALDH7A1 gene product, also known as 'antiquitin', is similar to the green garden pea 26 g protein involved in the regulation of turgor pressure. ALDH8A1 appears to metabolise retinal. ALDH9A1 codes for an enzyme that participates in the metabolism of γ-aminobutyraldehyde and aminoaldehydes derived from polyamines. The ALDH16A1 gene encodes an 802-amino acid protein with as-yet unknown function. Finally, the ALDH18A1 gene encodes Δ1-pyrroline-5-carboxylate synthetase, which qualifies for classification in the ALDH superfamily based on the sequence homology of one of the protein domains (residues 361 - 772).