Genomics and proteomics of vertebrate cholesterol ester lipase (LIPA) and cholesterol 25-hydroxylase (CH25H)

Cholesterol ester lipase (LIPA; EC 3.1.1.13) and cholesterol 25-hydroxylase (CH25H; EC 1.14.99.48) play essential role in cholesterol metabolism in the body by hydrolysing cholesteryl esters and triglycerides within lysosomes (LIPA) and catalysing the formation of 25-hydroxycholesterol from cholesterol (CH25H) which acts to repress cholesterol biosynthesis. Bioinformatic methods were used to predict the amino acid sequences, structures and genomic features of several vertebrate LIPA and CH25H genes and proteins, and to examine the phylogeny of vertebrate LIPA. Amino acid sequence alignments and predicted subunit structures enabled the identification of key sequences previously reported for human LIPA and CH25H and transmembrane structures for vertebrate CH25H sequences. Vertebrate LIPA and CH25H genes were located in tandem on all vertebrate genomes examined and showed several predicted transcription factor binding sites and CpG islands located within the 5′ regions of the human genes. Vertebrate LIPA genes contained nine coding exons, while all vertebrate CH25H genes were without introns. Phylogenetic analysis demonstrated the distinct nature of the vertebrate LIPA gene and protein family in comparison with other vertebrate acid lipases and has apparently evolved from an ancestral LIPA gene which predated the appearance of vertebrates. Electronic supplementary material The online version of this article (doi:10.1007/s13205-011-0013-9) contains supplementary material, which is available to authorized users.


Introduction
Lysosomal acid lipase or cholesteryl ester hydrolase (also called lipase A or LIPA) (EC 3.1.1.13) catalyses the hydrolysis of cholesterol esters or triglycerides which have been localized within lysosomes following a receptormediated endocytosis of low-density lipoprotein (LDL) particles (Goldstein et al. 1975;Anderson et al. 1994;Wang et al. 2008). Inborn errors of metabolism for the human gene encoding this enzyme (LIPA) have been described, including Wolman disease (WOD), resulting from a major defect of the gene which leads to a cholesteryl ester storage disease and loss of life, usually within 1 year of age while a second defect of the human LIPA gene generates a milder late-onset cholesteryl ester storage disease (CESD) (Beaudet et al. 1977;Burton and Reed 1981;Hoeg et al. 1984).
LIPA is localized on chromosome 10 of the human genome and is highly expressed throughout the body, and contains nine coding exons (Koch et al. 1981;Anderson and Sando 1991;Ameis et al. 1994). Several other acid lipase genes, including LIPF (encoding gastric triacylglycerol lipase), LIPJ (encoding lipase J); and LIPK, LIPM and LIPN (encoding epidermis acid lipases K, M and N), are also located within an acid lipase gene cluster on human chromosome 10 (Bodmer et al. 1987;Deloukas et al. 2004;Toulza et al. 2007). A new acid lipase gene (designated as Lipo) has also been recently reported for mouse and rat genomes (Holmes et al. 2010). The human acid lipase gene cluster encodes enzymes with similar sequences which are distinct from the ''neutral lipases'', including endothelial lipase (EL), lipoprotein lipase (LPL) and hepatic lipase (HL), which perform specific role in high-density lipoprotein (HDL), LDL and hepatic lipid metabolism, respectively (Wion et al. 1987;Martin et al. 1988;Cai et al. 1989;Ishimura-Oka et al. 1992;Hirata et al. 1999;Jaye et al. 1999).
Cholesterol 25-hydroxylase (CH25H or cholesterol 25-monooxygenase) (EC 1.14.99.38) catalyses the formation of 25-hydroxycholesterol from cholesterol which may serve as a corepressor of cholesterol biosynthetic enymes by blocking sterol regulatory element binding protein processing (Lund et al. 1998). 25-Hydroxysterol is also an activator of gene signalling pathways and an immunoregulatory lipid produced by macrophages to negatively regulate the adaptive immune response in mice (Dwyer et al. 2007;Baumann et al. 2009). CH25H is a member of an enzyme family that utilizes di-iron cofactors to catalyse the hydroxylation of sterol substrates, is encoded by an intronless gene (CH25H) located proximally to LIPA on human chromosome 10 and is an integral membrane protein located in the endoplasmic reticulum of liver and many other tissues of the body (Lund et al. 1998;Deloukas et al. 2004). Epidemiological studies have suggested that cholesterol metabolism plays a role in Alzheimer's disease (AD) pathogenesis and several of these genes, including LIPA and CH25H, have been investigated as possible risk factors for AD (Riemenschneider et al. 2004;Shownkeen et al. 2004;Shibata et al. 2006). Even though a linkage peak was identified within the relevant linkage region on chromosome 10, LIPA and CH25H gene markers were not significantly associated with susceptibility to AD.
This study describes the predicted sequences, structures and phylogeny of several mammalian and other vertebrate LIPA and CH25H genes and compares these results for those previously reported for human (Homo sapiens) and mouse (Mus musculus) LIPA and CH25H (Koch et al. 1981;Anderson and Sando 1991;Ameis et al. 1994;Lund et al. 1998). Bioinformatic methods were used to predict the sequences and structures for vertebrate LIPA and CH25H and gene locations for these genes, using data from the respective genome sequences. Phylogenetic analyses also describe the relationships and potential origins of vertebrate LIPA genes during mammalian and vertebrate evolution in comparison with other acid lipase genes.

Materials and methods
Vertebrate lipase and cholesterol 25-hydroxylase gene and protein bioinformatic identification BLAST (Basic Local Alignment Search Tool) studies were undertaken using web tools from the National Center for Biotechnology Information (NCBI; http://blast.ncbi.nlm. nih.gov/Blast.cgi Altschul et al. 1997 . This procedure produced multiple BLAST ''hits'' for each of the protein databases which were individually examined and retained in FASTA format, and a record kept was the sequences of predicted mRNAs and encoded LIPA-and CH25H-like proteins. These were derived from annotated genomic sequences using the gene prediction method: GNOMON and predicted sequences with high similarity scores for many of the vertebrate LIPA and CH25H genes and proteins examined (see Table 1). The orangutan (Pongo abelii) and marmoset (Callithrix jacchus) genomes were subjected to BLAT (BLAST-Like Alignment Tool) analysis using the human LIPA protein sequence and the UC Santa Cruz genome browser (http://genome.ucsc.edu/cgi-bin/hgBlat) with the default settings to obtain an Ensembl generated protein sequence (Hubbard et al. 2007). A similar BLAT analysis was conducted of the stickleback fish (Gasterosteus aculeatus) genome [http://genome.ucsc.edu/cgi-bin/ hgBlat] using the frog (Xenopus tropicalis) LIPA sequence (see Table 1).
BLAT analyses were then undertaken for each of the predicted LIPA and CH25H amino acid sequences using the UC Santa Cruz web browser (http://genome.ucsc. edu/cgi-bin/hgBlat) (Kent et al. 2003) with the default settings to obtain the predicted locations for each of the vertebrate LIPA and CH25H genes, including predicted exon boundary locations and gene sizes. BLAT analyses were also performed of human LIPF, LIPJ, LIPK, LIPM    Sources for LIPA and CH25H sequences were provided by the above sources 3 Biotech (2011) 1:99-109 101 and LIPN genes and the mouse Lipo1-like gene using previously reported sequences for encoded subunits in each case (see Table 1). Structures for the major human LIPA and CH25H isoforms (gene splicing variants) were obtained using the AceView website to examine the predicted gene structures using the human LIPA and CH25H genes to interrogate the database of human mRNA sequences ( Phylogenetic studies and sequence divergence Alignments of protein sequences were assembled using BioEdit v.5.0.1 and the default settings (Hall 1999). Alignment ambiguous regions, including the amino and carboxyl termini, were excluded prior to phylogenetic analysis yielding alignments of 365 residues for comparisons of vertebrate LIPA; human LIPJ; human, mouse and rat LIPF, LIPK, LIPM and LIPN; mouse and rat LIPO;1 and Drosophila melanogaster LIP3 sequences (Table 1;  Supplementary Table 1). Evolutionary distances were calculated using the Kimura option (Kimura 1983) in TRE-ECON (Van De Peer and de Wachter 1994). Phylogenetic trees were constructed from evolutionary distances using the neighbor-joining method (Saitou and Nei 1987) and were rooted using the Drosophila melanogaster LIP3 sequence. Tree topology was reexamined by the boot-strap method (100 bootstraps were applied) of resampling (Felsenstein 1985).

Results and discussion
Alignments of vertebrate LIPA amino acid sequences The amino acid sequences of derived LIPA subunits are shown in Fig. 1 together with previously reported sequences for human and mouse LIPA (Anderson and Sando 1991;Ameis et al. 1994;Du et al. 1996). Alignments of human LIPA with other predicted vertebrate LIPA sequences showed 64-98% identities, whereas lower levels of identities were observed with human LIPF, LIPJ, LIPK, LIPM and LIPN and with mouse LIPO1 sequences (49-63% identities), and with the Drosophila melanogaster LIP3 sequence (38% identity) (alignments of vertebrate LIPA sequences with human and mouse acid lipase gene families are not shown) ( Table 2). This comparison suggested that the vertebrate subunits identified were all products of a single gene family (LIPA) which is distinct from those previously described for mammalian LIPF, LIPJ, LIPK, LIPM and LIPN gene families (Bodmer et al. 1987;Toulza et al. 2007;Hirata et al. 1999;Jaye et al. 1999;Wion et al. 1987;Martin et al. 1988) and for a new rodent acid lipase gene family, designated as Lipo (Holmes et al. 2010). The predicted amino acid sequences for these vertebrate LIPA subunits were all of similar length (397-404 residues) and shared many (*34%) of identically aligned residues ( Fig. 1; Table 1). In addition, key residues previously described for human gastric acid lipase (LIPF) (Roussel et al. 1999) and for human LIPA (Zschenker et al. 2004) involved in catalysis and maintaining enzyme structure were conserved. Those retained for catalytic function included the active site residues involved with the charge relay system (human LIPA residue numbers used) (Ser174; Asp345; His374); the active site motif (Gly-Xaa-Ser-Yaa-Gly) (residues 172-176); and cysteine residues forming a disulfide bond (Cys248/Cys257) to support the enzyme's structure.
The hydrophobic N-terminus signal peptide function (residues 1-18 for human LIPA), the mannose-6-phosphate containing N-glycosylation site (residues 161-163: Asn-Lys-Thr) and the C-terminal sequence (residues 396-397   Table 1 for sources of LIPA sequences, * identical residues, colan 1 or 2 conservative substitutions, dot 1 or 2 non-conservative substitutions; residues involved in processing at N-terminus (signal peptide), potential N-glycosylation sites including residues NKT (161-163) which serves as a lysosomal targeting sequence, active site residues Ser174, Asp345, and His374 disulfide bond C residues for human LIPA, helix (human LIPA) or predicted helix; Sheet (human LIPA) or predicted sheet, possible basic amino acid ''patch'' for lysosomal targeting, bold underlined font shows known or predicted exon junctions Arg-Lys for human LIPA), which may contribute to the lysosomal targeting of LIPA (Sleat et al. 2006), have been retained or underwent conservative substitution(s) for all vertebrate LIPA sequences examined (with the exception of the chicken LIPA C-terminal sequence) (residues 399-400 Ile-Lys) (Fig. 1). Two of the other high probability N-glycosylation sites for human LIPA (Asn36-Val37-Ser38 and Asn273-274Met-275Ser) were retained for all of the vertebrate LIPA sequences examined, while another was conserved for some vertebrate LIPA sequences (Asn72-His73-Ser74) ( Fig. 1; Table 3). There were species differences observed for the theoretical isoelectric points (pI) of the vertebrate LIPA subunits, with predicted higher values (pI values [8) for mouse and chicken LIPA ( Table 1).

Alignments of vertebrate CH25H amino acid sequences
Amino acid sequence alignments of derived CH25H subunits are shown in Fig. 2 together with previously reported sequences for human and mouse CH25H (Lund et al. 1998;Zhao et al. 2005). Most of the vertebrate CH25H sequences  Table 1 for sources of CH25H sequences. * identical residues; colon 1 or 2 conservative substitutions, dot 1 or 2 non-conservative substitutions, histidine residues active site boxes 1, 2 and 3, predicted helix, predicted sheet, predicted transmembrane regions, bold underlined font shows known or predicted exon junctions (single exon CH25H genes observed in each case) were 270-274 amino acid residues in length, with the exception of mouse and rat CH25H which exhibited extended C-termini, and contained 298 residues. Three histidine boxes reported for human CH25H (Lund et al. 1998) have been conserved for all vertebrate CH25H sequences examined, including box 1 (Trp-His-Leu/Val-Leu-Val-His-His) for residues 142-148; box 2 (Phe/Ile-His-Lys-Val/Met/Leu-His-His) for residues 157-162; and box 3 (His-His-Asp-Leu/Met-His-His) for residues 238-244 (Fig. 2). These have been previously shown to be essential for CH25H catalytic activity and bind the iron atoms which assist in the hydroxylation reaction (Fox et al. 1994). Predicted transmembrane structures for vertebrate CH25H are also shown (Fig. 2), for which three such regions were predominantly retained for the sequences examined. Figure 3 examines in more detail the predicted positioning of the three transmembrane domains within the human CH25H sequence which suggest that the N-terminus commences outside the endoplasmic reticulum, and that the three active site histidine boxes are localized inside the membrane of the endoplasmic reticulum, where CH25H catalysis is likely to take place.
Comparative vertebrate LIPA and CH25H genomics The AceView web browser defines the human LIPA gene by 1443 GenBank accessions from cDNA clones derived from spleen, brain, liver and many other tissues and reports a high expression level (*4.9 times the average human gene) (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/) (Thierry-Mieg and Thierry-Mieg 2006). Human LIPA transcripts included 22 alternatively spliced variants, which differed by truncations of the 5 0 or 3 0 ends, the presence or absence of 10 cassette exons, or had overlapping exons with Fig. 4 Gene structures and tandem locations for the human CH25H and LIPA genes on chromosome 10 derived from the AceView website http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/, (Reimenschneider et al. 2004); isoform variant LIPAb and CH25H mRNAs are shown with capped 5 0 -and validated 3 0 -ends for the predicted sequences, predicted exon regions are shaded, note that CH25H is predicted as a single exon gene, 5 0 UTR and 3 0 UTR refer to untranslated 5 0 and 3 0 regions, respectively, predicted transcription factor binding sites are shown. NKX25 homeobox protein 2.5, RP58 transcriptional repressor RP58, ROAZ zinc finger protein 423, TAXCREB, CREBP1 and CREBP1C cyclic-AMP responsive elementbinding proteins, PPARG peroxisome proliferator-activated receptor gamma, HNF4 hepatocyte nuclear factor 4-alpha, COMP1 muscle specific transcription enhancer, HNF3B hepatocyte nuclear factor 3-beta, GFI1 zinc finger protein GFI1, RORA2 alpha orphan nuclear receptor, EVI1 zinc finger protein EVI1, FREAC4 forkhead box protein, STAT3 identified in the promoters of acute-phase genes, HEN1 helix-loop-helix protein 1, and OCT1 transcription factor that binds to the octomer motif, predicted locations for CpG islands (CPG45; CPG33) are shown by shaded triangles Fig. 3 Predicted locations for transmembrane regions for human CH25H. The graph shows probability (0-1 on y axis) of transmembrane regions (TrM1, TrM2 and TrM3 shown in red) for the human CH25H amino acid sequence (on x axis). Predicted outside membrane CH25H residues are shown in red; predicted inside membrane CH25H residues are shown in blue, predicted positioning of the three histidine active site boxes are shown as H..HH or HH..HH and are localized inside the membrane different boundaries. Of these, five encoded complete proteins, including isoform LIPAb (RefSeq NM_00235) shown in Fig. 4. The predicted 38.47 kb sequence contained ten premessenger exons and nine coding exons as well as several transcription factor binding sites (TFBS) and a CpG island (designated as CpG45) within the 5 0 -untranslated region for the human LIPA gene (Fig. 4). Figure 1 compares the locations of the intron-exon boundaries for the vertebrate LIPA gene products examined. Exon 1 corresponded to the encoded signal peptide in each case, and exon 4 encoded the lysosomal targeting sequence (for human LIPA residues 161-163 Asn-Lys-Thr) (Sleat et al. 2006). There is identity or near identity for the intron-exon boundaries for each of the vertebrate LIPA genes suggesting conservation of these exons during vertebrate evolution.
In contrast to human LIPA, the human CH25H gene is defined by only 29 GenBank accessions for the AceView web browser from cDNA clones derived from 14 tissues including pancreas, brain and lung and showed a reduced expression level (*25% of the average human gene) (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/) (Thierry-Mieg and Thierry-Mieg 2006). Moreover, a single human CH25H transcript was recorded covering 1.7 kb of sequence which was intronless and contained a large 5 0 untranslated sequence proximally located near the 3 0 region of the LIPA gene (Fig. 4), which is consistent with a previous report (Lund et al. 1998). The human CH25H genome sequence contained several predicted TFBS sites and a CpG island (CpG33) located in the intragenic region (*7.5 kb) separating the human CH25H and LIPA genes on chromosome 10. Of particular significance were the CREB (cyclic-AMP response element-binding) binding sites, which may play a role in driving expression from the CH25H promoter (Watters and Nourse 2009). The close proximal location of these genes was also observed for all other mammalian genomes examined (\20 kb) (Table 1), while chicken (Gallus gallus) and frog (Xenopus tropicalis) LIPA and CH25H genes were more distantly located (*160 kb). CpG islands were observed in the human LIPA-CH25H intragenic region and in the 5 0 -untranslated LIPA region which Fig. 5 Comparison of predicted three-dimensional structures for human, mouse and chicken LIPA subunits with the known structure for dog LIPF (from Roussel et al. 2002). Predicted 3D structures were obtained using the SWISS MODEL (http://swissmodel.expasy.org/ workspace/index.php) web site and the predicted amino acid sequences for vertebrate LIPA subunits (see Table 1). The rainbow color code describes the 3D structures from the N-(blue) to C-termini (red color). The structures are based on the known 3D structures for dog LIPF (from Roussel et al. 2002) (with a modeling range of residues 24-395 for human, mouse and chicken LIPA) may reflect roles for these CpG islands in up-regulating gene expression (Saxonov et al. 2006), given their colocation with the LIPA and CH25H promoters.
Secondary and tertiary structures for vertebrate LIPA sequences Figure 1 shows the secondary structures predicted for vertebrate LIPA sequences. Similar a-helix b-sheet structures were observed for all of the vertebrate LIPA subunits examined, particularly near key residues or functional domains, including the a-helix within the N-terminal signal peptide, the b-sheet and a-helix structures surrounding the active site Ser174 (for human LIPA), the a-helix enclosing the lysosomal targeting signal residues (Asn-Lys-Thr residues 161-163 for human LIPA) and the C-terminal a-helix containing the basic amino acid residue 'patch' (residues 396-397 Arg-Lys), which may contribute to LIPA lysosomal microlocalization (Sleat et al. 2006). Predicted LIPA secondary structures, however, may not fully reflect structures in vivo and serve only as a guide to the comparative structures for vertebrate LIPA subunits. The predicted tertiary structures for human, mouse, cow and chicken LIPA were sufficiently similar to the previously reported dog LIPF (gastric acid lipase) structure (Roussel et al. 2002) ( Fig. 5) but were based on incomplete sequences for human, mouse and cow LIPA (residues 24-395 for human LIPA). These results suggested that the major structural features for human LIPA recently reported (Roussel et al. 1999) resemble those for other vertebrate LIPA proteins, as well as for the dog gastric LIPF structure.
Phylogeny of vertebrate LIPA and other human acid lipase genes and proteins Phylogenetic trees (Fig. 6) were constructed from alignments of vertebrate LIPA-like amino acid sequences with human LIPJ, human; mouse and rat LIPF, LIPJ, LIPK, LIPM and LIPN; and mouse and rat LIPO1 sequences (for further details see Supplementary Table 1; and Holmes et al. 2010). The dendrogram was rooted using a Drosophila melanogaster LIP3 sequence (Pistillo et al. 1998) and showed clustering of all of the LIPA-like sequences which were distinct from the other human and mouse acid lipase gene families. The results were consistent with these acid lipase genes being products of gene duplication events prior to vertebrate evolution, particularly for the LIPA gene family, which is of apparent ancient origin of more than 500 million years ago (Donoghue and Benton 2007). Table 2 summarizes the percentages of identity for these enzymes and shows that vertebrate LIPA sequences are C64% identical which is in comparison with the 44-63% identities observed comparing sequence identities between acid lipase families. In addition, more closely related species showed higher levels of sequence identity for LIPA, such as the primate species (human and rhesus monkey) which were 98% identical, as compared with the bird (chicken) and human LIPA sequences, with 72% identical sequences.

Conclusions
Based on this report, we propose that an acid lipase primordial gene predated the appearance of vertebrates and underwent successive gene duplication events generating at least seven acid lipase gene families, namely LIPA (encoding lysosomal lipase), LIPF (encoding gastric lipase) and five other gene families (LIPJ, LIPK, LIPM, LIPN and LIPO), which have been retained as separate vertebrate gene families for more than 500 million years. In addition, it is likely that the LIPA gene family has been conserved throughout vertebrate evolution to serve a major role as an acid lysosomal lipase, given the conservation of key residues and lysosomal targeting sequences for vertebrate LIPA proteins.