Journal of Molecular Evolution

, Volume 72, Issue 1, pp 104–118

Characterization of Maltase Clusters in the Genus Drosophila

Authors

    • Institute of Molecular BiologySlovak Academy of Sciences
  • Štefan Janeček
    • Institute of Molecular BiologySlovak Academy of Sciences
Article

DOI: 10.1007/s00239-010-9406-3

Cite this article as:
Gabriško, M. & Janeček, Š. J Mol Evol (2011) 72: 104. doi:10.1007/s00239-010-9406-3

Abstract

To reveal evolutionary history of maltase gene family in the genus Drosophila, we undertook a bioinformatics study of maltase genes from available genomes of 12 Drosophila species. Molecular evolution of a closely related glycoside hydrolase, the α-amylase, in Drosophila has been extensively studied for a long time. The α-amylases were even used as a model of evolution of multigene families. On the other hand, maltase, i.e., the α-glucosidase, got only scarce attention. In this study, we, therefore, investigated spatial organization of the maltase genes in Drosophila genomes, compared the amino acid sequences of the encoded enzymes and analyzed the intron/exon composition of orthologous genes. We found that the Drosophila maltases are more numerous than previously thought (ten instead of three genes) and are localized in two clusters on two chromosomes (2L and 2R). To elucidate the approximate time line of evolution of the clusters, we estimated the order and dated duplication of all the 10 genes. Both clusters are the result of ancient series of subsequent duplication events, which took place from 352 to 61 million years ago, i.e., well before speciation to extant Drosophila species. Also observed was a remarkable intron/exon composition diversity of particular maltase genes of these clusters, probably a result of independent intron loss after duplication of intron-rich gene ancestor, which emerged well before speciation in a common ancestor of all extant Drosophila species.

Keywords

Molecular evolutionMaltaseAlpha-amylase familyGene clusterDrosophilaIntron/exon composition

Abbreviations

CAZy

Carbohydrate-Active enZymes

CSR

Conserved sequence region

EST

Expressed sequence tags

GH

Glycoside hydrolase

kb

Kilo base

LRT

Likelihood-ratio test

ML

Maximum likelihood

MP

Maximum parsimony

MYA

Million years ago

NJ

Neighbor-joining

S.E.

Standard error

Introduction

Maltases are α-glucosidases (EC 3.2.1.20), which catalyze the hydrolysis of α-1,4-glucosidic linkages of maltose with release of α-d-glucose (Chiba 1997). According to the Carbohydrate-Active enZymes classification system (CAZy) (Cantarel et al. 2009), the α-glucosidases are found in four families of glycoside hydrolases (GHs): GH4, GH13, GH31, and GH97. Crystal structures are resolved for α-glucosidases from all the four families. They are the GH4 AglA from thermophilic bacterium Thermotoga maritima (Lodge et al. 2003), GH13 α-glucosidase from thermophilic deep-sea bacterium Geobacillus sp. strain HTA-462 (Shirai et al. 2008), GH31 MalA from acidophilic and thermophilic archaeon Sulfolobus solfataricus (Ernst et al. 2006), and GH97 SusB from human intestinal bacterium Bacteroides thetaiotaomicron (Gloster et al. 2008; Kitamura et al. 2008). α-Glucosidases from families GH13, GH31, and GH97 share a (β/α)8-barrel fold of their catalytic domain, and a remote homology between the enzymes from the families GH13 and GH31 was revealed (Rigden 2002; Janecek et al. 2007). In contrast, α-glucosidases from the family GH4 show structural similarity to NAD-dependent dehydrogenases (2-hydroxyacid dehydrogenases), with typical Rossman fold of their NAD+-binding site (Lodge et al. 2003).

From taxonomical point of view, the most widespread are α-glucosidases from family GH31. They are found in all the three domains: in Archaea, Bacteria, and Eukarya from protists through fungi and plants to metazoans. Those from GH13 originate from bacteria, and in eukaryotes are limited to fungi and insect. The α-glucosidases from families GH4 and GH97 are solely of bacterial origin (Cantarel et al. 2009).

Insect α-glucosidases are found exclusively in family GH13. The GH13, also known as the α-amylase family (MacGregor et al. 2001), together with GH70 and GH77 forms the clan GH-H (Cantarel et al. 2009). Enzymes classified in the clan GH-H have to share following features: (i) catalytic domain (designated as domain A) is formed by the (β/α)8-barrel fold (i.e., TIM-barrel), from which a small distinct domain (called domain B) protrudes between the strand β3 and the helix α3; (ii) catalytic machinery consists of an aspartate (catalytic nucleophile) located at the β4-strand, a glutamate (proton donor) at the β5-strand, and another aspartate (transition-state stabilizer) positioned at the β7-strand; (iii) reaction mechanism is retains; and (iv) amino acid sequences contain from four up to seven conserved sequence regions (CSRs) located mainly at the β-strands of the catalytic domain A (Matsuura et al. 1984; Janecek et al. 1997; Kuriki and Imanaka 1999; MacGregor et al. 2001; Janecek 2002).

Family GH13 belongs to the largest GH families containing almost 30 different enzyme specificities. It was revealed that different specificities could be grouped together into subfamilies because of their sequence similarities (Janecek 1995, 2002). Based on a specific sequence of the fifth CSR, two GH13 subfamilies were established (Oslancova and Janecek 2002): the oligo-1,6-glucosidase and neopullulanase subfamilies, represented by oligo-1,6-glucosidase from Bacillus cereus with 167_QPDLN as CSR V and neopullulanase from Bacillus stearothermophilus with 295_MPKLN, respectively. According to this classification, α-glucosidases belong to oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002). At present, the family GH13 is divided into 36 CAZy-curator-established subfamiles (Stam et al. 2006), the insect α-glucosidases being grouped in the subfamily GH13_17 (Cantarel et al. 2009).

Although there is a large amount of putative α-glucosidases predicted by sequence comparison, only a few of them were biochemically characterized. Three α-glucosidase izoenzymes I, II, and III from European honeybee Apis mellifera, encoded by the genes hbg1, hbg2, and hbg3, respectively, expressed in different organs (Huber and Thompson 1973; Takewaki et al. 1980, 1993; Kimura et al. 1990; Nishimoto et al. 2001; Kubota et al. 2004), were revealed to differ in their substrate specificities and pH optima. A putative α-glucosidase (malI) with resemblance to yeast maltase was identified in the salivary glands of adult Aedes aegypti (James et al. 1989), and two probable maltase genes (agm1 and agm2) were found to be expressed in midgut of Anopheles gambiae (Zheng et al. 1995).

In Drosophila melanogaster, a small gene cluster was identified and sequenced (Snyder and Davidson 1983). It is 8 kb of DNA long and localized in chromosomal region 44D of chromosome 2R (right arm of chromosome 2), 11 kb away of larval cuticle gene cluster. It consists of three coordinately expressed genes designated as lvpH, lvpD, and lvpL (larval visceral protein H, D and L). Owing to their high mutual amino acid sequence similarity (48–53%), this cluster was proposed to arise by gene duplication, but because the intron/exon composition between the genes is not conserved, the duplication was probably an ancient event (Snyder and Davidson 1983). Function of these three genes was initially unknown (unrelated to nearby genes from the larval cuticle cluster), but later they were assigned to maltases based on their amino acid sequence similarity to that of maltase from yeast Saccharomyces carlsbergensis (Henikoff and Wallace 1988). To find out whether spatial organization of maltase cluster remained conserved in distantly related Drosophila species, the assumed homologous maltase cluster in Drosophila virilis was investigated (Vieira et al. 1997). It was found to consist of two genes (mav1 and mav2), oriented in the same direction and situated together with larval cuticle cluster on chromosome 4. Interestingly, the chromosome 4 is not homologous to the chromosome 2R of D. melanogaster,but it is the homologue of chromosome 2L. This observation would violate the principle that genes do not migrate between chromosomal elements during Drosophila evolution even if they can change position within one element (Hartl and Lozovskaya 1994). D. virilis possesses two instead of three maltase genes found in D. melanogaster and they are transcribed in the same direction, in contrast to the opposite direction of transcription of D. melanogaster lvpH, lvpD and lvpL genes. Moreover, none of intron positions is shared between the mav1 or mav2 and any of the lvpHDL genes. These great significant differences between the two clusters led to reasonable conclusion of their independent origin (Vieira et al. 1997). Similarly oriented studies that focused on the evolutionary history of related α-amylase genes (classified in the same family GH13) were also done with various Drosophila species (Da Lage et al. 1996, 2000; Zhang et al. 2003a). One of the most striking findings was the observation of an enzymatically inactive remote paralogous gene named Amyrel (Da Lage et al. 1998; Maczkowiak and Da Lage 2006).

In this study, we have used 12 available completely sequenced Drosophila genomes in a bioinformatics study with the aim of understanding better the evolutionary history of the α-glucosidases between closely related species. To achieve this aim, we investigated spatial organization of two maltase clusters in 12 Drosophila species, compared amino acid sequences encoded by these genes and analyzed intron/exon compositions of orthologous genes of these clusters. In an effort to place the origin and changes of the maltase clusters on an evolutionary time line of the genus Drosophila, estimations of an approximate time and order of particular gene duplications were also attempted.

Materials and Methods

Identification of Orthologues

Amino acid sequences were collected using protein BLAST (Altschul et al. 1990) against the default non-redundant database. Characterized α-glucosidase I from Apis mellifera (Huber and Thompson 1973; Kimura et al. 1990) (GenPept accession number: NP_001035326; GeneID: 409889, hbg1) was used as a query and search was limited to the genus Drosophila.

From obtained GenBank (Benson et al. 2009) results, sequences that possessed sequence features characteristic for enzymes from oligo-1,6-glucosidase subfamily of family GH13 (all the seven CSRs and conserved catalytic and substrate binding residues) were selected. To ensure that complete set of maltase genes, for each species that was acquired, only amino acid sequences from 12 completely sequenced Drosophila genomes were used for further analysis. The genes that possessed the identical intron/exon arrangement in different Drosophila species were proposed to be orthologous (Table S1).

For comparison, the two additional honeybee maltases genes, hbg1 and hbg2, as well as two maltase genes, agm1 and agm2, from A. gambiae were included (Table 1).
Table 1

Representative maltases used in the present study

Gene

Organism

Length

GenBank

hbg1

Apis mellifera

588

BAE86926

hbg2

Apis mellifera

580

BAE86927

hbg3

Apis mellifera

567

BAA11466

agm1

Anopheles gambiae

497

EAA00181

agm2

Anopheles gambiae

599

EAA00179

mal_A1

Drosophila melanogaster

577

AAF59089

mal_A1

Drosophila virilis

577

EDW6084

mal_A2

Drosophila melanogaster

567

AAF59088

mal_A2

Drosophila virilis

566

EDW60851

mal_A3

Drosophila melanogaster

574

AAM50308

mal_A3

Drosophila virilis

575

EDW60852

mal_A4

Drosophila melanogaster

579

ABY20547

mal_A4

Drosophila virilis

578

EDW60853

mal_A5

Drosophila melanogaster

630

AAF59085

mal_A5

Drosophila virilis

636

EDW60854

mal_A6

Drosophila melanogaster

601

AAS64893

mal_A6

Drosophila virilis

602

EDW60855

mal_A7

Drosophila melanogaster

599

AAF59084

mal_A7

Drosophila virilis

588

EDW60856

mal_A8

Drosophila melanogaster

588

AAF59083

mal_A8

Drosophila virilis

591

EDW60857

mal_B1

Drosophila melanogaster

584

AAF53127

mal_B1

Drosophila virilis

632

EDW64448

mal_B2

Drosophila melanogaster

583

AAF53128

mal_B2

Drosophila virilis

594

EDW64447

Length concerns the amino acid sequence and GenBank means the accession number from the database. For all the details on all 113 studied maltases, see the Supplementary Table S1

Sequence Alignments

All alignments were done using amino acid sequences with the program ClustalX (Jeanmougin et al. 1998) and manually tuned with regard to known CSRs known from the literature (Janecek et al. 1997; MacGregor et al. 2001; Janecek 2002; Oslancova and Janecek 2002).

Evolutionary Analyses

Neighbor-joining (NJ) (Saitou and Nei 1987) and maximum parsimony (MP) (Eck and Dayhoff 1966) phylogenetic trees were calculated by MEGA 4.1 package (Tamura et al. 2007). For NJ method, Jones–Taylor–Thornton model (Jones et al. 1992) of amino acid change was used, and unequal rates among sites were assumed. Gamma-shaped parameter (α) was estimated by the program PhyML 3.0 (Guindon and Gascuel 2003). For MP method, close-neighbor-interchange algorithm with default parameters was used for tree search. Reliability of tree topologies was evaluated using bootstrap test (Felsenstein 1985) with 1,000 replications. Maximum likelihood (ML) (Felsenstein 1981) tree was calculated with PhyML 3.0 algorithm (Guindon and Gascuel 2003) using default LG model (Le and Gascuel 2008) of sequence evolution, with unequal rates among sites. Eight substitution rate categories were applied. The gamma-shaped parameter (α) as well as proportion of invariable sites was estimated by program PhyML 3.0 itself (Guindon and Gascuel 2003). Because this method is too computationally demanding, bootstrap test was limited to 100 replications.

Intron/Exon Arrangement

The intron/exon composition of a gene was estimated by comparison between the amino acid sequence of the respective protein obtained from GenBank (Benson et al. 2009) and its nucleotide sequence obtained from the Flybase (Tweedie et al. 2009), using the program GeneWise (Birney et al. 2004).

Molecular Clock

To test the hypothesis of molecular clock, likelihood-ratio test (LRT) was used to compare the log likelihoods of phylogenies first with the assumption of a constant rate of evolution between branches (with molecular clock) with those between branches without this assumption. Time of divergence between particular genes was estimated from the ML tree. The tree was calibrated using 39 million years ago (MYA) as an assumed time of divergence between the subgenera Sophophora and Drosophila (Russo et al. 1995). As a second calibration point, we used 3 MYA as divergence time for D. melanogaster and the D. simulans complex (Throckmorton 1975).

To account for different evolutionary rates between branches, the local rate minimum deformation method, as implemented in the TreeFinder program (Jobb et al. 2004), was used. The same program was used to calculate the standard errors of the divergence time estimates using the bootstrap procedure under fixed topology of the input tree. Evolutionary history of maltase clusters was inferred using DILTAG algorithm (Lajoie et al. 2010).

Results and Discussion

Description of the Maltase Clusters

In a previous study (Snyder and Davidson 1983), a maltase cluster composed of three genes was identified in Drosophila melanogaster. These genes were sequenced and designated as lvpH, lvpD, and lvpL. They are localized next to each other on the chromosome 2 right arm near larval cuticle cluster. Between larval cuticle and cytochrome gene clusters on one side and gene called mangetout (mtt) coding for orphan DmxR-G protein-coupled receptor on the other side (Mitri et al. 2004, 2009), there is a region spanning 23 kb (from 4337166 to 4361466 of chromosome 2R, cytological location 44D1). D. melanogaster sequencing project (Adams et al. 2000) reported eight genes in this location. First three genes are the maltases lvpH, lvpD, and lvpL, whereas five other ones have been uncharacterized and annotated as similar to insect α-glucosidases. Based on their amino acid sequence similarity, presence of typical GH13 CSRs and catalytic residues we propose all eight genes code for α-glucosidases from the oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002), i.e., the CAZy subfamily GH13_17 (Stam et al. 2006). Because they are presumably of the same enzyme specificity, they share high sequence similarity (45–71% identities on amino acid sequence level) and are spatially localized close to each other, it is probable that these genes are part of a cluster, which arose by multiple subsequent duplications. Therefore, we designate them as maltases from cluster A (mal_A1, mal_A2, mal_A3, mal_A4, mal_A5, mal_A6, mal_A7, and mal_A8). Mal_A1, mal_A3, and mal_A7 are transcribed in the same direction, while mal_A2, mal_A4, mal_A5, mal_A6, and mal_A8 are transcribed in the opposite direction (Fig. 1). Except for this large cluster, we identified another one, on the chromosome 2, the left arm (cytological location 33A3–33A4). It consists of two genes with high sequence similarity (65% amino acid identity), transcribed in the same direction (Fig. 1). Both of them show sequence similarity to genes from the maltase cluster A (i.e., they should also code for the GH13_17 α-glucosidases), but since they are situated on different chromosomes, we designate them as genes from maltase cluster B (mal_B1 and mal_B2). According to DroSpeGe database (Gilbert 2007), cluster A is situated in syntenic region on the Muller element C and cluster B on the Muller element B in all 12 Drosophila species (in D. ananassae A: scaffold 13266, B: on scaffold 12943; in D.erecta A: s.4929, B: s.4929; in D. grimshawi A: s.15245, B: s.15126; in D. melanogaster A: 2R, B: 2L; in D. mojavensis A: s.6496, B: 6500; in D. persimilis A: s.2, B: s.8; in D. pseudoobscura A: s.3, B: 4_group3; in D. sechellia A: s.1, B: s.16; D. simulans A: 2R, B: 2L; in D. virilis A: s.12875, B: s.12963; in D. willistoni A: s.180700, B: s.180708, and in D. yakuba A: 2L, B: 2L). All putative orthologues were localized, based on the genomic maps provided by the Flybase (Tweedie et al. 2009) in the same position in the gene cluster on the chromosome in all 12 Drosophila species (e.g., mal_A2 is always between mal_A1 and mal_A3 in all the studied species, etc.).
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-010-9406-3/MediaObjects/239_2010_9406_Fig1_HTML.gif
Fig. 1

Scheme of spatial organization of genes of maltase clusters in the genus Drosophila. Particular genes are shown as arrows; direction of transcription is shown by orientation of an arrow. Two top lines represent a general organization of maltase clusters, as found in Drosophila ananassae, Drosophila erecta, Drosophila melanogaster, Drosophila mojavensis, Drosophila persimilis, Drosophila virilis, Drosophila willistoni, and Drosophila yakuba. Distances among maltase genes (in nucleotides) are shown above. The other lines depict potential deviations from this organization in four Drosophila species. Broken arrows stand for fragmentary or otherwise distorted genes. Duplicated genes are highlighted in gray

All the 10 above-mentioned genes from both clusters possess the sequence features characteristic of the GH13_7 α-glucosidases (Oslancova and Janecek 2002; Stam et al. 2006), and none of them contains large gaps, inserts and/or termination codons in their sequences. Although 3′ and 5′ flanking regions were not investigated, we suggest that all the studied maltases are functional, transcribed genes, and none of them is a pseudogene. To support this assumption, we searched GenPept (Benson et al. 2009) and Flybase (Tweedie et al. 2009) for EST (expressed sequence tags) for particular maltase genes. We found ESTs covering whole sequence for all 10 genes from D. melanogaster. In addition, from D. melanogaster RNA-Seq developmental profile and cell line expression data available in Flybase (Tweedie et al. 2009), it can be seen that most genes are transcribed from embryonic (not earlier than after 4–6 h) to adult stage. Of note is gene mal_A5, which is most heavily transcribed in the first 6 h of developing embryo, and gene mal_A7 which is in contrast transcribed only in the adult stage. Based on mRNA-Seq data of head tissue from D. pseudoobscura stored in flybase (Tweedie et al. 2009), four genes: mal_A1, mal_A5, mal_A6, and mal_B2 are transcribed in this species in a head region possibly in salivary glands.

We also investigated homologous maltase clusters A and B in eleven additional species from genus Drosophila to learn whether spatial organization remains conserved during evolution of this taxonomic group. We found out that in seven species (Drosophila ananassae, Drosophila erecta, Drosophila mojavensis, Drosophila persimilis, Drosophila virilis, Drosophila willistoni, and Drosophila yakuba), the organization of both clusters is exactly the same as it is in D. melanogaster.

In remaining four species (Drosophila grimshawi, Drosophila pseudoobscura, Drosophila sechellia, and Drosophila simulans), there are some minor differences in maltase cluster A organization (Fig. 1). While D. grimshawi seems to be lacking mal_A8, in D. pseudoobscura, mal_A7 is divided into two records in GenPept (XP_002138232, GI:6898154, and XP_002138231, GI:6898153), and the same applies for the mal_A6 from D. sechellia (split into XP_002032930, GI:6608185, and XP_002032931, GI:6608186). In D. simulans, between complete mal_A2 and mal_A3, there are additional fragments of mal_A3 (XP_002080592, GI:6733537) and mal_A2 (XP_002080593, GI:6733538). D. simulans also has two copies of mal_A5 gene (XP_002080596, GI:6733541; and XP_002080597, GI:6733542) and contains only a fragment of mal_A7 (XP_002080600, GI:6733545) between mal_A6 and mal_A8. In the D. sechellia GenPept record XP_002041985 (GI: 6617669), there are two sequences of mal_B1 and fragment of mal_B2 mixed together (not shown). Using bioinformatics approaches alone, it is not possible to distinguish whether these observed differences reflect reality or these are the results of sequencing errors. Nevertheless, the above mentioned fragments are too similar in their amino acid sequences to their orthologues in other species to be under neutral evolution, and so we consider them more likely to be the artifacts of sequencing process. Moreover, genomes of D.persimilis, D. sechellia, and D. simulans were sequenced to lower coverage (4×) than the other Drosophila genomes were (more than 8×), and therefore more artifacts were expected in these assemblies.

Sequence Comparison

We have done multiple alignment of 113 maltases sequences (Table S1). To construct this alignment, we used 108 amino acid sequences from maltase clusters from 12 Drosophila species and for comparison with biochemically characterized insect maltases, we completed it with three α-glucosidase izoenzymes (hbg1, hbg2 and hbg3) from A. mellifera (Huber and Thompson 1973; Takewaki et al. 1980, 1993; Kimura et al. 1990; Nishimoto et al. 2001; Kubota et al. 2004) and two maltases (agm1 and agm2) from A. gambiae (Zheng et al. 1995). Global multiple sequence alignment covers all the three family-GH13 domains A, B, and C (Fig. 2). The alignment clearly exhibits higher similarity between orthologues from different species, than between any two paralogues from one species (data not shown).
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-010-9406-3/MediaObjects/239_2010_9406_Fig2a_HTML.gif
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-010-9406-3/MediaObjects/239_2010_9406_Fig2b_HTML.gif
Fig. 2

Amino acid sequence alignment of 10 maltases from D. melanogaster (Drome) and D. virilis (Drovi), encoded by mal_A1–A8 and mal_B1–B2, with maltases from A. mellifera (Apime) and A. gambiae (Anoga), encoded by hbg1,2,3 and amg1,2, respectively (cf. Table 1). Positions of CSRs (I–VII) and introns (1–9 and N1–N3) are marked above the alignment blocks. Amino acid identities are in gray and catalytic triad is highlighted in black-and-white inversion

Enzymes from the α-amylase family GH13 (MacGregor et al. 2001) possess four invariantly conserved residues (Janecek 2002): Asp206 (catalytic nucleophile), Glu230 (proton donor), Asp297 (transition-state stabilizer), and Arg204 (numbering as in the Taka-amylase A from Aspergillus oryzae). They form the basis of the four best-known GH13 CSRs (Nakajima et al. 1986; MacGregor et al. 2001) situated at strands β3 (CSR I), β4 (CSR II), β5 (CSR III) and β7 (CSR IV) of the catalytic (β/α)8-barrel domain (domain A). Three additional CSRs have also been proposed (Janecek 1992, 1994a, b, 1995, 2002): CSR V situated near the C-terminus of domain B, and CSRs VI and VII at strands β2 and β8, respectively.

Domain A (in hbg1-encoded maltase it covers positions 1–122 and 203–489) begins with region covering strand β1 and helix α1, well conserved in mosquito, honeybee, as well as in all Drosophila maltase genes (Fig. 2). Of note is the presence of histidine at the C-terminal end of α1 (in hbg1: 55_KLXH) in all the three honeybee maltases, instead of otherwise conserved tyrosine (KLXY). In the following CSR VI, covering the strand β2 (hbg1: 63_GITAIWLSP), starting glycine and the C-terminal part (WLSP) is identical in all the sequences except from mal_B1 from D. persimilis and D. pseudoobscura, where leucine is changed to methionine. Conserved tryptophan is a sequence feature typical for enzymes from the GH13 oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002).

The C-terminal part of the CSR I situated at the strand β3 is also conserved invariantly. Except for hbg1 (111_NLKVILDLVPNH) where leucine (Leu118) succeeds the aspartate, phenylalanine (DFVPNH) is conserved in all other sequences (Fig. 2). Leucine (a hydrophobic residue in general) is a hallmark of the oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002).

Domain B (positions 123–202) is a small domain that protrudes out from the (β/α)8-barrel between strand β3 and helix α3. This typical feature of the enzymes from the α-amylase family was also found to be conserved in a protein that lost its enzymatic function and serves now as an auxiliary ectodomain of amino acid transporter—rBAT (Janecek et al. 1997; Gabrisko and Janecek 2009). In all studied Drosophila maltases domain B is approximately of the same length (differences concern 1–2 amino acid residues); with sequence conserved mostly within the β-strands. The CSR V (loop3 near the C-terminus of the domain B) has proline in the second position in most sequences (hbg1: 198_QPDLN), but in mal_ A7 (expect from D. grimshawi and D. willistoni) and in mal_A1 from D. erecta, D. melanogaster, D. sechellia, D. simulans, and D. yakuba proline is substituted with alanine (QADLN). Of note is the substitution of leucine to phenylalanine (from QPDLN to QPDFN) in the mal_A4 sequences (Fig. 2). Based on a specific sequence of this conserved region (CSR V), the GH13 oligo-1,6-glucosidase subfamily was established (Oslancova and Janecek 2002) with dominating sequence QPDLN in this stretch.

Catalytic aspartate (i.e., the catalytic nucleophile Asp230) present in the CSR II (hbg1: 223_GIDGFRIDAVPH) positioned at the strand β4 is present in all the studied proteins. Phenylalanine from stretch of three residues invariantly conserved in all sequences (GFR) is found in most enzymes from the oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002). Interestingly, isoleucine preceding the catalytic aspartate is substituted in mal_A7 and mal_A8 sequences with cysteine and in mal_A6 (except from representatives from subgenus Drosophila and in D. willistoni) with methionine (Fig. 2). Instead of proline, aromatic tyrosine is present in mal_A4, asparagine in hbg2 and mal_B2, and isoleucine in mal_B1 (except for D. grimshawi and D. melanogaster, where proline is changed to asparagine). Succeeding the CSR II, two potentially important residues (phenylalanine or tyrosine and glutamate) are present in all the sequences (Fig. 2).

A large amount of diversity between sequences is seen between the strands β4 and β5 (Fig. 2). The least conserved segment is considerably longer in mal_A1 sequences (possible insert of about 10 residues) and shorter in mal_B1, mal_B2 (presumed ~6-residue deletion), and in mal_A2 (deletion of 1 residue) when compared with remaining sequences.

In the CSR III (covering the β5 strand), the invariantly conserved catalytic glutamate (the proton donor) (hbg1: 299_EAY) is one position next, followed by an aromatic residue (tyrosine is in all the sequences except from mal_A4 and agm2, where it is changed to tryptophan). Between the conserved glutamate and the aromatic residue, there is an alanine in mal_A1, mal_A2, mal_A3, mal_B2, agm1,2, hbg1,2,3, and glycine in mal_B1, whereas all the remaining sequences contain threonine in this position (Fig. 2). About ten residues further, there are two aromatic residues (two tyrosines, or tyrosine and tryptophan) conserved in all the sequences except from agm1 (where the first aromatic residue is changed to leucine). At the strand β6, there is a stretch of four residues (hbg1: 318_PFNF) invariantly conserved in all sequences except from mal_A2, where the first phenylalanine is replaced by methionine (Fig. 2). Between this region and invariantly conserved tryptophan succeeded by proline (hbg1: 343_WIKGTP) 4 amino acid residues further, there is an insert of about 5 residues in the mal_A6. It is noted that the proline from WIKGTP is almost universally conserved, except from mal_A2, where it is changed to tryptophan.

CSR IV (hbg1: 352_ VPNWVMGNHD) situated at the strand β7 possess asparagine and tryptophan (NW) as well as stretch GNHD (with catalytic aspartate, Asp361; the transition-state stabilizer) invariantly conserved in all studied sequences (NWXXXGNHD) (Fig. 2). In the following region of 7 amino acid residues, there is a stretch containing two arginines and a C-terminal glycine present in all the sequences (hbg1: 364_RVGTRYPG).

Borders of the CSR VII (hbg1: 385_GVAVTYYGEE) positioned at the strand β8 are formed by invariantly present leucine (substituted in mal_A7 from D. virilis with methionine) and proline succeeded by a conserved glycine from CSR VII from the N-terminal side, and glycine and methionine at the C-terminal side (replaced in mal_A7 from D. persimilis and D. pseudoobscura by alanine) preceded by conserved GEE stretch from CSR VII found in all studied sequences (Fig. 2). Between the CSR VII and a well-conserved loop preceding the helix α8, there is a deletion of more than 10 residues in mal_A2.

Concerning the domain C (positions 490–588), the sequences of orthologues exhibit high mutual similarity, whereas the similarity between any two paralogues is low with no residues conserved invariantly in all the studied sequences (Fig. 2). The eventuality that evolution of this domain has been constrained only by preserving its tertiary structure (i.e., the rough fold) could be one of the reasons responsible for the clear low amino acid similarity of domain C sequences. Or, more interestingly, the domain C could be seen as a module evolving independently of, e.g., the catalytic domain, as it was shown for various domains in the GH13 α-amylase family enzymes (Janecek et al. 1997; Godany et al. 2010), and especially for phylogenies of the so-called starch-binding domains (Janecek et al. 2003; Machovic and Janecek 2006a, b, 2008).

Evolutionary History of the Maltase Clusters

From the alignment of amino acid sequences of 113 maltases (Table S1), we calculated phylogenetic trees using NJ (Saitou and Nei 1987), MP (Eck and Dayhoff 1966), and ML (Felsenstein 1981) methods.

For the ML, we used default LG (Le and Gascuel 2008) model of sequence evolution. To account for rate heterogeneity among sites, we applied a gamma distribution (eight substitution rate categories) with the α-shape parameter 0.61 (SE = 0.03) estimated from the data. For the NJ, the same α-shape parameter 0.61 (SE = 0.03) for gamma distribution was used.

It could be clearly seen from phylogenetic trees obtained with all the three methods (NJ, MP, and ML) that all orthologous genes are always situated together on one branch (Fig. 3). This means that orthologous genes between species are more similar to each other than any paralogous genes that are within one species. This observation also strongly suggests that all the duplications took place before speciation events and, thereafter, the duplicates evolved independently and not in a concerted manner.
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-010-9406-3/MediaObjects/239_2010_9406_Fig3_HTML.gif
Fig. 3

ML phylogram of 10 genes of two maltase clusters from 12 Drosophila species and maltases from A. mellifera (hbg1,2,3) and A. gambiae (amg1,2). Numbers represent the bootstrap support (in %) for a particular node. Scale bar shows evolutionary distance in the units of the number of amino acid substitutions per site

All the methods (NJ, MP, and ML) provide the same general topology of the phylogenetic tree. Hbg genes from honeybee are on one branch separated from Drosophila maltases that are positioned also on their own branch divided further to cluster A and cluster B (Fig. 3). To root the tree, we took the hbg1 gene as an outgroup. Concerning genes from the cluster A, mal_A3, mal_A4, and mal_A5 form one group (with high bootstrap support of more than 97%), where mal_A3 and mal_A4 are more similar to each other than both are to mal_A5 (for ML and MP trees only 47 and 44%, respectively, and for NJ tree 90% bootstrap support). Genes mal_A6, mal_A7 and mal_A8 cluster together, the mal_A7 and mal_A8 being more similar to each other than either is to mal_A6 (high bootstrap values of more than 97% for all the three methods). Mal_A2 groups together with mal_A3, mal_A4, and mal_A5 (Fig. 3), but with lower bootstrap values (86% for ML, 70% for NJ and only 50% for MP). Mal_A1 is positioned on its own branch, separated from other Drosophila maltases, although with lower bootstrap support (86% for ML, 65% for NJ, and only 38% for MP). Interestingly, the NJ, MP, and ML methods do not yield the comparable results for position of agm genes from mosquito. Both the genes are situated near the root of the NJ tree, together with the hbg genes from A. mellifera (not shown), whereas in both MP and ML (Fig. 3) trees, the agm2 is grouped with Drosophila maltases from cluster A, and agm1 even shares the branch with mal_A1 genes. None of these alternative branching patterns is supported with high bootstrap values (below 62% for all the methods). The reason for this ambiguity could be that some duplications of genes from maltase clusters took place before split between lineages leading to Brachycera and Nematocera, maybe as early as in the common ancestor of Diptera. An eventual study of maltase cluster in Nematocera could throw more light on this issue. Nevertheless, it is very probable that duplication of these genes was not initiated before Hymenoptera lineage split from other Endopterygota (Holometabola).

Times of divergence of particular gene duplication were estimated from branch lengths of the ML tree. To account for a different rate of evolution among lineages, we used a local rate minimum deformation method, because the molecular clock hypothesis (considering a constant rate of evolution among lineages) could clearly be rejected (LRT: χ2 = 575.45; df = 111; P = 0).

As a calibration point, we took 39 MYA that is an assumed divergence time between the subgenera Sophophora and Drosophila (Russo et al. 1995). For this, a more recent divergence estimate, based on phylogenetic studies on alcohol dehydrogenase (Jeffs et al. 1994; Russo et al. 1995) was decided. The more ancient calibration point (~60 MYA), based on immunological distance (Beverley and Wilson 1984) and genomic mutation distances data (Tamura et al. 2004), led to unrealistically old divergence time estimate between Hymenoptera and Diptera (more than 500 MYA). The use of the former younger calibration point resulted in more reasonable estimate of divergence between Hymenoptera and Diptera (350 MYA), which is in agreement with the present-day understanding of timing of the insect evolution (Gaunt and Miles 2002). We also used a second calibration point: three MYA as divergence time for D. melanogaster and the D. simulans complex (Throckmorton 1975).

Based on the above mentioned tree topology and time estimates, we propose the following scenario of evolution of genes from the Drosophila maltase clusters (Fig. 4). A single maltase gene (ancestor of all maltase genes) duplicated 352 (S.E. 70) MYA into two genes: one became the ancestor of all maltases from the cluster A (on the chromosome 2R) and the other one gave birth to genes from the cluster B (on the chromosome 2L). Then approximately 207 (S.E. 33) MYA mal_A1 were separated from other cluster A maltases. Their common ancestor duplicated into two genes 174 (S.E. 55) MYA, of which one became the ancestor of genes mal_A2, mal_A3, mal_A4, and mal_A5, and the other one the ancestor of the remaining mal_A6, mal_A7, mal_A8. Mal_A2 split from the group of mal_A3, mal_A4, and mal_A5 occurred 155 (S.E. 43) MYA. Mal_A6 separated from mal_A7 and mal_A8 ancestor 124 (S.E. 25) MYA. Then 119 (S.E. 19) MYA mal_A5 split from the ancestor of mal_A3 and mal_A4, quickly followed by separation between mal_A3 and mal_A4, approximately at the same time. Cluster B genes mal_B1 and mal_B2 separated from each other 84 (S.E. 16) MYA. The last duplication, which occurred 61 (S.E. 14) MYA, gave birth to genes mal_A7 and mal_A8 (Fig. 4). All these duplications took place in the common ancestor of all extant Drosophila species, well before the split between subgenera Sophophora and Drosophila, some 60–40 MYA (Beverley and Wilson 1984; Jeffs et al. 1994; Russo et al. 1995; Tamura et al. 2004). This observation is in contradiction with the conclusion of Vieira et al. (1997) who suggested an independent origin of maltase cluster in D. melanogaster and D. virilis and incorrectly estimated the orthologue/paralogue relationship between the maltase genes. They very probably compared two paralogues instead of orthologues from the two species (Vieira et al. 1997). It was also due to the lack of relevant sequence data at that time when only three maltase genes from D. melanogaster were known. New data that became available nowadays from sequencing of complete Drosophila genomes (Drosophila 12 Genomes Consortium 2007) allowed for the identification in the present study of seven more maltase genes organized in two clusters (designated as cluster A and cluster B; Figs 1 and 5) on two chromosomes. All this enabled we to deliver a more detailed analysis of their spatial organization, intron/exon composition, and comparison of amino acid sequence of enzymes encoded by these genes. In contrast to ancient nature of maltase clusters, two duplication events that occurred recently and independently in different Drosophila lineages were reported for genes in the α-amylase cluster (Zhang et al. 2003b). When comparing dynamic evolution of both gene number and primary structure of α-amylases (Brown et al. 1990; Popadic and Anderson 1995; Da Lage et al. 2000; Inomata and Yamazaki 2000; Maczkowiak and Da Lage 2006) to the stagnancy of the maltase clusters in the genus Drosophila, it should be taken into account that the α-amylase, although preferentially active toward a starch, exhibits in general a diverse substrate (starch and related poly and oligo-saccharides) and product (various maltooligosaccharides and even some transferase activity) specificity in comparison with the specificity of a maltase.
https://static-content.springer.com/image/art%3A10.1007%2Fs00239-010-9406-3/MediaObjects/239_2010_9406_Fig4_HTML.gif
Fig. 4

Schematic representation of order and time of duplication of genes from the maltase clusters of the genus Drosophila. Maltase genes (mal_A1–A8 and mal_B1–B2) are in boxes; the genes from the cluster B are distinguished from the cluster A genes by black-and-white inversion. Arrows pointing from the boxes represent particular duplication events. Time of duplication is in MYA, standard errors are shown in brackets

https://static-content.springer.com/image/art%3A10.1007%2Fs00239-010-9406-3/MediaObjects/239_2010_9406_Fig5_HTML.gif
Fig. 5

Intron/exon composition of maltase genes from A. mellifera (hbg1,2,3), A. gambiae (amg1,2) and all 12 Drosophila species are identified in the present study (mal_A1–A8 and mal_B1–B2). The 12 Drosophila species cover Drosophila ananassae, Drosophila erecta, Drosophila grimshawi, Drosophila melanogaster, Drosophila mojavensis, Drosophila persimilis, Drosophila pseudoobscura, Drosophila sechellia, Drosophila simulans, Drosophila virilis, Drosophila willistoni, and Drosophila yakuba. Amino acid sequence is represented by a straight line and triangles stands for particular introns. Phase of an intron is depicted by a number inside the triangle (0, 1 or 2). The three-domain arrangement characteristic of all GH13 maltases is shown as boxes at the top. The numbers above the boxes indicate the domain boundaries in the amino acid sequence of hbg1 from A. mellifera. The numbers below the boxes stand for the nine identified intron positions in maltase genes; N1–N3 depict three introns found solely in the genus Drosophila

The Intron/Exon Composition

We have also investigated the intron/exon composition of all ten maltase genes from 12 Drosophila species, 3 maltase izoenzymes (hbg1, hbg2, hbg3) from A. mellifera (Huber and Thompson 1973; Takewaki et al. 1980, 1993; Kimura et al. 1990; Nishimoto et al. 2001; Kubota et al. 2004) and two maltases (agm1 and agm2) from A. gambiae (Zheng et al. 1995).

In the genes hbg1 and hbg3 we identified 9 possible intron positions (intron 1–intron 9). Six intron positions are in domain A, one in domain B and two in domain C (Fig. 5). Intron 1 (phase 1) is at the α1 helix, intron 2 (phase 0) is before the β3 strand, intron 3 (phase 0) is in the middle of the domain B, intron 4 (phase 0) is at the helix α3 before strand β4, intron 5 (phase 0) is before β5, intron 6 (phase 0) is at the β7, intron 7 (phase 1) is in the loop preceding α8, intron 8 (phase 2) is between the first and the second β-strand of domain C and intron 9 (phase 2) is located behind the third β-strand of this domain (Fig. 5).

Gene hbg1 contains seven introns; it lacks introns 3 and 8. Hbg3 contains also seven introns but it lacks the introns 2 and 5. Interestingly, hbg2 is intronless. Two studied maltases from A. gambiae have markedly less introns than those from A. mellifera. Agm1 contains only intron 6 and the gene agm2 possesses introns 3, 6 and 9 (Fig. 5).

The number of introns in studied Drosophila maltase genes varies from 2 to 5. This is considerably more than that reported for Drosophila α-amylases (Amy genes), which are either intronless (monoexonic) or contain only one intron (Da Lage et al. 1996). Although some Drosophila introns in some maltase genes are about 100 bp long, the length of most introns in maltase genes is between 60 and 80 bp, i.e., in agreement with that observed for most introns of the Amy genes (Da Lage et al. 1996). The lengths are also similar to those of the introns of A. gambiae maltases. This is in a stark contrast to large introns of honeybee hbg genes (especially hbg3), whose lengths range from 634 to 3165 bp. One notable exception is D. mojavensis that possesses long introns in its maltase genes as follows: intron 9 in mal_A5—1360 bp, intron 6 in mal_A6—1049 bp, intron 8 in mal_A3—28 bp, and intron 8 in mal_A7—802 bp.

In Drosophila maltases most introns are concentrated in the 3′ end and only a few are present in the 5′ end of the gene (Fig. 5); this finding contrasts with reported significant 5′-biased distribution of introns of protein-coding genes in eukaryotic genomes (Lin and Zhang 2005). Intron 8 can be found in all Drosophila maltase genes except from mal_A6. Genes mal_A1, mal_A4, and mal_B1 lack the intron 9. The third most conserved intron is the intron 6 (situated at the strand β7). It is present in mal_A4, mal_A5, mal_A6 and, mal_A8. Presence of the introns 1 and 7 is limited to only two Drosophila maltase genes. The intron 1 can be found in mal_A3 and mal_A5, whereas the intron 7 is in mal_A1 (probably absent in mal_A1 from D. willistoni) and mal_B2. Only one gene—mal_A5 contains intron 4. None of the Drosophila maltase genes possesses introns 2, 3 or i 5 found in hbg genes from A. mellifera (Fig. 5).

It is worthwhile to note that the three introns are found in Drosophila maltase genes but absent from all Apis or Anopheles maltase genes studied in this work. We designated them as N1–N3 (Fig. 5). The introns N1 and N2 are positioned in the proximity of β5-strand in mal_A4. Because they are situated near the position of intron 5 (absent in mal_A5), it is possible that one of them actually is the intron 5. But it needs to be taken into consideration that none maltase gene from Drosophila has the intron 5 and, most importantly, both introns are in phase 2, whereas the intron 5 is in phase 0 (Fig. 5). It could be speculated that the eventual phase change of one of these introns is a result of intron sliding (Stoltzfus et al. 1997; Lehmann et al. 2010). Nevertheless, these two introns are found only in one Drosophila maltase gene (mal_A4), and intron N1 is present only in studied species from the subgenus Sophophora (and absent in representatives of subgenus Drosophila, i.e., D.grimshawi, D. mojavensis and D. virilis). It is ,therefore, very likely that intron N1 or both (N1 and N2) are novel introns, and that the intron N1 occurred after split between Drosophila and Sophophora lineages. The third probable novel intron (intron N3) is positioned between the introns 6 and 7 in mal_B1 and mal_B2 (Fig. 5). It is in the phase 0 and is situated at the β8 strand. Although the intron 6 is absent in mal_B1 and mal_B2, and both intron 6 and intron N3 are in the same phase 0, we assume that they are two different introns. Both introns are in well-defined, conserved and therefore easy alignable regions and relatively far away from each other (111 bp). To be an intron, it would have to overcome considerable long distance to get into its new position. Moreover, both mal_B1 and mal_B2 are positioned close to each other on the chromosome different from the one where other maltases are situated, and they exhibit higher mutual similarity than either does to maltases from the cluster A. It is therefore reasonable to propose that they are more related to each other than either is related to the genes from the cluster A and that their shared intron N3 arose after they split from the common ancestor with the cluster A.

In maltase genes from Drosophila genus, 6 from 9 intron positions present in hbg genes from A. mellifera could be found (Fig. 5). The most parsimonious scenario explaining the observed intron distribution, in their opinion, counts with an intron-rich common ancestor of all maltase genes that would resemble in the intron/exon composition more A. mellifera than A. gambiae, and subsequent loss of introns in particular Drosophila maltase genes.

In summary, several conclusions can be drawn such as follows: (i) the maltase genes in the genus Drosophila are grouped in two, old and evolutionarily quite stable clusters; (ii) the particular paralogues are considerably similar in sequence, probably as a result of purifying selection; and (iii) the remarkable diversity in their intron/exon composition emerged after duplication, but well before speciation, in a common ancestor of all extant Drosophila species.

Acknowledgment

This study was supported by the grant No. 2/0114/08 from the Slovak Grant Agency VEGA.

Supplementary material

239_2010_9406_MOESM1_ESM.xls (38 kb)
Supplementary material 1 (XLS 38 kb)

Copyright information

© Springer Science+Business Media, LLC 2010