Characterization of Maltase Clusters in the Genus Drosophila
- First Online:
- Cite this article as:
- Gabriško, M. & Janeček, Š. J Mol Evol (2011) 72: 104. doi:10.1007/s00239-010-9406-3
- 173 Views
To reveal evolutionary history of maltase gene family in the genus Drosophila, we undertook a bioinformatics study of maltase genes from available genomes of 12 Drosophila species. Molecular evolution of a closely related glycoside hydrolase, the α-amylase, in Drosophila has been extensively studied for a long time. The α-amylases were even used as a model of evolution of multigene families. On the other hand, maltase, i.e., the α-glucosidase, got only scarce attention. In this study, we, therefore, investigated spatial organization of the maltase genes in Drosophila genomes, compared the amino acid sequences of the encoded enzymes and analyzed the intron/exon composition of orthologous genes. We found that the Drosophila maltases are more numerous than previously thought (ten instead of three genes) and are localized in two clusters on two chromosomes (2L and 2R). To elucidate the approximate time line of evolution of the clusters, we estimated the order and dated duplication of all the 10 genes. Both clusters are the result of ancient series of subsequent duplication events, which took place from 352 to 61 million years ago, i.e., well before speciation to extant Drosophila species. Also observed was a remarkable intron/exon composition diversity of particular maltase genes of these clusters, probably a result of independent intron loss after duplication of intron-rich gene ancestor, which emerged well before speciation in a common ancestor of all extant Drosophila species.
KeywordsMolecular evolutionMaltaseAlpha-amylase familyGene clusterDrosophilaIntron/exon composition
Conserved sequence region
Expressed sequence tags
Million years ago
Maltases are α-glucosidases (EC 22.214.171.124), which catalyze the hydrolysis of α-1,4-glucosidic linkages of maltose with release of α-d-glucose (Chiba 1997). According to the Carbohydrate-Active enZymes classification system (CAZy) (Cantarel et al. 2009), the α-glucosidases are found in four families of glycoside hydrolases (GHs): GH4, GH13, GH31, and GH97. Crystal structures are resolved for α-glucosidases from all the four families. They are the GH4 AglA from thermophilic bacterium Thermotoga maritima (Lodge et al. 2003), GH13 α-glucosidase from thermophilic deep-sea bacterium Geobacillus sp. strain HTA-462 (Shirai et al. 2008), GH31 MalA from acidophilic and thermophilic archaeon Sulfolobus solfataricus (Ernst et al. 2006), and GH97 SusB from human intestinal bacterium Bacteroides thetaiotaomicron (Gloster et al. 2008; Kitamura et al. 2008). α-Glucosidases from families GH13, GH31, and GH97 share a (β/α)8-barrel fold of their catalytic domain, and a remote homology between the enzymes from the families GH13 and GH31 was revealed (Rigden 2002; Janecek et al. 2007). In contrast, α-glucosidases from the family GH4 show structural similarity to NAD-dependent dehydrogenases (2-hydroxyacid dehydrogenases), with typical Rossman fold of their NAD+-binding site (Lodge et al. 2003).
From taxonomical point of view, the most widespread are α-glucosidases from family GH31. They are found in all the three domains: in Archaea, Bacteria, and Eukarya from protists through fungi and plants to metazoans. Those from GH13 originate from bacteria, and in eukaryotes are limited to fungi and insect. The α-glucosidases from families GH4 and GH97 are solely of bacterial origin (Cantarel et al. 2009).
Insect α-glucosidases are found exclusively in family GH13. The GH13, also known as the α-amylase family (MacGregor et al. 2001), together with GH70 and GH77 forms the clan GH-H (Cantarel et al. 2009). Enzymes classified in the clan GH-H have to share following features: (i) catalytic domain (designated as domain A) is formed by the (β/α)8-barrel fold (i.e., TIM-barrel), from which a small distinct domain (called domain B) protrudes between the strand β3 and the helix α3; (ii) catalytic machinery consists of an aspartate (catalytic nucleophile) located at the β4-strand, a glutamate (proton donor) at the β5-strand, and another aspartate (transition-state stabilizer) positioned at the β7-strand; (iii) reaction mechanism is retains; and (iv) amino acid sequences contain from four up to seven conserved sequence regions (CSRs) located mainly at the β-strands of the catalytic domain A (Matsuura et al. 1984; Janecek et al. 1997; Kuriki and Imanaka 1999; MacGregor et al. 2001; Janecek 2002).
Family GH13 belongs to the largest GH families containing almost 30 different enzyme specificities. It was revealed that different specificities could be grouped together into subfamilies because of their sequence similarities (Janecek 1995, 2002). Based on a specific sequence of the fifth CSR, two GH13 subfamilies were established (Oslancova and Janecek 2002): the oligo-1,6-glucosidase and neopullulanase subfamilies, represented by oligo-1,6-glucosidase from Bacillus cereus with 167_QPDLN as CSR V and neopullulanase from Bacillus stearothermophilus with 295_MPKLN, respectively. According to this classification, α-glucosidases belong to oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002). At present, the family GH13 is divided into 36 CAZy-curator-established subfamiles (Stam et al. 2006), the insect α-glucosidases being grouped in the subfamily GH13_17 (Cantarel et al. 2009).
Although there is a large amount of putative α-glucosidases predicted by sequence comparison, only a few of them were biochemically characterized. Three α-glucosidase izoenzymes I, II, and III from European honeybee Apis mellifera, encoded by the genes hbg1, hbg2, and hbg3, respectively, expressed in different organs (Huber and Thompson 1973; Takewaki et al. 1980, 1993; Kimura et al. 1990; Nishimoto et al. 2001; Kubota et al. 2004), were revealed to differ in their substrate specificities and pH optima. A putative α-glucosidase (malI) with resemblance to yeast maltase was identified in the salivary glands of adult Aedes aegypti (James et al. 1989), and two probable maltase genes (agm1 and agm2) were found to be expressed in midgut of Anopheles gambiae (Zheng et al. 1995).
In Drosophila melanogaster, a small gene cluster was identified and sequenced (Snyder and Davidson 1983). It is 8 kb of DNA long and localized in chromosomal region 44D of chromosome 2R (right arm of chromosome 2), 11 kb away of larval cuticle gene cluster. It consists of three coordinately expressed genes designated as lvpH, lvpD, and lvpL (larval visceral protein H, D and L). Owing to their high mutual amino acid sequence similarity (48–53%), this cluster was proposed to arise by gene duplication, but because the intron/exon composition between the genes is not conserved, the duplication was probably an ancient event (Snyder and Davidson 1983). Function of these three genes was initially unknown (unrelated to nearby genes from the larval cuticle cluster), but later they were assigned to maltases based on their amino acid sequence similarity to that of maltase from yeast Saccharomyces carlsbergensis (Henikoff and Wallace 1988). To find out whether spatial organization of maltase cluster remained conserved in distantly related Drosophila species, the assumed homologous maltase cluster in Drosophila virilis was investigated (Vieira et al. 1997). It was found to consist of two genes (mav1 and mav2), oriented in the same direction and situated together with larval cuticle cluster on chromosome 4. Interestingly, the chromosome 4 is not homologous to the chromosome 2R of D. melanogaster,but it is the homologue of chromosome 2L. This observation would violate the principle that genes do not migrate between chromosomal elements during Drosophila evolution even if they can change position within one element (Hartl and Lozovskaya 1994). D. virilis possesses two instead of three maltase genes found in D. melanogaster and they are transcribed in the same direction, in contrast to the opposite direction of transcription of D. melanogaster lvpH, lvpD and lvpL genes. Moreover, none of intron positions is shared between the mav1 or mav2 and any of the lvpHDL genes. These great significant differences between the two clusters led to reasonable conclusion of their independent origin (Vieira et al. 1997). Similarly oriented studies that focused on the evolutionary history of related α-amylase genes (classified in the same family GH13) were also done with various Drosophila species (Da Lage et al. 1996, 2000; Zhang et al. 2003a). One of the most striking findings was the observation of an enzymatically inactive remote paralogous gene named Amyrel (Da Lage et al. 1998; Maczkowiak and Da Lage 2006).
In this study, we have used 12 available completely sequenced Drosophila genomes in a bioinformatics study with the aim of understanding better the evolutionary history of the α-glucosidases between closely related species. To achieve this aim, we investigated spatial organization of two maltase clusters in 12 Drosophila species, compared amino acid sequences encoded by these genes and analyzed intron/exon compositions of orthologous genes of these clusters. In an effort to place the origin and changes of the maltase clusters on an evolutionary time line of the genus Drosophila, estimations of an approximate time and order of particular gene duplications were also attempted.
Materials and Methods
Identification of Orthologues
Amino acid sequences were collected using protein BLAST (Altschul et al. 1990) against the default non-redundant database. Characterized α-glucosidase I from Apis mellifera (Huber and Thompson 1973; Kimura et al. 1990) (GenPept accession number: NP_001035326; GeneID: 409889, hbg1) was used as a query and search was limited to the genus Drosophila.
From obtained GenBank (Benson et al. 2009) results, sequences that possessed sequence features characteristic for enzymes from oligo-1,6-glucosidase subfamily of family GH13 (all the seven CSRs and conserved catalytic and substrate binding residues) were selected. To ensure that complete set of maltase genes, for each species that was acquired, only amino acid sequences from 12 completely sequenced Drosophila genomes were used for further analysis. The genes that possessed the identical intron/exon arrangement in different Drosophila species were proposed to be orthologous (Table S1).
Representative maltases used in the present study
Neighbor-joining (NJ) (Saitou and Nei 1987) and maximum parsimony (MP) (Eck and Dayhoff 1966) phylogenetic trees were calculated by MEGA 4.1 package (Tamura et al. 2007). For NJ method, Jones–Taylor–Thornton model (Jones et al. 1992) of amino acid change was used, and unequal rates among sites were assumed. Gamma-shaped parameter (α) was estimated by the program PhyML 3.0 (Guindon and Gascuel 2003). For MP method, close-neighbor-interchange algorithm with default parameters was used for tree search. Reliability of tree topologies was evaluated using bootstrap test (Felsenstein 1985) with 1,000 replications. Maximum likelihood (ML) (Felsenstein 1981) tree was calculated with PhyML 3.0 algorithm (Guindon and Gascuel 2003) using default LG model (Le and Gascuel 2008) of sequence evolution, with unequal rates among sites. Eight substitution rate categories were applied. The gamma-shaped parameter (α) as well as proportion of invariable sites was estimated by program PhyML 3.0 itself (Guindon and Gascuel 2003). Because this method is too computationally demanding, bootstrap test was limited to 100 replications.
To test the hypothesis of molecular clock, likelihood-ratio test (LRT) was used to compare the log likelihoods of phylogenies first with the assumption of a constant rate of evolution between branches (with molecular clock) with those between branches without this assumption. Time of divergence between particular genes was estimated from the ML tree. The tree was calibrated using 39 million years ago (MYA) as an assumed time of divergence between the subgenera Sophophora and Drosophila (Russo et al. 1995). As a second calibration point, we used 3 MYA as divergence time for D. melanogaster and the D. simulans complex (Throckmorton 1975).
To account for different evolutionary rates between branches, the local rate minimum deformation method, as implemented in the TreeFinder program (Jobb et al. 2004), was used. The same program was used to calculate the standard errors of the divergence time estimates using the bootstrap procedure under fixed topology of the input tree. Evolutionary history of maltase clusters was inferred using DILTAG algorithm (Lajoie et al. 2010).
Results and Discussion
Description of the Maltase Clusters
All the 10 above-mentioned genes from both clusters possess the sequence features characteristic of the GH13_7 α-glucosidases (Oslancova and Janecek 2002; Stam et al. 2006), and none of them contains large gaps, inserts and/or termination codons in their sequences. Although 3′ and 5′ flanking regions were not investigated, we suggest that all the studied maltases are functional, transcribed genes, and none of them is a pseudogene. To support this assumption, we searched GenPept (Benson et al. 2009) and Flybase (Tweedie et al. 2009) for EST (expressed sequence tags) for particular maltase genes. We found ESTs covering whole sequence for all 10 genes from D. melanogaster. In addition, from D. melanogaster RNA-Seq developmental profile and cell line expression data available in Flybase (Tweedie et al. 2009), it can be seen that most genes are transcribed from embryonic (not earlier than after 4–6 h) to adult stage. Of note is gene mal_A5, which is most heavily transcribed in the first 6 h of developing embryo, and gene mal_A7 which is in contrast transcribed only in the adult stage. Based on mRNA-Seq data of head tissue from D. pseudoobscura stored in flybase (Tweedie et al. 2009), four genes: mal_A1, mal_A5, mal_A6, and mal_B2 are transcribed in this species in a head region possibly in salivary glands.
We also investigated homologous maltase clusters A and B in eleven additional species from genus Drosophila to learn whether spatial organization remains conserved during evolution of this taxonomic group. We found out that in seven species (Drosophila ananassae, Drosophila erecta, Drosophila mojavensis, Drosophila persimilis, Drosophila virilis, Drosophila willistoni, and Drosophila yakuba), the organization of both clusters is exactly the same as it is in D. melanogaster.
In remaining four species (Drosophila grimshawi, Drosophila pseudoobscura, Drosophila sechellia, and Drosophila simulans), there are some minor differences in maltase cluster A organization (Fig. 1). While D. grimshawi seems to be lacking mal_A8, in D. pseudoobscura, mal_A7 is divided into two records in GenPept (XP_002138232, GI:6898154, and XP_002138231, GI:6898153), and the same applies for the mal_A6 from D. sechellia (split into XP_002032930, GI:6608185, and XP_002032931, GI:6608186). In D. simulans, between complete mal_A2 and mal_A3, there are additional fragments of mal_A3 (XP_002080592, GI:6733537) and mal_A2 (XP_002080593, GI:6733538). D. simulans also has two copies of mal_A5 gene (XP_002080596, GI:6733541; and XP_002080597, GI:6733542) and contains only a fragment of mal_A7 (XP_002080600, GI:6733545) between mal_A6 and mal_A8. In the D. sechellia GenPept record XP_002041985 (GI: 6617669), there are two sequences of mal_B1 and fragment of mal_B2 mixed together (not shown). Using bioinformatics approaches alone, it is not possible to distinguish whether these observed differences reflect reality or these are the results of sequencing errors. Nevertheless, the above mentioned fragments are too similar in their amino acid sequences to their orthologues in other species to be under neutral evolution, and so we consider them more likely to be the artifacts of sequencing process. Moreover, genomes of D.persimilis, D. sechellia, and D. simulans were sequenced to lower coverage (4×) than the other Drosophila genomes were (more than 8×), and therefore more artifacts were expected in these assemblies.
Enzymes from the α-amylase family GH13 (MacGregor et al. 2001) possess four invariantly conserved residues (Janecek 2002): Asp206 (catalytic nucleophile), Glu230 (proton donor), Asp297 (transition-state stabilizer), and Arg204 (numbering as in the Taka-amylase A from Aspergillus oryzae). They form the basis of the four best-known GH13 CSRs (Nakajima et al. 1986; MacGregor et al. 2001) situated at strands β3 (CSR I), β4 (CSR II), β5 (CSR III) and β7 (CSR IV) of the catalytic (β/α)8-barrel domain (domain A). Three additional CSRs have also been proposed (Janecek 1992, 1994a, b, 1995, 2002): CSR V situated near the C-terminus of domain B, and CSRs VI and VII at strands β2 and β8, respectively.
Domain A (in hbg1-encoded maltase it covers positions 1–122 and 203–489) begins with region covering strand β1 and helix α1, well conserved in mosquito, honeybee, as well as in all Drosophila maltase genes (Fig. 2). Of note is the presence of histidine at the C-terminal end of α1 (in hbg1: 55_KLXH) in all the three honeybee maltases, instead of otherwise conserved tyrosine (KLXY). In the following CSR VI, covering the strand β2 (hbg1: 63_GITAIWLSP), starting glycine and the C-terminal part (WLSP) is identical in all the sequences except from mal_B1 from D. persimilis and D. pseudoobscura, where leucine is changed to methionine. Conserved tryptophan is a sequence feature typical for enzymes from the GH13 oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002).
The C-terminal part of the CSR I situated at the strand β3 is also conserved invariantly. Except for hbg1 (111_NLKVILDLVPNH) where leucine (Leu118) succeeds the aspartate, phenylalanine (DFVPNH) is conserved in all other sequences (Fig. 2). Leucine (a hydrophobic residue in general) is a hallmark of the oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002).
Domain B (positions 123–202) is a small domain that protrudes out from the (β/α)8-barrel between strand β3 and helix α3. This typical feature of the enzymes from the α-amylase family was also found to be conserved in a protein that lost its enzymatic function and serves now as an auxiliary ectodomain of amino acid transporter—rBAT (Janecek et al. 1997; Gabrisko and Janecek 2009). In all studied Drosophila maltases domain B is approximately of the same length (differences concern 1–2 amino acid residues); with sequence conserved mostly within the β-strands. The CSR V (loop3 near the C-terminus of the domain B) has proline in the second position in most sequences (hbg1: 198_QPDLN), but in mal_ A7 (expect from D. grimshawi and D. willistoni) and in mal_A1 from D. erecta, D. melanogaster, D. sechellia, D. simulans, and D. yakuba proline is substituted with alanine (QADLN). Of note is the substitution of leucine to phenylalanine (from QPDLN to QPDFN) in the mal_A4 sequences (Fig. 2). Based on a specific sequence of this conserved region (CSR V), the GH13 oligo-1,6-glucosidase subfamily was established (Oslancova and Janecek 2002) with dominating sequence QPDLN in this stretch.
Catalytic aspartate (i.e., the catalytic nucleophile Asp230) present in the CSR II (hbg1: 223_GIDGFRIDAVPH) positioned at the strand β4 is present in all the studied proteins. Phenylalanine from stretch of three residues invariantly conserved in all sequences (GFR) is found in most enzymes from the oligo-1,6-glucosidase subfamily (Oslancova and Janecek 2002). Interestingly, isoleucine preceding the catalytic aspartate is substituted in mal_A7 and mal_A8 sequences with cysteine and in mal_A6 (except from representatives from subgenus Drosophila and in D. willistoni) with methionine (Fig. 2). Instead of proline, aromatic tyrosine is present in mal_A4, asparagine in hbg2 and mal_B2, and isoleucine in mal_B1 (except for D. grimshawi and D. melanogaster, where proline is changed to asparagine). Succeeding the CSR II, two potentially important residues (phenylalanine or tyrosine and glutamate) are present in all the sequences (Fig. 2).
A large amount of diversity between sequences is seen between the strands β4 and β5 (Fig. 2). The least conserved segment is considerably longer in mal_A1 sequences (possible insert of about 10 residues) and shorter in mal_B1, mal_B2 (presumed ~6-residue deletion), and in mal_A2 (deletion of 1 residue) when compared with remaining sequences.
In the CSR III (covering the β5 strand), the invariantly conserved catalytic glutamate (the proton donor) (hbg1: 299_EAY) is one position next, followed by an aromatic residue (tyrosine is in all the sequences except from mal_A4 and agm2, where it is changed to tryptophan). Between the conserved glutamate and the aromatic residue, there is an alanine in mal_A1, mal_A2, mal_A3, mal_B2, agm1,2, hbg1,2,3, and glycine in mal_B1, whereas all the remaining sequences contain threonine in this position (Fig. 2). About ten residues further, there are two aromatic residues (two tyrosines, or tyrosine and tryptophan) conserved in all the sequences except from agm1 (where the first aromatic residue is changed to leucine). At the strand β6, there is a stretch of four residues (hbg1: 318_PFNF) invariantly conserved in all sequences except from mal_A2, where the first phenylalanine is replaced by methionine (Fig. 2). Between this region and invariantly conserved tryptophan succeeded by proline (hbg1: 343_WIKGTP) 4 amino acid residues further, there is an insert of about 5 residues in the mal_A6. It is noted that the proline from WIKGTP is almost universally conserved, except from mal_A2, where it is changed to tryptophan.
CSR IV (hbg1: 352_ VPNWVMGNHD) situated at the strand β7 possess asparagine and tryptophan (NW) as well as stretch GNHD (with catalytic aspartate, Asp361; the transition-state stabilizer) invariantly conserved in all studied sequences (NWXXXGNHD) (Fig. 2). In the following region of 7 amino acid residues, there is a stretch containing two arginines and a C-terminal glycine present in all the sequences (hbg1: 364_RVGTRYPG).
Borders of the CSR VII (hbg1: 385_GVAVTYYGEE) positioned at the strand β8 are formed by invariantly present leucine (substituted in mal_A7 from D. virilis with methionine) and proline succeeded by a conserved glycine from CSR VII from the N-terminal side, and glycine and methionine at the C-terminal side (replaced in mal_A7 from D. persimilis and D. pseudoobscura by alanine) preceded by conserved GEE stretch from CSR VII found in all studied sequences (Fig. 2). Between the CSR VII and a well-conserved loop preceding the helix α8, there is a deletion of more than 10 residues in mal_A2.
Concerning the domain C (positions 490–588), the sequences of orthologues exhibit high mutual similarity, whereas the similarity between any two paralogues is low with no residues conserved invariantly in all the studied sequences (Fig. 2). The eventuality that evolution of this domain has been constrained only by preserving its tertiary structure (i.e., the rough fold) could be one of the reasons responsible for the clear low amino acid similarity of domain C sequences. Or, more interestingly, the domain C could be seen as a module evolving independently of, e.g., the catalytic domain, as it was shown for various domains in the GH13 α-amylase family enzymes (Janecek et al. 1997; Godany et al. 2010), and especially for phylogenies of the so-called starch-binding domains (Janecek et al. 2003; Machovic and Janecek 2006a, b, 2008).
Evolutionary History of the Maltase Clusters
For the ML, we used default LG (Le and Gascuel 2008) model of sequence evolution. To account for rate heterogeneity among sites, we applied a gamma distribution (eight substitution rate categories) with the α-shape parameter 0.61 (SE = 0.03) estimated from the data. For the NJ, the same α-shape parameter 0.61 (SE = 0.03) for gamma distribution was used.
All the methods (NJ, MP, and ML) provide the same general topology of the phylogenetic tree. Hbg genes from honeybee are on one branch separated from Drosophila maltases that are positioned also on their own branch divided further to cluster A and cluster B (Fig. 3). To root the tree, we took the hbg1 gene as an outgroup. Concerning genes from the cluster A, mal_A3, mal_A4, and mal_A5 form one group (with high bootstrap support of more than 97%), where mal_A3 and mal_A4 are more similar to each other than both are to mal_A5 (for ML and MP trees only 47 and 44%, respectively, and for NJ tree 90% bootstrap support). Genes mal_A6, mal_A7 and mal_A8 cluster together, the mal_A7 and mal_A8 being more similar to each other than either is to mal_A6 (high bootstrap values of more than 97% for all the three methods). Mal_A2 groups together with mal_A3, mal_A4, and mal_A5 (Fig. 3), but with lower bootstrap values (86% for ML, 70% for NJ and only 50% for MP). Mal_A1 is positioned on its own branch, separated from other Drosophila maltases, although with lower bootstrap support (86% for ML, 65% for NJ, and only 38% for MP). Interestingly, the NJ, MP, and ML methods do not yield the comparable results for position of agm genes from mosquito. Both the genes are situated near the root of the NJ tree, together with the hbg genes from A. mellifera (not shown), whereas in both MP and ML (Fig. 3) trees, the agm2 is grouped with Drosophila maltases from cluster A, and agm1 even shares the branch with mal_A1 genes. None of these alternative branching patterns is supported with high bootstrap values (below 62% for all the methods). The reason for this ambiguity could be that some duplications of genes from maltase clusters took place before split between lineages leading to Brachycera and Nematocera, maybe as early as in the common ancestor of Diptera. An eventual study of maltase cluster in Nematocera could throw more light on this issue. Nevertheless, it is very probable that duplication of these genes was not initiated before Hymenoptera lineage split from other Endopterygota (Holometabola).
Times of divergence of particular gene duplication were estimated from branch lengths of the ML tree. To account for a different rate of evolution among lineages, we used a local rate minimum deformation method, because the molecular clock hypothesis (considering a constant rate of evolution among lineages) could clearly be rejected (LRT: χ2 = 575.45; df = 111; P = 0).
As a calibration point, we took 39 MYA that is an assumed divergence time between the subgenera Sophophora and Drosophila (Russo et al. 1995). For this, a more recent divergence estimate, based on phylogenetic studies on alcohol dehydrogenase (Jeffs et al. 1994; Russo et al. 1995) was decided. The more ancient calibration point (~60 MYA), based on immunological distance (Beverley and Wilson 1984) and genomic mutation distances data (Tamura et al. 2004), led to unrealistically old divergence time estimate between Hymenoptera and Diptera (more than 500 MYA). The use of the former younger calibration point resulted in more reasonable estimate of divergence between Hymenoptera and Diptera (350 MYA), which is in agreement with the present-day understanding of timing of the insect evolution (Gaunt and Miles 2002). We also used a second calibration point: three MYA as divergence time for D. melanogaster and the D. simulans complex (Throckmorton 1975).
The Intron/Exon Composition
We have also investigated the intron/exon composition of all ten maltase genes from 12 Drosophila species, 3 maltase izoenzymes (hbg1, hbg2, hbg3) from A. mellifera (Huber and Thompson 1973; Takewaki et al. 1980, 1993; Kimura et al. 1990; Nishimoto et al. 2001; Kubota et al. 2004) and two maltases (agm1 and agm2) from A. gambiae (Zheng et al. 1995).
In the genes hbg1 and hbg3 we identified 9 possible intron positions (intron 1–intron 9). Six intron positions are in domain A, one in domain B and two in domain C (Fig. 5). Intron 1 (phase 1) is at the α1 helix, intron 2 (phase 0) is before the β3 strand, intron 3 (phase 0) is in the middle of the domain B, intron 4 (phase 0) is at the helix α3 before strand β4, intron 5 (phase 0) is before β5, intron 6 (phase 0) is at the β7, intron 7 (phase 1) is in the loop preceding α8, intron 8 (phase 2) is between the first and the second β-strand of domain C and intron 9 (phase 2) is located behind the third β-strand of this domain (Fig. 5).
Gene hbg1 contains seven introns; it lacks introns 3 and 8. Hbg3 contains also seven introns but it lacks the introns 2 and 5. Interestingly, hbg2 is intronless. Two studied maltases from A. gambiae have markedly less introns than those from A. mellifera. Agm1 contains only intron 6 and the gene agm2 possesses introns 3, 6 and 9 (Fig. 5).
The number of introns in studied Drosophila maltase genes varies from 2 to 5. This is considerably more than that reported for Drosophila α-amylases (Amy genes), which are either intronless (monoexonic) or contain only one intron (Da Lage et al. 1996). Although some Drosophila introns in some maltase genes are about 100 bp long, the length of most introns in maltase genes is between 60 and 80 bp, i.e., in agreement with that observed for most introns of the Amy genes (Da Lage et al. 1996). The lengths are also similar to those of the introns of A. gambiae maltases. This is in a stark contrast to large introns of honeybee hbg genes (especially hbg3), whose lengths range from 634 to 3165 bp. One notable exception is D. mojavensis that possesses long introns in its maltase genes as follows: intron 9 in mal_A5—1360 bp, intron 6 in mal_A6—1049 bp, intron 8 in mal_A3—28 bp, and intron 8 in mal_A7—802 bp.
In Drosophila maltases most introns are concentrated in the 3′ end and only a few are present in the 5′ end of the gene (Fig. 5); this finding contrasts with reported significant 5′-biased distribution of introns of protein-coding genes in eukaryotic genomes (Lin and Zhang 2005). Intron 8 can be found in all Drosophila maltase genes except from mal_A6. Genes mal_A1, mal_A4, and mal_B1 lack the intron 9. The third most conserved intron is the intron 6 (situated at the strand β7). It is present in mal_A4, mal_A5, mal_A6 and, mal_A8. Presence of the introns 1 and 7 is limited to only two Drosophila maltase genes. The intron 1 can be found in mal_A3 and mal_A5, whereas the intron 7 is in mal_A1 (probably absent in mal_A1 from D. willistoni) and mal_B2. Only one gene—mal_A5 contains intron 4. None of the Drosophila maltase genes possesses introns 2, 3 or i 5 found in hbg genes from A. mellifera (Fig. 5).
It is worthwhile to note that the three introns are found in Drosophila maltase genes but absent from all Apis or Anopheles maltase genes studied in this work. We designated them as N1–N3 (Fig. 5). The introns N1 and N2 are positioned in the proximity of β5-strand in mal_A4. Because they are situated near the position of intron 5 (absent in mal_A5), it is possible that one of them actually is the intron 5. But it needs to be taken into consideration that none maltase gene from Drosophila has the intron 5 and, most importantly, both introns are in phase 2, whereas the intron 5 is in phase 0 (Fig. 5). It could be speculated that the eventual phase change of one of these introns is a result of intron sliding (Stoltzfus et al. 1997; Lehmann et al. 2010). Nevertheless, these two introns are found only in one Drosophila maltase gene (mal_A4), and intron N1 is present only in studied species from the subgenus Sophophora (and absent in representatives of subgenus Drosophila, i.e., D.grimshawi, D. mojavensis and D. virilis). It is ,therefore, very likely that intron N1 or both (N1 and N2) are novel introns, and that the intron N1 occurred after split between Drosophila and Sophophora lineages. The third probable novel intron (intron N3) is positioned between the introns 6 and 7 in mal_B1 and mal_B2 (Fig. 5). It is in the phase 0 and is situated at the β8 strand. Although the intron 6 is absent in mal_B1 and mal_B2, and both intron 6 and intron N3 are in the same phase 0, we assume that they are two different introns. Both introns are in well-defined, conserved and therefore easy alignable regions and relatively far away from each other (111 bp). To be an intron, it would have to overcome considerable long distance to get into its new position. Moreover, both mal_B1 and mal_B2 are positioned close to each other on the chromosome different from the one where other maltases are situated, and they exhibit higher mutual similarity than either does to maltases from the cluster A. It is therefore reasonable to propose that they are more related to each other than either is related to the genes from the cluster A and that their shared intron N3 arose after they split from the common ancestor with the cluster A.
In maltase genes from Drosophila genus, 6 from 9 intron positions present in hbg genes from A. mellifera could be found (Fig. 5). The most parsimonious scenario explaining the observed intron distribution, in their opinion, counts with an intron-rich common ancestor of all maltase genes that would resemble in the intron/exon composition more A. mellifera than A. gambiae, and subsequent loss of introns in particular Drosophila maltase genes.
In summary, several conclusions can be drawn such as follows: (i) the maltase genes in the genus Drosophila are grouped in two, old and evolutionarily quite stable clusters; (ii) the particular paralogues are considerably similar in sequence, probably as a result of purifying selection; and (iii) the remarkable diversity in their intron/exon composition emerged after duplication, but well before speciation, in a common ancestor of all extant Drosophila species.
This study was supported by the grant No. 2/0114/08 from the Slovak Grant Agency VEGA.