Background

All genomes encode conserved genes. The arrangement of these genes on chromosomal elements is determined by a balance between stochastic rearrangements and functional constraints. The level of conservation of gene order (synteny) and linkage between two genomes will depend on the relative contributions of inter- and intrachromosomal rearrangements. Whereas shared ancestry and functional constraints will increase conservation of linkage and synteny between taxa, rearrangement events will tend to randomize gene order over time. In the Metazoa, several gene clusters have been identified that remain linked because of functional constraints. These include the histone genes [1], the Hox gene clusters [2], the immunoglobulin cluster [3], and the major histocompatibility complex (MHC) [4], but most genes are believed to be free to move within the genome. The tempo of gene rearrangement varies between taxa [5,6]. Vertebrate chromosomes are mosaic structures containing large conserved segments that can reside in different linkage groups in different species. There is a surprising conservation of synteny between distantly related species (approximately 450 million years (Myr) divergence) [7]. However, some lineages, such as rodents, show more extensive rearrangement than others, such as teleosts.

In protostomes, comparative studies of the genomes of closely related dipterans (Drosophila sp. and Aedes aegypti [5,8]) and nematodes (Caenorhabditis elegans and C. briggsae [6,9]) revealed a high rate of rearrangement. Chromosome rearrangements between closely related Drosophila species are mainly large pericentric inversions that may be facilitated by flanking transposon sequences [10,11]. C. elegans and C. briggsae are closely related, with estimates of 25-120 Myr divergence based on sequence comparisons [6,12]. Two groups have attempted to assess genome rearrangement rates and modes in comparisons between these two species. Kent and Zahler [9] compared 8.1 megabases (Mb) of fragmentary C. briggsae sequence derived from sequenced cosmid clones to C. elegans and derived a mean syntenic fragment length of 8.6 klobases (kb), or approximately 1.8 genes (there is one gene per 5 kb in C. elegans) [13]. In contrast, Coghlan and Wolfe [6], comparing 12.9 Mb of C. briggsae cosmid-derived sequence, found a mean syntenic fragment length of 53 kb. The difference appears to be purely methodological, as Kent and Zahler analyzed a subset of the data of Coghlan and Wolfe, and probably derives from a more relaxed definition of matching genes and use of cosmid fingerprinting physical map information by the latter study [6]. Estimation of rates of intrachromosomal to between-chromosome rearrangements showed that both were very frequent (approximately fourfold greater than that observed in D. melanogaster). Again, repeat sequences were associated with rearrangement boundaries [6]. It remains to be established whether this high rate of rearrangement is peculiar to the Caenorhabditis lineage, or is a general feature of nematode genomes.

To address this question we have begun analysis of a third nematode genome, that of the human filarial parasite Brugia malayi, which is estimated to have last shared a common ancestor with C. elegans 300-500 Myr ago [14]. B. malayi has a genome size of 100 Mb [15] and a gene complement estimated to be similar to C. elegans [16], and is the subject of a mature, expressed sequence tag (EST)-based genome project [16,17]. Unlike C. elegans, which has five autosomes and an XX/Xo sex-determination system [18], B. malayi has four autosomes and an XX/XY system [19]. The small size of condensed nematode chromosomes has precluded accurate in situ analysis of conservation of gene order. We have therefore taken a sequence-based approach, and here compare an 83 kb region surrounding the B. malayi macrophage-migration-inhibitory factor 1 locus (Bm-mif-1), a B. malayi homolog of a vertebrate cytokine [20], to the C. elegans genome and have found evidence for conservation of linkage and microsynteny between these two distantly related nematodes. The general features of this comparison were confirmed using a survey of genome sequences from B. malayi.

Results

General sequence features of an 83 kb segment of the B. malayigenome

Two overlapping bacterial artificial chromosome clones (BACs) were isolated that spanned the Bm-mif-1 locus. The inserts of BMBAC01L03 and BMBAC01P19 were 28,757 base pairs (bp) and 64,685 bp, respectively, with 10,637 bp of overlap, yielding a contiguated sequence of 82,805 bp (Figure 1). AT content overall was 68.0%; exonic DNA had an AT content of 59.9% and intergenic and intronic DNA had AT contents of 69.3% and 70.4% respectively. The average predicted gene size was 4.7 kb (range 0.6-20 kb). The average distance between genes was 3.1 kb (range 0.3-10.5 kb), giving an average gene density of one gene per 6.9 kb. There was an average of 9.3 introns per gene, with an average intron length of 316 bp (range 48-2,767 bp). The C. elegans orthologs of the B. malayi genes (see below) had a mean length of 3.2 kb, with an average of 5.5 introns per gene (mean size of 142 bp). The B. malayi genes were longer as a result of increased mean length and number of introns. Comparison to C. elegans presumed orthologs (see below) showed that only 50% of C. elegans introns were conserved in B. malayi (29 of 56 introns), and 25% of B. malayi introns (29 of 107) were conserved in C. elegans (Table 1). Of the 12 predicted B. malayi genes, seven were tested and confirmed by cDNA-PCR, and alternatively spliced transcripts were identified for four. Five of the 12 genes had corresponding ESTs (Table 1).

Figure 1
figure 1

The BMBAC01L03/BMBAC01P19 contig compared to the C. elegans genome. Genes are indicated by exon (box) and intron (bracket) structures. For each species, the direction of transcription of the genes is indicated by an arrow. The C. elegans gene structures are drawn to the same scale as the B. malayi contig. A, Match to B. malayi EST cluster BMC03169 [16]. Brugia EST (BMC) and Onchocerca volvulus (OVC) clusters are viewable in NemBase [39,60]. B, Highly similar to O. volvulus EST cluster OVC02481 [61]. C, Match to B. malayi EST cluster BMC00238. D, Match to B. malayi EST clusters BMC02055 and BMC01932. However, no ORF was identified, and it may not represent protein-coding sequence (see text for discussion). E, Match to B. malayi EST cluster BMC06334. F, Match to B. malayi EST cluster BMC00400. G, BMBAC01L03.1 and BMBAC01P19.7 are gene fragments. Percent identity was calculated on the alignable portion of the C. elegans ortholog. H, F13G3.9 (Ce-mif-3) is on C. elegans chromosome I. However, F13G3.9 is not the predicted ortholog of Bm-mif-1 and thus the relationship is indicated by a dashed arrow (see text). I, Percent identity was calculated for BMBAC01P19.3 and BMBAC01L03.4 only within the PWWP or dnaJ domains respectively. Homolog pairs are indicated by the colouring of the gene models.

Table 1 Genes predicted on the BMBAC01L03/BMBAC01P19 contig

Comparison of predicted genes to C. elegans

All 12 predicted genes had C. elegans homologs, but putative orthology could only be assigned to 11 pairs (Figure 1, Table 1). Orthology definition is possibly problematic, as the complete genome sequence of B. malayi is not known, and it is thus possible that genes more similar to these C. elegans comparators could be present. We note, however, that no B. malayi EST-defined genes (23,000 ESTs defining approximately 8,300 genes [16]) have better matches to these C. elegans proteins (data not shown), and that orthology definition included coextension of the proteins, and conservation of intron position and phase (Table 1). The exception, BMBAC01L03.3, contained two domains, an amino-terminal LON ATP-dependent serine protease domain (domain PF02190) and an anonymous carboxy-terminal domain (PFB022940). Proteins predicted from the Arabidopsis thaliana (AAC42255.1), Mus musculus (NP_067424), and Homo sapiens (XP_0421219) genomes share this architecture, but there are no C. elegans proteins that have both domains.

Some genes were similar to hypothetical, functionally uncharacterized genes from C. elegans. BMBAC01P19.7a/b had multiple predicted transmembrane segments also found in a number of peptides from other species (PFB002843) and were most similar to C36B1.12 (60% identity). There is only one homolog of BMBAC01P19.3a in any organism -F43G9.4 from C. elegans. The amino termini of both BMBAC01P19.3a and F43G9.4 contained PWWP domains (PF00855). PWWP domains are found in proteins with nuclear location and roles in cell growth and differentiation [21,22]. PSORT profiling indicated that BMBAC01P19.3 and F43G9.4 were likely to have nuclear localizations. The amino terminus of BMBAC01L03.4 contains a dnaJ-like domain (PF00684). The dnaJ domain is found in 41 C. elegans proteins, but BMBAC01L03.4 showed highest identity (57%) to F39B2.10. Both proteins had the dnaJ domain at their amino terminus and shared a common position of the first intron in this region. The remainder of the protein was not conserved.

BMBAC01P19.1 encodes Bm-mif-1 (Figure 2) [20]. Mammalian MIF is a cytokine involved in inflammation, growth, and differentiation of immune cells [23]: B. malayi MIF-1 may have a role in immunomodulation of the host [20,24]. C. elegans has four MIF-like genes: Ce-mif-1 (Y56A3A.3), Ce-mif-2 (C52E4.2), Ce-mif-3 (F13G3.9), and Ce-mif-4 (Y73B6BL.13). Transgenic reporter and immunolocalization studies suggest that C. elegans MIFs may have roles in development and the dauer stage [13,25]. Bm-MIF-1 has highest pairwise similarity to Ce-MIF-1 (41% compared to 23-29% for the other three paralogues; Figure 2) [20], and phylogenetic analysis of over seventy MIF-like proteins from eukaryotes confirms this assignment (D.B.G. and M.L.B., manuscript in preparation). Comparison of Bm-MIF-1 to the C. elegans MIFs, a second B. malayi MIF (Bm-MIF-2), and human MIF-1 (Figure 2) revealed that Bm-mif-1 and Ce-mif-1 shared two intron/exon boundaries also found in vertebrate MIFs. One of these introns was also present in Ce-mif-3, but Ce-mif-3 and the other two C. elegans mif genes shared a set of introns not present in the mif-1 genes. Bm-MIF-1 and other filarial MIF-1 homologs contain a CXXC motif (single-letter amino-acid code) critical for the thiol-oxidoreductase activities of vertebrate MIF [26]. None of the C. elegans MIF homologs contained this motif.

Figure 2
figure 2

Comparison of B. malayi and C. elegans MIF proteins. Bm-MIF-1 (accession AAC82502) was aligned with human Hs-MIF-1(AAA21814), C. elegans MIF homologs Ce-MIF-1 (CAB60512), Ce-MIF-2 (CAB01412), Ce-MIF-3 (CAA95795), Ce-MIF-4 (AAG23475), and Bm-MIF-2b (AAF91074). Intron positions are marked by triangles (red, conserved with Hs-MIF-1; blue, Ce-MIF-2, -3 and -4 specific). The proline at position 2 (white) is important for immune function, and the CXXC motif at positions 60-63 is essential for thiol-oxidoreductase activity in mammalian MIF. The percent identity of each protein to Bm-MIF-1 is given at the end of the alignment.

Conserved gene clusters

Two clusters of three genes in close proximity are conserved. The first involves BMBAC01L03.2, .3 and .5. The C. elegans orthologs of these genes are F43G9.5, F43G9.4, and F43G9.3 respectively. F43G9.5 and F43G9.3 are divergently transcribed from a 631 bp intergenic region. F43G9.3 is followed by F43G9.4 in the same transcriptional orientation with 501 bp separating the genes. In B. malayi this local synteny is conserved, except that two additional genes - BMBAC01L03.3 and .4 - are found between BMBAC01L03.2 and .5.

The second cluster also involves three genes. Proteins predicted from both alternative transcripts of BMBAC01P19.2 were found to be homologous to large proteins from Homo sapiens (BAF180, AAG34760 [27]), Gallus gallus (JC5056 [28]), D. melanogaster (CG11375, AAF56339), and C. elegans (C26C6.1) (Figure 3). These proteins shared six bromodomains (PF00439), two BAH domains (bromo-adjacent homology, PF01426), a HMG box (high mobility group, PF00505), and an anonymous carboxy-terminal domain (PFB007669). The B. malayi, C. elegans, and D. melanogaster polybromodomain (PBR) proteins also contain two C2H2 zinc fingers. PBR proteins may be involved in chromatin-remodeling complexes. Bromodomains interact with acetylated lysine in histone complexes, while HMG boxes are found in chromatin proteins that bind to single-stranded DNA and unwind double-stranded DNA. Human BAF180 has been shown to localize to the kinetochores of mitotic chromosomes [27]. None of the vertebrate PBR homologs contains zinc fingers, which may indicate additional functions for the nematode and fly proteins.

Figure 3
figure 3

The pbr synteny cluster and pbr homologs in other species. The genomic organization of the pbr synteny cluster in C. elegans and B. malayi, and the domain structure of the PBR homologs in Drosophila melanogaster, Gallus gallus, and Homo sapiens are illustrated. Intron/exon boundaries that are conserved between the nematodes are indicated by asterisks. White boxes represent the contiguous DNA underlying the gene models.

Two conserved genes were identified immediately upstream from pbr-1 (Figure 3). BMBAC01P19.5 (named Bm-ubr-1 (upstream of pbr-1)) showed significant similarity only to T28F4.4 from C. elegans (27% identity). The protein encoded by BMBAC01P19.4 is homologous to C. elegans T28F4.5 (30% identity). Iterative searches of GenBank using PSI-BLAST [29] indicated that BMBAC01P19.4 and T28F4.5 belong to a group of small peptides that include human DAP-1 (death-associated protein). DAP-1 is a nuclear protein and positive regulator of interferon gamma-induced apoptosis in HeLa cells [30]. PSORT profiling indicated that both nematode proteins may have a nuclear localization. BMBAC01P19.2 (Bm-pbr-1) and BMBAC01P19.5 (Bm-ubr-1) are divergently transcribed and BMABAC01P19.4 (Bm-dap-1) is found in the large third intron of BMBAC01P19.5 in the same transcriptional orientation as BMBAC01P19.2 (Figure 3). In the C. elegans instance of the PBR cluster, C26C6.1 (Ce-pbr-1) and T28F4.4 (Ce-ubr-1) are also divergently transcribed from a 1,233 bp intergenic region. The third gene, T28F4.5 (Ce-dap-1) is found in the large third intron of T28F4.4 on the same strand as C26C6.1.

Comparison of the intergenic and upstream regions of both clusters, and of the orthologous gene pairs, did not reveal any clear motifs that might be involved in transcriptional regulation. In particular, the intergenic DNA between pbr-1 and ubr-1, and the first intron of ubr-1, had less than 30% pairwise identity throughout, and there were no stretches of greater identity. The AT richness of the B. malayi genome compared to C. elegans may obscure any conserved elements. No RNA-coding genes were found. Two B. malayi ESTs matched at > 99.5% identity to two regions of BMBAC01P19 separated by 200 bp that were not predicted to be part of a transcript (see Figure 1). These regions are downstream of gene BMBAC01P19.3, and may derive from alternative 3' untranslated regions: the furthest downstream match includes a good polyadenylation site. The 3' end of the cDNA determined for this gene may have derived from internal priming from an A-rich segment of the 3' untranslated region.

Fractured synteny between the genomes of B. malayi and C. elegans

All of the C. elegans orthologs, except for Y56A3A.3 (Ce-mif-1, 41% identity to Bm-mif-1, on chromosome III), are located on chromosome I (Figure 4). F13G3.9 (Ce-mif-3, 23% identity to Bm-mif-1) is found on C. elegans chromosome I in close proximity to the orthologs of B. malayi genes BMBAC01P19.2, .4, and .5. This could suggest that our orthology assignment is wrong. As described above, however, Ce-mif-1 and Bm-mif-1 share two intron positions and are more similar to each other than either is to Ce-mif-3, which has one concordant intron position, and one discordant intron position. The conflict between location and structure could be due to a gene-conversion event in either lineage, or an event of directed movement or insertion.

Figure 4
figure 4

Comparison of linkage and synteny with C. elegans. The B. malayi contig is compared to an approximately 9 Mb segment of C. elegans chromosome I. The relative positions of the ortholog pairs, colored as in Figure 1, are indicated. The link between Bm-mif-1 and Ce-mif-3 (F13G3.9) is dashed to indicate that these two genes are paralogs rather than orthologs (see text for details).

Eight of the 10 remaining C. elegans orthologs lay within a 2.3 Mb region in the center of chromosome I (6.7-9 Mb) (Figure 4). The orthologs of the other two genes (BMBACoLo3.4 and BMBAC01P19.6) are found at the distal tip of chromosome I. While there has been extensive rearrangement of gene order, when compared to the C. elegans orthologs, 10 of the B. malayi genes were in the same relative transcriptional orientation. Examination of the boundaries of the C. elegans cluster and individual gene regions did not show any association with repeat-sequence classes, including those shown to be commonly associated with rearrangements between C. elegans and C. briggsae [6].

Genome survey sequence comparison and synteny

To ascertain whether the segment sequenced was representative of the relationship between the B. malayi genome and that of C. elegans, we surveyed the B. malayi BAC-end derived genome survey sequences (GSSs; J. Daub, C. Whitton, N.H., M. Quail and M.L.B., unpublished observations). There are over 18,000 GSSs from B. malayi, derived from three independent libraries. Each BAC-end sequence was compared to the C. elegans proteome (Wormpep [31]) and significant similarities recorded (BLASTX probabilities < e-8). The chromosomal position of each matching C. elegans protein was derived from Wormbase [32]. One hundred and sixty-four BACs had matches at both ends to C. elegans proteins under these conditions (summarized in Table 2, details in Table 3). We note that these matches are not necessarily to orthologs, as we have not carried out intensive analysis of each one, but random selection of genes should not yield greater linkage estimation despite the problem of gene families and domain matches. While much of the C. elegans proteome consists of protein families, very few of these have a chromosomally restricted distribution [33,34].

Table 2 Synteny conservation between B. malayi BAC-end genome survey sequences and C. elegans genome sequence
Table 3 B. malayi BAC end comparisons to C. elegans

C. elegans has six chromosomes. Under a minimal model, if a genome rearrangement were equally likely to involve a between-chromosome as a within-chromosome event, and was only dependent on the length of DNA in the within-chromosome versus not-within-chromosome classes, we would expect approximately five of every six rearrangements to involve between-chromosome events and one-sixth to involve within-chromosome events. This model ignores the fact that B. malayi has only five chromosome pairs: four autosomes and one XY pair. The derivation of the two karyotypes is unknown, and cannot be deduced from phylogenetic comparisons (see [35]). While most nematodes of clade V have six chromosomes like C. elegans, other taxa in the Secernentea have from one to > 100 [36]. If we assume that the C. elegans complement derives from splitting of an ancestral chromosome retained in B. malayi, the expectation would be that 20% of rearrangements would be within-chromosome.

Many more BACs had significantly more ends mapping to the same chromosome than would be expected under these models (approximately 55%, χ2 test p < 0.01 for all comparisons in Table 2 under the above model). The mean distance between the C. elegans matches was 4.4 Mb, which may be compared to an expected approximately 45 kb for the separation between the B. malayi BAC ends.

Discussion

B. malayi is a human parasite only distantly related to the model nematode C. elegans [14,37]; therefore, genome comparisons between these species will yield data concerning longer-term changes in structure and function that cannot be derived from within-genus comparisons. In the 83 kb of genomic DNA flanking the B. malayi mif-1 locus we found a fractured conservation of microsynteny between the two nematode genomes, and conservation of linkage. Twelve protein-coding genes were predicted, and 11 of these had putative orthologs in the C. elegans genome. Ten of these orthologs were on C. elegans chromosome I, with eight in a 2.3 Mb segment in the center of the chromosome and two at the distal tip of chromosome I. Some of these genes have remained tightly linked in the same or slightly modified relative transcriptional orientations in both species.

This pattern, of conservation of linkage with disruption of precise synteny, was confirmed using BAC-end sequences. Of the 171 clones with matches at both ends to C. elegans genes, over 55% were localized to the same chromosome in C. elegans. While the mean distance separating the B. malayi genes is 45 kb (the length of the BAC clones; [38] and C. Whitton and M.L.B., unpublished work), the mean distance between the matching C. elegans genes is approximately 4.4 Mb.

The 83 kb fragment of B. malayi genomic DNA is the largest contiguated portion of sequenced genomic DNA from a non-rhabditid nematode described to date. A large proportion (around 60%) of genes identified in the B. malayi EST dataset (23,000 ESTs corresponding to around 8,300 unique transcripts [39]) have no close C. elegans homologue [16]. In this study, however, C. elegans orthologs were identified for 11 of the 12 identified B. malayi genes. Some of these orthologous pairs were confirmed by congruence in length of open reading frame and shared intron positions, despite low pairwise identity. Global searches with ESTs would not have detected these pairs (BLAST probability values of approximately e-4), and thus the true proportion of B. malayi unique genes is likely to be less than 60%. B. malayi genes were found to have larger and more numerous introns than C. elegans genes (2.2 times longer and 1.7 times more frequent), in keeping with previous estimates made using data from several highly expressed genes [40]. If the contig is representative and gene complement is equivalent to C. elegans, the B. malayi genome may be larger (120-140 Mb) than estimated previously (100 Mb [41]). Four of seven genes confirmed by reverse transcriptase PCR had alternative transcripts, a figure consistent with C. elegans EST and cDNA projects [42]. Additionally, five genes had B. malayi EST matches, a proportion congruent with the estimate that the EST program has identified around 40% of the expected 20,000 B. malayi genes [16].

Conserved linkage between the genomes of closely related eukaryotic organisms has been shown in several taxa. But it is only recently, with the sequencing of discrete segments or whole genomes, that examples of conservation of microsynteny between the genomes of distantly related species (not involving functionally related genes) have been described [43,44]. The microsyntenic gene clusters retained between C. elegans and B. malayi do not fall into any clear functional categories. However, all genes contained in the second cluster (BMBAC01P19.2, .4, and .5) are predicted to have nuclear localization signals and could be co-regulated. Alternatively, promoters or cis-acting regulatory elements required for their proper function could be embedded within other cluster members. Interdigitation of these regulatory elements could be constraining the movement of genes away from this cluster. No conserved motifs were found, however, and this possibility can thus only be tested by transgenesis experiments. This phenomenon has been observed in other systems such as fungal genomes, where gene pairs predicted to have overlapping regulatory elements are more likely to be conserved between species [45].

Many genes in C. elegans are co-transcribed in operons [46,47] and this could constrain synteny breakage. The C. elegans orthologs of BMBAC01L03.5 and BMBAC01P19.3 are separated by 501 bp, an intergenic distance found in other C. elegans operons, and the downstream gene (Ce-F43G9.4) was shown to be trans-spliced to the SL2 spliced leader, a feature of downstream genes in C. elegans operons [47]. However, in B. malayi, BMBAC01L03.5 and BMBAC01P19.3 are separated by 2.8 kb, which is outside the range of operon intergenic spacing. The functions of C. elegans genes on chromosome I have been investigated by RNA-mediated interference and a phenotype was identified for one gene in each cluster: embryonic lethality (F39G4.5 [48]) and altered adult morphology (C26C6.1 [49]). Therefore, it is possible that the clusters are conserved because removing other members would interfere with functions of these essential genes. The one exception to the conservation of linkage is the Bm-mif-1/Ce-mif-1 ortholog pair. Another C. elegans MIF homolog, Ce-mif-3, is found in close proximity to the genes in the pbr-1 synteny cluster, raising the possibility that a gene-conversion event may have obscured orthology assignment for this gene.

In the Metazoa, long-range synteny between the genomes of distantly related species (>300 Myr divergence) has only been identified previously in vertebrates (teleost fish and humans [50,51]). In vertebrates, interchromosomal exchanges seem to be rare events, and some linkage groups, such as human chromosomes 6 and X, are conserved across most eutherian mammals [7]. From the analyses presented here we can suggest some general patterns of gene rearrangement in nematodes. Most of the C. elegans orthologs were located in a small segment of chromosome I (nine of eleven genes in 2.3 Mb or 16% of the chromosome), suggesting that local intrachromosomal inversions or rearrangements have occurred more frequently than long-range intrachromosomal, or interchromosomal rearrangements. This is consistent with patterns observed in closely related dipterans, where the composition of linkage groups is conserved but not the order within the chromosome. Mechanistically this may occur because intrachromosomal rearrangements require fewer DNA breaks than interchromosomal translocations, and the nuclear scaffold may hold local chromosomal regions in closer association. The high rate of rearrangement of genes within the nematode chromosomes makes it unlikely that the positional information of genes in the Caenorhabditis genomes will be useful in finding orthologous genes in the genomes of distantly related nematodes such as B. malayi.

Materials and methods

Identification of candidate genomic clones for sequencing

A probe for Bm-mif-1 was synthesized by labeling full-length cDNA (GenBank accession U88035) with biotin (Phototope; New England Biolabs), hybridized to high-density arrays of 18,000 BAC clones containing B. malayi genomic DNA [52], and detected with the Phototope detection kit (New England Biolabs). Hybridization-positive BACs were PCR verified using gene-specific primers Bm-MIF-1.F1a (ATGCCATATTTTACGATTGATAC) and Bm-MIF-1.R1a (GAACACCATCGCTTGTCCACC) using standard reaction and cycling conditions (0.2 mM dNTPs, 1.5 mM MgCl, 0.5 pM primer; 1 cycle of 94°C for 3 min; 35 cycles of 94°C for 15 sec, 55°C for 20 sec, 72°C for 3 min; 1 cycle of 72°C for 10 min). BMBAC01P19 was selected for sequencing. Sequence from the T7 end of the insert was used to design specific primers 01P19.T7.F1 (GCAGCAAATGCTTATTTGTCTTG) and 01P19.T7.R1 (GTTTGGTGATTCATGTCCATGAGC). Primers 01P19.T7.R1 and 2BiotinBACF3 (designed to the BAC vector; (biotinU)2GAGTCGACCTGCAGGCATGC; New England BioLabs Organic Synthesis Unit) were used to synthesize a biotin-labeled end probe. The probe was hybridized to the BAC library filter using a modified hybridization and detection protocol [38]. Positive BACs were PCR verified with primers 01P19.T7.R1 and 01P19.T7.F1, and insert DNA prepared using a kit (Qiagen). BAC ends were end-sequenced using the Sanger Institute protocol [53]. BMBAC01L03 showed minimal overlap with BMBAC01P19 compared to other clones and was selected for sequencing.

Preparation, subcloning, and sequencing of BACs

The BACs were sequenced using a standard two-stage strategy involving random sequencing of subcloned DNA followed by directed sequencing to resolve problem areas. In the first stage, DNA prepared from BAC clones was shattered by sonification and fragments of 1.4-2 kb cloned into pUC18. DNA from randomly selected clones was sequenced with dye-terminator chemistry and analyzed on automated sequencers. Each BAC was sequenced to a depth of sevenfold coverage. Contigs were assembled using phrap (Phil Green, Washington University Genome Sequencing Center, unpublished). Manual base calling and finishing was carried out using Gap4 [54]. Gaps and low-quality regions were resolved by techniques such as primer walking, PCR and resequencing clones under conditions that give increased read lengths.

Sequence analysis

The finished sequences of BMBAC01P19 and BMBAC01L03 were compared to the GenBank nonredundant (nucleic acid and protein) EST database (dbEST), the C. elegans genome and protein and the custom B. malayi clustered EST [16] databases using BLAST [55,56]. GeneFinder (P. Green and L. Hillier, Washington University Genome Sequencing Center, unpublished) was trained with 162 publicly available B. malayi gene sequences and used to analyze the contiguated sequence. The sequence was annotated on the Artemis workbench [57]. Predicted protein sequences were compared to Pfam [58] and cellular localization examined using PSORTII [59]. The annotated sequence is available in GenBank (accession AL606837).

Verification of gene predictions

To confirm gene predictions from BMBAC01P19, primers were designed and PCR was carried out on oligo(dT)-primed B. malayi mixed adult first-strand cDNA with gene-specific primers. To isolate cDNA ends, the GeneRacer 3' RACE primer (Invitrogen) (GCTGTCAACGATACGCTACGTAACGGCATGACAGTG), or the nematode SL1 sequence (GGTTTAATTACCCAAGTTTGAG) were used with specific primers. Secondary PCRs were carried out using nested primers and 2% of the primary PCR product. Positive PCR products were cloned and sequenced.

BAC-end sequence analysis

The B. malayi BAC-end sequence dataset was compared to the C. elegans proteome in Wormpep. Significant matches were filtered, and BAC clones having matches on both ends retained. The chromosomal position of the C. elegans genes was determined from [32].