Mycobacterium avium (M. avium) subspecies represent a closely related group of mycobacteria that are commonly found in the environment; some of which are frequently associated with infections of birds and ruminants. The M. avium subspecies are distinguished from each other by nucleic acid hybridization [1, 2] along with growth characteristics [3]. M. avium subspecies paratuberculosis (MAP) is the causative agent of Johne's disease, a chronic and economically significant infection primarily of ruminant animals characterized by a prolonged subclinical phase leading eventually to a severe gastroenteritis which results in malnutrition and ultimately death. Much remains to be elucidated regarding the pathogenesis and population genetics of MAP. Specifically, it remains unclear what MAP virulence factors are important for both infection and persistence, and despite observed phenotypic and genetic differences between MAP isolates obtained from sheep and cattle, the biological basis for host specificity remains unclear.

Previous work in our laboratory has utilized DNA microarrays to compare the genome content of members of the M. avium complex (MAC) which includes MAP, M. avium subspecies avium (MAA), M. avium subspecies silvaticum (MAS), M. avium subspecies hominissuis (MAH) and M. intracellulare [4]. These findings revealed that non-MAP MAC isolates do not contain several large regions of genomic DNA that are present in MAP K-10. Extensive genomic conservation was observed for the MAP isolates examined in the study, most of which were obtained from cattle [4]. Recently, Marsh and coworkers have described the presence of several large sequence polymorphisms among sheep and cattle isolates of MAP [5, 6], while Semret et al have reported on the presence of polymorphic regions that are shared between MAA and MAP sheep isolates [7]. The biological consequence of these large sequence polymorphisms has not yet been determined.

In the work presented here, we have utilized a DNA microarray constructed with oligonucleotides representing all of the predicted coding and intergenic regions from the MAP K-10 genome as well as the remaining novel coding sequences from the MAH 104 [8] genome to examine the genome content of M. avium subspecies obtained from a variety of host animals. We hypothesize that genes found to be polymorphic among M. avium subspecies isolated from different hosts will serve as ideal targets for future studies designed to elucidate the biological basis of host specificity and pathogenicity.


Validation of microarray sensitivity and specificity

The reference isolate MAP K-10 was compared to the sequenced MAH isolate 104 in order to evaluate the performance of the oligonucleotide microarray as a platform for comparative genomic hybridizations. The hybridizations and data analyses were performed as described in the Materials and Methods, and the resulting intensity ratios for each oligonucleotide probe were matched to the BLASTn score of the probe when searched against the complete genomes of MAP K-10 and MAH 104. Probe sequences with an E-value of greater than 1 × 10-5 were identified as not present in the reference genome. The distribution of log transformed hybridization intensity ratios (MAH 104/MAP K-10) is presented in Figure 1. A majority of the probes (75%) were observed to have ratios falling between -0.5 and 0.5, as would be expected for the genomes of these closely related mycobacteria. The distribution of MAP K-10 specific probes peaked below a log ratio of -1.0, with a majority of the probes having ratios lower than this. Mirroring this trend, the MAH 104 specific probes peaked above a log ratio of 1.0 with most of the probes observed to have ratios greater than this. Only five of the MAP K-10 specific probes had positive log ratios, while only three of the MAH 104 specific probes had negative log ratios. In both cases, the log ratios of these outlying probes fell within the range of -0.5 to 0.5. Based on these reference hybridizations, we predict that probes on the MAP K-10 microarray with log ratios greater than 1.2 can be identified as absent from the MAP K-10 genome, while probes with log ratios less than -1.2 are missing from the test isolates relative to MAP K-10. Utilizing these cutoffs, both the false positive and false negative rates are less than 1%.

Figure 1
figure 1

Distribution of log transformed microarray fluorescence intensity ratios for competitive genomic hybridizations of MAH 104 and MAP K-10 (MAH 104/MAP K-10). The results presented (n = number of results) are the average values of multiple hybridizations and replicate spots as described in Material and Methods. Log ratio distributions are displayed for (A) all probes present on the microarray; (B) MAP K-10 specific probes (as determined by BLAST E-value < 1 × 10-5); and (C) MAH 104 specific probes.

Sheep genomotype

A similar genomic profile was observed for MAP sheep isolates 397, 467, 7565, and 9040. These sheep isolates were distinguished from the other MAP isolates examined in this study by the presence of four clusters of ORFs homologous to sequences found in the MAH 104 genome but absent in MAP K-10. Additionally, these isolates are missing three clusters of ORFs that are present in the other MAP isolates examined. The sheep isolate 9040 also appears to be a member of this genomotype, although the observed differences in hybridization intensity compared to MAP K-10 are not as pronounced as the other isolates (Figure 2).

Figure 2
figure 2

Most variable ORFs amongst mycobacterial isolates examined. 474 ORFs with 3-fold or greater differences in hybridization between at least two isolates and MAP K-10 are displayed in rows. Columns represent individual mycobacterial isolates that were grouped by hierarchical clustering (Pearson correlation, average linkage). Grey colored data points represent no data. Isolate designations are listed in the right margin next to a colored bar indicating the location of the isolates on the graph. Selected gene clusters are labeled by the first and last ORFs. A complete listing of the ORFs and log ratios has also been provided [see Additional file 1].

Consistent with previous findings, we identified several regions that were present in MAP sheep isolates and MAH, but not in MAP cattle isolates. These included the MAH 104 ORFs MAH1834-1858 (Figure 2). This region was partially identified as the 197 bp sequence PIG-RDA 20 (AY266301) and was mapped to a 197 Kbp segment of the MAH 104 genome by Dohmann and coworkers [9] and was subsequently described by Semret and coworkers as LSPA4-II [7] (Table 1). This region contains a copy of the IS1311 insertion sequence (MAH1844) and within the MAH 104 genome is flanked by an additional copy of IS1311 (MAH1833). Another previously described LSP included ORFs MAH4893 through MAH4910 (Table 1). This region was partially identified as the 233 bp sequence PIG-RDA10 (AY266300), mapped to a 16 Kbp segment of the MAH 104 genome [9] and the full sequence was later identified as LSPA18 [7].

Table 1 Mycobacterium avium large sequence polymorphisms. Genomotypes from the present study that contain the indicated sequences are listed. Supporting evidence from previous studies is provided and includes the designations used to describe individual polymorphisms.

Several new or only partially described LSPs common to MAP sheep and MAH isolates were identified in the present study. One of the genomic regions included ORFs MAH2772 – 2791 (Figure 2 and Table 1). This gene cluster was partially identified as the 548 bp sequence PIG-RDA 30 (AY266302) and was mapped to a 27 Kbp region on the MAH 104 genome [9]. This cluster of genes encodes several proteins that may contribute to pathogenesis. MAH2777 is predicted to encode a cytochrome P450 enzyme. Members of this enzyme family have been implicated in basic cellular processes as well as virulence (reviewed in [10]). MAH2782 is homologous to an arylsulfatase characterized in Pseudomonas aeruginosa and like other sulfatases may modulate mycobacterial interactions with host cells (reviewed in [11]). This gene cluster is also characterized by the presence of several genes encoding putative proteins involved in lipid and energy metabolism. Seven of the identified genes encode proteins with no predicted function.

Another novel LSP found in MAP sheep and MAH isolates but absent from MAP cattle isolates was comprised of ORFs MAH3041 – 3052 (3,399,659 – 3,411,697) which are predicted to encode proteins involved in the biosynthesis of glycopeptidolipids [12, 13] (Table 1). Sequencing of this region in the sheep isolate MAP 397 revealed the presence of additional ORFs (hyp, hlpA, gdhgA and gtfC) with homology to glycopeptidolipid biosynthesis genes immediately downstream of MAH3052 (gtfB in Figure 3). These additional ORFs were not homologous to any MAH 104 sequences (and thus were not detected by the microarray analysis) but rather were homologous to MAA TMC724 (ATCC 25291), a serovar 2 isolate of MAA. A direct comparison of the glycopeptidolipid biosynthesis gene clusters from MAP 397 and MAA TMC724 (accession number AF125999) revealed that they are approximately 99% identical with the exception of a deletion at the 5' end of mtfA as well as the absence of the insertion sequence IS2534 from the MAP 397 locus (Figure 3). Despite the presence of these genes, no glycopeptidolipids have been observed to be produced by sheep isolates of MAP under standard in vitro culture conditions (T. Eckstein, personal communication); therefore, the practical contribution of glycopeptidolipids to the pathogenesis of MAP sheep isolates remains unclear at this point.

Figure 3
figure 3

Glycopeptidolipid biosynthesis gene cluster in MAP sheep isolate 397. Solid black lines represent homology (approximately 99% identity) with the MAA TMC724 (ATCC 25291) glycopeptidolipid biosynthesis cluster (accession number AF125999), while dashed green lines represent sequences absent in isolate 397. Solid or dashed red and blue lines represent regions of homology with MAH 104 and MAP K-10, respectively. Vertical lines are spaced at 1600 base intervals for reference and represent the number of nucleotides from the start of the Genbank entry.

A putative transcriptional regular labeled as MAH315 in MAH 104 was identified by microarray hybridization as present in MAP sheep isolates. This ORF was subsequently confirmed as present in the MAP 397 genome by sequencing. The protein encoded by MAH315 has homology to the GntR-family of transcriptional regulators which are widely distributed across bacterial species and regulate a variety of cellular processes [14, 15].

A second subset of LSPs were characterized as being present in all MAP isolates with the exception of the sheep genomotype. Several of these MAP sheep isolate deletions have already been described. The deletion encompassing MAP1485c – 1491 was previously identified by Marsh and coworkers as S strain deletion #1 and by Semret and coworkers as LSPA20 [5, 7] (Table 2). An additional deletion in the sheep genomotype included the cluster of ORFs between MAP1728c and MAP1744 (Table 2). This deletion was partially identified by Marsh and coworkers as RDA3 [6], and later fully described as S deletion #2 [5]. MAP2325 was identified as being absent from Australian sheep isolates of MAP [5]. This ORF was not identified as missing from the MAP sheep isolates examined in this study, and subsequent sequencing of this region in MAP 397 confirmed the presence of an ORF with 100% identity to MAP2325. Although further examination of this observation is warranted, this discrepancy may represent a geographic difference between MAP isolates recovered from sheep in Australia and the United States.

Table 2 Mycobacterium avium large sequence polymorphisms. Genomotypes from the present study that lack the indicated sequences are listed. Supporting evidence from previous studies is provided and includes the designations used to describe individual polymorphisms. ID numbers correspond to Figure 4.

A novel LSP comprised of the ORFs MAP1432 – 1438c was identified in the present study as absent from sheep MAP isolates (Table 2). This gene cluster is predicted to encode four energy metabolism enzymes as well as a lipase (MAP1438c). MAP1432 encodes a hypothetical protein with homology to the REP13E12 family of repetitive elements that were originally described in Mycobacterium tuberculosis and have been shown to be targets of phage integration [16]. MAP2656 was initially identified as absent via microarray analysis but sequencing of MAP 397 identified a homologue with 100% identity. This represents the only observed discrepancy between the microarray and sequencing results.

Avium-silvaticum genomotype

The genome profile of MAP sheep isolate 5001 was unlike the other isolates obtained from sheep that were examined in this study. This isolate grouped together with four independent laboratory stocks of what has historically been referred to as M. avium strain 18 [1722] as well as five isolates of MAS (Figure 2). Additionally, two isolates of MAA (ATCC25291 and ATCC35719) were observed to have genomic profiles similar to this group. The hybridization pattern of these isolates is collectively referred to here as the avium-silvaticum genomotype.

Similar to the MAP sheep isolates, the avium-silvaticum genomotype isolates contained LSPs that are absent from MAP cattle isolates but present in MAH 104. They possessed the MAH3041-3052 and MAH4893-4910 gene clusters also found in MAP sheep isolates; however, they did not possess the MAH1834-1858 or MAH2772-2791 clusters of MAH 104 ORFs (Figure 2 and Table 1). These isolates contained additional LSPs homologous to the MAH 104 genome. A cluster of MAH 104 ORFs including MAH4657-4665 (5,122,587–5,131,447) was present in the avium-silvaticum genomotype isolates. The ORFs in this LSP are predicted to encode proteins that function as transcriptional regulators and dehydrogenases, while several have no known function. A second LSP with homology to the MAH 104 genome included ORFs MAH4704-4709 (5,173,587–5,179,816) (Table 1). With the exception of two ORFs predicted to encode components of a restriction modification system, the remaining coding sequences present in this LSP have no predicted function.

Several LSPs comprised of MAP K-10 ORFs were absent from the avium-silvaticum genomotype isolates (Figure 2). In many cases, the avium-silvaticum genomotype isolates were distinguished from each other by these LSPs. Isolates 6007 and 6049 lack a cluster of ORFs between MAP0746 and MAP0766c (Figure 2 and Table 2). This LSP was present in all of the other isolates examined and verified in selected isolates by PCR (Table 3). Included within this LSP are nine ORFs homologous to mammalian cell entry gene clusters [23, 24]. The other ORFs in this cluster encoded degradation enzymes or had no predicted function. The avium-silvaticum genomotype isolates were also missing ORFs MAP1376-1381, which were also absent in the hominissuis genomotype isolates 5835cc and 5836cc described below (Table 2). The absence of this region was verified by PCR with primers designed for representative ORFs (Table 3). The MAS isolates 6007, 6049, 6058, and ATCC49884 were distinguished from all the other isolates examined by the absence of ORFs MAP1424c-1464 (Table 2). This LSP includes a gene cluster that was identified as absent from the sheep genomotype (MAP1432-1438c). A large number of the ORFs included in this region (n = 16) are predicted to be involved in lipid metabolism. Notably, just as the smaller sheep isolate LSP was flanked by a REP13E12 family repetitive element (MAP1432), the larger MAS LSP is also flanked by a REP13E12 element (MAP1465) in MAP K-10. The isolates 5001, 6006, 6058, and ATCC49884 lack the ORF clusters MAP2372-2375c and MAP3063-3079c (Table 2). These LSPs were present in all of the other isolates examined and contain ORFs primarily encoding proteins with no predicted functions, although MAP3069-3071 encode proteins that are predicted to be involved in terpenoid biosynthesis. MAH2829-2830 were identified as present only in isolates ATCC35719, ATCC25291, and 6071 as well as the strain 18 isolates.

Table 3 PCR amplification results for selected ORFs and isolates

All of the avium-silvaticum genomotype isolates were also observed to have greater amounts of probe hybridized to the oligonucleotide target representing MAP3814c relative to MAP K-10 (Figure 2). This ORF is 31% identical to transposase sequences and may represent part of an insertion sequence that is present in multiple copies within MAS and MAA genomes. The MAP K-10 ORFs immediately downstream of MAP3814c (MAP3815-3818) are missing in both the avium-silvaticum and hominissuis genomotype isolates based on the microarray hybridization results. Notably, ORFs MAP3815-3817 lack homology to any other sequences deposited in Genbank. This appears to provide an additional example of mycobacterial genome diversity mediated by a mobile genetic element.

The avium-silvaticum genomotype isolates could be further separated into two subgroups based on their genomic hybridization profiles. One group included the isolate 5001 and the isolates originally identified as MAS, while the other consisted of the MAA and strain 18 isolates, which the exception of isolate 6071. While both groups of isolates displayed very similar hybridization profiles, they could be distinguished by the polymorphic regions MAP1424c-1464 and MAH2829-2830. The consistent genomic profile amongst the strain 18 isolates obtained from independent laboratories suggests that these isolates have not experienced significant genomic polymorphisms despite several decades of laboratory use. Additionally, these results indicate that the microarray platform used in this study provides reproducible results when comparing isolates.

Hominissuis genomotype

Three of the MAP isolates examined (7265, 7301, and 9034) had hybridization profiles similar to MAH 104 and are collectively referred to as the hominissuis genomotype. Also included with these isolates were two mycobacterial isolates (5835cc and 5836cc) from a duck, which were distinguished from each other upon culture by pigmentation. This group is characterized by the absence of several genes and LSPs that are present in MAP, as well as the presence of additional LSPs not found in MAP. Many of these LSPs are shared with the MAS and MAA isolates examined in this study. The sequenced MAP K-10 and MAH 104 genomes have been previously compared with both bioinformatic and experimental approaches, including DNA microarrays [4, 2528]. The LSP that includes ORFs MAP1230-1237c appears to be a distinguishing factor of the hominissuis genotype as these ORFs were absent from this genomotype but present in all of the other isolates examined in this study (Table 1). This LSP corresponds to the region previously identified by Tizard and coworkers as a low-GC genetic island present only in MAP and MAS [29]. Isolates 5835cc and 5836cc were distinguished from the other hominissuis genomotype isolates by several polymorphic regions. MAP0072c-0076 is absent from both isolates and includes several genes that are predicted to encode membrane proteins, while the deletion of MAP2523c-2532 includes several enzymes involved in energy metabolism (Figure 2). The hybridization profiles of the remaining hominissuis genomotype isolates closely mirrored MAH 104.

Cattle genomotype

The remaining MAP isolates examined in this study displayed hybridization profiles similar to the sequenced cattle isolate MAP K-10 and are collectively referred to as the cattle genomotype. The distinguishing feature of this genomotype is the large variety of host species represented. This group includes isolates obtained from cattle (n = 5), birds (n = 4), goats (n = 3), bison (n = 3), humans (n = 9), a cat, and an armadillo (Figure 2). Relative to the MAP K-10 genome, no LSPs were observed in any of the cattle genomotype isolates.


The use of specific genetic markers has allowed MAP isolates to be separated into two general populations: a relatively homogenous group comprised of primarily bovine isolates and a more heterogeneous group that includes isolates from small ruminants and other mammals. The goal of the present study was to identify variations in total genome content between M. avium subspecies isolated from several different host animals in order to determine which genes may contribute to host specificity and pathogenesis. The isolates obtained from goat (n = 3), bison (n = 3), bird (n = 4), armadillo (n = 1), cat (n = 1), and human (n = 10) hosts did not contain any large polymorphic regions when compared with the MAP K-10 cattle isolate. Five of the seven cattle isolates examined similarly did not contain any large polymorphisms. The remaining isolates (n = 7) grouped into three distinct genomotypes. Four of the six sheep isolates examined (397, 467, 7565 and 9040) shared large polymorphisms that included the absence of ORFs present in the MAP K-10 genome as well as the presence of ORFs that are also found in the MAH 104 genome. The ORFs included in the missing regions (n = 32) are predicted to encode proteins involved in a variety of functions including lipid and energy metabolism, virulence, and transcriptional regulation. The regions containing homologues to ORFs in the MAH 104 genome (n = 73) encoded proteins involved in glycopeptidolipid biosynthesis, transcriptional regulation, virulence, and metabolism. These polymorphic regions also contained a number of proteins with unknown function. The remaining sheep (5001 and 9034) and cattle (7265 and 7301) MAP isolates had genomic profiles similar to MAH or MAS.

The identification of a glycopeptidolipid biosynthesis operon in sheep isolates of MAP raises several interesting possibilities. Considering the absence of glycopeptidolipids in cattle isolates of MAP, it may contribute to the phenotypic differences observed between the two isolate types, although the production of glycopeptidolipids by sheep MAP isolates has not yet been experimentally verified. Glycopeptidolipids from MAA have been identified as Toll-like receptor 2 agonists that are capable of activating macrophages [30]. Sequencing of this region indicates that the sheep isolate glycopeptidolipid biosynthesis cluster is most similar to a serovar 2 isolate of MAA (ATCC 25291) rather than the sequenced serovar 1 isolate MAH 104. ATCC 25291 was examined as part of this study, however, since no sequence data from this isolate was present on the microarray, the presence or absence of these genes was not detected by hybridization. These results suggest that additional insights into the pathogenicity and evolutionary history of MAP sheep isolates may be gained by a closer examination of MAA serovar 2 genomes.

This study provides some of the first molecular evidence that distinguishes MAS from other M. avium subspecies. The MAS and MAA isolates examined in this study were observed to have very similar genomic hybridization profiles, but could be distinguished by several large sequence polymorphisms. Specifically, the silvaticum and avium subspecies could generally be separated from each other by the presence or absence of the polymorphic regions MAP1424c-1465 and MAH2829-2830. Isolate 5001 was grouped together with the classical MAS wood pigeon isolates. This isolate was originally isolated from a sheep that appeared to be suffering from Johne's Disease. It is noteworthy that in a previous study by Burrells and coworkers MAS was also cultured from sheep that were diagnosed with Johne's Disease [31]. The silvaticum genomotype isolates share some distinguishing genome features with the MAP sheep isolates, including all or part of three LSPs. It is unclear whether the recovery of MAS isolates from diseased animals is due to the increased susceptibility of sheep to MAS infections or if the inherent difficulties of culturing sheep MAP isolates in vitro results in the isolation of secondary opportunistic pathogens. Regardless, the association of MAS with infections resembling Johne's Disease warrants additional study.

The LSPs observed in the mycobacterial isolates were often closely associated with mobile genetic elements and a reduction in the proportion of GC-base pairs relative to the MAP genome average of 69.3% (Figure 4). A majority of the LSPs contain or are flanked by insertion sequences, phage remnants, or REP-family elements, which themselves have been predicted to be sites of phage integration. Considering the paucity of recombination observed amongst mycobacteria, it is not surprising that much of the sequence diversity amongst this closely related group of mycobacteria appears to be localized to loci that are highly mobile and hotspots for recombination.

Figure 4
figure 4

Location of polymorphic regions mapped to the MAP genome. The outermost ring is the nucleotide position from the origin of replication, while the green ring is the percentage of GC nucleotides. The innermost rings represent the ORFs that were identified as variably present (log ratio < -1.2) in at least one of the mycobacterial genomes (blue) as well as ORFs conserved in all of the isolates examined (black). ORFs that are annotated as mobile genetic elements are colored red. The numbers adjacent to polymorphic sequences correspond to the LSP numbers listed in Table 2. The circular genome MAP was generated with the software program Microbial Genome Viewer [50].

It is notable that two of the LSPs include mce gene clusters. The mce genes were originally described in M. tuberculosis as facilitating mycobacterial cell entry [32] and subsequent genome sequencing identified four mce genes clusters in the M. tuberculosis genome (mce1, mce2, mce3, and mce4) [33]. Deletion of the mce gene clusters in M. tuberculosis resulted in the attenuation of virulence in mouse models [34, 35]. Recent work has suggested that the mce gene clusters may function as ATP-binding cassette transport systems [36] with cholesterol subsequently identified as one potential substrate [37]. A phylogenetic analysis of the mce-like genes from a variety of microbes further supports the predicted function of these gene clusters as ABC type transporters [38]. Analysis of the MAP genome sequence has identified eight mce gene clusters [39]. The two mce gene clusters identified as polymorphic in the isolates examined in this study (MAP0757-0765 and MAP2189-2194) were annotated by Casali and Riley as mce5 and are predicted to have arisen from a recent gene duplication event [38]. The MAP2189-2194 mce5 cluster was absent from most hominissuis and avium-silvaticum genomotype isolates, while the MAP0757-0765 cluster was only absent from MAS isolates 6007 and 6049.

While the LSPs that distinguish cattle and sheep MAP isolates represent a potential source of species specificity, the lack of LSPs amongst the cattle genomotype isolates suggests that other mechanisms may also have an influence. Alternatively, the variety of species from which the cattle genomotype isolates were recovered may represent transient or "pass through" infections by the more environmentally prevalent cattle type isolates. From a human health perspective, all of the human MAP isolates examined in this study appear to be highly similar to the prototypical bovine isolates. While the source and route of MAP infections in humans remains controversial, it appears that both cattle and humans are susceptible to infection by MAP isolates with similar genotypes. The diversity of SSR genotypes that displayed similar hybridization patterns suggests that additional insights into host specificity may be gained by future studies designed to examine small nucleotide changes.

The microarrays used in this study are not sensitive enough to detect small genetic changes such as single nucleotide polymorphisms (SNPs). Alternate approaches such as directed sequencing are required to elucidate these types of genetic differences which may also result in significant biological effects. Additionally, only sequences present on the microarray can be detected via hybridization, thus novel sequences that may be present in other mycobacterial genomes will not be observed. This limitation was partially addressed by sequencing portions of the MAP sheep isolate 397 genome. Genome sequencing and subtractive genomic approaches are warranted in future studies to identify additional sequences not represented on the MAP microarray used in this study.


This study has identified several polymorphic regions within the genomes of M. avium subspecies obtained from a variety of host animals. Many of the subspecies are distinguished from each other by both host specificity as well as large sequence polymorphisms, which suggests that genes encoded within these regions influence the host range of the mycobacteria. The lack of genetic diversity and widespread distribution of cattle genomotype isolates of MAP suggest that this subspecies is either capable of infecting a variety of host species or alternatively is widespread in the environment and therefore comes into transient contact with a large number of hosts. Future experiments will be designed to elucidate the functions of the genes contained within the large sequence polymorphisms. Insights into the variable pathogenicity and host specificity of the closely related M. avium subspecies are likely contained within these regions.


SSR Genotyping

PCR amplification of two of the most discriminatory short sequence repeat (SSR) loci – mononucleotide G and tri-nucleotide GGT repeats [40] was carried out as described (2, 30) using primer sets (i) 5'-TCA GAC TGT GCG GTA TGG AA-3' and 5'-GTG TTC GGC AAA GTC GTT GT-3', and (ii) 5'-AGA TGT CGA CCA TCC TGA CC-3' and 5'-AAG TAG GCG TAA CCC CGT TC-3', respectively. The PCR products were purified with a QIAquick PCR purification kit (Qiagen Inc., Valencia, CA) and sequenced by using standard dye terminator chemistry, and the sequences were analyzed on an automated DNA sequencer (3700 DNA Analyzer, Applied Biosystems, Foster City, CA). The alleles were assigned a number congruent to the number of G and GGT residues.

Polymerase chain reaction

All primers were designed with Primer3 software [41]. PCR reactions used genomic DNA as a starting template and used standard conditions reported in detail elsewhere [4].

Microarray Design

The software program ArrayOligoSelector [42] was used to identify 70 mer oligonucleotides specific for every predicted open reading frame (ORF) in the MAP K-10 genome [39]. One 70 mer was designed for each MAP K-10 ORF with a total length of less than 4000 bp, while longer ORFs were split in half and one 70 mer was designed for each half. One 70 mer was also designed for every MAP intergenic region greater than 500 bp. Additionally, an automated annotation of the unannotated MAH 104 genome sequence (downloaded on 03/25/05) was performed using the methods described by McHardy and coworkers [43] and one 70 mer was designed for every predicted ORF that was less then 30% identical to MAP K-10 sequences as determined by BLAST analysis [44]. The annotated MAH 104 ORFs were numbered sequentially starting at the origin of replication, although for reference the nucleotide start and stop positions for each ORF or cluster are also reported. The arbitrarily assigned ORF designations were subsequently matched to the ORF designations used when the completed MAH 104 genome sequence was deposited in Genbank (accession number NC_008595) [see Additional file 1]. A detailed description of this microarray platform has been deposited in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus under the accession number GPL3433.

Bacterial growth and DNA extraction

Mycobacteria were cultured and DNA extracted as previously described [45] or as follows. Bacteria were grown in Middlebrook 7H9 broth (pH 6.0) supplemented with oleic acid-albumin-dextrose-catalase (Becton Dickinson Microbiology, Sparks, MD), and 0.05% Tween 80. Cultures of MAP were further supplemented with ferric mycobactin J (2 mg/liter; Allied Monitor Inc., Fayette, MO). Genomic DNA was extracted from MAP isolates (Table 4) with Genomic-tip 100/G anion-exchange columns (QIAGEN, Valencia, CA) as previously described [46] with the following modification: D-cycloserine was not added as part of the extraction procedure. For the purposes of this study, Mycobacterium avium subsp. hominissuis is used to designate Mycobacterium avium isolates cultured from non-avian sources, while Mycobacterium avium subsp. avium is used for isolates originating from birds [8].

Table 4 Mycobacterial isolates used in this study.

Microarray hybridization and data analysis

Microarray hybridization and preliminary data filtering was performed as described in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database entries (see below). The software program Cluster 3.0 [47] was used to filter out ORFs that were lacking values for greater than 80% of the MAP isolates examined and median center the results for each isolate. Hierarchical clustering of the results was performed and visualized with Cluster 3.0 and Java TreeView [48] or MultiExperiment Viewer [49]. Putative large sequence polymorphisms (LSPs) were identified by a visual examination of the hybridization results and selected LSPs were subsequently confirmed by genome sequencing and/or PCR. All microarray data has been deposited in the NCBI Gene Expression Omnibus database as series number GSE7622 which includes sample accession numbers GSM172202 – GSM172344.

Genome sequencing

Purified genomic DNA from MAP isolate 397 was prepared as described above for microarray hybridizations. Initial shotgun genome sequencing was performed by 454 Life Sciences (Branford, CT). Additional sequences were generated by directly sequencing PCR products and genomic DNA using an ABI3100 capillary sequencer (Applied Biosystems). Sequence data was generated as part of a genome sequencing project and will be published upon completion (M. Paustian, unpublished data).