Background

Huanglongbing (HLB) is a destructive disease of citrus production worldwide. All known commercial citrus cultivars are susceptible to HLB. The disease was first noted in Chaoshan area in Guangdong Province of the People's Republic of China in the late of 1800s [1] and is currently distributed in 10 citrus producing provinces in South China. HLB is now established in Sao Paulo of Brazil [2] and Florida of the United States [3] where it poses a great threat to the citrus industry. The disease is associated with three species of non-culturable, phloem-limited, α-Proteobacteria: 'Candidatus Liberibacter asiaticus', 'Ca. L. africanus', and 'Ca. L. americanus' [4, 5]. In both China and U.S., only 'Ca. L. asiaticus' has been detected. Due to the lack of pure culture, 'Ca. L. asiaticus' has been poorly characterized. Little is known about the bacterial biology, genetic diversity, and epidemiology.

Sequence analyses of conserve genomic loci such as 16S rRNA gene and 16S/23S intergenic spacer regions have been used to define 'Ca. Liberibacter' species [4, 6]. However, more variable genomic loci need to be identified to better characterize the bacterium. Before the availability of whole genome sequence, Bastianel et al. [7] identified an outer member protein gene (omp) to differentiate isolates/strains of 'Ca. L. asiaticus' from different geographical origins, although each regions was represented by only one to three strains. Tomimura et al. [8] analyzed the single nucleotide polymorphisms (SNPs) in a bacteriophage-type DNA polymerase gene and revealed three clusters of 'Ca. L. asiaticus' strains from the Southeast Asia. All Indonesia strains clustered in one group and the other two clusters were not correlated with geographical origins including Vietnam, Thailand, Taiwan, and Japan.

The completed genome sequence of 'Ca. L. asiaticus' Strain Psy 62 is now available [9]. The annotated genome has 1,109 protein and 53 RNA coding loci and is readily accessible for genomic analyses. Based on the variation of tandem repeat number (TRN) at the locus of CLIBASIA_01645, the population of 'Ca. L. asiaticus' strains in Guangdong of China was found to differ from that in Florida of U.S. [10]. This analysis of TRN also detected the possible presence of two genotypes in Florida: a TRN < 10 genotype that widely distributed statewide and a TRN > 10 genotype that was limited to central Florida. In Guangdong, TRN variations were more heterogeneous and correlations to geographical origins were not established. A recent report used four tandem repeat loci to analyze 'Ca. L. asiaticus' strains from Japan, Taiwan and Indonesia revealed various levels of population diversity, yet correlation to other genotypes or geographical origins was not known [11]. More recently, a prophage terminase gene (CLIBASIA_05610) was used to evaluate population diversity of 'Ca. L. asiaticus' in two geographically distinct citrus growing provinces (Yunnan and Guangdong) in China [12]. The 'Ca. L. asiaticus' populations in these two locations are significantly different in their prophage terminase gene frequencies. In other bacteria, such as Escherichia coli, Haemophilus influenzae and Xylella fastidiosa, genomic loci with variable TRN or prophage genes are also known to be valuable descriptors of bacterial genetic diversity [1317].

This study was to further explore the use of available genomic information for 'Ca. L. asiaticus' characterization. We report our observation of DNA mosaicism or hyper-sequence variation at the locus of CLIBASIA_05650 and the downstream intergenic region in the genome of 'Ca. L. asiaticus'. PCR analyses using a primer set flanking this genomic locus revealed eight electrophoretic types (E-types) of 'Ca. L. asiaticus' strains from China and U.S. Analyses on DNA mosaic phenomenon depicted the inter- and intra-continent diversity of 'Ca. L. asiaticus'. The molecular nature of DNA mosaicism was identified through sequence analyses.

Methods

Sample collection

HLB symptomatic citrus leaves were collected from nine provinces in China (Figure 1, Table 1) and Florida in U.S. between 2007 and 2010. Each sample originated from a single tree and was tentatively considered as a single strain. All collected samples in China were shipped by mail to Citrus Research Institute of Southwest University in Chongqing, or Citrus HLB research laboratory of South China Agricultural University in Guangdong. Collection of HLB samples in Florida have been described previously [10].

Figure 1
figure 1

A map of China showing geographical locations (both solid and open triangles) with altitudes where citrus Huanglongbing (HLB) samples were collected. The dash line oval indicates a high altitude region and the solid line oval indicates a low altitude region.

Table 1 Distributions and frequencies of 'Candidatus Liberibacter asiaticus' electrophoretic types (E-types) at different locations in China and U.S.

DNA extraction

In Chongqing, midribs of citrus leaves were excised and DNA was extracted using the cetyltrimethylammonium bromide (CTAB) methods as previously described [18]. Procedures of DNA extraction in Guangdong and Florida were described previously [10]. 'Ca. L. asiaticus' was identified by PCR with primer sets OI1/OI2c [4] and ITSAf/ITSAr [19]. DNA preparations were sent to the San Joaquin Valley Agricultural Sciences Center, United Stated Department of Agriculture-Agricultural Research Services, Parlier, CA, U. S. A. for further analyses.

Primers and PCR assays

The whole genome sequence of 'Ca. L. asiaticus' strain psy62 (accession number CP001677) was obtained from NCBI GenBank database. Fifteen primer sets, which targeted genomic loci with tandem repeats and prophage genes, were designed by setting the Tm at 60°C and amplicon size around 800 bp with Primer 3 software [20]. Tandem repeat loci were identified using Tandem Repeat Finder (version 4.03) with default parameters [21]. Of the 45 tandem repeat loci, eight loci with 97-100% matches of each repeat were applied in the study. Seven prophage loci were directly selected from the annotated 'Ca. L. asiaticus' psy62 strain genome. DNA from a set of 10 'Ca. L. asiaticus' strains (5 from China and 5 from Florida) was used to test the capacity of each primer set in detecting strain diversity. Primer set Lap5640f/Lap5650r flanking the chromosomal region of CLIBASIA_05640 to CLIBASIA_05650 was selected for further analysis because it generated different electrophoretic profiles from different strains. Primer specificity to 'Ca. L. asiaticus' were verified by in silico analysis through BLASTn search against the GenBank database. Primer set LapGP-1f/LapGP-1r, targeting a tandem repeat locus of CLIBASIA_01645 [10], was also included in this study for a comparison purpose. All primer sets used in the study are listed in Table 2 and Additional file 1.

Table 2 List of primers and their related properties used in this study

Analyses of different 'Ca. L. asiaticus' populations

Although a single amplicon of 797 bp from primer set Lap5640f/Lap5650r was predicted based on the available genome sequence of strain psy62 [9], multiple amplicons were observed from other 'Ca. L. asiaticus' strains from China and Florida. Amplicon profiles on agarose gel were designated as electrophoretic types or E-types. E-type frequencies were summarized and Chi-square test was used to determine the significance of E-type differences at different geographical locations.

DNA sequencing and analysis

DNA bands were excised from the gel and purified using QIAquick Gel Extraction kit (Qiagen, Valencia, CA). Purified DNAs were cloned with pGEM T-easy vector (Promega Corp. Fitchburg, WI) and sequenced using BigDye Terminator v3.1 Cycle Sequencing Kit in a 3130 × 1 Genetic Analyzer (Applied Biosystems, Inc.). Multiple sequence alignments were performed using ClustalW (Ver.1.74) program with the default parameters [22]. Manual adjustment was performed when appropriate. Protein secondary structure prediction was performed by the method of Bryson et al. [23] available in PSIPRED server http://bioinf.cs.ucl.ac.uk/psipred/. The protein 3-D structure model was built based on a fold prediction protocol with the help of Phyre [24].

Nucleotide sequence accession numbers

Nine DNA sequences of 'Ca. L. asiaticus' representing different amplicon sizes and collection origins have been deposited in GenBank with accession numbers JF412691 to JF412699 (Additional file 2).

Results

Detection of DNA mosaicisms by primer set Lap5640f/Lap5650r

A total of 262 HLB samples detected positive with primer set OI1/OI2c [4] and ITSAf/ITSAr [19] were analyzed. Among them, 188 samples were from nine provinces in China and 74 samples were from Florida (Table 1). The geographical origins of HLB samples in China were from locations of both high altitude region (HAR) and low altitude region (LAR) (Figure 1). PCR amplification with primer set Lap5640f/Lap5650r produced eight E-types, designated as E-type A to H. Each E-type was composed of one or more of five DNA amplicons, designated as P1 to P5 (Figure 2). DNA polymorphisms were not detected with the other 14 primer sets listed in Additional file 1 (data not shown), i.e. each of the 14 primer sets generated a single amplicon.

Figure 2
figure 2

Electrophoretic profiles (E-types) of representative ' Candidatus Liberibacter asiaticus' strains from PCR amplification with primer set Lap5650f/Lap5650r. Lane M on the left is molecular markers. Size unique amplicons are labeled by numbers and designated through P1-P5 with sequence lengths indicated on the right.

The 797 bp calculated amplicon in the genome of 'Ca. L. asiaticus' strain psy62 placed the strain to E-type C (Figure 2, Table 1). Surprisingly, E-type C was found in 3 out of the 74 Florida HLB samples (4.1%). Other E-types detected in Florida were A, G, and H. E-type G was predominant (82.4%) followed by E-type A (10.4%) and E-type H (4.1%) (Table 1). Six E-types (A, B, C, D, E, and F) were found in the 188 samples from China (Figure 2, Table 1). E-type A was the most frequent (71.3%), followed by E-type B (19.7%). When geographical origins were considered, E-type A was mostly from LAR locations and E-type B was mostly from HAR locations. Similarly, only 11 samples (5.8%) from China belonged to E-type C (the same as strain Psy62 in Florida) and they were all from HAR locations (Table 1).

To avoid the presence of small expected values in the Chi-square test, data in Table 1 were regrouped into four categories: E-type A, E-type B, E-type G and other E-types for location comparisons. The results showed that the E-type distribution of 'Ca. L. asiaticus' population in China were significantly different from those in Florida (P = 1.12 × 10-44). Within the samples from China, the E-type distribution in the LAR population was significantly different from those in the HAR population (P = 1.59 × 10-22).

Correlation between E-types and TRN genotypes

To evaluate the correlation between E-types and TRN genotypes, all 74 'Ca. L. asiaticus' strains from Florida (Table 1) were also tested for TRNs variations with primer set LapGP-1f/LapGP-1r [10]. All the seven E-type A strains belonged to TRN > 10 genotype, whereas the other three E-type strains were grouped with TRN < 10 genotype. Therefore, the Florida strains could be divided into E-type A and non-E-type A groups, matching with TRN > 10 and TRN < 10 genotypes, respectively, and supported the previous observation that there were at least two groups of 'Ca. L. asiaticus' strains in Florida. No significant correlation between E-type and TRN genotype was found after testing all 'Ca. L. asiaticus' strains from Yunnan, Guangxi, and Guangdong provinces (data not shown).

Sequence analyses of five amplicons from primer set Lap5640f/Lap5650r

The sequences of five amplicons (P1, P2, P3, P4, and P5) from primer set Lap5640f/Lap5650r were determined to be 797, 869, 906, 1071, and 1143 bp, respectively (Figure 2). The size of each amplicon was confirmed by sequencing three to five addition 'Ca. L. asiaticus' strains. Alignment data showed that the five DNA sequences shared a common backbone of P1 with P2, P3, P4 and P5 derived from insertion events at nucleotide position 574 and 722 (Figure 3). P2 (869 bp) had a 72-bp direct repeat at position 574 inside open reading frame (ORF) CLIBASIA_05650. P3 (906 bp) had an insertion of 109 bp fragment at position 722 within the annotated intergenic region. Similar to P3, P4 (1,071 bp) had an insertion at position 722 but a fragment size of 274 bp. P5 had both the P2 and P4 type insertions. BLASTn search using the five amplicon sequences (P1 to P5) showed that only P1 and P5 were nearly identical with bacterial sequences currently deposited in GenBank database. The P1 sequence was identical to that in strain Psy62 [9]. P5 was over 99% similar to those of 'Ca. L. asiaticus' strain UF506 (HQ377374.1), Liberibacter phage SC1 (HQ377372.1), and Liberibacter phage SC2 (HQ377373.1) [25]. The insertion/deletion sequences made P2, P3 and P4 unique as comparing to the available 'Ca. L. asiaticus' sequences in GenBank.

Figure 3
figure 3

Sequence comparison of five types of PCR amplicons (P1-P5) derived from primer set Lap5640f/Lap5650r. Annotation of 'Candidatus Liberibacter asiaticus' strain Psy62 is used as a reference and shown in the first row where primer set Lap5640f/Lap5650r flanks a region of 797 bp. Open reading frame CLIBASIA05640,05645 and 05655 encode hypothetical proteins. CLIBASIA_05650 encodes a phage associated protein. Nucleotide positions 574 and 722 are marked as insertion/deletion sites.

In silico analyses of CLIBASIA_05650 alleles

ORF CLIBASIA_05650 was annotated as interrupted gp229, a phage-associated protein [9]. A 72-bp (24 amino acids) insertion as shown in P2 and P5, which distributed in E-type F, G, or H (Figure 3), created an in frame mutation. Close examination showed that CLIBASIA_05650 was mostly composed of imperfect six amino acids (or 18 bp nucleotides) tandem repeats leading by residue V (Figure 4). Such hexapeptide domains are common to many bacterial transferases represented by LpxA-like enzymes. The secondary and tertiary (3-D) structure predictions on translated amino acid sequences were constructed (Figure 4). The 24 amino acid insertion apparently shortened many of the beta-sheets (Figure 4A) and added a structure motif (Figure 4B) along with the increases of prediction stability in both secondary and tertiary structures. Interestingly, of the 66 strains which have P2 and P5 amplicons, 64 (97.0%) were collected from Florida, U.S., and only 2 (3.0%) were from Guangdong, China (Table 1).

Figure 4
figure 4

Predictions of secondary and tertiary (3-D) structures of CLIBASIA_05650 by PSIPRED and Phyre servers. Panel A (top): CLIBASIA_05650 allele with a 24-amino acid sequence insert. Six motifs are shown in tertiary structure. The 24-amino acid repeat unit is underlined in red and the second 24-amino acid sequence insert is underlined in green. Panel B (bottom): CLIBASIA_05650 allele without a 24-amino acid sequence insert. Five motifs are shown with the tertiary structure. The potential 24-amino acid repeat unit is underlined in black. In both A and B, the first amino acid of a hexapeptide unit, V, is highlighted in red. Confidence of prediction is presented in bar graph (1-9) in the secondary structure and in P-value in the tertiary structure.

Discussion

In this study, primer set Lap5640f/Lap5650r yielded one to three amplicons for a given HLB samples. A total of five amplicons with different sizes were identified. They are related by insertion/deletion events, demonstrating the mosaicism in the population genome of 'Ca. L. asiaticus'. In another word, at the locus of CLIBASIA_05640-CLIBASIA_05650, 'Ca. L. asiaticus' possesses alleles composed of sequences identical in some parts but polymorphic in other parts. DNA mosaicism described in this study is largely from size variation of different PCR amplicons and confirmed by sequencing with limited strains. Deng et al. [19] showed the co-amplification of different amplicons from primer sets targeting the rrn locus in the chromosome of 'Ca. L. asiaticus'. However, further sequencing investigation was not reported.

As shown in Figure 2, the mosaicism of E-types B, D, E, G and H is represented by multiple DNA bands from the same PCR primer set, raising a question if a HLB sample has single or multiple clones (or clonal strains) of 'Ca. L. asiaticus'. This is of particular interest, since 'Ca. L. asiaticus' DNA obtained was not from a clonal pure culture. Further complicated the issue is the variation of amplicon intensity, suggesting different concentration of PCR templates. If a single clone scenario is considered, the bacterium should have multiple Lap5640f/Lap5650r loci, either in chromosome or/and in the form of a phage. Lytic phage possessing this genomic locus has recently been reported [25]. Alternatively, the HLB samples may contain multiple clones of 'Ca. L. asiaticus'. More evidence is, however, needed. A third scenario could be the combination of both of the above.

Since the sequenced Florida strain Psy62 belongs to E-type C (Table 2, Figure 2), it is interesting that the frequency of E-type C is low in Florida (4.1%), as well as in China (5.9%). This could mean strain Psy62 may not be the most representative strain. We noted that Psy62 originated from a psyllid and all the 'Ca. L. asiaticus' samples in this study were from citrus. Could it be possible that bacterial population was difference between psyllids and plant hosts? Zhang et al. [25] recently reported that phages behaved differently between plants and psyllids in Florida. Phage SC1 and SC2 were lytic in dodder plant but remained lysogenic in psyllids.

Among the six E-types in China, five were found in Yunnan and two were in Guangdong (Table 1). The higher E-types number suggests that 'Ca. L. asiaticus' population in Yunnan could be more diverse than that in Guangdong. The uniqueness of P3 (E-type D and E) to Yunnan samples further substantiates the speculation. It should be noted that Yunnan is one of the world origins of citrus species [26]. It remains to be tested if a long history of the presence of citrus species is associated with more diversity of 'Ca. L. asiaticus' population. Information about the population diversity of 'Ca. L. asiaticus' in Yunnan is currently very limited.

The challenge of in vitro culture of 'Ca. L. asiaticus' has been a critical factor limiting our capacity to study the bacterial biology. DNA sequencing and in silico analyses provide a different venue to collect information of unculturable bacteria. Regarding to CLIBASIA_05650, the P1/P3/P4 alleles which encode 18 hexapeptides predominately occurred in 'Ca. L. asiaticus' populations in China, whereas the P2/P5 alleles which have 22 hexapeptides distributed mostly in Florida populations. Hexapeptide variation has been reported in other bacteria [27]. This type of genetic heterogeneity may be associated with phenotypic variation for environment adaptation [17, 28].

Conclusions

This study described and analyzed a DNA mosaic phenomenon in the unculturable 'Ca. L. asiaticus' associated with citrus HLB. In addition to the previous studies on two different genomic loci [10, 12], we identified a new genomic locus that generated single to multiple amplicons from different HLB samples. Analyses on the DNA mosaicism revealed significant inter- and intra population variations of 'Ca. L. asiaticus' from South China and Florida. Further investigation showed that insertion/deletion events contributed to the DNA mosaicisms.