Status of the genome

Ploidy and genome size

Diploidy is the general rule in Citrus and related genera of Aurantioideae with a basic chromosome number x = 9 (Krug 1943). However, some euploid genotypes have been found in the citrus germplasm. The most common euploid variations are triploids and tetraploids (Lee 1988). Longley (1925) was the first to identify a tetraploid wild form, the ‘Hong Kong’ kumquat (Fortunella hindsii Swing.). Triploid ‘Tahiti’ lime (Citrus latifolia Tan.), tetraploid strains of Poncirus trifoliata (L.) Raf., allotetraploid Clausena excavata Burm. F., tetraploid Clausena harmandiana Pierre (Guill) and hexaploid Glycosmis pentaphylla Retz. (Corrêa) are other examples of some natural polyploids found in the germplasm of Aurantioideae.

Citrus species have small genomes. The size of the Citrus sinensis haploid genome is estimated to be 372 Mb (Arumuganathan and Earle 1991). In a previous work estimating citrus genome size by flow cytometry for at least four genotypes by species, Ollitrault et al. (1994) found significant genome size variation between citrus species. The largest and smallest genomes were respectively Citrus medica L. (the citron, with an average value of 398 Mb/haploid genome) and Citrus reticulata Blanco (the mandarin, with an average value of 360 Mb/haploid genome). Citrus maxima (Burm.) Merrill, the pummelo, was intermediate with an average 383 Mb/haploid genome. Interestingly, the secondary species presented intermediate values between their putative ancestral parental taxa: 370, 368, 381 and 380 Mb for C. sinensis (L.) Osb., Citrus aurantium (L.) , Citrus paradisi Macf. and Citrus limon (L.) Burm. haploid genomes, respectively.

Sequenced haploid and diploid genomes

The International Citrus Genome Consortium (ICGC) was organized in 2003, and a series of operating principles and goals was established at that time. The ICGC recognized that the task of developing useful genomic resources for citrus genetics and breeding research exceeded the capacity of any of the individual labs or national groups, and sought therefore to develop a set of global and freely available genome-based tools. These included extensive EST datasets, consensus genetic linkage maps integrated with physical maps, and new tools for functional genomics and bioinformatics studies. However, the cornerstone objective for the ICGC was a whole citrus genome sequence. Initially, the ICGC selected a diploid clone of sweet orange, ‘Ridge Pineapple’, as the target genome for the sequencing project. Sweet orange was selected because it is the most widely grown citrus type worldwide, and the ‘Ridge Pineapple’ clone was selected specifically because it produces seedy fruit, and as such presented a minimal possibility of chromosomal rearrangements as have previously been associated with seedless types (Gmitter et al. 1992). The goal was to have 8−10x coverage of the genome, and as this was prior to the advent of second generation sequencing platforms, it was to be done using Sanger technology. With a clear plan, the members of the ICGC proceeded to seek funding for the project from within their respective nations or supported by their citrus industries. The efforts to secure funding for the citrus genome sequencing project took several years, and the goal was finally realized with first investments announced in 2008.

In the 5 years between the formation of the ICGC and the availability of funding to support the sequencing project, two significant developments took place that had implications for the citrus genome community. First, there was the release of the diploid poplar genome (Tuskan et al. 2006), which represented the first highly heterozygous diploid woody plant genome. Through discussions held with authors of the poplar genome manuscript, it became clear that to minimize complications associated with assembly of a highly heterozygous genome (such as sweet orange) the ICGC should identify and use a haploid citrus clone. In response, the ICGC characterized three known haploid lines, all of which were derived from the Clementine mandarin (Citrus clementina cv. Nules). These plants were characterized using over 180 available SSR markers, mitotic chromosome counts and cDNA array complete genome hybridization to assure homozygosity and the structural integrity of the selected haploid target genome (Gmitter et al. unpublished data). One of the candidates proved to be a trihaploid, while the others were true haploids, with no evidence of compromised genome integrity, no recognizable deletions of genomic regions and homozygosity at all loci surveyed. Finally, a haploid line that was retained in pathogen-free status, and which had already been propagated substantially, was selected for sequencing (Aleza et al. 2009). The sequencing, assembly and annotation of this haploid genome is a collaboration of Genoscope in France, Istituto Applicata di Genomica in Italy and the US Department of Energy Joint Genome Institute (JGI)/HudsonAlpha in the USA, and with contributions from LifeSequencing, a private company in Valencia, Spain. Citrus researchers from Brazil (Centro de Citricultura Sylvio Moreiera), France (CIRAD, Centre de Coopération Internationale en Recherche Agronomique pour le Développement), Italy (CRA-ACM, Centro di Ricerca per l’Agrumicoltura e le Colture Mediterranee), Spain (IVIA, Instituto Valenciano de Investigaciones Agrarias-Spain) and the USA (UF-CREC, the University of Florida Citrus Research and Education Center and the University of California at Riverside) served as the steering committee for the project and were actively involved and contributed in all aspects. Because of its quality, the Clementine genome sequence will serve as the primary reference genome for all citrus and related genera into the future.

A second significant development was the advent of second generation genome sequencing platforms, among these most notably being 454 pyrosequencing technology. In parallel with the ICGC haploid project, a second project to sequence the genome of the diploid sweet orange (C. sinensis cv. Ridge Pineapple) using 454 technology was undertaken. This project is a collaboration of UF-CREC, UF-Interdisciplinary Center for Biotechnology Research, Roche-454 Life Sciences, JGI and the Georgia Institute of Technology. Preliminary annotated assemblies of both genomes have been released and can be found at JGI’s portal ( and in the Tree Fruit Genome Database Resources (tfGDR; Preliminary comparisons of the two genome sequences show that the sweet orange sequence assembly is substantially more fragmented than the haploid Clementine, as might be expected based on genome complexity and technical differences between Sanger and 454 sequencing technologies. However, the numbers of genes predicted within each genome are very similar, indicating at least good coverage of coding gene space in both assemblies. The next steps ahead are briefly outlined below.

Other omics

Functional evaluation of transcriptome

More than a half million citrus ESTs (Expressed Sequence Tags) have been obtained and deposited to public databases in recent years (Delseny et al. 2010). These sequences were obtained from various tissues of over 15 citrus accessions, related genera and hybrids but about 80 % of these sequences were derived from four major types, sweet orange (C. sinensis, 38.4 %), Clementine (C. clementina, 22.2 %), mandarin (C. reticulata, 9.45 %) and trifoliate orange (P. trifoliata L. Raf, 10.6 %).

Unigene data sets for sweet orange, Clementine and trifoliate orange have been constructed by NCBI ( and by tfGDR ( A total of about 99,000 assembled sequences were obtained from all EST sequences in public databases by clustering analysis (Shimizu et al. 2009). Although the number of the assembled sequences was larger than the expected number of the genes in citrus genome, 67.6 to 81.9 % of them were deduced from the orthologs for protein sequences (Uniprot), cDNA sequences for Arabidopsis and rice, and for citrus unigenes (Table 1). Comparisons of these with the whole genome shotgun sequences obtained from sweet orange by JGI in 2007, which were also preliminarily assembled, demonstrated 86.1 % similarity (Table 1).

Table 1 Functional classification of EST sequences

Functional assignment according to gene ontology (Ashburner et al. 2000) categorized 54.6 to 65.7 % of the sequences into three categories: biological function, biological process and cellular components (Table 1). Classification for primary ontology of the three categories of GO based according to GO slim terms suggested their primary role (Fig. 1), and they were similar to those obtained from unigene sets from apple and peach (data not shown). Another approach to estimate the molecular function intended for metabolic pathways classified 16.4 % of the assembled sequences with KEGG ontology terms, and 11.3 % of the sequences were mapped to a known metabolic pathway (Table 1). Furthermore, 22.2 and 4.4 % of the assembled sequences showed similarity with known gene families and transcriptional factor genes in Arabidopsis (Table 1). In addition to these efforts, initial attempts to identify microRNAs in Citrus were reported (Xu et al. 2009).

Fig. 1
figure 1

Functional categorization of citrus EST sequences according to Gene Ontology. The number of GO slim term for three ontology categories (biological function, biological process and cellular component) estimated from the assembled EST sequences were demonstrated. The functional annotations were deduced by comparison with cDNA sequences of Arabidopsis

Gene expression profiling

Accumulation of the nucleotide sequences for expressed genes with functional annotation enables expression profiling in citrus tissues. These efforts have been conducted by development of DNA microarrays. Different types of microarray, spotted array, in situ synthesized oligonucleotide for short (25 mer oligo) or long (60 mer oligo) probes have been developed (Table 2). Applications for expression profiling toward gene mining with these microarrays have been reported (Fujii et al. 2007, 2008; Terol et al. 2007).

Table 2 DNA microarray platforms for gene expression profiling of citrus

Currently, analysis of gene expression in plants by deep sequencing of short reads has also become possible. Massive parallel signature sequencing was used for gene expression profiling of lycopene accumulation in a red-flesh sweet orange mutant (Xu et al. 2010). Expression profiling by the deep sequencing approach does not require designing probe sequences prior to experiments, and it is superior for identifying rare transcripts not represented on microarrays, splice variants, or expressed genes with different alleles. However, it is a time-consuming process to assemble short reads and functionally annotate assembled sequences for every comparison. The microarray-based approach for expression profiling is advantageous for its moderate cost with wide dynamic range (>104) and sufficient sensitivity for genes expressed at low levels due to long oligonucleotide probes. No effort for sequence assembly and annotation is required every time. These techniques, therefore, will be used for different and complimentary purposes.

New insight on polyploid citrus genome expression

Despite the scarcity of polyploid accessions in citrus germplasm banks, it appears that polyploidization events are relatively frequent when seedling populations are analysed. Frequency of female 2n gametes ranges from rates below 1 % to over 20 % (Esen and Soost 1971, 1973; Geraci et al. 1975; Soost 1987; Iwamasa et al. 1988; Aleza et al. 2010), probably due to the abortion of the first meiotic division (Chen et al. 2008c) or second meiotic division (Luro et al. 2004) in the megaspore. Tetraploidization seems also to occur frequently in apomictic citrus genotypes. Lapin (1937) found tetraploid seedlings among eight citrus species (rate ranging from less than 1 to 5.6 %) and Poncirus (4 %). Cameron and Frost (1968) estimated that 2.5 % of nucellar progenies from a broad range of citrus genotypes grown in California were tetraploid; they proposed that chromosome doubling in nucellar tissue might be the general mechanism underlying tetraploidization in apomictic citrus genotypes. Today, there is renewed interest in polyploid citrus for seedless triploid breeding programs (Ollitrault et al. 2008a; Grosser and Gmitter 2010) and for tetraploid rootstock breeding (Saleh et al. 2008; Grosser and Gmitter 2010). Moreover somatic hybridization has become an integral part of citrus improvement programs aiming to create new allotetraploid rootstocks or to synthesize tetraploid parents for further triploid breeding (Grosser and Gmitter 2010). This leads to new research questions on polyploid citrus genome expression. Indeed recent research on polyploid plants have shown that allopolyploidization greatly affects the genomic and phenotypic expression (Adams et al. 2004; Comai et al. 2000; Wang et al. 2004, 2006a, b; Wendel and Doyle 2005; Flagel and Wendel 2010) with numerous examples of non-additive inheritance of gene expression (He et al. 2003; Hegarty et al. 2006; Wang et al. 2006a, b; Chen 2007; Chen et al. 2008d; Flagel et al. 2008)). Neo-regulation of parental genome expression in allopolyploid plants would partially explain their higher adaptability and why they often give rise to new phenotypes, exceeding the variability range of the diploid gene pool. Somatic hybrids allow combining genomes without sexual recombination and are interesting models to study the immediate effect of allopolyploidization on the regulation of gene expression and subsequent phenotypic variation. For these reasons, citrus is an excellent model for polyploid genome expression studies. Allotetraploid citrus from somatic hybridization displays certain transgressive morphological vegetative traits (leaf thickness, stomata density and size, etc.) similar to autotetraploids arising from chromosome doubling of nucellar cells that can be associated with tetraploidy per se (Ollitrault et al. 2008a). However, inheritance of other traits is clearly linked with the parental combinations with codominance or dominance of one or the other parent according to the traits under consideration (Bassene et al. 2009a).

Allotetraploid hybrids have been obtained at CIRAD in France by protoplast fusion between Citrus deliciosa (‘Willowleaf’ mandarin) and six other citrus species: four species belonging to Citrus (C. limon, Citurs aurantifolia, C. sinensis and C. paradisi) and P. trifoliata and Fortunella margarita (kumquat) (Ollitrault et al. 2001). Molecular analysis using 96 SSR markers did not reveal any inconsistency with total addition of parental genomes. Evaluations of the phenome, proteome and transcriptome were conducted using a field trial planted in a complete randomized design with three trees (replications) grafted onto Carrizo citrange rootstock for each genotype. Leaf volatile compounds of the six allotetraploid hybrids sharing Willowleaf mandarin as their common parent were analyzed by GC–MS, and the systematic dominance of mandarin traits was observed (Gancel et al. 2003). Particularly notable was the absence of monoterpene aldehydes and monoterpene alcohols and the very low levels of sesquiterpene hydrocarbons, sesquiterpene alcohols and sesquiterpene aldehydes (β- and α-sinensals) in all hybrids, as these compounds were found at high concentrations in the non-mandarin parents. The leaf proteomes of two allotetraploid somatic hybrids combining C. deliciosa with C. aurantifolia and Fortunella margarita were analysed by 2D electrophoresis (Gancel et al. 2006). The two allotetraploid hybrids were closer to their mandarin parent than to their other parents in terms of presence/absence of protein spots as well as at a quantitative expression level. Seventy-five percent of the protein spots specific to the non-mandarin parent were silenced in the somatic hybrids. Moreover, 14 and 29 % spots of the C. deliciosa + C. aurantifolia and C. deliciosa + F. margarita hybrids, respectively, were not encountered in their parental genotypes, suggesting a derepression of these genes in the allotetraploids. Recently, genome-wide gene expression analysis on fruit pulp of a Citrus interspecific somatic allotetraploid between C. reticulata cv ‘Willowleaf’ mandarin + C. limon cv ‘Eureka lemon’ was done using a Citrus 20K cDNA microarray (Bassene et al. 2010). About 4 % transcriptome divergence was observed between the two parental species, and 212 and 160 genes were highly expressed in C. reticulata and C. limon, respectively. For these genes, the authors observed a global down regulation of the allotetraploid hybrid transcriptome compared to a theoretical midparent. The genes under-expressed in mandarin compared to lemon were also repressed in the allotetraploid. However, when genes were over-expressed in C. reticulata compared to C. limon, the allotetraploid genes expression distribution was much more equilibrated with evidence of transgressive overexpression as well. This led to a global dominance of the mandarin transcriptome. The potential implication of non-additive gene expression on the phenotype of citrus somatic hybrids was illustrated by Bassene et al. (2009b) by analyzing the carotenoid and ABA contents and the expression of the genes of the carotenoid/ABA biosynthesis pathway.

Molecular mechanism of citrus–Xylella fastidiosa interactions revealed by transcriptome characterization

Citrus variegated chlorosis (CVC), caused by X. fastidiosa, is currently one of the most serious diseases limiting citrus production in Brazil. The annual loss due to CVC is more than U$100 million spent on chemical control of the vectors (sharpshooters), pruning infected branches and trees, and replacing old orchards (Bové and Ayres 2007). The diseased plants show a strong correlation between symptoms and disturbances caused by water stress, which is in accord with the hypothesis of xylem blockage caused by X. fastidiosa (Almeida et al. 2001). The C. sinensis–X. fastidiosa interaction is a kind of compatible interaction because all varieties of sweet orange (C. sinensis L. Osb.) are susceptible to CVC. However, this interaction is unusual compared with classical compatible interactions between bacteria/plant, since X. fastidiosa does not have avr genes from which avirulence (Avr) proteins from the pathogen are injected into the plant cells (Simpson et al. 2000). On the other hand, mandarins (Citrus reticulata) and some hybrids with sweet orange seem to be tolerant or resistant to X. fastidiosa (Coletta-Filho et al. 2007) characterized as an incompatible-like interaction. Curiously, the persistence of X. fastidiosa seems to be brief inside mandarin plants, since the bacteria can be isolated at 30 days after inoculation (DAI) but not past 60 DAI (Coletta-Filho et al. 2007; De Souza et al. 2009). An explanation for the resistance of these plants could be the morphology of the vessels, but no correlation between diameters or numbers of vessels with resistance was found (Coletta-Filho et al. 2007), suggesting that resistance may be related to active defense responses. To understand molecular mechanisms that take place in the interaction between susceptible/resistant plants and X. fastidiosa, EST libraries were constructed from sweet orange plants with and without CVC symptoms, and ‘Ponkan’ mandarin in the presence of the pathogen (De Souza et al. 2007a, b). All EST sequences are available in GenBank, accession numbers from EY758170 (dbEST id: 51298891) to EY783598 (dbEST id: 51324319).

The evaluation of susceptible C. sinensis plants in the presence of X. fastidiosa, with or without CVC symptoms, revealed changes in gene expression patterns. Photosynthesis-related genes were down-regulated in symptomatic tissues, reflecting the phenotype of chlorophyll degradation observed in plants with CVC symptoms. However, in asymptomatic tissues of infected plants these genes were up-regulated, possibly representing early host responses to the bacterium. Regardless of the damage caused by the disease, CVC does not kill affected plants. Genes related to reorganization of cell walls, ion transport and water stress were up-regulated in plants with CVC, supporting functions to allow plant survival. Up-regulated defense response genes associated with oxidative stress (Peroni et al. 2007) and PR proteins (PR 7, PR 10) (Campos et al. 2007) were found only in symptomatic plants, suggesting that although the defense machinery was activated, it was insufficient to block the disease completely. The processes of pathogen recognition may be triggered later, when the bacteria are already established within the plant, a common characteristic in susceptible responses to pathogens. Another possibility is that these genes are expressed as a secondary response to physiological effects caused by the infection, such as nutrition deficiencies and water stress (De Souza et al. 2007a).

The responses in CVC-tolerant ‘Ponkan’ mandarin showed an induction of different sets of genes at two different time points compared with mock-inoculated (De Souza et al. 2009), revealing a probable multifactor anti-pathogen response involving perception, signal transduction and activation of defense-related genes. In the first time point, various genes involved in recognition and signal transduction were up-regulated including one possibly involved with pathogen recognition (a CC-NBS-LRR-like disease resistance protein), normally responsive to the pathogen’s Avr proteins. Interestingly, no genes for Avr proteins or type III secretion apparatus are found in the genome of X. fastidiosa (Simpson et al. 2000).

Genes induced and linked with the signal transduction cascade were those encoding MAPK, ethylene-related transcription factor, lipoxygenase (LOX, the key enzyme in jasmonic acid (JA) synthesis and S-adenosyl-L-methionine:salicylic acid methyltransferase. The latter is responsible for the synthesis of methyl salicylate, a plant-signaling compound of the salicylic acid (SA) pathway which activates defense responses and systemic acquired resistance (Deng et al. 2005; Park et al. 2007). These results indicate crosstalk among regulatory pathways controlling different cellular processes in the mandarin–X. fastidiosa interaction, similar to that demonstrated in Arabidopsis (Schenk et al. 2000) and found in some cases of effector triggered immunity (ETI; Tsuda and Katagiri 2010). Up-regulation of genes associated with oxidative stress (ROS), anti-microbial compounds (specifically miraculin, PR 6 and PR 17) and P450 suggest participation of these processes in the inactivation of the pathogen inside the host.

Bacteria from inoculated ‘Ponkan’ mandarins can be detected by PCR at 60 DAI, but their isolation is not possible (suggesting very few or no living cells) (De Souza et al. 2009). Most induced genes at this time were associated with resistance response systems, including one encoding a glyoxylate aminotrasferase 2 similar to At2 from Cucumis melo; transgenic plants expressing this gene were resistant to Pseudomonas cubensis (Taler et al. 2004). This aminotransferase could activate defense responses (e.g. cell wall stretching, callose deposition and lignification), which may culminate in cell death and arrest of the pathogen (Taler et al. 2004). Five genes that encode P450 were also induced 60 DAI. Aside from their functions in defense against pathogens and synthesis of plant growth regulators, plant P450s are also related to the detoxification process (Paquette et al. 2000). These enzymes may play a role in detoxification either before or concomitantly with the biosynthesis of secondary metabolites, since genes related with both processes were found, including flavonol synthase and hydroxycinnamoyl transferase, which participate in the phenylpropanoid pathway (Lukacin et al. 2003; Hoffmann et al. 2004). The roles of flavonoids in plant defense against pathogens, herbivores and environmental stresses are well established due to their antimicrobial function (Treutter 2005; Meragelman et al. 2005), and their role in late stage of infection by X. fastidiosa in ‘Ponkan’ mandarin seems to be important to eliminate the pathogen from the plant.

Experiments to identify genes differentially expressed in CVC-tolerant mandarin used plants 30 and 60 DAI. The early high expression of a gene encoding a CC-NBS-LRR was particularly interesting since the likely mechanism for this resistance was non-host resistance, generally mediated by microbe-associated or pathogen-associated patterns (MAMPs or PAMPs) through pattern-recognition receptors (PRRs). The defense responses mediated by PAMPs are called PAMP-Triggered immunity (PTI) (Tsuda and Katagiri 2010). The induced responses are transient in PTI and occur in a different way than in ETI, mediated by effector recognition by CC-NBS-LRR; by contrast, ETI defense responses are more prolonged and robust than those of PTI (Tsuda and Katagiri 2010). Analysis of inoculated ‘Ponkan’ mandarin revealed defense responses even at 60 DAI, so it is tempting to hypothesize that the pathogen has a cytosolic effector and an ETI is leading to the resistance of mandarin, despite the fact that the X. fastidiosa genome does not possess either type III secretion system or avr genes. Further, as xylem is mainly composed of non-living cells, it is difficult to explain how this effector could be recognized. Therefore, to verify whether the CC-NBS-LRR gene was expressed in an early stage of infection and if a crosstalk among signaling pathways in fact occurs, a time course experiment was conducted sampling tissues at 1, 7, 14 and 21 DAI, and gene expression was evaluated by real-time quantitative PCR of the CC-NBS-LRR, NPR1 and PR proteins and SA-, JA- and ETI-related genes (Rodrigues et al. 2010). The CC-NBS-LRR showed higher expression just 1 DAI, followed by the induction of genes involved with SA and NPR1. No induction of genes related to JA or ETI was observed in early stages of infection. The higher expression of CC-NBS-LRR at 1 DAI reinforces the hypothesis that ETI might occur in the mandarin–X. fastidiosa interaction and the signaling is mediated by SA. However, possible crosstalk among different signaling pathways was observed 30 DAI, when the bacteria are still alive in the plant. If X. fastidiosa triggers ETI in mandarin, then the response would be different from the classical ETI response, since X. fastidiosa has a lifetime of several weeks in the plant. This may be a consequence of the pathogen life style, in which the bacteria are injected directly by the insect vector to the xylem vessel where the bacteria need to adhere and to form biofilm. The colonization process takes time and a prolonged plant response could be necessary for resistance to be fully expressed. It may be that in late stages of infection the ETI response becomes stronger and involves redundant activities of SA and JA–ET pathways, contributing a substantial level of resistance. These compensatory interactions may simply result from the higher signal flux in ETI, and making this response more robust against pathogen interference (Dodds and Rathjen 2010).

On the other hand, a PTI response must also be considered since X. fastidiosa is able to produce many PAMPs such as cell wall degradation enzymes, exopolysaccharide, lipopolysaccharide and surface adhesion proteins. However, PTI confers a transient response, but gene expression associated with plant defense response was observed even after 60 days. One possible explanation of the participation of PTI is as already mentioned above; the colonization by X. fastidiosa takes time and the production of PAMPs that could be recognized by surface PRR is slow, consequently resulting in a prolonged response. These possible mechanisms are intriguing; if the response is mediated by PTI, then why is there a CC-NBS-LRR expressed in mandarin during the infection? And how can this response be prolonged? If the response is mediated by ETI, what is the cytosolic effector produced by X. fastidiosa if this bacterium does not have type III secretion system or avr genes? And, how can the cellular interactions occur in xylem cells? These are subjects for future research efforts. Taken together, we suggest a hypothetical model to explain mandarin defense responses induced by X. fastidiosa (Fig. 2).

Fig. 2
figure 2

A common signaling machinery is used differently in PTI and ETI

Even though there is no information about which strategy is used by mandarin in response to X. fastidiosa, the high expression of one gene encoding CC-NBS-LRR during early and late stages of infection suggests that X. fastidiosa could have an unknown cytosolic effector that can be recognized by this resistance gene. This could activate MAPKs leading to signal transduction. SA plant hormone seems to be involved in signaling mediated by ETI and could induce the expression of NPR-dependent genes in early and late stages of infection. Genes associated with JA-ET as well as ethylene transcriptional factor, oxidative stress, PR proteins, miraculin, P450 and others were expressed at 30 DAI and could be involved in the increase of the resistance to the pathogen. At 60 DAI there is no expression of genes involved with plant hormone signaling or the CC-NBS-LRR, but there is an increase of P450, PR proteins and phenolic compounds. Although ETI seems more likely in the mandarin–X. fastidiosa, a possible prolonged PTI response may also be considered. X. fastidiosa colonizes and forms a biofilm in mainly dead xylem vessel cells; the process takes time, and multiple, distinct PAMPs may be produced and recognized by surface PRRs to trigger a basal resistance response.

Application to breeding

Mining of polymorphisms for molecular marker development

Data mining of nucleotide sequences obtained from expressed genes or genome sequence assemblies provide various types of putative polymorphic regions that could be useful for molecular marker development. Simple sequence repeat (SSR) regions have been mined from EST sequences and utilized for marker development and linkage mapping (Chen et al. 2006; Luro et al. 2008; Roose et al. 2006). Similar approaches applied to the assemblies of EST and shotgun sequences from sweet orange identified about 144,000 and 382,000 putative SSR regions, respectively. Over 89.5 % of identified SSR regions in expressed genes (EST-SSR) were occupied by SSR sequences with motifs of up to ten nucleotides (Fig. 3a). On the contrary, SSR regions with short motif sizes were abundant (83.6 % of total putative SSR region with motifs up to five nucleotides) in genomic SSRs (Fig. 3b). Putative SSR regions found for both EST and genome sequences should be available to design SSR markers that could compensate for the unequal distribution of EST-SSRs or genomic-SSRs.

Fig. 3
figure 3

Size distribution of putative SSR region for EST and genome sequences. Distributions of possible SSR regions for EST-SSR (a) and genomic SSR of sweet orange (b) with motif sizes longer than 2 bp that repeated at least four times. Upper numbers represent motif size (2 to 10 bp with more than 10 bp for EST-SSR and 2 to 8 bp with more than 8 bp for genomic SSR) and numbers in parentheses represent their composite ratio

Significant numbers of EST sequences obtained from more than 15 citrus accessions also provide important clues for other type of polymorphisms. Homology searches of the assembled EST sequences against all raw sequences regarding their origins estimated a frequency for single nucleotide polymorphisms at 2 SNPs per 100 bp in average, among two to four cultivars (Fig. 4a). The SNP frequencies were increased according to the number of the aligned sequences for up to 9.8 SNPs per 100 bp, but it was decreased for more than 14 cultivars. The drop of SNP frequency at that region could reflect fewer sequences and possible bias against certain genes. Omura and colleagues demonstrated an initial attempt for development and application of 384 SNPs array analysis for genotyping citrus accessions (Omura et al. 2008). In conclusion, sufficient numbers of putative SSRs and SNPs within EST or genome sequences of citrus should facilitate two-way integration between genes and loci of interest by expression profiling and genotyping studies. These when associated with plant phenotypes facilitate the development of markers that could be used by breeders for selection of parents to use in hybridizations and early selection of superior offspring from such hybridizations.

Fig. 4
figure 4

Frequency of SNPs identified by comparing EST sequences obtained from different cultivars. a Frequency of SNP per 100 bp deduced from comparison. The number of GO slim terms for three ontology categories (biological function, biological process and cellular component) estimated from the assembled EST. Bar represents standard error. b Numbers of aligned sequences for the comparison to detect SNPs

Nuclear marker development

Following isozymes markers (Torres et al. 1978), several kinds of nuclear markers have since been developed and used for citrus genetic studies. Dominant markers such as random amplified polymorphic DNA (RAPDs; Luro et al. 1992; Higashi et al. 2000), inter-simple sequence repeat (Fang and Roose 1997; Fang et al. 1997, 1998), and Amplified Fragment Length Polymorphisms (AFLPs; Liang et al. 2006) have been useful for large-scale characterization of genomes for which previous genomic sequence information was not available. These marker types are still used in citrus for nucellar/zygotic differentiation (Rao et al. 2007), genetic mapping (de Simone et al. 1998; Sankar and Moore 2001; de Oliveira et al. 2007; Gulsen et al. 2010), germplasm studies (Krueger and Roose 2003; Pang et al. 2007; Kumar et al. 2010; Biswas et al. 2010b; Yang et al. 2010) and somatic hybrid genome characterization (Scarano et al. 2002; Guo et al. 2002; Fu et al. 2004). More powerful single locus codominant markers such as restriction fragment length polymorphism (Federici et al. 1998), sequence characterized amplified regions (SCARs; Nicolosi et al. 2000) and cleaved amplified polymorphic sequences (CAPs) from ESTs (Omura et al. 2005, 2006) have also been developed and utilized in citrus.

In the last 15 years a broad international collaboration in the citrus community developed sets of simple sequence repeat (SSRs) markers. Only a limited number of SSRs markers obtained from genomic libraries have been published (Kijas et al. 1995; Corazza-Nunes et al. 2002; Novelli et al. 2006; Barkley et al. 2006; Froelicher et al. 2008). The implementation of large EST databases has allowed the development of many more SSRs markers. Chen et al. (2006) have published 56 SSRs derived from the Genbank citrus EST data base and more than 200 SSR markers have been developed (Luro et al. 2008) from the 1,600 microsatellite sequences from 37,000 ESTs characterized by Terol et al. (2007). More recently, Terol et al. (2008) identified more than 7,600 microsatellite sequences from end sequencing of Clementine BACs. They were used to develop 79 SSRs for direct anchoring of the Clementine genetic and physical maps (Ollitrault et al. 2010). SSR markers have been included in citrus genetic maps (Chen et al. 2008a; Lyon et al. 2007; Luro et al. 2007; Ollitrault et al. 2008b; Bernet et al; 2010). In addition to genetic mapping, SSRs were used for: the analysis of genetic diversity (Luro et al. 2001; Corazza-Nunes et al. 2002; Barkley et al. 2006), discriminating zygotic and nucellar seedlings (Ruiz et al. 2000; Oliveira et al. 2002; Rao et al. 2007), control of the origin of plants obtained by induced gynogenesis (Froelicher et al. 2007; Aleza et al. 2009), molecular characterization of triploid cultivars (Aleza et al. 2010) or somatic hybrids (Chen et al. 2008b; Bassene et al. 2009a) and the analysis of the origin of 2n gametes (Luro et al. 2004; Chen et al. 2008c; Cuenca et al. 2009; Ferrante et al. 2010).

Due to vegetative propagation and their intervarietal diversification without sexual recombination, some citrus species such as C. sinensis, C. clementina or C. paradisi display very low or no intraspecific molecular polymorphism. Some studies have targeted with success markers of transposable elements to identify molecular polymorphism between varieties within these species (Breto et al. 2001; Bernet et al. 2004; Tao et al. 2005). IRAP and REMAP markers have also been recently used for studies of genetic similarity based on retro-transposon within the genus Citrus and its relatives (Biswas et al. 2010a, b). High-throughput methods for marker saturation are needed for efficient QTL and association genetic studies, as well as for positional cloning of genes. For this purpose, arrays for SNP markers (Close et al. 2006; Omura et al. 2008; Ollitrault et al. 2011) were developed in the USA, Japan and France and permitted the elaboration of saturated genetic maps (Ollitrault et al. 2011; Roose, personal communication) that should have an important impact on citrus genetics and genomics research.

Fine mapping of a locus involved in polyembryony

Polyembryony is a process to develop embryos derived from maternal tissues in seed. It is a type of apomixis that produces plants asexually, without fertilization or meiosis. In polyembryonic citrus cultivars, somatic embryos develops from nucellar tissues in seeds of the maternal plant, as a clone (Kobayashi et al. 1967, 1979). The numbers of embryos produced in seeds of many economically important citrus cultivars vary from ~2 (lemon, lime), to ~20 (grapefruit, sweet orange, satsuma, kumquat, rough lemon and trifoliate orange), to many (>20; Ponkan mandarin). Polyembryony strictly limits the use of these cultivars for cross breeding, but it is a useful trait for stable maintenance and propagation of citrus rootstocks from seeds, ensuring plant uniformity in the orchard.

Previous studies by means of test crosses suggested that a single, or a few, dominant genes were involved in the determination of polyembryony in citrus (Hong et al. 2001; Iwamasa et al. 1967; Parlevliet and Cameron 1959). An evaluation in a cross between two polyembryonic cultivars (C. volkameriana and P. trifoliata) and confirmative evaluation with other crosses proposed several QTL loci for polyembryony (Asins et al. 2002). Recent approaches toward DNA marker mapping of the polyembryony locus have been reported from several groups.

Nakano and colleagues identified several DNA markers neighboring a major polyembryonic locus of Satsuma mandarin by evaluation in a cross, that segregated for polyembryony in a 1:1 ratio, between monoembryonic ‘Kiyomi’ tangor (Satsuma mandarin × Sweet orange) and a polyembryonic Satsuma mandarin cultivar (Nakano et al. 2008b). The linkage map with RAPD and SCAR markers obtained from bulked segregation analysis covered a region for 47 cM in length flanking the polyembryony locus. Another approach reported by Kepiro and Roose (2009), identified five AFLP markers tightly linked to the polyembryony locus in P. trifoliata, although a linkage map was not presented. Through a continuous effort of DNA marker landing and BAC walking with DNA sequencing of Satsuma BAC clones, Nakano with colleagues enriched CAPS or SNP markers surrounding the polyembryony locus. They evaluated haplotype structure by means of these markers and constructed haplotype-specific BAC contigs that cover the genomic region of the polyembryony locus (Nakano et al. 2008a). Comparative mapping among five cross populations with the developed DNA markers flanking the polyembryony locus demonstrated similar lineage of these markers, and it suggested that a region flanking the polyembryony locus could be conserved among a wide variety of polyembryonic citrus plants. Very recently, they sequenced 380-kb region of several BAC clones covering the polyembryony locus and identified several ORFs of interest according to their functional annotations. Sequence-to-sequence comparisons in this region against the reference sequences are underway to identify the gene(s) associated with citrus polyembryony.

Early flowering by transgenic study and its application for shortening the juvenile period

Flower development is a matter of interest for stable fruit production and cross breeding of many trees. In citrus, the seedling juvenile period is considerably long (1.5 to 20 years), hampering classical breeding efforts, linkage analysis for various important traits and functional evaluations of genes of interest relating to reproductive organs. This is a common problem among woody plants, and various molecular approaches to shorten juvenile periods have been attempted (Flachowsky et al. 2009).

Studies in model plants have revealed the involvement of many genes in flower development (Bemer and Angenent 2010). Initial attempts to promote early flowering in citrus by transgenic techniques were reported (Peña et al. 2001). They introduced Arabidopsis LEAFY (LFY) or APETALA1 (AP1), genes that promote flower initiation in Arabidopsis (Bowman et al. 1993; Weigel and Meyerowitz 1994), into citrange (hybrid of C. sinensis L. Osbeck × P. trifoliata L. Raf.). Expression of LFY or AP1 genes that were driven under control of 35S CaMV promoter in the transgenic citrange plants showed early flowering and produced normal and fertile flowers. Time to the first flower for these transgenic citrange plants were 2 to 20 month, and fruits were obtained as early as the first year. The juvenile periods for these plants were obviously shorter than the 5-years duration for control transgenic plants. Some zygotic seedlings obtained by crosses with transgenic plants (LFY and AP1) as pollen donors likewise had a very short juvenile phase, flowering in their first spring and setting normal fruit. Nucellar seedlings of AP1 transgenic plants also had short juvenile periods, but those LFY transgenic plants exhibited strong inhibition of apical growth possibly due to the fact that they were not grafted onto vigorous root stocks. These observations confirmed inheritance of introduced transgenes and stable expression to induce early flowering of their offspring.

Use of an endogenous citrus gene in shortening juvenile phase was also reported (Endo et al. 2005). They introduced the citrus ortholog of FLOWERING LOCUS T gene under regulation of the 35S CaMV promoter, which is involved in the photoperiodic induction of flowering (Koornneef et al. 1991), into trifoliate orange (P. trifoliata). Transgenic plants expressing the citrus FT gene flowered as early as 6 to 11 months after transformation. Flowers and pollen of these transgenic plants were normal and fertile. Zygotic seeds obtained by crossing with monoembryonic ‘Kiyomi’ tangor (C. unshiu × C. sinensis) showed extremely early flowering, and a part of these seeds flowered immediately after germination, confirming that the endogenous FT gene is also capable of inducing early flowering in citrus as demonstrated with the LFY or AP1 genes.

Coincidence between FT gene expression and the period of putative flower bud initiation was reported by comparing profiles of their expression for citrus orthologues of FT, APETALLA1 (CsAP1), LEAFY (CsLFY) and TERMINAL FLOWER 1(CsTFL1) that have been involved in floral organ development in Arabidopsis (Nishikawa et al. 2007). The expression profile of FT coincided with floral induction in Satsuma, induced by low temperature treatment at 15 °C. They found that seasonal profiles of FT expression in evergreen Satsuma mandarin and deciduous trifoliate orange were also associated with the supposed period of flower bud formation for these plants (Nishikawa et al. 2009). Relating to these observations, the citrus homolog of MOTHER OF FT AND TFL1 (MFT) gene was isolated and its preferential expression in seeds was reported (Nishikawa et al. 2008). Ectopic up-regulation of 13 citrus genes in FT transgenic plants was evaluated with Citrus 22K oligo DNA microarray (Nishikawa et al. 2010). Among them, citrus orthologs for two SEPALLATA (SEP) and one FRUITFUL(FUL) genes (CuSEP1, CuSEP3 and CuFUL) were introduced to Arabidopsis thaliana and confirmed induction of early flowering in transgenic Arabidopsis plants.

Meanwhile, precocious citrus plants found occasionally as a spontaneous mutant or from zygotic seedlings have been evaluated (Iwamasa et al. 1967; Wakana et al. 2005; Yadav et al. 1980). Expression profiling of a precocious trifoliate orange plant that was found as a spontaneous mutant revealed differential expression of three genes (BARELY ANY MERITED, FLOWERING LOCUS T and TERMINAL FLOWER1) encoding proteins previously reported to be associated with, or involved in, developmental processes in other species (Zhang et al. 2009). Similar attempts at shortening generation cycles for breeding in Populus by inducing A. thaliana FT genes were reported, although flowering responses of the transgenic Populus plants varied depending on the combination of FT genes and genetic background (Zhang et al. 2010). These observations suggest that the FT gene would primarily determine the period of floral induction in citrus, and over expression of it may shorten the juvenile phase. In addition to these efforts, application to establish an efficient system for functional evaluation of transgene in reproductive organs in citrus, by shortening the juvenile phase, has been reported (Cervera et al. 2009; Endo et al. 2009). Similar attempts to evaluate transgenes using the spontaneous precocious trifoliate orange were also reported (Tong et al. 2009).

Database and information services

As mentioned previously, preliminary annotated assemblies of the Clemetine haploid and diploid sweet orange genomes can be accessed through and tfGDR ( These web portals each have their own specialized tools for genome browsing, searching and utilization for specific targeted research; they also allow access to citrus genome resources housed in NCBI, including but not limited to ESTs and SSRs from both genomes, and BES of diploid Clementine. In addition, version 1.32 of HarvEST:Citrus currently displays 141 libraries and 469,618 ESTs from Citrus and Poncirus, and it can be accessed through As additional research results are generated, it is anticipated that the growth and utility of these public resources will increase exponentially.

Future directions

The currently available versions of the haploid Clementine and the diploid sweet orange genomes were released to the public so the citrus research community could begin to use them for a wide variety of research interests, including studies of disease resistance mechanisms, fruit quality attributes, tolerance of abiotic stresses and many others. However, the work in improving the assemblies and annotations will continue. The Clementine version 0.9 at Phytozome, for example, consisted of 1128 main genome scaffolds, and the sweet orange assembly was even more fragmented assembling into 12,574 main genome scaffolds. Work is in progress to link the Clementine genome sequence with ongoing efforts in integrative physical and genetic linkage mapping (based on BES-derived markers), so the genome sequence can be displayed as nine pseudomolecules. A number of BAC clones will be sequenced and compared with the genome assembly to validate it. Similar efforts are underway to improve the sweet orange genome assembly, as well.

Currently, three haplotypes are represented by the sequences derived from the haploid Clementine and diploid sweet orange. Additional WGS of the diploid progenitor Clementine is being undertaken, and thus a fourth haplotype will be produced. Plans for future analyses of the currently available genomes include exploration of citrus genome evolution and duplication (paleopolyploidy), phylogeny and origins of the sweet orange (the most widely grown citrus type in the world), analysis of transposable elements within the genomes, identification of SSR and SNP sets, comparisons of gene content and genome structure, and in the longer term will come the studies of gene and allelic content as they relate to citrus biochemistry, metabolism, physiology, disease resistance, stress tolerance, fruit quality and productivity.

As sequencing technology evolves and becomes more efficient and less costly, there will undoubtedly be many new citrus genomes sequenced; the foundational reference genome from the haploid Clementine will serve well for accessing and utilizing these newly produced genome sequences. Citrus geneticists will be able to survey and comprehend the genetic diversity resident in the citrus germplasm pool, including not only species within the genus but also closely related genera. It may very well be that the most important genes or alleles required to address serious challenges to citrus production, leading to the genetic improvement of cultivars to meet those challenges, could come from outside of the commercial citrus gene pool. Deep sequencing of citrus transcriptomes is already underway in gene expression studies of host responses to pathogens, with a major emphasis currently on Huanglongbing (citrus greening or HLB, a most serious disease ravaging citrus production on a nearly global basis). Some emphasis is being placed now also on proteomic and metabolomic studies of host responses to pathogens, and this movement toward integration of data sets from the various levels of plant responses will continue and expanded to other critical aspects of citrus plant performance and function. Ultimately, greater understanding of such phenomena will lead to development of tools that can be used to manipulate plants and to manage better their interactions with biotic and abiotic factors. The future chapters of citrus science and the application of genome sequence-derived knowledge and tools will be written on the basis of what has been accomplished with these very first citrus genome sequence resources.