Genomic insights advance the fight against black rot of crucifers

Xanthomonas campestris pv. campestris, the causal agent of black rot of crucifers, was one of the first bacterial plant pathogens ever identified. Over 130 years later, black rot remains a threat to cabbage, cauliflower, and other Brassica crops around the world. Recent genomic and genetic data are informing our understanding of X. campestris taxonomy, dissemination, inoculum sources, and virulence factors. This new knowledge promises to positively impact resistance breeding of Brassica varieties and management of inoculum sources.

Bacteria of the genus Xanthomonas cause an array of diseases on over 300 plant species (Vauterin et al. 2000). Xanthomonas campestris pv. campestris (Xcc), the causal agent of black rot of Brassicaceae, was one of the first identified plant pathogenic bacteria, described by Harrison Garman in 1889 in Kentucky, USA (Alvarez et al. 2000;Garman 1890). Garman determined that a disease of cabbages causing "brown watery lesions" appearing in times of high rainfall and humidity was caused by a bacterium (Garman 1890). Erwin Smith further described the disease and studied control methods in the years following its discovery (Smith 1898). Smith used Xcc as a central example during heated debates with Alfred Fischer between 1897 and 1901 on whether bacteria could cause plant disease (Fischer and Smith 1981).
Through more than a century of research, taxonomy, modes of pathogen transmission, and virulence have been described based on observations of pathogenicity on crops and survival in weeds, soil, and seeds. Despite the rich history of Xcc research, effective control of black rot remains elusive. Recent advances in genetic characterization and next generation whole genome sequencing have enabled more conclusive determination of the relationship of Xcc to other Xc pathovars, better understanding of the distribution and sources of Xcc inoculum, and characterization of Xcc virulence factors. Beginning with an overview of the disease and a snapshot of the currently available genome sequences for Xc, this review summarizes these recent advances, and highlights prospects for improved control of black rot.

Symptoms, diagnosis, and control of black rot
Black rot symptoms include V-shaped lesions originating from bacterial points of entry at hydathodes or wounds (Figs. 1c,2a). Xcc travels through the plant vasculature leaving a trail of blackened veins for which the disease gets its name. In severe cases, Xcc can travel systemically through the plant to reach vascular tissue in the head (Fig. 1d), resulting in discolored, unmarketable cabbage prone to secondary infection during storage (Ragasová et al. 2020). Xcc typically enters through hydathodes as overnight guttation fluid is drawn back into the plant in the morning hours (Cook et al. 1952). It may also enter through wounds, from the leaf surface or in rain splash. Unlike many other bacterial pathogens, Xcc cannot infect through stomates. There is evidence that Xcc can cause a hypersensitive response (HR) at stomates (Cook et al. 1952) that might limit further ingress of the pathogen. It is alternatively possible that the ability of stomates to close in response to a pathogen limits entry of Xcc; hydathodes have not been found to undergo this  (Cerutti et al. 2017). Other barriers, including incompatibility with the mesophyll environment, might also play a role (Laurent Noël, personal communication).
Xcc is primarily seed-dispersed; a single infected seed in 10,000 can cause an epidemic in the field (Schaad et al. 1980;Williams 1980). Cabbage seeds are typically started in high-density transplant facilities with overhead watering, which can result in extensive bacterial spread (Fig. 1b). Germinating seed contaminated with Xcc will often remain asymptomatic under greenhouse conditions (Roberts et al. 1999). Yet, when these infected seedlings are moved to the field, they become sources of secondary inoculum later in the growing season.
Black rot can be diagnosed visually based on lesions and blackened veins. The presence of the pathogen can be confirmed by using PCR or by plating tissue or seed extracts on MCS20ABN, FS, or YDC media (Roberts and Koenraadt 2007). Mucoid,yellow (Fig. 1a;FS or YDC) or green (MCS20ABN) colonies grow within four days (Roberts and Koenraadt 2007).
Control measures for black rot are limited, but include copper application, crop rotation, removal of crop debris and cruciferous weeds, seed treatment, and using pathogen-free seed (Vicente and Holub 2013). Planting tolerant varieties is recommended, but not always practiced due to varietal preference among growers and a lack of available tolerance in certain brassicas. There are no identified major genes for black rot resistance in commercial cole crops. Progress and challenges in Brassicaceae resistance breeding have been reviewed elsewhere (Debieu et al. 2016;Huard-Chauveau et al. 2013;Singh et al. 2018;Vicente et al. 2002). As black rot-resistant cabbage remains elusive in traditional breeding, it is important to gain a greater understanding of the global spread, sources of inoculum, and virulence mechanisms of Xcc to help growers make educated choices to limit disease impact in the field and to enable breeders to follow targeted strategies to produce more resistant Brassica varieties. As temperatures rise and weather patterns change, it has been hypothesized that bacterial diseases, including those caused by Xanthomonads, will emerge or reemerge more intensely even in previously disease-free areas, increasing the need for sustainable control strategies (Kudela 2009).

Genome sequences available for understanding X. campestris taxonomy and virulence
Xyllela fastidiosa, a member of the Xanthomonadaceae family, was the first plant pathogen to be sequenced (Simpson et al. 2000), and the first genomes for members of the Xanthomonas genus soon followed in a comparison of Xcc strain ATCC33913 and a strain of Xanthomonas axonopodis pv. citri (da Silva et al. 2002). Next, the genome of Xcc 8004, a rifampicin-resistant strain produced from an isolate from infected cauliflower, was published and compared to ATC33913 (Qian et al. 2005). While Xcc is an important plant pathogen, it is also used in industrial production of the polysaccharide xanthan, a thickening agent and stabilizer. Thus, the next set of genomes published focused on commercial, xanthan-producing strains of Xcc for optimization of xanthan biosynthesis (Tao et al. 2012;Vorhölter et al. 2008;Wibberg et al. 2015). In 2013, draft genome sequences of strains CN14, CN15, and CN16, isolated in Guilin, China, were produced using next generation Illumina sequencing (Bolot et al. 2013b). Long-read sequencing (Single Molecule, Real-Time (SMRT) Sequencing, Pacific Biosciences, Menlo Park, CA, USA) has been applied to Xcc strains CN03, CN12, CN14, CN15, CN16, CN17, and CN18, isolated from areas across China (Denancé et al. 2018;He et al. 2007). Long-read sequencing overcomes limitations of previous sequencing methods by allowing for easier assembly of repetitive regions including transcription activator-like effector (TALE) genes, discussed later in this review. In addition, six X. campestris strains from pathovars raphani (756c and CFPB 5828r), campestris (Xca5, formerly known as armoraciae), incanae (CFBP 1606r and CFBP 2527r), and barbareae (CFBP 5825r) were sequenced, allowing for comparisons of virulence factor repertoires (Bogdanove et al. 2011;Bolot et al. 2013a). The nonpathogenic X. campestris strain E1, isolated from Brassica oleracea seed, was sequenced and found to be similar to pathogenic X. campestris strains, yet lacking common characteristics of those strains such as the type III secretion system (Lee et al. 2020). Genome sequences of 71 X. campestris strains have been published as of December, 2020 (Supplementary Table 1). Each has ~ 65% GC content and consists of a 4.9-5.2 Mb chromosome with up to 3 plasmids (Supplementary Table 1).

X. campestris genomic relationships reflect pathovar groupings
In the past, the species X. campestris included an abundance of pathogenic variants (pathovars), defined by the host(s) on which they cause disease and the nature of the disease. Current nomenclature, supported by molecular, genetic, and genomic characterization, limits X. campestris to pathogens and endophytes found on Brassicaceae including cruciferous crops, ornamentals, and the model plant pathogen, Arabidopsis thaliana (Vauterin et al. 2000;Vicente et al. 2001). Six pathovars of X. campestris were proposed based on DNA-DNA hybridization (Vauterin et al. 2000): pvs. aberrans (Xcab), armoraciae (Xca), barbareae (Xcb), campestris (Xcc), incanae (Xci), and raphani (Xcr) (Vauterin et al. 2000;Vicente et al. 2006Vicente et al. , 2001. However, more recent pathogenicity assays on an array of host plants indicate that only three disease types are caused by X. campestris strains, and that they should therefore be regrouped into three pathovars, Xcc, Xci, and Xcr, with a 4th group of strains, designated as X. campestris nonpathogenic (Xcnp), to include those not known to cause disease symptoms on any host (Fargier et al. 2011;Fargier and Manceau 2007). The first disease type is vascular black rot caused by Xcc and Xcab on Brassicas (Fig. 2a). While this disease has the most impact on cultivation of Brassica oleracea, including cabbage, cauliflower, broccoli, and brussel sprout crops, Xcc can also cause disease in radish (Raphanus sativus), ornamental Brassicas, and some ecotypes of the model plant Arabidopsis thaliana (Fargier and Manceau 2007;Guy et al. 2013a;Qian et al. 2005). The second disease, bacterial blight of ornamental crucifers, including garden stock (Matthiola incana) and wallflower (Erysimum cheiri), is caused by Xci (Fargier and Manceau 2007). Xci is not known to cause disease on cabbage or other brassica oleracea crops (Fargier and Manceau 2007). Like black rot, bacterial blight is a vascular disease in which the pathogen enters through leaf hydathodes or wounds (Fig. 2b). Finally, Xcr and some strains classified as Xca cause bacterial spot of crucifers and tomato (Solanum lycopersicum) (Fig. 2c). Bacterial spot is nonvascular; Xcr enters through stomates to cause necrotic spots surrounded by chlorotic halos. Xcr and Xcc can be found together in Brassica fields, however, Xcc has the greater economic significance.
Molecular and whole genome sequence comparisons buttress the argument for classification of X. campestris into three pathovars and two groups of so-called non-pathogenic strains. Vicente and colleagues (Vicente et al. 2006) found that repetitive extragenic palindromic PCR (rep-PCR) of 50 strains from global collections distinguished Xcc and Xcr, and Fargier and colleagues later showed that multi-locus sequence analysis (MLSA) did as well, using 42 Xcc and Xcr strains (Fargier et al. 2011). Of the available genomes, 23 sequences classified as X. campestris in the NCBI genomes database showed low (< 95%) average nucleotide identity (ANI), indicating that they were misnamed and are members of a different Xanthomonas species. We used the remaining 48 published Xc genome sequences to create an ANI matrix ( Fig. 3a) using the enveomics ANI matrix calculator (Rodriguez-R and Konstantinidis 2016) and a maximum likelihood (ML) tree ( Fig. 3b) using the Reference sequence Alignment-based Phylogeny builder (REALPHY) (Bertels et al. 2014), with Xcc type strain ATCC33913 as the reference. Both ANI and ML analyses revealed that Xci and the nonpathogenic strains group separately from Xcc (Fig. 3). Xcr strains were heterogeneous, but they grouped more closely with Xci than with Xcc (Fig. 3). However, strains from all three pathovars had ANI of ≥ 97%, demonstrating that they are the same species, despite their different host ranges and disease pathologies. The so-called nonpathogenic strains are of two types: those that appear to be more genetically similar to endophytes and lack a type III secretion system (T3SS), like Xc nE1 (Lee et al. 2020) and Xc CFBP7700 (Meline et al. 2019) and those that have a full virulence repertoire, (previously named Xcb, type strain CFBP5825r) (Fargier and Manceau 2007;Roux et al. 2015). Strains of these two types cluster separately from each other based on ANI and ML analyses (Fig. 3). Because the second group of strains contain full virulence repertoires, even though they were not found to cause disease on the plant species tested, they could be pathogens on yet unidentified hosts. Therefore, we distinguish these the first and second groups respectively as Xcnp and X. campestris unknown pathogenicity (Xcup).

Xcc is primarily disseminated via contaminated seed
Xcc is found in almost every area of the world where cabbage or other cruciferous crops are grown. Cabbage and other cruciferous vegetable seeds are shipped around the world. Though Xcc can overwinter in crop debris and soil and has been hypothesized to survive on cruciferous weeds or endophytically in other plants, genome sequencing and genetic comparisons of strains have shed light on the geographic distribution of Xcc and shown that seeds are the primary source of inoculum.
As early as 1898, it had been hypothesized that weeds were sources of Xcc inoculum, and growers were recommended to destroy weeds surrounding fields (Smith 1898). X. campestris has been isolated from many species of cruciferous weeds and Arabidopsis (Ignatov et al. 2007;Kniskern et al. 2007;Schaad and Dianese 1981). And, a study showed that black rot could spread to cabbage planted in a field surrounding Brassica campestris, Raphanus raphanistrum, and Chenopodium amaranticolor plants artificially inoculated with a strain of Xcc (Schaad and Dianese 1981). These data further encouraged removal of cruciferous weeds in cabbage growing areas. However, later, multilocus sequence analysis (MLSA) and amplified fragment length polymorphism (AFLP) comparing Xcc found on crops to strains found on cruciferous weeds surrounding agricultural areas and in uncultivated sites in California concluded the populations were distinct (Ignatov et al. 2007). While many of these California weed isolates were reported to cause black rot symptoms on Brassica oleracea, strains collected from crop plants in fields adjacent to weed isolate collection sites clustered with other crop isolates, rather than with the weed isolates from the same area. This observation of separation between weed and 1 3 crop isolates was repeated with Xc found in the Rhine Valley in Germany by Krauthausen et al. (2018). Based on the data from these two regions, as concluded by Krauthausen et al. (2018), though some weed isolates can cause disease when inoculated to Brassica crops, strains from weeds do not appear to be a major source of inoculum for black rot, perhaps due to unknown factors that make weed to crop transmission inefficient.
Seed was deduced to be the major source of inoculum by comparing the relatedness of strains found in distinct geographic regions. If the same strains re-infect crops year after year, Xcc is mainly surviving in fields, either in soil, on debris, or on alternative hosts. If the pathogen is primarily seedborne, it would be expected that different strains would be found in a field or region annually, depending on the origin of the seed planted. Indeed, this was found to be the   LMG8031  B1459  ICMP21080  WHRI10003  WHRI8960   17  ICMP4013  CFBP5817r   Xca5  8004  Xc86  ATCC33913   CN16  CN14  CN15  CN03  CN12  CN17  CN18  MAFF302021   bra1  MAFF301176   3811  B100  CFBP1869r   JX  CN11  MEDVA40  LMCP11  LMCP73  MEDVP25   E1  NLP121  NLP172  CFBP7700   BRE17  MAFF106181  WHRI10006   756c  CFBP5825r  WHRI8525  WHRI8473  CFBP1606r  WHRI8527  CFBP2527  WHRI8481 (Lange et al. 2016). MLSA of 154 strains showed that Xcc populations in New York change each year and that strains are diverse and more closely related to other isolates from around the world than to each other. Multiple studies have shown that Xcc strains group into two clades (Denancé et al. 2018;Fargier et al. 2011;Lange et al. 2016;Vicente et al. 2006). However, genetic data show high degrees of heterogeneity derived from recombination and point mutations among X. campestris isolates that cannot be explained by geographical origin or host identity (Fargier et al. 2011). While there does appear to be a correlation between strain relatedness and country of origin , comparisons of whole genomes from global collections revealed that there is little geographic population structure (Denancé et al. 2018;Guy et al. 2013a;Lange et al. 2016;Vicente et al. 2006). Thus, although crop debris and contaminated soil may be important sources of inoculum in some locations, for example tropical agricultural areas where cruciferous crops are grown year-round or areas where cultural practices do not include crop rotation, the data point to high levels of global pathogen dispersal, underscoring the conclusion that seed is the primary source of inoculum, moving Xcc around the world. While hot water seed treatments are common, and seed testing is standard practice, these do not appear sufficient to control the spread of black rot. More research in Brassica seed health may help provide guidelines for improved practices to decrease the global spread of black rot via seed.

Genomic analyses provide insights into mechanisms of Xcc virulence
Xanthomonas genomes have been extensively examined for the presence of genes for virulence factors including exopolysaccharides, lipopolysaccharides, secretion systems, and secreted toxins and virulence effectors. The availability of many Xanthomonas genomes enables comparative analyses to find which pathogenicity-or virulence-associated genes are conserved and therefore likely most important.
In a study of over 50 published Xanthomonas genomes representing vascular and non-vascular pathogens, presence of a hydrolase gene, cbsA, was strongly correlated with the ability to invade the vascular tissue (Gluck-Thaler et al. 2020). Heterologous expression of cbsA enabled strains from non-vascular Xanthomonas pathovars to cause vascular disease symptoms. When cbsA was knocked out in some vascular Xanthomonas pathovars, the pathogen was unable to move through the xylem (Gluck-Thaler et al. 2020). While this study focused on Xanthomonas translucens, the findings suggest that the presence of this gene in Xcc and its absence from Xcr could be the reason the former is a vascular pathogen and the latter is not. Broader comparative analysis of Xcc and Xcr genomes as well as functional analysis will be needed to explore this hypothesis.
A critical component for pathogenicity in most Xanthomonas species is the type III secretion system (T3SS), which translocates type III effector (T3E) proteins into plant host cells (Arlat et al. 1991). Once inside the cell, T3Es target host proteins or, in the case of the transcription activator-like effector (TALE) proteins discussed below, the host genome, in ways that contribute to bacterial proliferation and disease development. While a few species of plant pathogenic Xanthomonas, like X. cannabis, do not encode a functional T3SS, most do, and rely heavily on a repertoire of T3Es to cause disease (Jacobs et al. 2015;Pieretti et al. 2009;White et al. 2009). Mutation of the hrp (hypersensitive response and pathogenicity) genes of Xcc that encode the T3SS render the pathogen unable to translocate effectors and non-pathogenic (Arlat et al. 1991;Guy et al. 2013b). Genomic and functional analyses have revealed that Xcc strains encode between 17 and 27 T3Es, while other X. campestris pathovars have 13-24 (Roux et al. 2015). Eighteen T3Es are shared among eight sequenced Xcc strains and are currently considered the core Xcc 'effectome' (Roux et al. 2015). The core effectome of the X. campestris species, based on a comparison of 14 strains from pathovars campestris, raphani, incanae, and a non-pathogenic strain, is quite small, consisting only of xopF1, xopP, and xopAL1. Xanthomonas T3Es whose functions have been characterized have been described elsewhere (Timilsina et al. 2020;Vicente and Holub 2013).
TALE proteins are found in many Xanthomonas species. In X. oryzae, X. axonopodis pv. manihotis, X. citri pv. citri, and X. gardneri, TALEs are important virulence determinants. TALEs localize to the nucleus, directly bind specific plant gene promoters, and upregulate these genes by virtue of a C-terminal activation domain. TALEs have a conserved central region of 33-35 amino acid tandem repeats that determine the DNA binding specificity, with each repeat interacting with a single base in the DNA, contiguously. A two-residue polymorphism in each repeat called the repeat variable diresidue (RVD) predictably determines the nucleotide to which that repeat can bind. The plant promoter sequences that TALEs bind are called effector binding elements (EBEs) and can be computationally predicted in a plant genome and confirmed via expression studies. The most common TALE targets discovered so far are in Clade III of the SWEET gene family, encoding sucrose transporters. Upregulation of a SWEET gene by a TALE has been shown to increase susceptibility in diseases caused by Xanthomonas in rice, cassava, citrus, and cotton (Hutin et al. 2015;Perez-Quintero and Szurek 2019). TALE targets, including SWEET genes, whose upregulation leads to increased disease, have been named susceptibility (S) genes. S gene alleles with mutations in the EBE preventing TALE binding have been introduced by breeding or by genome editing to provide "resistance through loss-of-susceptibility" (Blanvillain-Baufumé et al. 2017;Hutin et al. 2015;Oliva et al. 2019).
Initially, it was thought that Xcc lacked TALEs, as the first three complete genomes published did not contain TALE genes or gene fragments (da Silva et al. 2002;Qian et al. 2005;Vorhölter et al. 2008). The first evidence that Xcc strains might encode TALE genes came with the sequencing and reclassification of Xca5, an Xcc strain formerly classified as X. campestris pv. armoraciae (Bolot et al. 2013a). This strain expresses two TALE genes (hax3 and hax4) on the chromosome and one (hax2) on a plasmid. Due to the ~ 102-105 bp repeats present in TALE genes, whole genome sequencing using short-read technology cannot resolve complete TALE gene sequences. The fact that there are often multiple TALE genes encoding different sequences of RVDs in a single Xanthomonas genome adds to the challenge. However, with 3rd generation, single molecule long-read technology (SMRT) sequencing, repetitive TALE gene sequences in their genomic context, even those in large clusters, can be resolved (Booher et al., 2015). SMRT sequencing is currently the gold standard for sequencing Xanthomonas genomes to capture TALE genes, and other platforms, such as the MinION platform (Oxford Nanopore, Oxford, UK), hold promise (Bansal et al. 2018;Kaur et al. 2019).
A survey of 46 Xcc isolates from a global collection that included long-read sequencing showed that 29 strains encode 26 unique TALEs (Denancé et al. 2018). Xcc strains have been found to contain between zero and four TALE genes found on the chromosome or on plasmids (Denancé et al. 2018). Unlike TALE genes from other Xanthomonas species such as X. oryzae, Xcc TALEs are not found in genomic clusters (Denancé et al. 2018). An analysis of Xcc TALEs provided evidence that two ancestral TALE genes underwent multiple DNA rearrangements to result in the diversity of TALEs found in sequenced strains (Denancé et al. 2018). Interestingly, of > 20 Xcr strains tested by PCR and western blot analysis, none has been found to encode TALEs (ZD and Laurent Noël, unpublished data). This is surprising, as Xcc TALE genes are often found on plasmids and in association with mobile genetic elements, and Xcr and Xcc can coexist in the same fields or even on the same host plant. In contrast to Xcr, Xci and Xcup both encode TALEs (Roux et al. 2015). While TALEs play important roles in virulence in many species of Xanthomonas pathogens, more research is needed to determine whether they act as virulence factors for X. campestris.
Further application of long-read sequencing to diverse Xcc isolates will help determine the prevalence and diversity of TALEs in this pathogen and help identify relevant targets in the crop species it infects. While no Xcc TALE targets have yet been experimentally confirmed, a 2018 study revealed that an unsequenced Xcc strain, Xc1, upregulates SWEET15, a clade III SWEET, in Arabidopsis thaliana (Zhang et al. 2018). Whether SWEET15 is upregulated by a TALE in Xc1 and functions as an S gene remains unknown. Future work to identify TALE-upregulated S genes in Brassica hosts could provide targets for informed breeding strategies. Breeding of varieties with naturally occurring or gene-edited promoter variations preventing TALE-mediated upregulation could result in plants less susceptible to black rot. Conceivably, even strains without TALEs may benefit from basal expression of these S genes, so gene variants with lower basal expression might reduce susceptibility broadly. Resistance through loss-of-susceptibility is considered an approach that can provide more durable means of disease control than deployment of dominant resistance genes, especially if loss-of-function alleles for multiple S genes are deployed together (Hutin et al. 2015).

Prospects for disease management
Genomic and genetic studies have informed our understanding of host specificity, transmission, and virulence. Already, genomic data have changed management strategies for black rot. However, Xcc continues to cause losses in yield and quality to Brassica growers around the world. How can the knowledge we have gained from genomic and genetic studies be further used to improve management of black rot, and what additional information is needed?
As discussed above, Xcc strains do not group geographically, indicating a global population likely disseminated on seed each year. And, we know from genetic groupings of Xcc weed and crop isolates that cruciferous weeds are not likely major sources of inoculum, further indicating the importance of seed dispersal as the primary mechanism of spread. Despite seed treatments and testing already being done, Xcc remains a major problem to growers. Certainly, more information is needed to understand and prevent seed transmission of Xcc, including understanding how Xcc retains viability after seed treatment. Such studies will benefit from the available genomic and genetic resources. It will also be important to explore mechanisms driven by host biology, for example by examining variation in seed transmission across host genotypes and exploring why infected seedlings often show no symptoms until transplantation to the field.
Genetic resistance is the most promising solution to prevent damage caused by black rot. However, resistant varieties are not currently available, and, for long-term storage cabbage and cauliflower, even tolerant varieties are limited. To breed more resistant Brassicas, modes of Xcc virulence must be better understood, and Xcc genomic and genetic resources will continue to advance this goal. In particular, they will help define the diversity and roles of TALEs in Xcc. If TALEs are important for Xcc virulence, one potential road toward more resistant cabbage would be to determine TALE-targeted S genes and to breed more resistant cabbage with naturally occurring or edited non-functional (or non-TALE-inducible) alleles of those genes. As noted earlier, such alleles would result in brassicas more resistant at least to strains carrying certain TALEs, and the non-functional alleles could provide even broader resistance if the S genes were important at basal levels of expression even for strains without TALEs. New insights could also be gained by further sequencing of non-pathogenic X. campestris strains and comparing them to pathogenic ones.
Continued advances in Xcc genomics and genetics, along with focused inquiry on seed transmission and host mechanisms in disease, promise to set us ahead in the historical fight against this globally destructive pathogen.