Conservation of putative transcription factor binding sites of co-expressed Plasmopara halstedii genes in two Phytophthora species

Oomycetes, a large group of fungus-like organisms, include some destructive plant pathogens causing enormous economic damage. Phylogenetically, oomycetes belong to the kingdom Straminipila and have diverse lifestyles, including saprotrophs and both general and specialized pathogens of various eukaryotic supergroups. A rapid increase in genomic studies and next-generation sequencing technologies have led to significant progress in understanding oomycete lifestyles. However, their genetics, including transcriptional regulation, have been studied to a much lesser extent. Here, we provide a cross-species analysis of oomycete promoter for providing a first step towards elucidating gene regulation networks related to pathogenicity and life cycle stages. The clustered DNA sequences of Plasmopara halstedii transcriptome time-series expression level dataset from a preliminary study have been used as a core reference for cross-species comparisons. Using a computational pipeline, 46 potential transcription factor binding site (TFBS) motifs in 25 clusters with functionally conserved downstream genes of downy mildew and two Phytophthora species, regardless of the gene expression levels of Phytophthora transcriptomes, were found. This can now be followed up by knock-out experiments in oomycete species amenable for genetic modification.


Introduction
Oomycete plant pathogens are eukaryotic, filamentous organisms of the kingdom Straminipila that exhibit obligate biotrophic, hemi-biotrophic, or necrotrophic lifestyles (Fawke et al. 2015). The most notorious and devastating oomycetes belong to the genus Phytophthora (almost 200 species) and the downy mildews (over 700 species) (Thines 2014;Scanu et al. 2021;Chen et al. 2022). Phytophthora species are hemi-biotrophic pathogens and possess two lifestyle phases, where they initially act as biotrophs and then switch to a necrotrophic phase in which they kill their host to feed on the dead matter. Plasmopara halstedii, an obligate biotroph, maintains a close interaction with sunflower hosts in which it depends on the host for its survival. Most of the obligate biotrophs show limited or very narrow host ranges, unlike many hemi-biotrophic Phytophthora species, which sometimes can parasitize members of several plant families. With respect to their importance as plant pathogens in terms of research and economical or environmental impact, the following list by Kamoun et al. (2015) provided the following ranking: at the first position Phytophthora infestans causing late blight of potatoes and tomatoes and at the fourth position Phytophthora sojae causing seed, root, and stem rot of soybean (Kamoun et al. 2015). Pl. halstedii causes downy mildew of sunflower and significant yield loss worldwide in commercial seed production (Sharma et al. 2015;Gascuel et al. 2015;Laura Martínez et al. 2021). A recent study demonstrated that several virulence mechanisms of filamentous fungi and oomycete pathogens are highly conserved and include core functions such as transport, carbohydrate metabolism, secondary metabolite synthesis, signal transduction, and amino acid metabolism (Pandaranayaka et al. 2019). However, cross-species transcriptional regulation studies to identify regulatory motifs associated with conserved gene functions are still lacking.
Both obligate biotrophic and hemi-biotrophic oomycetes use secreted effector proteins for manipulating the host immune system to their own benefit (Bozkurt et al. 2011). Of the pathogenicity-related genes or proteins, secreted apoplastic effectors are targeted to the space outside the host cytoplasm (Tian et al. 2007;Li et al. 2020). These include small cysteine-rich proteins such as cystatin-like protease inhibitors (EPIC), Nep1 (necrosis-and ethylene-inducing peptide 1)-like proteins (NLPs), toxins, and cell wall-degrading enzyme (CWDE) glycoside hydrolase families. In plant pathogenic oomycete species, a second class of effectors, the cytoplasmic effectors, is targeted to host cytoplasm, which include effectors with a recognizable N-terminal host-targeting domain such as RxLR (arginine-any-leucine-arginine)-dEER (Asp-Glu-Glu-Arg) effectors, Crinklers (CRN) with the N-terminal domains LXLFLAK (Leu-Xaa-Leu-Phe-Leu-Ala-Lys)-DWL (Asp-Trp-Leu)-HVLVVVP (recombination domain) and suppress or manipulate the host processes by manipulating targets within plant cells (Asai and Shirasu 2015;Selin et al. 2016).
Throughout the past few years, the availability of a large number of completely sequenced genomes for oomycetes became available (Sivashankari and Shanmughavel 2007;Fletcher et al. 2022;Matson et al. 2022;Cox et al. 2022). In addition, there have been rapid advances in RNA sequencing techniques leading to a growing number of studies exploring gene regulation during pathogen development and infection (Wang et al. 2009;Seidl et al. 2012). The comparative analysis of multiple genomes can provide insights into the biology of species not obtainable from singular genomes (O'Brien and Fraser 2005). For example, a study comparing human genes with genes from other organisms was able to assign functions to un-annotated genes in all genomes investigated (Sivashankari and Shanmughavel 2007).
By considering the core orthologous genes, phylogenetic analysis on deeply sequenced oomycete genomes has revealed a close relationship between downy mildews and Phytophthora (Voglmayr et al. 2004;Sharma et al. 2015;McCarthy and Fitzpatrick 2017;Bourret et al. 2018). To identify conserved gene regulatory elements, the current study compared the genome sequences of three related oomycete species, as species with close phylogenetic distance are expected to have sequence conservation and are likely to encode many identifiable orthologs (Judelson et al. 1992;McCue et al. 2002). In the current analysis, two Phytophthora species, Ph. infestans and Ph. sojae, and Pl. halstedii have been included. A study on human and mouse tissues has shown that DNA sequences controlling the expression of genes that are regulated similarly in the related species can be expected to be conserved (Sivashankari and Shanmughavel 2007). Such studies are lacking for oomycetes, so far. More recently, a de novo DNA motif-discovery study in Phytophthora predicted conserved DNA motifs by correlation with gene expression levels during infection (Seidl et al. 2012). Given the close phylogenetic relationship, it can be expected that key transcriptional regulatory networks between Phytophthora and downy mildews would be conserved [16,17].
The present study is aimed at identifying regulatory motifs in co-regulated gene clusters that also are functionally conserved among three oomycetes species.

Pl. halstedii time-series transcriptome dataset
In a previous study, an expression for the life cycle of Pl. halstedii was performed (Bharti et al. 2023). For the generation of the corresponding dataset, gene expression values with false discovery rate corrected p-value or adjusted p-value (p-adj) ≤ 0.01 were clustered using hierarchical clustering, visualized in terms of Z-scores using DEGreport version 1.29.0 (Love et al. 2014;L Pantano 2019), and the 400-nt (nucleotides) upstream sequences of expressed genes were the starting point for further cross-species sequence conservation analyses. Figure 1 shows the procedural framework for motif discovery used in the current study.

Selection of other oomycete species
The key to being able to compare upstream regions between different oomycete species is the availability of their whole genome sequences. To include only high-quality data, in addition to Pl. halstedii (genome assembly GCA_900000015.1) as a reference, the wellstudied oomycete Phytophthora species Ph. infestans (GCA_000142945.1) and Ph. sojae (GCA_000149755.2) were included. Whole genomes and functional annotations are retrieved from Ensembl Protists release 49 and the Joint Genome Institute (JGI) fungal portal (Grigoriev et al. 2014;Howe et al. 2020).

Upstream DNA sequence extraction and genomic alignments
Both gene sequences and their upstream 400-nt flanking regions were retrieved using an in-house python script. Initially, the local nucleotide alignments of oomycete orthologues were constructed using a command line BLAST 2.12.0 (blastn) tool (Altschul 1990;Zhang et al. 2000;Camacho et al. 2009), setting the threshold e-value to < 0.01 and the word size to 11. Subsequently, the local alignments were selected by using the criteria such as nucleotide hit length > 11 nucleotide (nt), identity of 50% or higher, and hit strand in the same direction for both upstream sequence and gene sequence alignment.

Motif discovery among aligned upstream sequences
A motif discovery approach was taken using MEME 5.2.0 or Multiple Expression Motifs (EM) for Motif Elicitation (Bailey et al. 2009) for clusters with less than 50 genes and STREME (Bailey 2021) for clusters with more than 50 genes to discover motifs between 4 and 18 nt (e-value ≤ 0.5). The following parameters were used: zero or one occurrence per sequence, binomial test based on the differential enrichment test, and p-value threshold ≤ 0.05. Furthermore, an empirical 3rd-order background Markov model based on all upstream regions of the combined backgrounds the genomes of Pl. halstedii, Ph. infestans, and Ph. sojae which accounts for single-, di-, tri-, and tetra-nucleotide distributions among the genomes, was applied. The reference genome motif database from pairwise aligned upstream regions was generated using STREME, and discovered motifs in the clustered groups of more than 50 genes were searched against this database. AME (analysis of motif enrichment) of the MEME suite detects putative motifs' coordinates that are relatively enriched (e-value ≤ 0.5) in a set of sequences compared with reference database motifs (McLeay and Bailey 2010). By  De novo and reference database motif discovery and genomic location analysis Cl 1 to Cl 97 (2) Independent sequence retrieval of Ph. sojae and Ph. infestans from whole genomes (Ensembl Protists release 49) and local two databases created (dbase U for upstream sequences and dbase G for downstream gene sequences).
(3) The sequence alignments between the reference Pl. halstedii clustered sequences (Cl 1 to Cl 97 ) and database sequences were made using BLAST (blastn) tool with parameters e-vale < 0.01, word size 11, sequence identity > 70% for upstream, and > 50% for genes respectively. (4) The intersecting upstream sequences with their gene sequences were filtered for further motif analysis of length 4-18 nt using HMM-based MEME suite (cluster size < 50) and STREME (cluster size > 50) tool of MEME suite. In addition, the motif database created with STREME-IDs (dbase M generated from orthologous upstream sequences) using STREME tool. For the discovered MOTIF IDs (de novo + STREME) and reference hits (STREME-IDs), the motif enrichment and exact genomic location search were performed using AME and FIMO of meme suite. (5)

Interpretation of enriched and conserved motifs associated with genes with known functions
From the published genome and time-series transcriptomic data of Pl. halstedii, genes annotated as pathogenicity related, as encoding for transcription factor (TF) proteins, as core biological pathways, as belonging to the secretome, and as clustered gene expression profiles were obtained from the supplementary data of Sharma et al. (2015) and Bharti et al. (2023). Experimentally validated oomycete effector sequences and WY domain information were retrieved from published literature and the supplementary data of EffectorO (Nur et al. 2023

Alignment of upstream nucleotide and downstream gene sequences
In a previous study on the time course of sunflower infection by Pl. halstedii, gene expression analysis revealed a total of 97 expression clusters, with 8444 genes mapped successfully on 15,707 reference gene transcripts (Bharti et al. 2023). In the current study, sequence alignments of orthologous genes from the genomes of Ph. infestans and Ph. sojae could be obtained for 96 of 97 clusters. With an e-value < 0.01, identity > 70%, and hit strand in the same orientation, a similar procedure as for their downstream regions (~ 25 K gene sequences) revealed sequence identity > 50% within gene sequence alignments ( Fig. 2a; Additional file 1). There was no orthologous gene found in Phytophthora spp. for the 17 gene members in cluster 22, which is the reason why no alignments for this cluster could be performed. The remaining 96 clusters contained 6279 sequences and lengths up to 400 nt from the ATG start codon (Additional file 1). A large number of orthologous gene members were assigned to cluster 11 (Additional file 3).

Reference motif database generation from orthologous oomycete genomes and motif discovery on clustered orthologous upstream regions intersected with downstream gene sequences
The tool STREME generated a reference motif database containing 15 over-represented, statistically significant motifs (p-value < 0.05). The retrieved STREME-5 motif (AGC GCG TG, 622 occurrences) out of 15 reference motifs had the highest occurrences, and the second most abundant motif was STREME-1 (CAG CGG GGC TGC CGT, 369 occurrences) ( Table 1; Additional file 2). A total of 245 novel, statistically significant (p-value < 0.05) motifs were found using the tools MEME and STREME (Additional file 3). AME derived the genomic co-ordinates (p-value < 0.05) for 112 putative and cross-species conserved motifs (STREME-IDs) against the reference motif database (Additional file 3).
No significantly enriched motifs were generated for 25 out of 97 expression clusters.

Motif categorization
The 245 motifs found were further divided into five categories on the basis of upstream genomic alignment conservation.  Fig. 1, 2, 3). The reference gene cluster expression profiles for Category A motifs were adapted from Bharti et al. (2023) as shown in Fig. 3. Utilizing sequence similarity-based criteria, 112 Category A motifs conserved in Pl. halstedii, Ph. sojae, and Ph. infestans were further analyzed for association with genes related to pathogenicity, exocytosis and vesicle transport, ion channels and calcium-binding proteins, plant cell wall-degrading enzymes (PCWDEs), and transcription factor (TF) proteins. None of the motifs were found associated with protease inhibitors and ion channels ( Table 1).

Association of TFBS motifs with pathogenicity-related genes, exocytosis and vesicle transport, calcium-binding proteins, plant cell wall-degrading enzymes (PCWDEs), and transcription factors
Genes in 25 clusters show significant over-representation of regulatory motifs (46 motifs; p < 0.05) for genes involved in pathogenicity, exocytosis and vesicle transport, ion channels and calcium-binding proteins, plant cell wall-degrading enzymes (PCWDEs), and transcription factor (TF) proteins, which implies that genes involved in these biological functions are regulated in Pl. halstedii and two Phytophthora species in a conserved fashion in spite of their different genomic locations (Table 1).

RxLR and Crinkler (CRN) motifs and their upstream regulatory motifs
Upstream of putative RxLR-effector coding genes, the motifs found were MOTIF-60, MOTIF-61 in cluster (cl) 36, and STREME-3 in cl 90 (Additional file 4). In the cross-protein sequence similarity analysis, it was found that both motifs were associated with previously reported effectors (Fig. 4). Interestingly, two RxLR-like protein coding genes CEG44730 and PHYSODRAFT_293198 with putative regulatory motifs MOTIF-60 and 61 and the RxLR-like protein coding gene CEG41057 with Table 1 Category A motifs classified according to the gene groups and protein sequence conservation As per the gene groups, the upstream motifs and clusters of Category A (Pl. halstedii; Ph. sojae, Ph. infestans) were separated out. On the basis protein sequence similarity search and mBedlike clustering guide tree generated using Clustal Omega, further, the protein sequences within the clusters were divided into subgroups in order to analyze the orthologous sequence conservation among the gene group(s) within cluster(s). The secretion motif information was adapted from Sharma et al. (2015) and examined for their occurrences in the identified RxLRs and CRN sequences within clusters.
*The presence of secretion motif in the cluster member(s) STREME-3 upstream were retrieved (Sharma et al. 2015). Further, sequence similarity with the functionally characterized effectors Ph. infestans Avrblb2 and Ph. sojae avirulence homolog-5 (Avh5) revealed a putative location and an alternative RxLR motif (Fig. 4, Table 2). Interestingly, there is also protein conservation for genes within clusters containing RxLR-like proteins such the presence of radial spoke protein 11 and DnaJ heat shock protein.
In cl 61, the MOTIF-84 was found upstream of conserved CRN domain-containing proteins CEG41789 and PHYSODRAFT_316479, which had no similarity with previous experimentally validated effectors. From cl 65, three CRN-like genes, CEG44506 (crn1 and PsCRN108 like; conserved LQLFLAK domain) and PHYSODRAFT_471561 (crn2 like) with LFLAK-motif (FLAR) and with the upstream MOTIF-85, were significantly over-represented in the cluster (Fig. 5, Table 2). Two orthologous CRN-like protein coding genes with the motifs HVLVVVP had the upstream MOTIF-38; a HVL-VAL motif was found downstream of MOTIF-13 (in cl 9 and cl 5, respectively) (Additional file 4). The motif STREME-5 was found upstream of cl 54, with one gene CEG44994 (Avr4/6 like) containing HVLVVVD.

Other groups and the upstream regulatory motifs
Pathogenic microbes produce a variety of peptidases, which are enzymes that catalyze the breakdown of host proteins into small polypeptides to disrupt the host defense and create conditions suitable for pathogen colonization (Marshall et al. 2017;Figaj et al. 2019). In the current study, orthologous ubiquitin-specific protease, OTU (ovarian tumor)-like cysteine protease, and serine protease family were found in some clusters with the following regulatory motifs: MOTIF-15 (cl 6), MOTIF-17 (cl 7), MOTIF-86 (cl 65), MOTIF-98 (cl 81), MOTIF-103 (cl 88), and MOTIF-117 (cl 111) were found (Additional Table 1; Additional file 4).
ATP-binding cassette (ABC) transporter proteins constitute one of the largest protein families, present in both prokaryotes and eukaryotes, and transport a broad range of substances  (Bharti et al. 2023) are shown, with an enlarged image for cluster 47 on the right side for highlighting the different stages. For the selected 61 clusters, there are upstream gene sequences conserved among Pl. halstedii, Ph. sojae, and Ph. infestans. The X-axis shows the timepoints and Y-axis shows Z-score (scaled relative gene expression). Using the hierarchical clustering method, clusters with at least 15 genes per cluster were selected for further motif discovery across biochemical membranes (Dassa and Bouige 2001). The regulatory motif MOTIF-78 (cl 56) was found enriched with ABC transporter proteins (Table 1; Additional file 4).
In oomycetes, cytoplasmic Ca 2+ levels are controlled by calcium-binding channel proteins and channels represent key targets for anti-oomycete fungicides for pathogen control (Judelson and Blanco 2005). In the current study, the regulatory motifs MOTIF-15 (cl 6) and MOTIF-17 (cl 7) were found associated with such proteins (Table 1; Additional file 5).
Exocytosis serves for the delivery of vesicle content with enzymes such as proteases, glucanases, and callose from the pathogen cell to the host extracellular matrix and also to the plasma membrane (Leborgne-Castel and Bouhidel 2014). Rab family GTPases and transfer proteins SEC20 and SEC14 for export from endoplasmic reticulum to the Golgi apparatus were conserved in Pl. halstedii and the two Phytophthora species and associated with the motifs MOTIF-54 (cl 28), MOTIF-61 (cl 36), STREME-5 (cl 12), MOTIF-85, 86 (cl 65), and MOTIF-104 (cl 90) (Additional Table 1; Additional file 5).

Discussion
A previous study focused on regulatory motif discovery in upstream clustered DNA sequences from an infection time series for identifying stage-specific putative TFBS (Bharti et al. 2023). In this study, we took advantage of the previously published dataset and investigated, if there are motifs in closely related obligate biotrophs or hemi-biotroph oomycete genomes associated with similar genes and thus potentially functionally conserved. The current study investigated this by identifying the conserved regulatory sequence patterns in different oomycete species. For this, motif discovery and functional conservation analysis of clusters was investigated.

Biological interpretation of regulatory motifs in other categories
The Category B motif MOTIF-95 (cl 71; AAG GCG GAGA; 10 nt; 8 sites; genes; -97/-302 to -87/-312) contains the previously reported GAGA-motif and is similar to the 7-nt cold-box GGA CGA G located upstream of transcription start site essential for Sequence similarity with experimentally validated effector sequences from literature and relationship between the upstream sequence conservation and protein sequences. The subgrouped protein sequence conservation and their identified putative upstream motifs MOTIF-60, 61 a and STREME-3 b using MEME and STREME suite for cluster 36 and 90 respectively containing RxLRlike and RxLR-dEER proteins.

High upstream conservation for zinc finger and myb TFs coding genes
In the present study, the maximum number of genes in Category A with high upstream sequence conservation is zinc finger and myb TF coding genes. Interestingly, promoters containing one CACCT and one CAC CTG known as target of the complex for the Smad-interacting protein 1 (cause a form of Hirschsprung disease in Humans) or "Sip1" from mouse embryo were found to be probably similar to the core of MOTIF-15 (cl 6; GSCAC CAA SYT; 11 nt; 24 sites; 21 genes; -16/-136 to -42/-376; putative CCAAT box ;Remacle 1999;Postigo and Dean 2000). For the cluster 3 genes, the motif STREME-12 (TCT TCG CCA GGA; 12 sites; 11 genes; -228/-390 to -217/-379) also might be targeted by zinc-finger transcription factors (Table 1).

Similarity with experimentally validated effector sequences from literature and relationship between the upstream sequence conservation and protein sequences
Interestingly, the functionally characterized effectors Avr-blb2 of pathogen Ph. infestans have a similarity with the RxLR-like protein coding gene set regulated by 16-nt-long conserved MOTIF-60 (TCAWBKNSMRKCYGRD) and the 6-nt-long MOTIF-61 (ACA AGC ) of cluster 36 ( Fig. 5a; Additional file 4; (Oh et al. 2009;Bozkurt et al. 2011;Sun Fig. 6 Sequence similarity with experimentally validated effector sequences from literature and relationship between the upstream sequence conservation and protein sequences. Protein sequence conservation for genes associated with motifs MOTIF-60 and MOTIF-61 as identified using MEME and STREME on cluster 36, containing DnaJ heat shock proteins a and radial spoke protein 11 b. On the basis of protein sequence similarity search and mBed-like clustering guide tree generated using Clustal Omega, further, the protein sequences within the clusters were divided into subgroups to analyze the orthologous sequence conservation among the gene group ( (Torto et al. 2003) were found to be conserved downstream of the MOTIF-38 (AGAAKRYR ATC AAGG) in cluster 19. The start position of the motif varies between -353 and -131 for a set of genes with associated with MOTIF-38 and is at position -66 upstream for the crn2-like gene of Ph. infestans. The conserved HVLVVVP positions of crn1 and crn2 genes from Torto et al. (2003) were also conserved in Pl. halstedii gene CEG48632 (Sharma et al. 2015). CEG44994, a CRN-like protein of Pl. halstedii, was found to be similar to Avr4/6 (previously Avh171) recognized by Rps4-and Rps6containing soybean plants (Dou et al. 2010). The presence of conserved central CG-rich motif (STREME-5: AGC GCG TG), at position -170 relative to the ATG of the CRN-like gene CEG44994, is a variant of the TGC[A/G][T/G]G[C/G] GA motif implicated in the regulation of glycolysis pathway genes, in a genome-wide upstream motif analysis of the protist parasite Cryptosporidium parvum (Mullapudi et al. 2007).
Co-expression of genes, as an indicator of concerted regulation, suggests that the genes may share similar regulatory motifs. However, it is not necessarily true that the co-expressed genes must have the same or similar regulatory motifs conserved in closely related species. Even the same species with similar expression patterns might also be due to a master regulator that controls the expression of other transcription factors for various nodes, leading to coordinated expression of multiple genes. Alternatively, also the combined action of two or more transcription factors independently regulated could lead to the same expression pattern (Fig. 6).

Conclusions and outlook
Genes with similar mRNA expression profiles are likely to be regulated by similar mechanisms. For testing this hypothesis, the current study focused on potential transcription factor binding motifs in the promoter regions of oomycete genes within total 46 conserved motifs associated with members of 25 expression clusters with a potential role in pathogenesis that were found. Examples for this are MOTIF 13 associated with cluster 5 and STRME-3 associated with cluster 90. The approach taken in this study could be expanded to orthologous upstream regions of other oomycete species to obtain further insights into conserved regulatory pathways. Such studies could be followed up with the identification of potential transcription factors and co-factors that bind to orthologous TFBS. The identification of highly conserved, oomycete-specific binding sites could also be potential targets for devising control strategies against oomycete pathogens. manuscript or influenced any editorial decisions. SB was supported for carrying out research under the DAAD doctoral program.

Disclaimer
The funder had no role in study design, data collection and interpretation, decision to submit the work for publication, or preparation of the manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.