Introduction

The growth and differentiation of organisms are constantly adapted to multiple environmental factors. Environmental factors can vary from moderate to dramatic. They may or may not be biological. Among the many abiotic factors are nutrition, light, oxygen, water, temperature, gravity, wind, etc. Biological agents are other organisms that have symbiotic, pathogenic, or herbivorous interactions with plants. All of these environmental factors are recognized independently and specifically by plants (Liu et al. 2023).

Perception and global response are linked through signal transduction pathways at the cellular, systemic, and interbiological levels (Kucera et al. 2005; Mostofa et al. 2022). In order to ensure proper adaptation to the environment, the signals generated after the perception of multiple environmental factors need to be integrated and evaluated according to their importance. This crosstalk between different signal pathways in a network seems to be the basis for assessing the importance of incoming signals. Understanding these complex processes leads to a better understanding of the molecular mechanisms of adaptation. Adjusting the different signal elements can improve the plant’s adaptation to its environment.

The regulation of gene transcription level is one of the key factors in plant adaptation to environmental and regional stress, and transcription factors play an important role in this regulation process (Jan et al. 2019; Li et al. 2016, 2022; Shahzad et al. 2020; Shen 2022; Yin et al. 2021). Transcription factors can bind to a specific gene sequence upstream, the 5 ‘end of the gene, thereby controlling how a gene expresses its genetic information at a specific time and space (Jan et al. 2019; Li et al. 2016, 2022; Shahzad et al. 2020; Shen 2022; Yin et al. 2021). Transcription factors generally consist of one or more DNA-binding domains and a transcription regulatory region (Li et al. 2016, 2022; Shahzad et al. 2020; Shen 2022; Yin et al. 2021). It can activate or inhibit the transcription efficiency of multiple target genes (Li et al. 2016, 2022; Shahzad et al. 2020; Shen 2022; Yin et al. 2021). Transcription factors are important regulatory factors that activate or inhibit the expression of coding or non-coding genes (Li et al. 2016, 2022; Shahzad et al. 2020; Shen 2022; Yin et al. 2021). It plays multiple functions in plant growth and stress signaling pathways. It is of great biological significance to study the role of transcription factors in plant growth and development. Studies have shown that about 7% of the genes in the genome of vascular plants encode transcription factors (Li et al. 2016, 2022; Shahzad et al. 2020; Shen 2022; Yin et al. 2021). At present, more than 50 transcription factor families of different plant species have been screened and identified through bioinformatics (Li et al. 2016, 2022; Shahzad et al. 2020; Shen 2022; Yin et al. 2021). However, no relevant studies have been reported in Rhizophoraceae plants.

Carallia brachiata is an evergreen tree in the Carallia genus of the Rhizophoraceae and distributes in tropical regions of the eastern hemisphere, including China, India, Sri Lanka, Burma, Thailand, Vietnam, Malaysia and northern Australia (Patil and Chavan 2015). C. brachiata has the characteristics of barren, drought-resistant, pollution-resistant, etc. (Junejo et al. 2020; Nalinratana et al. 2023; Xiang et al. 2016; Zhou et al. 2015; Qiao et al. 2020). Compared with other genera of Rhizophoraceae growing in marine water, C. brachiata grows in inland freshwater environment with salinity close to zero, and even faces drought stress (Junejo et al. 2020; Nalinratana et al. 2023; Xiang et al. 2016; Zhou et al. 2015; Qiao et al. 2020). Studies have shown that C. brachiata has strong adaptability to drought conditions, mild drought can promote its growth, and the degree of drought will affect the proportion of its input to above-ground and underground parts (Junejo et al. 2020; Nalinratana et al. 2023; Xiang et al. 2016; Zhou et al. 2015; Qiao et al. 2020). C. brachiata could grow well no matter in poor soil or seriously polluted atmospheric environment, so it is also an excellent tree species for transforming forestry, and has the laudatory name of “green diamond” (Junejo et al. 2020; Nalinratana et al. 2023; Xiang et al. 2016; Zhou et al. 2015; Qiao et al. 2020).

In view of the importance of transcription factors in plant growth and development, physiological and biochemical metabolism, as well as their potential application value in improving plant adaptation, this study conducted genome-wide identification and expression analysis of the transcription factor family of C. brachiata, in order to provide data support for improving the adaptation of C. brachiata and better serving garden optimization.

Materials and Methods

Description of Materials used in this Study

The materials used for genome sequencing were collected in June of 2016 from healthy trees in Gaoqiao Town, Zhanjiang City, Guangdong Province, China where is typical inland area. Eight tissues for transcriptome sequencing were collected from three trees with similar age from June 2016 to May 2017 from the same location. Both the genome and transcriptome data were generated in the article of Qiao et al. 2020.

Data Preparation

Genome and transcriptome data (roots, stems, leaves, flowers, ovules, fruits, seeds, embryos) of C. brachiata were downloaded from NCBI official website. The accession number was PRJNA632974 (C. brachiata) (Qiao et al. 2020).

Data Analysis

Transcription Factors Identification on the Genome

XMM analysis (Finn et al. 2011) was performed using PFAM search tool (Punta et al. 2012) to obtain protein domain information of all genes. All transcription factors were identified and predicted according to DNA binding Domain (DBD), the structural features contained in transcription factors. Finally, all PFAM search domains conforming to the above criteria were obtained according to the information on PlantTFDB and PlnTFDB websites (http://plntfdb.bio.uni-potsdam.de/v3.0/).

Quality Control of Transcriptome Data

Save the original data in FASTQ format, and use FastQC (version 0.11.2) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) for data quality control. Remove the connector with TrimGalore (version 0.4.0) (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/).

Reference Sequence Alignment Analysis

The quality-controlled reads were compared to the reference genome using the default parameter settings in HISAT software (Kim et al. 2015).

Analysis of Gene Expression Level

FPKM (expected number of Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced) was used to represent gene expression. HTSeq software (Anders et al. 2015) was used to calculate the gene expression level of each sample, and whether FPKM was greater than 1 was used to characterize the gene expression (Brooks et al. 2011). The final expression of the same gene is the average of the repeats.

Differential Gene Expression Analysis

The number of reads was standardized using edgeR packages (Robinson et al. 2010). The DEGSeq R (1.20.0) (Wang et al. 2010) package was then used for differential gene analysis between different tissues, and the P-values were corrected using Benjamini & Hochberg’s method (Benjamini and Hochberg 2000). P < 0.05 and absolute value of log2 (reads multiple) ≥ 1 were the threshold for significant difference. The expression of a gene in a certain tissue was greater than 80% of the sum of the expression of the gene in all tissues, which was considered as tissue-specific gene.

GO (Gene Ontology) Enrichment Analysis

GO enrichment is the most direct visual method to help us understand the gene function. This study used the free analysis platform (http://www.omicshare.com/tools) for enrichment of different expression genes. Significant enrichment is defined as a GO term with a probability P ≤ 0.05.

KEGG (Kyoto Encyclopedia of Genes and Genomes) Enrichment Analysis

Pathway enrichment analysis is the most direct and simplest way to accurately understand which metabolic pathways and signal transduction pathways genes are involved in, and it is also one of the important methods to target key genes. This analysis was done on the free platform (http://www.omicshare.com/tools). If P ≤ 0.05, it indicates that genes are significantly enriched in this pathway.

RT-qPCR Validation

To verify the expression level of key genes of C. brachiata, we used RT-qPCR method. For reverse transcription, the first-strand cDNA was synthesized using a 5X All-In-One RT MasterMix (ABM, Canada). The relative quantification was done by ChamQTM Universal SYBR® qPCR Master Mix (Vazyme, China) from three biological replications, normalized to the reference actin gene from two species respectively and calculated by the 2 − ΔΔCt method. Primer sequences are shown in Table S1.

Results

Transcription Factors Identification

2322 transcription factors were identified on C. brachiata genome. According to the homology of the DNA binding domain, all transcription factors can be divided into 91 families. Among them, the five families with the most members are MYB, C2H2, bHLH, AP2/ERF-ERF and NAC. The number of them were 146, 135, 131, 114 and 109 (Table 1).

Table 1 Number of transcription factors in C. brachiata

KEGG and GO Enrichment Analysis of Transcription Factors

The results of enrichment analysis of 2322 transcription factors showed that they were enriched in 15 pathways and the first 12 were significantly enriched (Fig. 1a). Among 15 pathways, eight and three of them are divided into metabolism and genetic information processing classes respectively (Fig. 1b). Of the remaining four pathways, two belong to environmental adaptation processing and two to organismal systems (Fig. 1b). They are plant hormone signal transduction, circadian rthythm-plant, plant-pathogen interaction, MAPK signaling pathway-plant (Fig. 1a). Meanwhile, these four pathways also have the highest number of enriched genes. The gene numbers of them are 323, 125, 93 and 89 respectively (Fig. 1a). The annotation of KEGG pathways showed that there are 373 and 218 genes are involved in signal transduction and environmental adaptation respectively (Fig. 1c). We list fifteen significantly enriched Go terms. Twelve of them are biological processes. The other three are belong to the molecular function (Fig. 2).

Fig. 1
figure 1

KEGG pathways enriched by transcription factors expressed in C. brachiata. A, Bar diagram of enrichment KEGG pathways. The darker the color, the more significant the enrichment; B, KEGG enrichment loop diagram. The first circle: enrich the top 15 pathways, outside the circle is the coordinate ruler of the number of differential genes. Different colors indicate different A classes. The second circle: the number and Q value of the pathway in the differential gene background. The more the number of differential genes background, the longer the bar, the smaller the Q value, the redder the color. The third circle: total number of foreground genes. The fourth circle: the RichFactor value of each pathway (the number of differential genes in the pathway divided by all the numbers in the Pathway), background grid lines, each grid indicates 0.1; C, Statistical charts of Grade B classification of each pathway, the number represents the number of genes.

Fig. 2
figure 2

GO enrichment of transcription factors expressed in C. brachiata. A, GO terms loop diagram; B, Description of each GO term. The illustration of the Fig. 2a is the same as that of Fig. 1b.

Tissues Specific Expression (TSE) Analysis of Transcription Factors

The expression profiles of 2322 transcription factors showed that 200 TFs were tissue specifically expressed (Figure S1). The number of them in eight tissues were 34 (root), four (stem), 20 (leaf), 48 (flower), 42 (ovule), 32 (fruit), 13 (seed) and 11 (embryo) respectively (Table 2). 204 TFs belonged to 40 families. 25, five, nine and seven genes were annotated in the plant hormone signal transduction, circadian rthythm-plant, plant-pathogen interaction, MAPK signaling pathway-plant pathways respectively (Table S2). Among 40 families, MYB family had the largest gene number and was specifically expressed in every tissue (Table 2).

Table 2 Number of tissue specific expression TFs (TSEs) in tissues of C. brachiata.

Transcription Factors Expression Trends Analysis

We analyzed the expression trend of 630 genes that significantly enriched in plant hormone signal transduction, circadian rthythm-plant, plant-pathogen interaction and MAPK signaling pathway-plant. In 20 trend profiles, profile 0 and profile 16 with the significant P value (P < 0.05) are assigned 257 and 116 genes respectively (Fig. 3). The remaining 18 profiles with the P value ≥ 0.5 and only a small number of genes were assigned to them (Fig. 3).

Fig. 3
figure 3

Trend analysis of transcription factors expressed in C. brachiata. A, Statistical map of the number of genes in different trend sets, red to green indicates P values from small to large; B, Profiles ordered based on the P value significance of number of genes assigned versus expected.

We screened 37 genes that significantly different expressed in certain tissue from profile0 and profile16 that significantly most highly expressed in each tissue and displayed them in heatmap (Fig. 4). These genes belong to 15 families. The top three families with largest gene number were AUX/IAA, bHLH and bZIP. The numbers were six, five and five. The genes number in root, stem, leaf, flower, ovule, fruit, seed and embryo were eight, one, ten, three, six, two, four and two respectively (Fig. 4). IAA-2 and NAC-1 were specifically expressed in leaf. IAA-5 and ARR-A were specifically expressed in ovule, and MYB-1 was specifically expressed fruit.

Fig. 4
figure 4

Expression of key transcription factors that significantly enriched in plant hormone signal transduction pathway in C. brachiata. Red means higher expression.

Gene Expression Validation by RT-qPCR

To validate the reliability of gene expression data obtained by transcriptome sequencing, six genes with differential expression in tissues were selected for RT-qPCR validation. These genes were divided into two groups, including tissue-specific genes (IAA-2, IAA-5, MYB-1) and highly expressed genes in certain tissues (ARF18, ARR-B-3, ABI5). The result of validation was consistent with the transcriptome data (Fig. 5).

Fig. 5
figure 5

RT-qPCR validations of key genes of C. brachiata. IAA-2, IAA-5 and MYB-1 are tissue-specific genes. ARF18, ARR-B-3 and ABI5 are highly expressed in certain tissues.

Discussion

C. brachiata is very different from other species in the same family, for example, in terms of habitat and adaptive characteristics. C. brachiata is a typical terrestrial species, because of its beautiful tree shape, tolerance to barren and drought and other characteristics, in recent years become the new favorite of landscaping. In this study, from the perspective of transcription factors, we try to explore the molecular mechanism of the plant growth and development of C. brachiata to adapt to its environment.

Transcription factors can activate or inhibit the expression of regulatory genes by interacting with DNA or other proteins, and they play an important role in many physiological and biochemical metabolic processes of plants via participating in the hormones signal transduction (Jan et al. 2019; Li et al. 2022; Shahzad et al. 2020; Shen 2022). In our study, 2322 transcription factors were identified in C. brachiata. MYB family had 146 genes. In plants, the MYB is one of the largest families and involved in the regulation of numerous functions such as gene regulation in different metabolic pathways especially secondary metabolic pathways, regulation of different signaling pathways of plant hormones etc. (Thakur and Vasudev 2022). Among 2322 transcription factors, 204 were tissue specifically expressed. MYB family had the largest number and specifically expressed in every tissue (Table 2). This result also illustrated the broad role of this family.

2322 transcription factors significantly enriched in 12 pathways. Four of them were divided into environmental adaptation and 698 genes were enriched in. They are hormone signal transduction, circadian rthythm-plant, plant-pathogen interaction and MAPK signaling pathway-plant. 204 tissue-specific genes were also significantly enriched in these four pathways, which demonstrated the importance of them. These pathways have been repeatedly reported to be involved in plant response to environmental signaling. The fruit of C. brachiata is often infested by pests. Genes enriched in plant-pathogen interaction and MAPK signaling pathway-plant response to this signal (Yin et al. 2021). Understanding the interaction between plant and pathogen is of great significance for studying the mechanism of plant response to disease.

Most of the tissue-specific genes and differently expressed genes are involved in plant hormone signal transduction such as IAA-2, IAA-5, MYB-1, ARF18, ARR-B-3, ABI5, DELLA, PIF3/4, MYC2, NPR1, TGA and so on. These genes are responsible for the signal transduction of cytokinine, auxin, gibberellin, jasmonic acid, salicylic acid, ethylene and abscisic acid (Fig. 6). Transcription factors in MYB, NAC and bHLH families have been reported that they play roles in response to drought, UV, high temperature etc. via regulating the hormones signal transduction (Blázquez et al. 2020; Li et al. 2020; Liu et al. 2007; Papon and Courdavault 2022; Wang et al. 2018, 2023; Xiang et al. 2023). C. brachiata grows in inland area in tropical regions with climatic characteristics of drought, UV radiation and high temperature and so on. These transcription factors make C. brachiata better respond to its environment by involving in hormone signal transduction. Jasmonic acid and salicylic acid are two important hormones for stress response and disease resistance of plants (Hu et al. 2022; Lee et al. 2010). MYC2, NPR1 and TGA were key factors in jasmonic acid and salicylic acid signal transduction respectively. Ethylene and abscisic acid are also two hormones that plants sometimes rely on to survive adversity (Blázquez et al. 2020). ERFs respond to the ethylene and abscisic acid to help activate ABA dependent and independent stress-responsive genes (Matilla 2020). For example, ABI5 regulating seed dormancy is an important factor in response to ABA hormones, and ERF is also involved in this process. The high expression of these genes in the reproductive organs is consistent with their role in the relative process. We know that the role of hormones is often multiple, involved in the various processes of plant growth and development (Xie et al. 2019; Varshney et al. 2023; Zhang et al. 2023). Cytokinine, auxin and gibberellin were three main positive hormones in plant growth process such as promoting cell differentiation and lead to tillering of new branches, inducing seed germination and stem growth (Blázquez et al. 2020; Papon and Courdavault 2022). ARRs, DELLAs and PIFs are key factors in cytokinin and gibberellin signal transduction. IAAs and ARFs are transcription factors that regulate the transcription of auxin-responsive genes during plant growth and development (Blázquez et al. 2020; Li et al. 2020). These factors with positive roles in plant growth and development were mainly working in vegetative organs (root, stem, leaf), while the ethylene and abscisic acid were specially working in reproductive organs. The transcription factors regulated the different tissues completing the process of plant growth and development of C. brachiata in its environment via hormones signal transduction.

Fig. 6
figure 6

Regulation of key transcription factors in C. brachiata adaptation. A, B, C, D, E, F, G, are signal response pathways of cytokinine, auxin, gibberellin, jasmonic acid, salicylic acid, ethylene and abscisic acid.