Introduction

Caleosins are known as the structural proteins in the nano-oil body construction1. The first caleosin was discovered in response to osmotic stress in rice seeds in 19962. The naming is due to the presence of a calcium-binding motif involved in the Ca2+-signalling pathway in response to biotic and abiotic stresses3,4. Compared to oleosins, which are mostly found in seed or pollen oil bodies of the higher plants, caleosins exist in microsomes and almost all plant tissues, microalgae, and fungi3. Therefore, it is conceivable that caleosins play a role beyond the construction of oil bodies.

The various activities of caleosins raise the possibility of having different molecular structures. The best-known model for oleosin and caleosin has been described according to the hydropathy analysis, in which a central non-cytoplasmic (N) domain is flanked by two cytoplasmic (C) domains (hereafter, this structure is briefly called CNC)5. A proline knot motif in the central domain is responsible for mooring into the oil body. Thus, the terminal domains are supported to settle on the phospholipid layer in contact with the cytoplasm. However, our investigations show that CNC is not the only functional arrangement of caleosins. Though several bioinformatics analyses have been conducted on oleosins and caleosins6,7,8,9,10,11, to our knowledge, this is the first time that the presence of transmembrane (TM) domains has been included in the description and classification of caleosins.

Results

Classification of caleosins

A total of 1877 plant caleosins were extracted from UniProt (supplementary file). Then, they were classified into five classes according to the transmembrane topology. Each group was named based on the domain arrangement. An overview of the classification is shown in Fig. 1. Briefly, most caleosins (51% sequences) possess no transmembrane domain and are classified as non-cytoplasmic proteins (N-class). By contrast, the well-known CNC structure accounts for 10% of caleosins. In the other two classes, sequences are without the cytoplasmic domain in either N-terminus (NC-class) or C-terminus (CN-class). Finally, 2% of caleosins remains for NCN structure in which the cytoplasmic region is flanked by the non-cytoplasmic domains. The criteria and characteristics of each class are summarized in Table 1. Notably, a signal peptide was detected in some of the sequences with a non-cytoplasmic N-terminus (Table 1). A computational study by DeepLoc 1.0 predicted the endoplasmic reticulum as the predominant subcellular location of caleosins. Then, the plastid and mitochondrion were assigned to the caleosins (Fig. 2a).

Figure 1
figure 1

Caleosin classification according to the transmembrane topology by Phobius. The signal peptide, cytoplasmic and non-cytoplasmic regions are indicated with red, green and blue lines, respectively. The purple line shows the hydrophobic analysis by ProtScale. The zero score is drawn with a black dashed line.

Table 1 Classification of caleosins based on the number of the transmembrane domain (TM) and domain arrangement.
Figure 2
figure 2

(a) The frequency of subcellular locations for each caleosin class. (b) The percentage of each caleosin class in the major clades of plants. The caleosin classes have been named based on the order of non-cytoplasmic (N) and cytoplasmic (C) domains.

The mentioned frequency in Table 1 mainly represents angiosperms since they include 95% of the collected data. However, the N-class is the most abundant caleosin in almost all plant clades, as displayed in Fig. 2b. According to the statistical analysis, the molecular weight of caleosins is higher in Chlorophyta than in angiosperms (P < 0.01) and gymnosperms (P < 0.05).

The motif study revealed four regions (Table 2), all of which belonged to the caleosin-related protein family (PF05042). The motif arrangement and distance were almost conserved in all classes (P < 0.01). However, N- and NC-classes showed more substitutions in the sequences.

Table 2 Caleosin motifs discovered using MEME web server.

Phylogenetic analysis

A phylogenetic tree was generated by the MAFFT algorithm used for minimum linkage analysis. Accordingly, two main clades were created in agreement with the previous report12. In general, clade I included ~ 90% of the CNCs and CNs, while ~ 70% of the NC caleosins, as well as half of the N and NCN members, were placed in clade II. Relatively similar clades (~ 85%) were obtained using the sequences of motifs 3 and 4.

More investigations revealed no significant difference in the acetylation of the two clades. However, average MW and pI values were higher in clades I and II, respectively (P < 0.01). Formerly, caleosins have been divided into H- and L- isoforms according to MW, in which L-isoform was suggested to evolve from H-isoform7.

Expression profiles of caleosin genes

The expression profiles of 154 caleosin genes from 17 plant species were extracted from the Expression Atlas database. The average expression of each class is summarized as a heatmap in Fig. 3.

Figure 3
figure 3

The average expression of the caleosin classes in each tissue. The caleosin classes have been named based on the order of non-cytoplasmic (N) and cytoplasmic (C) domains.

Although a combination of caleosins is expressed all over the plant, CNCs, CNs, and NCs are accumulated more specifically in parts of seed, fruit and flower, respectively. By contrast, Ns are ubiquitous, whose high expression have been reported in flower, fruit, seed, leaf, and root. The highest average expression level belonged to some CNCs predominantly found in the embryo and the aleurone layer.

The RNA-seq of wheat aleurone and starchy layers has revealed a decrease in NCs and a considerable increase in CNCs during the post-anthesis13. However, the results were diverse for Ns, which could be related to their phylogenetic clade (Fig. 4).

Figure 4
figure 4

Expression profile of wheat caleosins through different developmental stages13. The caleosin classes have been named based on the order of non-cytoplasmic (N) and cytoplasmic (C) domains. Moreover, the phylogenetic clade (I or II) is mentioned next to the name of each caleosin class. DPA: days post-anthesis; DAP: days after pollination.

Comparing the expression profile with the phylogenetic tree suggested that high-expressed embryonic caleosins (e.g. Q7XQ03 and C5YBZ6) might belong to clade I, while floral caleosins (e.g. C5Z7Z3 and C5XZA3) could be located in clade II. However, statistical verification requires more data, especially about the remaining CNCs and NCs in their non-dominant clade.

Abiotic and biotic stress response

Abscisic acid (ABA) and osmotic stress upregulate caleosins and promote seed dormancy14. However, expression patterns differ depending on the tissue, developmental stage, type of caleosin and some unknown factors. An investigation of the differential expression shows the upregulation of clade I caleosins in response to ABA (Fig. 5). Moreover, there is a significant difference (P < 0.05) between N-II and caleosins of clade I in response to drought. A notable example of clade I is RD20, expressed predominantly in non-seed tissues, and regulates stomatal closures in response to drought15. By contrast, D6PW68, as an N-II caleosin, is upregulated more by salicylic acid (SA) and methyl jasmonate (MeJA) than ABA16. The same results have been reported for some N-II caleosins from cotton17 and rice18. Interestingly, each class of GhCLO genes have shown almost similar expression patterns; for example, the NC-II caleosins (GhCLO1, GhCLO10 and GhCLO11) in response to NaCl, PEG and MeJA, and the CN-I caleosins (GhCLO16 and GhCLO7) in response to ABA17.

Figure 5
figure 5

The differential expression of caleosins in response to 20 Mm abscisic acid. The data was extracted from the Expression Atlas database. The caleosin classes have been named based on the order of non-cytoplasmic (N) and cytoplasmic (C) domains. Moreover, the phylogenetic clade (I or II) is mentioned next to the name of each caleosin class.

Although there were no reports on NC caleosins in the Expression Atlas database, AtCLO4 and AtCLO7, as NC-II caleosins, have been reported to downregulate in response to ABA and salt stress19,20. By contrast, A2XVG1 and Q7FAX1 (NC-I) are upregulated in response to ABA2, PEG6000 and drought18,21.

Further investigation of expression profiles revealed no significant differences between caleosins in response to biotic stress. RD20 is a well-studied caleosin strongly induced by the reactive oxygen species (ROS) caused by pathogens. Some reports indicated interaction of RD20 with an α-dioxygenase in leaf lipid droplets catalyzes the conversion of α-linolenic acid to the antifungal phytoalexin22,23.

Comparison of topology prediction approaches

Membrane proteins are underrepresented in the PDB database (∼ 1–2% of all available structures) due to their difficulty in crystallization. Therefore, alternative experimental and computational methods are considered for membrane proteins. Here we compare the Phobius results with other algorithms and research to make a more accurate assumption.

The possibility of a transmembrane structure was strengthened by the description of caleosin as an insoluble and non-cytoplasmic protein in 19962. Furthermore, the role of the hydrophobic central region of membrane proteins of oil bodies has been confirmed by structural domain deletion24, protease protection assay25, and structural proteomic approach26. All the mentioned methods suggest a CNC structure for an oleosin from Arabidopsis thaliana (P29525), in agreement with computational predictions. However, not all algorithms always make identical outcomes, and not all predictions will be supported by experiments.

As illustrated in Fig. 6a, most caleosins belonged to N-class by Philius and Phobius, while the tendency was more towards CNC-class by TOPCONS, SCAMPI and PolyPhobius. However, the percentage of NC- and NCN-classes was almost constant. Comparing the predictions showed that nearly half of the results were the same as Phobius, and most differences were related to the number of TMs (Fig. 6b). The highest overlap occurred between TOPCONS and SCAMPI with 87% similarity, while OCTOPUS had less than 50% nearness to other algorithms (supplementary data).

Figure 6
figure 6

(a) Transmembrane topology of caleosins detected by Philius, Phobius, PolyPhobius, OCTOPUS, SCAMPI, SPOCTOPUS, and TOPCONS. (b) The frequency of the same results between Phobius and other algorithms and the diversities resulting from the number of TM, domain order or both.

More investigations indicated that TOPCONS and Phobius make a better distinction between the phylogenetic clades. These programs also have shown the highest performance in topology prediction of transmembrane and globular proteins containing signal peptides27.

Comparing the predictions with the published enzymatic digestion experiments showed that Phobius is more consistent with laboratory data. For example, the microsome-associated AtCLO3 is protected from proteinase K digestion3. It is more in line with the Phobius prediction, which proposed an N structure. However, the secondary structure of caleosins is dramatically affected by the polarity of environments28. Regardless of how close the predictions are to the experiments, the results can be considered a suitable descriptor for distinguishing the location and function of caleosins.

Discussion

After the first description of caleosins in the 1990s, many studies focused on their structures and functions. Unlike oleosins, caleosins are ubiquitous proteins with a wide range of activities, from the response to stress to seed development. Although CNC has been known as the main structure of the oleo-proteins, caleosins have been recently divided into three groups based on the hydrophobic domain position that could be in the centre, N-terminus or random sites (e.g., algae and fungal caleosins)29. However, the importance of the transmembrane topology was overlooked. Consequently, in this article, five classes were introduced (Table 1) that could explain the functions and locations of plant caleosins.

Reviewing other research met our results and partly opened up the knot of this puzzle. Accordingly, most CN and CNC caleosins had been reported as seed-specific proteins6,9,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44, while NC members had been isolated from the floral parts9,17,42,45,46,47,48,49,50,51. That might explain the higher portion of NC and CNC in gymnosperms and angiosperms, respectively (Fig. 2).

Furthermore, the N members are expressed in almost all plant tissues, including root52, flower42,53, seed21,37,54, and leaf9,16,55. Hydrophobicity is expected to localize N caleosins outside the cells to sense the exogenous stimuli and participate in the signal transition. However, optical coherence tomography revealed them on the periphery of the plasma membrane, passed through the epicyte and extended into the cytoplasm16. It has been reported that Ns plays a role in basal resistance to drought18,44, salinity29, low temperature56,57, and pathogen infection16, as well as reaction with various substrates in the oxylipin pathway58,59. The best-known member of the N-class, RD20 or AtCLO3, is upregulated following exposure to stress, which leads to the production of oxidized fatty acids in the ABA and salicylic acid signalling pathways. Lipid oxidation reduces ROS levels, minor cell death, and delayed floral transition60,61. Additionally, the rd20 knock-out plants have enhanced stomatal opening and reduced tolerance to drought62.

Although N-class is the prevalent caleosin in plants and bacteria, a combination of classes exists in fungi with the superiority of CNCs (~ 40%) (data not shown). Besides, CNC- and NC-classes account for 90% of plant oleosins and are mainly expressed in seed and flower parts, respectively (data not shown). It is in agreement with other structural classifications of oleosins8,11. This data resolves the importance and role of each class of oleo-proteins.

Methods

Characteristics of caleosin proteins

Protein sequences were collected using “caleosin” as a keyword in UniProt version 2022–11 (http://uniprot.org/). The results were limited to viridiplantae in the taxonomy section. Then, a domain exploration was done for the caleosins in Phobius (http://phobius.sbc.su.se)63 and TOPCONS (https://topcons.cbr.su.se)27. Subsequently, the hydropathic plot was produced by ProtScale (http://web.expasy.org/protscale/) using the Kyte-Doolittle method64. Furthermore, the subcellular location, acetylation, and theoretical isoelectric point (pI) and molecular weight (MW) values were predicted by DeepLoc - 1.0 (https://services.healthtech.dtu.dk/service.php?DeepLoc-1.0)65, NetAcet 1.0 Server (http://www.cbs.dtu.dk/services/NetAcet/)66, and Expasy (https://web.expasy.org/compute_pi/)67, respectively.

Motif analysis of the caleosins

The dataset was analyzed using MEME Version 5.2.0 (meme-suite.org/tools/meme)68 with zero or one occurrence per sequence for four different motifs. The annotations of the motifs were investigated using the GenomeNet website (http://www.genome.jp/tools/motif/).

Gene expression profile analysis

The gene expression level of plant caleosins was obtained from Expression Atlas database (https://www.ebi.ac.uk/gxa/home). The data were plotted by Heatmapper with default parameters and Pearson distance measurement (www.heatmapper.ca/)69.

Protein alignment and phylogenetic study

Multiple alignments and phylogenetic analyses were constructed by MAFFT (https://mafft.cbrc.jp/alignment/server/index.html) with default parameters70.

Statistical analysis

Data were subjected to analysis of variance (ANOVA), t-test and Wilcoxon test using R software (R-3.6.3). The P values less than 0.05 were taken into consideration.