Combination of transcriptomic and proteomic approaches helps to unravel the protein composition of Chelidonium majus L. milky sap

Main conclusion A novel annotated Chelidonium majus L. transcriptome database composed of 23,004 unique coding sequences allowed to significantly improve the sensitivity of proteomic C. majus assessments, which showed novel defense-related proteins characteristic to its latex. To date, the composition of Chelidonium majus L. milky sap and biosynthesis of its components are poorly characterized. We, therefore, performed de novo sequencing and assembly of C. majus transcriptome using Illumina technology. Approximately, 119 Mb of raw sequence data was obtained. Assembly resulted in 107,088 contigs, with N50 of 1913 bp and N90 of 450 bp. Among 34,965 unique coding sequences (CDS), 23,004 obtained CDS database served as a basis for further proteomic analyses. The database was then used for the identification of proteins from C. majus milky sap, and whole plant extracts analyzed using liquid chromatography–electrospray ionization-tandem mass spectrometry (LC–ESI-MS/MS) approach. Of about 334 different putative proteins were identified in C. majus milky sap and 1155 in C. majus whole plant extract. The quantitative comparative analysis confirmed that C. majus latex contains proteins connected with response to stress conditions and generation of precursor metabolites and energy. Notable proteins characteristic to latex include major latex protein (MLP, presumably belonging to Bet v1-like superfamily), polyphenol oxidase (PPO, which could be responsible for browning of the sap after exposure to air), and enzymes responsible for anthocyanidin, phenylpropanoid, and alkaloid biosynthesis. Electronic supplementary material The online version of this article (doi:10.1007/s00425-016-2566-7) contains supplementary material, which is available to authorized users.


Introduction
Greater celandine (Chelidonium majus L.) (Fig. 1), an herbaceous perennial plant, belongs to Papaveraceae family, which is an important source of biologically active substances. Milky sap and extracts of greater celandine are used in traditional medicine to treat papillae, warts, and condylomas, which are symptoms of human papilloma virus (HPV) infections. C. majus extracts are also used to treat liver disorders and fight fever (Hiller et al. 1998). The medicinal and pharmaceutical interest in this plant is based on its ability to synthesize alkaloids, flavonoids, or phenolic acids (Colombo and Bosisio 1996). It is often suspected that many of these substances may be either synthesized or stored in laticifers of the plant. These specialized cells produce milky sap (latex) that exudes when the plant is injured (Hagel et al. 2008). They are located in phloem areas forming an internal, articulated, and non-anastomosing system throughout the whole plant (Hagel et al. 2008). Therefore, sequestration of bioactive compounds into laticifers might protect the plant from the effects of its own toxins and provide a defense against herbivores (Hagel et al. 2008;Konno 2011;Escalante-Perez et al. 2012).
Previous proteomic studies regarding C. majus milky sap using two-dimensional electrophoresis (2-DE) and liquid chromatography-electrospray ionization-tandem mass spectrometry (LC-ESI-MS/MS) revealed the presence of about 21 proteins, mostly of defense-and pathogenesis-related properties (Nawrot et al. 2007a). However, sensitivity of used methods suffered from lack of relevant database of reference sequences.
Chelidonium majus (Fig. 1) belongs to non-model plants, and to date, its genome is unsequenced. Since assembly of such genome could be very laborious and correct gene annotation would require either homologybased or transcriptomic information, we decided to use RNA sequencing to create such database. Recent advances in next-generation sequencing (NGS) with the possibility of quantification of transcript abundances have greatly lowered its cost and provided insight into diverse plant species, including those with medicinal properties (Hao et al. 2011;Hamilton and Buell 2012). The de novo transcriptome sequencing and characterization has been performed successfully for Taxus wallichiana var. mairei (Lemée & H.Lév.) L.K. Fu & Nan Li (Hao et al. 2011), Phalaenopsis spp. (orchid) (Fu et al. 2011), Macleaya spp.
The aim of the present study was to sequence, assemble, and annotate the C. majus transcriptome. We created a database for protein identification for mass spectrometry data. The database allowed us to significantly improve the sensitivity of previous assessments concerning composition of C. majus milky sap, which relied only on NCBInr (National Center for Biotechnology Information non-redundant) database searches with Viridiplantae filter (Nawrot et al. 2007a(Nawrot et al. , b, 2014. A schematic representation of the overall sequencing, annotation, and protein identification workflow is presented in Fig. 2. The integration of transcriptomic and proteomic data in this study shows that it is possible to answer the emerging biological issues on non-model plant species without genome sequence available, such as novel insights into plant latex protein composition.

Plant material
Chelidonium majus L. plants were planted in IPK Gatersleben green house from C. majus seeds collected in Poznań, Poland, on 12 June 2012. The voucher specimen of seeds is deposited in the Department of Molecular Virology, Faculty of Biology, Adam Mickiewicz University in Poznań, Poland. They were cultivated in small cuvette with pH 5.5 ground, and after 4 weeks, they were transferred to bigger pots. Supplemented with mineral substrates (N 6 g/ m 2 , P 2 O 5 7 g/m 2 , K 2 O 9 g/m 2 ), they were cultivated until reaching the height of ca. 25-30 cm for 6 weeks.

Illumina HiSeq 2000 sequencing
Chelidonium majus RNA samples were isolated from stem of the plant with exuding milky sap ( Fig. 1) using RNeasy Plant Mini Kit (Qiagen). cDNA synthesis was carried out and cDNA library was prepared using TruSeq RNA Library Prep Kit v2 (Illumina Inc., San Diego, CA, USA). After RNA quality validation on Agilent Technologies 2100 Bioanalyzer, the library was sequenced using Illumina HiSeqTM 2000 (Illumina Inc.).

De novo transcriptome assembly
Paired RNA-seq reads were pre-trimmed at the first undetermined base or at the first base having Phred quality below 20. The pairs with one (or both) reads shorter than 31 bases after trimming were excluded from the assembly process. In total, 112,544,804 (84 %) out of 133,550,365 read pairs passed filtering. Subsequently, the transcripts were assembled de novo using Trinity r2013-02-25 (specifically designed to deal with de novo transcriptome assembly, with default settings) (Grabherr et al. 2011). Protein coding sequences (CDS) were predicted using TransDecoder (http://transdecoder.sourceforge.net/) from Trinity package. Non-redundant CDS were selected for annotation and further analyses. De novo assembly was performed by VitaInSilica (http://www.vitainsilica.pl/).

Gene annotation and analysis
To obtain comprehensive information on the function of predicted CDS, we used Blast2GO that annotates sequences based on ontological data retrieved for top BLAST (Basic Local Alignment Search Tool) hits of each sequence. To generate the comparison, local BLASTx search with the e value cutoff of 1e-10 against complete RefSeq (NCBI Reference Sequence) protein database (release 64, available at NCBI FTP server from 14 March 2014) was performed. The resulting XML document was imported to Blast2GO software and refined by imposing more stringent cutoff (1e-25). After InterProScan analysis, the program assigned provisional names and gene ontology (GO) terms to annotated CDS. The additional BLAST analysis performed during annotation step allowed us to scrutinize sequences for remaining contaminants of nonplant origin. Additionally, KAAS [Kyoto Encyclopedia of Genes and Genomes (KEGG) Automatic Annotation Server] was used to functionally annotate genes in a genome by amino acid sequence comparisons against a manually curated set of ortholog groups from dicot plants in KEGG GENES (http://www.genome.jp/kegg/genes.html).

One-dimensional gel electrophoresis (1-D, SDS-PAGE)
To verify the protein composition of protein samples, sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) was carried out in a slab mini-gel apparatus according to Laemmli (1970), using 10 % polyacrylamide as the separating gel and 5 % polyacrylamide as the stacking gel. The proteins were reduced by heating them to 100°C in the presence of 2-mercaptoethanol for 5 min. About 50 lg of each sample was added to the gel, including two technical replicates. After SDS-PAGE, the gels were fixed and stained using sensitive Coomassie blue staining (Neuhoff et al. 1988).

Liquid chromatography and tandem mass spectrometry analysis (LC-MS/MS)
Gel bands were digested with trypsin and, finally, all were subjected to LC-ESI-MS/MS analysis in Mass Spectrometry Laboratory, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland. Tryptic peptide mixtures were analyzed by LC-MS/MS using nanoflow HPLC and a Linear Trap Quadrupole (LTQ) Orbitrap XL mass spectrometer (Thermo Fisher Scientific) as the mass analyzer. Peptides were eluted from a 75-lm analytical column on a linear gradient from 10 to 30 % acetonitrile over 50 min and sprayed directly into the LTQ-Orbitrap mass spectrometer.

Quantitative comparative analysis of C. majus milky sap proteins
For quantitative, comparative analysis of C. majus milky sap with C. majus whole plant samples based on emPAI (exponentially modified protein abundance index) values of individual proteins, we analyzed data from three independent specimens of latex and three corresponding whole plant extracts. emPAI value is proportional to protein abundance in a protein mixture (Ishihama et al. 2005). Approximate relative abundance of each protein (calculated from emPAI according to formula: (emPAI value 9 100)/sum of all emPAI values in the category) was averaged among three latex replicates and compared with the corresponding value calculated for whole plant extract. The results with a sap/extract ratio [2 and t test significance level \0.1 were considered as significantly overrepresented in the sap.

Biological networks gene ontology tool (BiNGO) analysis
The overrepresentation of ontological terms in tested set (milky sap) compared to whole plant extract proteome and transcriptome was assessed using BiNGO tool. BiNGO (Maere et al. 2005) is a Cytoscape (Saito et al. 2012) plugin determining which GO categories are statistically over-or underrepresented in a set of genes. It maps the predominant functional themes of a given gene set on the GO hierarchy and outputs this mapping as a Cytoscape graph.

Illumina sequencing and de novo C. majus transcriptome assembly
With the purpose of understanding C. majus transcriptome, RNA was extracted from the young stem of 6-week-old C. majus plant with exuding milky sap (Fig. 1c) and sequenced with Illumina paired-end sequencing technology generating approximately 119 Mb of raw sequence data from 133,550,365 reads with an average length of almost 100 bp ( Fig. 3; Table 1).
After the quality and adaptor trimming process, de novo transcriptome assembly was performed using Trinity suit (specifically designed to deal with de novo transcriptome assembly) (Grabherr et al. 2011). This resulted in 107,088 contigs, with N50 of 1913 bp and N90 of 450 bp (Table 1).
Detailed analysis of BLAST results revealed that 33 of coding sequences (0.09 %) may be contaminants of either fungal or bacterial origin (likely transcripts of plant commensal flora). These sequences were removed from the  Table S1). About 65.85 % (23,004) sequences were annotated by Blast2GO with provisional names and GO categories (Table 1, Supplementary  Table S1). Among them, 57.00 % was associated with molecular function, 46.42 % with biological processes, and 37.26 % was assigned to cellular components ( Fig. 4; Supplementary Table S2). GO terms associated with primary metabolites were found, such as universal building blocks of sugars, amino acids, nucleotides, lipids, and energy sources. In addition, GO terms associated with macromolecule metabolic processes (6995), small molecule metabolic processes (5042), and protein metabolic processes (4189) were found. Analyzing the obtained transcripts, we found sequences connected to major branches of metabolism and signal transduction that are expected in complex systems (Garzon-Martinez et al. 2012). Brief summary of GO annotation is shown in Fig. 4. It is noteworthy that we found 4424 transcripts that fall into ''response to stimulus'' category (BP level 2), as this group includes candidate genes for pathogen resistance and many putative components of the milky sap. All unique coding sequences were used to construct reference database for further mass spectrometry analysis.

Analysis of overrepresented GO terms in C. majus latex
We utilized BiNGO to compare GO terms between C. majus milky sap and C. majus whole plant extract proteomes. The analysis comprised two steps. The first step was the comparison of C. majus latex proteome to C. majus transcriptome (Fig. 5a). The abundance of proteins involved in response to stress, response to biotic stimulus, response to abiotic stimulus, secondary metabolic process, and cellular homeostasis remains in agreement with the putative protective function of the milky sap. On the contrary, it is much harder to explain overrepresentation of photosynthesis, carbohydrate metabolic process, or generation of precursor metabolites and energy.  Second step was the comparison of latex against whole plant extract proteomes. The results (Fig. 5b) confirmed overrepresentation of response to biotic stimulus, response to stress, and generation of precursor metabolites and energy terms.

Analysis of C. majus protein composition
Previous functional assignments of C. majus proteins relied on homology-based information, because the sequence information of this plant was not available (Remmerie et al. 2011;Nawrot et al. 2007aNawrot et al. , b, 2014. The novel transcriptome-based C. majus database allowed us to significantly improve the sensitivity of previous proteomic assessments concerning composition of C. majus milky sap and extracts and provide meaningful biologicalbased information. For proteomic assessments using the novel database, we used two types of datasets. One of them was previously published data for proteomic content of C. majus whole plant extracts using an approach of protein separation by 1-D SDS-PAGE to subsequent LC-MS/MS analysis of individual gel bands (i.e., shotgun approach) (Nawrot et al. 2014). The main advantages of such approach are sample purity after SDS-PAGE improving proteome coverage of analyzed samples (Schulze and Usadel 2010;Matros et al. 2011). The other dataset comprised C. majus milky sap samples analyzed using the same approach. For this purpose, we prepared three independent specimens of latex (isolated from stems of 6-to 8-week-old plants) and applied shotgun approach to identify proteins (1-D gel electrophoresis, subsequent whole lane trypsin digestion and LC-ESI-MS/MS analysis). For both types of datasets, raw proteomic data were searched against annotated C. majus CDS database using Mascot software. In total, 334 different putative proteins were identified for C. majus milky sap (for all three replicates) and 1155 for C. majus whole plant extract (Supplementary Tables S3, S4). The complexity of proteins in C. majus milky sap is hence about 3.5 times lower than in whole plant extract. The number of identified proteins from both milky sap and whole plant extract and unique CDS from transcripts were assigned to most represented KEGG pathways in C. majus and compared in Supplementary Table S5.

Chelidonium majus milky sap protein composition based on comparative analysis
To demonstrate the power of the use of our new C. majus CDS database, we performed quantitative, comparative analysis of C. majus milky sap and whole plant protein samples based on emPAI obtained during Mascot search (Supplementary Table S3) to show proteins which are overrepresented in the latex, hence potentially ''sapspecific''. The average of relative abundance of each protein of three latex replicates was compared with corresponding value calculated for three whole plant extract replicates (Supplementary Table S4). The resulting ratio informs about the level of overrepresentation of identified protein in the specified type of sample. The proteins observed in all sap samples with a sap/extract ratio [2 and t test significance level \0.1 were considered as significantly overrepresented in C. majus milky sap. Overall results of the analysis are present in Supplementary   Table S4. Approximately, 29 of 334 initially identified proteins met the above-mentioned criteria and were classified as overrepresented in the milky sap (Table 2). Protein identification results were presented according to their relative abundance in the latex comparing to relative abundance of the same protein hits in C. majus whole plant extract (ratio of the average of the relative emPAI value of the latex and whole plant extract). Proteins were selected according to t test values (P \ 0.1), the presence of identified proteins in at least two of three sap replicates, mean sap/extract ratio [2, and mean % emPAI milky sap values [0.1. The list of all overrepresented proteins is enclosed in Supplementary Table S4 * t test was used to determine if the emPAI values of selected identified proteins are significantly different between C. majus milky sap (n = 3) and C. majus whole plant extract samples (n = 3) with P \ 0. Stress-and defense-related proteins overrepresented in C. majus milky sap One of the overrepresented, highly abundant proteins identified only in C. majus milky sap, but not in C. majus whole plant extract, was MLP-like protein 28, which belongs to major latex protein (MLP) class (Table 2, nos. 1-4, 7, 10). It accounted for more than a quarter (26.47 %) of the protein content of the sap belonging to stress-and defense-related proteins. The second major latex protein identified was MLP-like protein 34 (Table 2, no. 5), which accounted for 1.73 % of the protein content of the milky sap. The major latex protein/ripening-related protein (MLP/RRP) subfamily is the second largest subfamily among plant proteins with 60 members, 31 of them are from Arabidopsis thaliana (L.) Heynh. (Radauer et al. 2008). The members of this subfamily were first described as proteins abundantly expressed in the latex of opium poppy (Papaver somniferum L.) (Nessler and Burnett 1992;Decker et al. 2000). The biological function of the MLP/ RRP proteins is still unknown, but they have been associated with fruit and flower development and in defense or stress response (Radauer et al. 2008). Based on the modest sequence similarity, they have been characterized as members of the Bet v1 protein superfamily (Bet_v_I [Pfam:PF00407]) (Lytle et al. 2009). The most distinctive feature of the Bet v1 fold is a large solvent accessible hydrophobic cavity, which may function as a ligandbinding site (Radauer et al. 2008). The other stress-and defense-related protein found in C. majus milky sap, which accounts for 2.22 % of the protein content of the sap, is polyphenol oxidase (PPO), which is present mainly in the latex (Table 2, nos. 6, 8-9). Two unigenes of PPO were found in C. majus latex: one of them is present only in the sap (Table 2, no. 8) and second is ca. 33 times more abundant in the sap than in whole plant extract (32.76-fold, Table 2, no. 6). Polyphenol oxidases or tyrosinases (PPO), also known and reported under various names (phenolase, catechol oxidase, catecholase, monophenol oxidase, o-diphenol oxidase, and orthophenolase) based on substrate specificity, are widely distributed in plants and fungi (Wititsuwannakul et al. 2002). The activation of PPO leads to the oxidation of phenolic compounds, consequently enhancing the resistance. PPO is also involved in the generation of reactive oxygen species (ROS). It was observed that exuding yellow C. majus milky sap becomes brown after exposure to air. Such browning reactions are caused by the presence of PPO family genes (Mayer 2006). Other data show that latex of different plants coagulates when exposed to air. Therefore, it is proposed that natural latex has a protective function, sealing wounds, acting as a barrier to microorganisms, and discouraging herbivory (El Moussaoui et al. 2001;Wahler et al. 2009).
The consistency of latex itself has a defensive role because the glue-like exudate seals wounds in the plant from pathogen attack and coats the mouthparts of herbivores (Hagel et al. 2008;Konno 2011;Escalante-Perez et al. 2012).
Anthocyanidin-o-glucosyltransferase-like is an enzyme involved in anthocyanidin biosynthesis, which belongs to flavonoids. Anthocyanidins are common plant pigments. They are the sugar-free counterparts of anthocyanins forming a large group of polymethine dye (Mouradov and Spangenberg 2014).
Caffeoyl-CoA O-methyltransferase (EC 2.1.1.104) is an enzyme that catalyzes the reaction of conversion of Sadenosyl-L-methionine with caffeoyl-CoA into S-adenosyl-L-homocysteine and feruloyl-CoA. A large number of natural products are generated via a step involving this enzyme, which participates in phenylpropanoid biosynthesis (Boerjan et al. 2003).
Reticuline oxidase-like is the berberine bridge enzyme [(S)-reticuline: oxygen oxidoreductase (methylene-bridgeforming), EC 1.5.3.9], a vesicular plant enzyme that catalyzes the reaction along the biosynthetic pathway that leads to benzophenanthridine alkaloid biosynthesis. (S)-Reticuline is a key branch-point intermediate that can be directed into several alkaloid subtypes with different structural skeleton configurations (Ziegler et al. 2009). Cytotoxic benzophenanthridine alkaloids are accumulated in certain species of Papaveraceae and Fumariaceae in response to pathogenic attack and, therefore, function as phytoalexins (Dittrich and Kutchan 1991).

Antioxidant and metabolic proteins in C. majus latex
The analysis confirmed the presence of the protein components of antioxidant defense system in the C. majus latex. These proteins form the first line of defense against different stress conditions and help to prevent from attack of different pathogens (Walz et al. 2002), which are highly abundant in the milky sap (  (Nawrot et al. 2007a, b).
Chelidonium majus latex also contains metabolic and storage proteins (Table 2, nos. 23-26). Beta-amylase precursor was relatively abundant (no. 23; 5.83 %), with 34.17fold overrepresentation in the milky sap comparing to whole plant extract. Beta-amylase is an enzyme which hydrolyzes glucans derived from starch granules to maltose and is present in different plant organs (Fulton et al. 2008). Its presence in the milky sap could be explained by the differentiation of articulated laticifers of C. majus. Such gradual degeneration of the cytoplasm occurs in many species and is often accompanied by the appearance of altered plastids and characteristic starch grains (Hagel et al. 2008).

Conclusions
Our study presents de novo assembly and characterization of C. majus transcriptome. Further comparative proteomic analysis of C. majus milky sap and whole plant extract samples provided new insights into milky sap protein composition. The novel transcriptome-based C. majus database allowed to significantly improve the sensitivity of proteomic identifications. In the present study, 334 different putative proteins were identified in C. majus milky sap samples comparing to only 21 in previous study (Nawrot et al. 2007a, b). Moreover, our approach enabled identification of the major latex protein (MLP) 28, which could not be detected without species-specific database. The quantitative analysis confirmed that C. majus latex is rich in proteins connected with response to stress conditions and generation of precursor metabolites and energy. We also identified polyphenol oxidase (PPO), several enzymes involved in biosynthesis of natural products and a range of abundant antioxidant proteins. These findings support the importance of C. majus latex for plant defense against pathogens and herbivores.
The obtained C. majus annotated CDS database will serve as a valuable dataset for further studies of C. majus proteomes and as comparative material for other plant species.
The data sets supporting the results of this article are available at the NCBI Sequence Read Archive (SRA) and are available under SRA Accession Number SRR1998045 (related BioProject PRJNA264791-Chelidonium majus transcriptome, related BioSample SAMN03142649).
Author contribution statement RN performed the study, collected plant material, performed proteomic analyses and prepared the manuscript. JB performed gene annotation, bioinformatics and comparative analyses. RL participated in design and coordination of the study, collected plant material. LA performed sequencing and initial assembly. HPM supervised the work, participated in its design and coordination and corrected the manuscript. All authors read and approved the final manuscript.