Metabolomics and molecular marker analysis to explore pepper (Capsicum sp.) biodiversity

An overview of the metabolic diversity in ripe fruits of a collection of 32 diverse pepper (Capsicum sp.) accessions was obtained by measuring the composition of both semi-polar and volatile metabolites in fruit pericarp, using untargeted LC–MS and headspace GC–MS platforms, respectively. Accessions represented C. annuum, C. chinense, C. frutescens and C. baccatum species, which were selected based on variation in morphological characters, pungency and geographic origin. Genotypic analysis using AFLP markers confirmed the phylogenetic clustering of accessions according to Capsicum species and separated C. baccatum from the C. annuum–C. chinense–C. frutescens complex. Species-specific clustering was also observed when accessions were grouped based on their semi-polar metabolite profiles. In total 88 semi-polar metabolites could be putatively identified. A large proportion of these metabolites represented conjugates of the main pepper flavonoids (quercetin, apigenin and luteolin) decorated with different sugar groups at different positions along the aglycone. In addition, a large group of acyclic diterpenoid glycosides, called capsianosides, was found to be highly abundant in all C. annuum genotypes. In contrast to the variation in semi-polar metabolites, the variation in volatiles corresponded well to the differences in pungency between the accessions. This was particularly true for branched fatty acid esters present in pungent accessions, which may reflect the activity through the acyl branch of the metabolic pathway leading to capsaicinoids. In addition, large genetic variation was observed for many well-established pepper aroma compounds. These profiling data can be used in breeding programs aimed at improving metabolite-based quality traits such as flavour and health-related metabolites in pepper fruits.


Introduction
Pepper (Capsicum spp.) is one of the most important fruit crops worldwide. It is cultivated all over the world, primarily in the tropical and subtropical countries. Pepper fruit production reached up to 1.8 million HA with more than 28 million metric tonnes harvested from the crop (FAO 2011). The fruits are mainly consumed as a fresh or cooked vegetable, a condiment or a spice and used as a colouring agent in the food industry. Pepper is increasingly recognized as an excellent source of health-related metabolites, such as ascorbic acid (vitamin C), carotenoids (provitamin A), tocopherols (vitamin E), flavonoids and capsaicinoids (Howard and Wildman 2007). We recently showed that different Capsicum accessions display a large variation in morphological characters as well as in the levels and composition of the above-mentioned health related metabolites (Wahyuni et al. 2011). The accessions belong to four widely cultivated species: Capsicum annuum, Capsicum chinense, Capsicum baccatum, and Capsicum frutescens. These four species can be intercrossed and produce fertile hybrids (Pickersgill 1997). Variation in morphological characters included fruit features, such as colour, type, shape and size. Fruit colour in the accessions varied from red, orange and yellow to brown and salmon and this was determined by the accumulation of specific carotenoids, i.e. capsanthin and capsanthin-esters for red accessions, violaxanthin and violaxanthin-esters for yellow accession. Fruit spiciness or pungency is determined by total levels of capsaicinoids, which varied strongly among the accessions analyzed (Wahyuni et al. 2011). Based on the range of capsaicinoid levels the accessions can be categorized into non-pungent, low, mild and highly pungent (Howard and Wildman 2007).
While the targeted metabolite analyses described above focused on the analyses of specific groups of known pepper compounds, untargeted metabolomics approaches based on mass spectrometry (MS) allow the simultaneous detection of metabolites in a biological sample, without a priori knowledge of the identity of the metabolites detected. Such untargeted profiling approaches have been used to obtain an overview of the metabolic diversity in germplasm collections, such as Arabidopsis (Keurentjes et al. 2006) and tomato (Tikunov et al. 2005). These analyses may help to understand metabolic pathways underlying the main metabolic contrasts between genotypes and, in combination with genetic analyses, to identify mQTLs and genes underlying key steps in metabolic pathways. In addition, metabolic profiling allows the detection and subsequent identification of unknown compounds correlating with a trait of interest, such as pungency, colour or flavour. There is no doubt that the concept of metabolomics-assisted breeding is a novel and powerful approach leading to new targets for breeding programs aimed at the improvement of metabolite-based quality traits (Fernie and Schauer 2009).
In addition to phenotypic information, knowledge about genetic diversity among accessions is also important to maximize the rate of crop improvement (Geleta et al. 2005). Genetic diversity can be measured using molecular markers and one of the methods is amplified fragment length polymorphism (AFLP) developed by Vos et al. (1995), which has been used extensively for genetic mapping and fingerprinting in plants. Combining genetic and phenotypic information, such as metabolite profile, may help breeders to develop new varieties with desired specific metabolite compositions and related traits.
Here we present a broad overview of the metabolic variation in the fruits of a collection of 32 diverse pepper accessions, by measuring the composition of semi-polar and volatile metabolites using LC-MS and headspace GC-MS, respectively. Our results showed that the diversity of volatile and semi-polar metabolites detected in the accessions separated them according to species and pungency, which correlated well with the phylogenetic relationships measured using AFLP molecular markers. Metabolite identification revealed the metabolic pathways underlying the metabolic differences between genotypes, including several known flavour-related pathways. This result offers both new sources and novel traits for the genetic improvement of the quality of cultivated pepper.

Plant materials and sampling protocol
A total of 32 accessions from Capsicum annuum, Capsicum chinense, Capsicum baccatum and Capsicum frutescens were selected based on the variations in morphological characters and geographical origins as presented in Table 1 (Wahyuni et al. 2011). Plant growth condition, experimental design and sampling protocol were performed as described by Wahyuni et al. (2011). Briefly, the experimental design consisted of two randomized blocks, each comprising all 32 accessions grown in plots of four plants. Fruits of two biological replicates, consisting of 10-50 ripe fruits (depending on fruit size) from the two innermost plants of a plot, were harvested for each accession. Before harvesting fruits for metabolic analyses, we carefully followed the ripening stages of the first set of fruits produced by each accession. Fruit ripeness was indicated by a combination of (i) its mature colour based on the description in the CGN database, (ii) fruit firmness and (iii) number of days after fruit setting (on average 70-80 days after fruit setting). Young leaves and fruit pericarp tissue (separated from placenta plus seeds) were collected, frozen in liquid nitrogen, ground and stored at -80°C prior to analysis.

Extraction and analysis of semi-polar metabolites
Semi polar metabolites were extracted using the protocol described previously by de Vos et al. (2007). Briefly, 500 mg of freeze-ground material of pepper pericarp were extracted with 1.5 ml of 99.875 % methanol acidified with 0.125 % formic acid. The extracts were sonicated for 15 min and filtered through a 0.2 lm polytetrafluoroethylene (PTFE) filter. For each accession two biological replicates were prepared, resulting in a total of 64 extracts.
To check the technical variation, including extraction, sample analysis and data-processing, quality control samples were prepared by pooling fruit material of several randomly chosen accessions, extracted using the same procedure and injected after every 16 accession sample extracts.
All the extracts were analysed using reversed phase liquid chromatography coupled to a photodiode array detector and a quadrupole time of flight high-resolution mass spectrometry (LC-PDA-QTOF-MS) system, using C18-reversed phase chromatography and negative electrospray ionization, as described previously (de Vos et al. 2007). For LC-PDA-QTOF-MS, 5 ll of the extract were injected and separated using a binary gradient of ultrapure water (A) and acetonitrile (B), both acidified with 0.1 % formic acid, with a flow rate of 0.19 ml/min. The initial solvent composition consisted of 95 % of A and 5 % of B; increased linearly to 35 % A and 65 % B in 45 min and maintained for 2 min. The column was washed with 25 % A and 75 % B for 5 min and equilibrated to 95 % A and 5 % B for 2 min before the next injection.
The putative identification of differential semi-polar metabolites was performed by re-injection of the extract from one of the biological replicates of each accession on a LC-LTQ-Orbitrap FTMS hybrid mass spectrometer (Thermo Fisher Scientific) and performing MS n fragmentation The LTQ-Orbitrap hybrid mass spectrometer system was set up in negative ionization mode, as previously described by van der Hooft et al. (2011). Xcalibur software (Thermo Fisher Scientific) was used to control all instruments and for data acquisition and data analysis.

Extraction and analysis of volatile metabolites
Volatile metabolites were extracted using the protocol previously described by Tikunov et al. (2005). Briefly, 750 mg of freeze-ground pepper pericarp of each sample was weighed in a 5-ml screw-cap vial, closed, and incubated at 30°C for 10 min in a water bath. An aqueous 0.75 ml of EDTA-NaOH solution (100 mM EDTA solution pH adjusted to 7.5 with NaOH) was added to the incubated sample and followed by the addition of solid CaCl 2 (5 M final concentration). The closed vials were sonicated for 5 min and 1 ml of the extract was transferred into a 10-ml crimp vial (Waters), capped, and used directly for headspace SPME-GC-MS analysis as described in Tikunov et al. (2005). As for LC-MS, quality control samples were prepared after eight accession samples to analyse the analytical variation. Each accession was analysed with two biological replicates.

Metabolite data processing
Both volatile and semi-polar data were processed separately through some steps as described in several points below:

Mass spectral alignment, filtering and clustering
Volatile and semi-polar metabolite profiles derived using the SPME-GC-MS and the LC-PDA-QTOF-MS platforms, respectively, were processed as described by Tikunov et al. (2005Tikunov et al. ( , 2010. Both datasets were processed independently by the MetAlign software package (www. metalign.nl) for baseline correction, noise estimation, and ion-wise mass spectral alignment. The MetAlign outputs for both the GC-MS and LC-MS data were processed separately with MSClust software for data reduction and compounds mass spectra extraction ).

Putative identification of semi-polar and volatile metabolites
The identification of semi-polar metabolites was carried out by means of their UV spectra, exact molecular weight and MS n fragmentation pattern. Putative identification of semi-polar metabolites was obtained using different metabolite databases such as Dictionary of Natural Products (http://dnp.chemnetbase.com), KNApSAcK (http:// kanaya.naist.jp/KNApSAcK) and in-house metabolite databases, and using previous results on pepper described by Marin et al. (2004) and Wahyuni et al. (2011). Putative identification of volatile metabolites was performed by automatic matching of their mass spectra extracted by MSClust with the National Institute of Standards and Technology (NIST) mass spectral library entries, using the NIST MS Search v2.0 software (http:// chemdata.nist.gov/mass-spc/ms-search/). The compound hit that showed the highest matching factor (MF) value (C600) and the lowest deviation from the retention index (RI) value was used for the putative metabolite identity. Additional manual spectral matching was performed for selected metabolites with low MF and high RI deviations by deconvoluting the chromatographic peak at the corresponding retention time using AMDIS (version 2.64) (http://www.amdis.net/) followed by matching the resulting spectra with those in the NIST library database.

Multivariate analysis
Semi-polar and volatile metabolite data sets containing the intensity levels of all centrotypes for all pepper samples were analysed separately using multivariate statistical analyses included at the Genemath XT version 1.6.1 software. Pre-treatment of the data was performed by log 2 transformation and mean centering. The pre-treatment data were subjected to analysis of variance (ANOVA) and multiple testing error rates (Bonferroni procedure) to determine the least significance variances among pepper accessions. Metabolites that showed significant differences between accessions, determined by p-values lower than 0.001, were subjected to principal component analysis (PCA) and hierarchical cluster analysis (HCA). HCA was performed by using the UPGMA method and Pearson's coefficient matrix in Genemath XT software. To test the reliability of the dendrogram produced by HCA, bootstrap analysis was performed with 100 replications.

Genetic diversity of Capsicum accessions based on AFLP analysis
The collection of 32 pepper accessions derived from diverse origins (Table 1). To get insight into the genetic relationships, all plants were genotyped using AFLP analysis. Screening of the accessions with five EcoRI 9 MseI primer combinations resulted in the identification of 255 polymorphic bands. A dendrogram resulting from HCA was constructed by counting the presence or absence of these polymorphic bands. The dendrogram showed that the 32 accessions were separated based on Capsicum species (Fig. 1). Two main clusters were observed, differentiating between C. baccatum, denoted as A-1, and the C. annuum-C. frutescens-C. chinense cluster, denoted as A-2. Cluster A-1 grouped all C. baccatum accessions analysed, including C. baccatum var. baccatum (no. 15) and three accessions of C. baccatum var. pendulum: Aji Blanco Christal (no. 30), no. 31, and RU 72-51 (no. 32). Cluster A-2 contained two sub-clusters that distinguished all C. annuum accessions from C. frutescens-C. chinense.
In the C. frutescens-C. chinense sub-cluster, the two C. frutescens accessions, Lombok (no. 28) and Tabasco (no. 29) clustered together and were separated from the C. chinense cluster. Within the C. annuum sub-cluster, several subgroups could be observed, e.g. accessions with pointed fruits (accession no. 4, 7, 1, 5, 8, and 20) and the blocky-type breeding varieties (accession no. 21, 6, 11, 10 and 14). In addition, we observed that accession C. chinense AC 2212 (no. 24) was positioned in the C. annuum cluster, suggesting that this accession, genetically, more likely belongs to C. annuum than to C. chinense. Based on this dendrogram, C. chinense AC2212 (no. 24) is denoted as C. annuum for further discussions.

Principal component and hierarchical cluster analyses of Capsicum accessions based on their metabolites profiles
In order to explore the metabolic variation in pepper fruits, two different analytical platforms were used: LC-PDA-QTOF-MS for the detection of semi-polar metabolites, such as alkaloids, flavonoids and other phenolic compounds, and headspace GC-MS to detect flavour-related volatiles.

Semi-polar metabolites
Pericarp extracts from ripe fruits of the 32 pepper accessions were analysed by LC-PDA-QTOF-MS and the  Table 1 corresponding chromatograms were subjected to full mass spectral alignment using MetAlign software followed by filtering out of low intensity ions. Using the MSClust algorithm, a total of 11,372 ions were grouped into 881 ion clusters, representing reconstructed metabolites. From each mass cluster, the most abundant ion was selected as the representative of each putative compound and subsequently used for analysis of variance. In total 297 compounds showed significant intensity differences (ANOVA; p \ 0.001) between Capsicum accessions. These were used for further analyses, supplemented with 34 additional metabolites, which did not meet the above criteria, but for which a putative identification was obtained (Supplementary Table S1). PCA revealed a separation of the 32 pepper accessions into species groups, based on their semi-polar metabolite patterns (Fig. 2a). The first principal component (PC1) explained 43.4 % of the variation and separated a group of 15 C. annuum accessions, denoted as C. annuum-spI, from the other 17 Capsicum accessions. The second principal component (PC2) explained 9.6 % of the variation and separated the C. chinense group from C. baccatum, C. frutescens and the sub-group of C. annuum, denoted as C. annuum-spII. The latter group contained C. annuum Long Sweet (no. 12), C. annuum Sweet Banana (no. 13) and C. annuum Chili Serrano (no. 22).
The dendrogram produced by HCA of the pepper accessions, based on the variation in 331 semi-polar metabolites (Fig. 3), confirmed the results observed with PCA and provided a more detailed view on the relationships between accessions. HCA distinguished two major clusters, denoted as B-1 and B-2. Cluster B-1 consisted of the C. annuum-spI accessions and showed a clear separation between the wild variety C. annuum AC 1979 (no. 19) and the land races and breeders' varieties in this group. Cluster B-2 contained two sub-groups, one consisting of the C. annuum-spII and the second one separating the C. chinense, C. frutescens, and C. baccatum accessions, conform the PCA results. Accession no. 2, C. chinense I1 PI 281428, which in the PCA plot was located in close proximity to the C. annuum-spII group (Fig. 2a), was biochemically different from the other C. chinense accessions and forms an outlier in cluster B2.
HCA of the set of 331 semi-polar metabolites revealed the presence of several metabolite groups, arbitrarily denoted A to H, characterised by their specific expression pattern across the 32 pepper accessions (Fig. 3). By using different metabolite databases, in combination with accurate mass MS n fragmentation experiments, we could putatively identify in total 88 out of the 331 semi-polar metabolites (Supplemental Table S1). These metabolites belonged to a number of compound classes, such as flavonoids, phenylpropanoids, phenolics, acyclic diterpenoids, branched chain amino acid derivatives and fatty acid derivatives. Structurally related metabolites derived from the same metabolic pathway clustered together, as shown previously for volatiles in tomato (Tikunov et al. 2005) and pepper (Eggink et al. 2011). Compared to the three other species, all C. annuum accessions accumulated relatively high levels of the acyclic diterpenoid phytatetraene, which was present in the form of different conjugated derivatives (groups D, E and F). These diterpenoid glycosides are also called capsianosides (De Marino et al. 2006;Izumitani et al. 1990;Lee et al. 2009) and currently receive increasing interest for their presumed antioxidant (Shin et al. 2012) and pharmaceutical activities (Hashimoto et al. 1997). The difference between C. annuum-spI (cluster B-1) and C.annuum-spII (part of cluster B-2) was mainly due to variation in the decoration of capsianosides, as shown in metabolite groups D and E versus F, respectively. The C. baccatum group could be distinguished from all other genotypes due relatively high levels of several phenolic and phenylpropanoid glycosides, i.e. several coumaroyl, feruloyl and benzyl glycosides. Interestingly, three of these compounds had a sulfate conjugation as well. In addition, the C. baccatum accessions contained several unique double charged compounds which we have not been able to annotate so far.  Table 1 Metabolic diversity in pepper 135 A large proportion of the identified metabolites consisted of flavonoid glycosides. These represented different intermediates in the biosynthetic pathway leading from chalcones (naringenin chalcone), through flavanones to the major pepper flavonoids belonging to both flavones (luteolin and apigenin) and flavonols (quercetin and kaempferol). The abundance of specific flavonoid glycosides appeared to be, at least partly, responsible for the  Table S1. The numbers below the dendrogram correspond to the accession numbers in the first column of Table 1 separation of C. chinense and C. frutescens from the other species (group B and C). Other flavonoid glycosides showed accumulation patterns which were subspecies or genotype-specific rather than species-specific (Group G and H). This was most obvious for C. annuum Long Sweet (no. 12), which contained strongly elevated levels of various flavanone, flavone and flavonol glycosides. This accession was previously shown to be a high-flavonoid pepper accession (Wahyuni et al. 2011). Unfortunately, it was not possible to see a species-or genotype-specific pattern in the flavonoid conjugation suggestive of the presence of specific modifying enzymes in certain genotypes or species groups.
Capsaicinoids were generally not detected in the LC-MS set-up used. However, we could detect one capsaicin analogue ( Fig. 3; group B). The levels of this compound showed a very high correlation with the total capsaicinoid content in pericarp (R 2 = 0.98; Table 2), which was measured previously (Wahyuni et al. 2011). Other metabolites present in the same cluster showed a good correlation with capsaicinoid content as well (Table 2) and, interestingly, those metabolites for which we had a putative annotation all derived from the phenylpropanoid and the branched chain fatty acid pathway (consisting of methyl-branched amino acid degradation and fatty acid synthesis modules), both of which also serve to provide the immediate precursors for capsaicin biosynthesis (Mazourek et al. 2009).
Metabolite cluster C consists of compounds with relatively high abundance in accession cluster B2, consisting of C. annuum-spII, C. chinense, C. frutescens and C. baccatum. Apart from three flavonoids, most metabolites within this cluster could not be identified so far.

Volatile metabolites
Headspace SPME-GC-MS chromatograms of ripe fruit pericarp samples from the 32 pepper accessions were subjected to full mass spectral alignment using MetAlign software, followed by grouping of the resulting 13,833 ion fragments into 408 putative volatile metabolites, using MSClust software. Of those 408 metabolites, 347 showed differential expression patterns based on ANOVA (p \ 0.001) and were selected for further analysis. These 347 volatiles were putatively identified on the basis of both their mass spectrum and retention index, and are listed in Supplementary Table S2. In total 194 volatiles had a NISTmatch factor above 600 and a retention index deviation of less than 50 units, and their putative identification was therefore considered as reliable. PCA analysis based on 347 volatiles (Fig. 2b) revealed that the separation into species groups was not as clear as observed for both the genetic data ( Fig. 1) and the semi-polar metabolites (Fig. 2a). In fact, PC1, which explained 37.7 % of the total variation, differentiated the species groups according to pungency, as previously determined by measuring total capsaicinoid levels (Wahyuni et al. 2011). Accessions of C. annuum were clustered into two groups, denoted as C. annuum-vocI and C. annuum-vocII. C. annuum-vocI consisted of three mild pungent accessions: C. anuum Bisbas (no. 9), Chili Serrano (no. 22) and Chili de Arbol (no. 23), and one high pungent accession: C. annuum AC1979 (no. 19). The second group, C. annuum-vocII, contained the remaining C. annuum accessions, consisting of seven non-pungent accessions and seven low pungent accessions. In addition, PC1 also differentiated the C. chinense accessions into two groups based on pungency. Group one, denoted as C. chinense-vocI, consisted of six C. chinense accessions with a mild level of pungency, whereas C. chinense-vocII contained two low pungent accessions: C. chinense No. 4661 (no. 17) and No. 4661 Selection (no. 18). Also the four C. baccatum accessions were differentiated based on pungency: the high pungent C. baccatum var. baccatum (no. 15) was clearly separated from the three low/mild pungent accessions of C. baccatum var. pendulum and plotted close to the two high pungent accessions of C. frutescens.
HCA revealed two main accession clusters, denoted as D-1 and D-2, (Fig. 4). This separation was driven by a large set of volatiles which were much more abundant in fruits of accession cluster D-1 compared to D-2. We arbitrary divided the volatiles characteristic for cluster D-1 into two groups A and B based on their accumulation pattern (Fig. 4). On average the similarity distances between volatiles of group B were the shortest over the whole dendrogram, indicating a very high pattern similarity across the accessions. A major portion of this volatile cluster consisted of various methyl-branched esters, which are products of the branched chain amino acid degradation pathway. Some of these compounds could also be observed in the neighbouring cluster A together with sesquiterpenes, which were clustered in this group. In summary, methylbranched esters and sesquiterpenes were much more abundant in fruits of accession cluster D-1 compared to D-2. Interestingly, these two accession clusters correspond to the division that could be made based on fruit pungency, which was also apparent in the PCA results. Cluster D-1 was comprised of accessions with mild and high levels of pungency, corresponding to C. annuum-vocI, C. chinense-vocI, C. frutescens and C. baccatum var. baccatum. Cluster D-2 contained the low and non pungent accessions and consisted of three sub-clusters containing C. annuum-vocII, C. chinense-vocII and C. baccatum var. pendulum. The total amount of methyl-branched esters, in general, fits well with the quantitative patterns of capsaicinoids in both pericarp (R 2 = 0.9) and placenta (R 2 = 0.7) across all 32 accessions, as previously determined by Wahyuni et al. (2011), but detailed analysis of the pungent group shows that accumulation of these volatiles fit the capsaicinoid accumulation in pericarp better than capsaicinoid amounts found in placenta ( Fig. 5; Table 2). The apparent division of genotypes based on pungency likely reflects the activity of the branched-chain amino-acid degradation pathway, which not only leads to the production of methyl-branched esters, but also provides precursors for the production of the acyl moiety of capsaicinoids. Monoterpene volatiles, clustered in compound group E, drove further differentiation of C. annuum-vocII into six sweet breeders ' varieties (accession no. 6, 11, 10, 14, 13 and 21) and the low pungent, pointed C. annuum accessions (accession no. 24, 1, 20, 5, 8, 12, 4 and 7), which showed a significantly higher emission of monoterpenes. In addition, C. chinense accessions generally showed a low emission of monoterpenes. Another group of volatiles showing biochemically driven compound clustering is group D, in which several volatiles originating from the lipoxygenase pathway showed very similar intensity patterns across the accessions and were on average low in the six sweet breeder's C. annuum accessions mentioned above. Group G did not reveal a clear biochemically driven compound clustering as it consisted of volatiles of various biochemical origin.
In general, methyl-branched esters and sesquiterpenes appeared to be the most differentiating compounds between the accessions analysed. Their quantitative patterns show a high similarity to the pungency pattern of the accessions suggesting that these volatiles and capsaicinoids may have common factors which control their accumulation in pepper fruit.

Discussion
The dendrogram produced using AFLP markers revealed the phylogenetic relationship between the four Capsicum species used in this study. The phylogenetic relationships fit well with the geographic dispersion of the species and the common ancestor of the C. annuum-C. chinense-C. frutescens species complex, which diverged from C. baccatum at an early stage during evolution (Basu and De 2003;Djian-Caporilano et al. 2007;Eshbaugh 1970). This is also in line with results from previous studies in which different Capsicum germplasm was compared (Ince et al. 2010;Kochieva et al. 2004;Toquica et al. 2003). Further differentiation can be seen within the C. annuum group, in which bell pepper breeding lines and pointed peppers formed separate groups. This molecular differentiation most likely reflects breeders' efforts to develop and genetically select for specific pepper fruit types.
The overall composition of semi-polar metabolites was strongly determined by the species group, indicating that genetic differences between species are reflected in Ret retention time (minutes), Mass nominal mass (in case of volatiles) or exact mass (in case of semi-polar metabolites), RI retention index, HCA cluster refers to the cluster in the HCA for volatiles (Fig. 3) or semi-polar metabolites (Fig. 4). Only metabolites with a putative identity are shown. Total capsaicinoids used in the correlation analysis was measured previously (Wahyuni et al. 2011) Metabolic diversity in pepper 139 metabolic differences. This may be due to differences in the regulation of metabolic pathways, differences in the activity of key enzymes determining the flux through the pathway, or the activity or substrate specificity of specific modifying enzymes. The importance of the latter enzyme group was illustrated by the large, species-specific, variation in the ''decoration'' of flavonoids and capsianosides. In contrast to the AFLP data, semi-polar metabolites differentiated the C. annuum group from C. chinense-C. frutescens-C. baccatum. This differentiation may be due to the domestication and selection history of the accessions studied. Indeed, C. annuum accessions have been most commonly used for breeding purposes and therefore may have been exposed to the strongest selection history, including selection for desirable traits during breeding. Selection on specific mutations will have a marginal impact  Table S2. The numbers below the dendrogram correspond to the accession numbers in the first column of Table 1 on the total genetic make-up of the species group, but may strongly affect the expression of genes thereby altering specific metabolic pathways, as shown for tomato ripening (Kovács et al. 2009), or high pigment Levin et al. 2006) mutants. In pepper, this is clearly demonstrated by comparing the position of the wild variety C. annuum AC 1979 (no. 19) relative to those of the cultivated C. annuum accessions in the phylogenetic tree based on AFLP markers ( Fig. 1) with its position in the HCA based on semi-polar metabolites (Fig. 3).
Unlike the overall metabolite profiles, targeted analyses of a specific set of metabolites, such as carotenoids, capsaicinoids and vitamins C and E, revealed a large variation between individual accessions rather than between species groups (Wahyuni et al. 2011). Variation in these health-related metabolites may have resulted from breeding efforts aimed at selecting for consumer-driven attributes such as carotenoids for fruit color and capsaicinoid levels to enhance or avoid pungency, rather than from differences between species.
In case of volatiles, the accessions were primarily clustered based on pungency rather than on species (Fig. 2b). Identification of the volatiles driving this contrast revealed a relatively high abundance of sesquiterpenes and methylbranched fatty acid esters in the pericarp of pungent accessions. The total amount of methyl-branched fatty acid esters showed a strong correlation with the total amount of capsaicinoids in the pericarp and, to a lesser extent, to capsaicinoids in the placenta (Fig. 5) and the same holds for the individual volatiles in metabolite cluster A of the HCA, including four sesquiterpenes (Table 2). In this respect we also observed a close clustering of methyl-branched esters with the capsaicin analog in a HCA based on a combined dataset of both semi-polar and volatile metabolites (Supplemental Fig. 1; Table S3). However, there is no obvious relation between the metabolic pathways leading to sesquiterpenes and the pungent capsaicinoids, suggesting that this correlation is most likely due to population structure in this germplasm collection rather than to a causal relationship. This can, however, only be addressed by the use of segregating populations. We can not exclude that population structure may also account for the correlation of methyl-branched esters with pungency. However, methylbranched fatty acid esters are derived from the catabolism of branched-chain amino acids, such as valine, leucine and isoleucine, which also serve as direct precursors for the acyl branch of capsaicinoid biosynthesis (Mazourek et al. 2009). Hence, methyl-branched fatty acid esters may reflect the activity of the capsaicinoid pathway. The primary site of capsaicinoid production is the placenta, from where capsaicinoids are transported into the apoplast and stored in so-called 'blisters', or towards other tissues or organs, such as the fruit pericarp, stems and leaves (Broderick and Cooke 2009;Estrada et al. 2000;Kim et al. 2011;Stewart et al. 2007). The strong correlation of methyl-branched fatty acid ester levels with capsaicinoid levels in the pericarp suggests that these methyl-branched esters are also synthesized in the placenta from where they are subsequently transported to the pericarp. Indeed, Moreno et al. (2012) recently showed a 12-48-fold higher accumulation of methyl-branched esters in the placenta compared to the pericarp.
In pepper, pungency is primarily determined by the Pun1 gene, which encodes capsaicin synthase, the final step in capsaicinoid biosynthesis. This putative acyltransferase is responsible for the formation of capsaicinoids through transfer of the acyl moiety of the capsaicin pathway to vanillin-amide, the product of the aromatic branch of the capsaicin pathway (Mazourek et al. 2009;Stewart et al. 2005). Loss of function mutations in this gene lead to the absence of capsaicinoids and a lack of pungency. Quantitative differences in pungency likely reflect the activity through the aromatic and acyl branches of the capsaicin pathway and several QTLs affecting pungency have been described (Ben-Chaim et al. 2006;Blum et al. 2003;Mazourek et al. 2009). Most of the current sweet pepper breeding germplasm has a loss of function mutation in the pun1 gene (Mazourek et al. 2009). It is evident that these accessions not only lack capsaicinoids, but are also Fig. 5 Bar-plots representing relative abundance of metabolites in 32 pepper accessions: a total methyl-branched esters in pericarp, b total capsaicinoids in pericarp and c total capsaicinoids in combined placenta and seeds. Total capsaicinoids in pericarp and placenta and seeds were based on Wahyuni et al. (2011). The numbers below the bar plots correspond to the accession numbers in the first column of Table 1 Metabolic diversity in pepper 141 low in methyl-branched fatty acid esters, suggesting that the entire pathway leading to capsaicinoids is down-regulated in these accessions. There are several possible explanations for this observation. Firstly, it is possible that the Pun1 locus or capsaicinoids, directly or indirectly, regulate the flux through the entire upstream biochemical pathway, as suggested by increased and co-regulated expression of several capsaicin pathway genes in Pun1 compared to pun1 genotypes (Stewart et al. 2005;Stewart et al. 2007). Secondly, we cannot exclude that the capsaicin synthase enzyme has a broad substrate preference and also functions as an ester-forming enzyme required for the esterification of methyl-branched fatty acid esters (Stewart et al. 2005). Thirdly, it is possible that breeding efforts selecting for sweet bell peppers not only targeted the pun1 mutation, but also selected for genotypes with a low  (2010). Genotypes are represented in the same order as in the HCA (Fig. 4). Values represent log 2 values of mass peak intensities determined for each volatile in the set of 32 pepper accessions. Colours represent relative intensities for each volatile from dark green (low intensity) to dark red (high intensity) activity of the upstream branches of the capsaicin pathway, which may be due to mutations in one or more genes encoding enzymes or regulatory factors of this pathway. The latter option would make it possible to introgress the pungent aroma into a non-pungent pepper background.
(Sub)species-specific clustering of volatiles was mainly due to qualitative and quantitative differences in the accumulation of metabolites derived from the major pathways leading to volatile production: aromatic and branched chain amino acid catabolism, fatty acid degradation, mono-and sesquiterpene biosynthesis and, to a lesser extent, carotenoid degradation, reflecting variation in the activities and substrate specificities of the corresponding enzymes. Although we did not conduct any sensorial analyses on these accessions, it may be expected that the observed metabolic differences form the basis for species-specific differences in aroma and opens up possibilities to breed for novel aromas in cultivated pepper backgrounds. Rodríguez-Burruezo et al. (2010) determined the volatile composition in ripe fruits of 16 Capsicum accessions from the annuum-chinensefrutescens complex and combined their metabolite analysis with taste panel data and sniffing port analyses. They concluded that the diversity in aromas found in their accessions was due to variation in the levels of at least 23 odour-contributing volatiles. Of those, 16 could also be detected in our data set (Table 3). In agreement with Rodríguez-Burruezo et al. (2010), we found that pungent accessions contain the highest levels of aroma-contributing volatiles, such as the sweet, fruity methyl-branched esters hexyl-2/3-methylbutanoate, 4-methylpentyl-3-methylbutanoate and 4-methylpentyl-4-methylpentanoate, the fatty acid derived aldehydes nonenal and nonedienal, which give rise to a green cucumber-like odour impression, as well as the fruity, floral carotenoid-derived volatiles alpha-and beta-ionone (Fig. 4, clusters A and B). The characteristic green bell pepper volatiles pyrazine and 2-heptanethiol were detected at varying levels in all genotypes analysed, irrespective of pungency level or species. Pointed C. annuum accessions were relatively rich in the monoterpenes alpha-pinene (pine, wood-like), linalool (citrus, fruity, floral) and 1,8-cineole (eucalyptus-like), as well as the hydrocarbon ectocarpene which has a green, sweet odor description (Fig. 4, cluster E). Finally, we observed that the important fatty-acid derived volatile hexanal was most abundant in all C. baccatum accessions analysed (Fig. 4, cluster D).

Conclusions
The results of this study clearly demonstrate that there is a large metabolic variation present in the pepper germplasm collection analysed. Species-driven metabolic differences are the major determinants of the variation in semi-polar metabolites, whereas pungency was the main driver responsible for the variation in aroma volatiles. The levels of the metabolites analysed varied greatly among fruits of different accessions, demonstrating the potential of the current germplasm collection for genetic improvement of metabolic traits. In addition to promising sources of healthrelated flavonoids and capsianosides, we identified accessions with high levels of several established flavour-related volatiles, such as methyl-branched fatty acid esters, fatty acid derived volatiles, such as hexanal, nonenal and nonedienal, and monoterpenes. These accessions are potential candidates for breeding programs aimed at developing new pepper cultivars with improved flavour and other consumer quality characteristics. Our results also indicate the value to explore the metabolic variation with different analytical platforms and to couple metabolomics with genetic analysis as a strategy to target crop breeding programs to phenotypic diversity for important quality traits.