Mitochondrial genome variation between different accessions of Matricaria chamomilla L. (Asteraceae) based on SNP mutation analysis

Matricaria chamomilla L. (chamomile, Asteraceae) (GRIN, The Plant List 2013) has a long history of usage in traditional herbal medicine and is still today amongst the most important medicinal plants. Despite this importance, genetic diversity of cultivated and wild germplasm of M. chamomilla was rarely investigated so far. The objective of this study was to estimate the mitochondrial (mt) diversity of various cultivated M. chamomilla genotypes by determining point mutations in the mt genome. 89 SNPs (single nucleotide polymorphisms) were identified in the next generation sequencing data of 33 genotypes from 11 di- and tetraploid chamomile accessions representing a sequence diversity of 0.32 SNPs/kb. Based on the SNP analysis 19 mitochondrial haplotypes (mitotypes) could be specified with genetic distances ranging between 0.011 and 0.851. The examined mt variability within the accessions was higher than expected; only one monomorphic accession (variety ‘Camoflora’) was identified. Diploid accessions exhibited with 1.9 mitotypes per accession a higher variability than tetraploid accessions with a ratio of 1.3. Although some of the mitotypes were distributed over different accessions, identical mitotypes within di- and tetraploid accessions could not be determined. Furthermore, the mitotypes did not correspond to the geographical origin of the accessions. Although not the whole mt genome could be assembled in this study, the substitutions identified represent a valuable tool for further investigations of maternal phylogenetic relationships within M. chamomilla.


Introduction
Matricariae flos (flowers from Matricaria chamomilla L., common name chamomile, Asteraceae) (GRIN, The Plant List 2013) is one of the most important medicinal plant raw materials with numerous applications in the pharmaceutical and cosmetic area. M. chamomilla is native to Europe and West Asia from where it gradually spread over many temperate zones worldwide (Franke and Schilcher 2007). The first documentation of cultivation of this plant dates back to 9000-7000 BC (Franke and Schilcher 2005). To meet the extensive demand on plant raw material, first breeding attempts started approximately 50 years ago (Franke and Schilcher 2007) with the centers being situated in the Czech and Slovak Republic, Poland, Hungary and Germany (Das 2014;Seidler-Lozykowska 2016). Breeding material was originally gained from wild collections, but information about the parental populations is available for only a few varieties (Das 2014). M. chamomilla is mainly outbreeding and originally diploid with tetraploid varieties induced by artificial polyploidisation to obtain higher crop yields due to larger flower heads or to obtain higher contents of essential oils or medicinally active compounds (Das 2014). Another aspect for the implementation of tetraploid varieties was to avoid crossings between cultivated and wild chamomile plants (Das 2014). However, accessions of both ploidy levels are cultivated and most of them still exhibit high phenotypic variability (Das 2014;Plescher and Sonnenschein 2013).
Genomic approaches to investigate the genetic diversity of cultivated and wild chamomile germplasm in order to improve breeding processes have already been made (Wagner et al. 2005;Solouki et al. 2008;Pirkhezri et al. 2010;Okon et al. 2013;Ahmadi et al. 2014;Otto et al. 2017). Nevertheless, the number of studies conducted on this topic is low compared to other crop plants and no effort has been made to investigate the diversity of the chloroplast (cp) or mitochondrial (mt) genome of chamomile lines.
One major problem in chamomile cultivation is the germination of chamomile (as weed) in the following crops because of self-seeding and seed shedding, which complicates regular crop rotation (Franke and Hannig 2012). The use of male sterility systems, already common in many crop species (Balk and Leaver 2001;Rao et al. 2018), for the prevention of self-fertilization would indeed improve the attractiveness of chamomile cultivation. Beneath others, the cytoplasmic male sterility (CMS) is one way to produce male sterile lines with preserved female fertility. Rearrangements or mutations within the mt genome, especially in the open reading frames (ORFs), often are responsible for the development of novel chimeric genes and thus the probability of male sterility in the progenies (Schnable and Wise 1998;Budar et al. 2003;Hanson and Bentolila 2004;Chase 2007). In contrast to animal and fungal mitochondrial DNA (mtDNA), the mt genome of plants differs greatly in size, gene content and gene order and alternative mitotypes coexisting with the main mtDNA are described (Gualberto et al. 2014;Kersten et al. 2016). The rare point mutations in the mitochondrion of plants compared to that of animals are believed to be explained by the occurrence of repair mechanisms because of an active DNA recombination (Gualberto et al. 2014). Due to these innate recombination and replication processes, the plant mt genome undergoes homologous recombination frequently, although only a low rate of point mutations is observed (Gualberto et al. 2014). The use of mt sequences for phylogenetic studies in plant genera therefore stands back behind the application of cp sequences although mtDNA can be rather informative for the study of maternal relationships, migration routes of populations in phylogeographic studies or the exploration of genetic backgrounds (Tomaru et al. 1998;Gugerli et al. 2001;Besnard et al. 2002;van de Paer et al. 2018). Another drawback of using mt markers is the low number of published universal primers. This is another reason why mt sequence information of plants is still underrepresented in public genebanks. Newer techniques like next generation sequencing (NGS) now enable the production of a high amount of sequences thus facilitating the exploitation of whole genome information and the identification of high quality SNPs in the resulting sequences.
Nevertheless, to date, the mt genome sequences of only seven Asteraceae species are deposited in Genbank with the total lengths of 363,342 bp in Lactuca sativa (Kozik et al. 2019), 363,328 bp in Lactuca serriola (Kozik et al. 2019), 208,097 bp in Chrysanthemum nankingense (Wang et al. 2018), 211,002 bp in Chrysanthemum boreale (Won et al. 2018), 277,718 bp in Diplostephium hartwegii, 300,945 bp in Helianthus annuus (Grassa et al. 2016) and 453,334 bp in Conyza canadensis (Peng et al. 2014). Sequence information of a total mt genome of M. chamomilla is not published yet.
In this study, we examine the mt diversity of different genotypes of M. chamomilla in order to determine their maternal relationships.
The obtained data can be useful for reconstructing the breeding history of the examined accessions and, eventually, to assign cultivated accessions to geographical origins. The identification of mitochondrial markers can further help organize genetically wide crossings increasing the probability of finding male sterile plants.

Plant material
Based on the data of published GBS (genotyping by sequencing) analyses (Otto et al. 2017), eleven genetically diverse di-and tetraploid accessions of Matricaria chamomilla L. (Table 1) were selected and cultivated in a greenhouse. Three plants per accession were selected randomly for the extraction of mitochondrial DNA and further analysis.
Enrichment of mitochondria and extraction of mitochondrial DNA Mitochondrial DNA was extracted according to the protocol of Triboush et al. (1998) with minor modifications. 1 g of very young leaves and apical shoot tips were ground in 10 ml of the recommended prechilled buffer STE with mortar and pistil using autoclaved sea sand. We did not filter the lysate but removed sea sand and cell debris by the first centrifugation step. Centrifugation steps for removal of nuclei and pelleting of mitochondria were done twice to optimize mt enrichment. All steps for isolation of mt organelles were performed on ice, centrifugation steps were done at 0°C using the Hettich 320R centrifuge (Hettich, Bäch, Switzerland). After lysis of the organelles, centrifugation steps were done at room temperature using the microcentrifuge Hettich Mikro 200 (Hettich, Bäch, Switzerland). For the extraction of DNA phenol:chloroform:isoamylalcohol (25:24:1) and chloroform:isoamylalcohol (24:1) were consecutively used instead of phenol:chloroform 1:1 for removal of proteins. The second precipitation step of the protocol for a better removal of chloroplast DNA was skipped and the reaction was stopped after elution of the first DNA pellet in TE pH 8.0.
The quality and quantity of the DNA extracts were measured using a spectrophotometer (NanoDrop2000 TM , Thermofisher Scientific, Waltham, USA).
To evaluate residual amounts of nuclear and chloroplast DNA, samples were tested in a qPCR run by amplifying one mitochondrial (cox 1), chloroplast (trnL-F-IGS) and nuclear (ITS1) gene region. Concentrations of 1 ng/ll were compared to a predefined standard sample extracted with the same protocol and a two-fold dilution series (2-0.2 ng/ll) of a standard sample of total genomic DNA extracted using a modified CTAB-protocol (Schmiderer et al. 2013). Standard samples were set up by using a mixture of 'Bona' seedlings for both extractions. Implementing that the analysed gene regions are representatives of the whole plant genome, the standard sample showed a relation of 0.1%, 1.5% and 98.4% of the nuclear, chloroplast and mitochondrion amplicon, respectively. With the CTAB protocol without mt enrichment the relation of amplicons was 36.1%, 33.2% and 30.7%. The samples foreseen for sequencing were in the range of the standard sample.
For a 10 ll PCR reaction 1 ng of DNA was added to a master mix containing 19 HOT FIREPol Ò EvaGreen Ò HRM Mix (no ROX) (Solis BioDyne, Tartu, Estonia) and 150 nM forward and reverse primers (Life Technologies, Vienna, Austria) (Online Resource 1), respectively. The PCR cycle profile included a denaturation step at 95°C for 14 min, followed by 50 cycles of 95°C for 10 s, 55°C for 20 s and 72°C for 20 s. Samples and no-template controls were analyzed in duplicates. The qPCR was performed with the Rotor-Gene TM 6000 (Qiagen, Hilden, Germany) and analyses were conducted using the Rotor-Gene TM 6000 software.

NGS
Library construction, adapter ligation and sequencing (5 Mio. read pairs, 0.6 GB of an Illumina Miseq run) were outsourced (LGC Genomics GmbH, Berlin, Germany). After DNA fragmentation with a focusedultrasonicator (Covaris, Woburn, Massachusetts, USA) libraries were prepared using the Ovation Rapid DR Multiplex System 1-96 (NuGEN) including the following steps: end repair, ligation, final repair, library purification and library amplification (14 cycles). Libraries were pooled, purified and size selected via preparative gel electrophoresis and quality controlled using the BioAnalyzer (Agilent, Santa Clara, CA 95051, USA) and Qubit fluorometer (Thermofisher Scientific, Waltham, USA).
One sample from the accession 'Bona' was previously sequenced in a separate run to test for sequence quality and mt enrichment. The Illumina bclfastq 1.8.4 software was used for demultiplexing and quality filtering of the reads under the allowance of one or two mismatches and removal of short reads (\ 20 bases).
NGS data were deposited in Genbank (NCBI) under the following dataset: Temporary Submission ID: SUB5046906/BioProject ID: PRJNA515664.

Sequence assembly and SNP detection
After quality filtering, paired reads were mapped to Diplostephium hartwegii (KX063855), an Asteraceae species, which was used as reference sequence using the Geneious mapper [Geneious 9.1.5 (http://www. geneious.com, Kearse et al. 2012)] with low sensitivity under default parameters and choosing the function to find structural variants, short insertions and deletions of any size (Online Resource 2).
The resulting consensus sequence for mtDNA of all chamomile samples was used as new reference sequence to which paired reads of the samples were mapped separately under the same parameters. This resulted in individual consensus sequences for the samples, which were again aligned with Diplostephium hartwegii as reference using the LASTZ alignment as implemented in Geneious 9.1.5. Annotations were consecutively adopted from Diplostephium hartwegii. SNPs were detected with the SNP caller as implemented in Geneious 9.1.5 with default parameters and were used to set up the different mitochondrial haplotypes (= mitotypes). In order to compare the SNP frequency of chamomile with that of other species, the mitochondrial genomes of Lactuca sativa (NC_042756) and L. serriola (NC_042378) (Kozik et al. 2019) were downloaded from NCBI, aligned with MAFFT v7.308 (Katoh and Standley 2013) and SNPs detected as explained above.

Statistics
All genetic distances were calculated and visualized in R 3.5.2 (R Core Team 2018) with the R package poppr 2.8.1 (Kamvar et al. 2014(Kamvar et al. , 2015 using Nei's distance for distance calculation and the Davidson-Harel layout algorithm (Davidson and Harel 1996) for the minimum spanning network.

Results
Mitochondria enriched DNA of eleven accessions (32 plants) was sequenced on an Illumina Miseq run (5 Mio. read pairs). The run yielded 10,163,050 reads ranging between 87,428 and 741,788 raw sequences per sample (Online Resource 3). One sample ('Bon-a5') was previously sequenced in a separate run (30,806 raw sequences) to test for sequence quality and mitochondrial enrichment. An overall mitochondrial consensus sequence (162,685 bp) was established using paired reads of one sample per accession (about 30% of paired reads) and Diplostephium hartwegii (KX063855) as reference with a mean coverage of 5109 sequences. Paired reads of the samples were mapped separately to this consensus to generate individual mt consensus sequences. The mean coverage of individual samples ranged from 71 in 'Promyk10' to 479 in 'Hun2_1'. Only two samples were outliers in the low range, the overall mean coverage was 201. The test sample 'Bona5' yielded a sequence coverage of 12. Consensus sequences were aligned with the reference D. hartwegii in a LASTZ alignment and SNPs were detected. Ninety-seven percent of the sequences (157,993 bp) could be aligned to the genome of D. hartwegii.
Twenty-three percent (36,383 bp) of the assembled sequences belonged to coding regions, 77% (121,610 bp) to non-coding regions. The area of highest diversity was found within the first 20,000 bp, in the region between the sequence coordinates 1 and 20,000 of D. hartwegii (Fig. 1).
In total, 102 SNPs were detected in M. chamomilla in a 277,718 bp alignment, 89 SNPs (representing 0.03% of the aligned sequences) when sequence ambiguities within a single sample or mutations with low sequence coverage were omitted ( Table 2). The sequence diversity between M. chamomilla and D. hartwegii was 0.45% or 1238 mutations within the D. hartwegii genome.
Out of the 89 SNPs 67 polymorphisms were substitutions from purine to pyrimidine bases or vice versa (Table 3). If the mutations would be evenly distributed between coding and non-coding regions, we would expect 20 SNPs in the coding regions based on the 23% coding regions found in our assembly. However, only 11% of the SNPs (10 mutations) were found in the genic regions, 89% (79 SNPs) were located in the intergenic regions. Three of the point mutations in the genic regions were found within the cox2 gene.
The SNP markers identified in this study distinguish nineteen haplotypes (mitotypes) in M. chamomilla (Online Resource 4). The mtDNA of only one accession ('Camoflora') was monomorphic for all three individuals, all other accessions were polymorphic. However, individuals of different accessions shared the same mitotype, although this was restricted to individuals of the same ploidy level (Table 4 and Fig. 2). Within the diploid accessions, fifteen mitotypes in twenty-four individuals were found, whereas four different mitotypes were identified in nine individuals of the tetraploid accessions. The ratio of mitotypes per accession was with 1.9 therefore slightly higher in diploid accessions than the ratio of 1.3 in tetraploid accessions. Genetic distances between the mitotypes ranged between 0.01 and 0.85. Three main clusters separating from each other with higher distances (0.43-0.45) could be identified in a minimum spanning network (Fig. 2). Cluster I comprising the tetraploid Mitotype 16 ('Goral', 'Margaritar') and several diploid accessions, cluster II is composed by the tetraploid Mitotype 19 and the diploid Mitotype 17 and cluster III is built by the rest of the tetraploid individuals (Mitotype 18) and the main group of diploid individuals ('Promyk', 'Camoflora', 'Soroksári', 'Argenmilla' and 'Bona'). Clusters I and III could be further divided into several subclusters. Genetic distances between Mitotype 16 (49) and Mitotype 10 (29) (0.11) or Mitotype 16 (49) and Mitotype 13 (29) (0.07) for example are higher than between diploid mitotypes within the same cluster. The highest genetic distance (0.85) was calculated between Mitotype 1 ('Bona') and Mitotype 17 ('Margaritar'). The accessions with the highest variability within were the Croatian provenance 'PG029' (29), the Argentinean accession 'Argenmilla' (29), 'Bona' (29) from the Slovak republic and 'Margaritar' (49) from Romania. The three mitotypes of the accession 'Margaritar' are so diverse that they are dispersed on the three main clusters of the minimum spanning network with genetic distances of 0.5-0.7. The three mitotypes of 'Bona' and 'PG029' respectively can be found in clusters I and III with one mitotype each that is more distantly related to the others (* 0.6). Due to the high amount of polymorphic accessions it is impossible to determine which of the diploid accessions was used to attain a corresponding tetraploid accession or to determine a common ancestor of diand tetraploid accessions. According to Das (2014), the Slovak accessions 'Bona' (29) and 'Goral' (49) e.g. resulted from breeding experiments using the same plant material. This fact, however, could based on this limited sample set be neither verified nor disproved. The analysed data also did not offer a clear connection between the mt lines and geographic origin of our samples. Anyhow, the high variability within the accessions shows that plants with different genetic background must have been used to establish distinct chamomile varieties.

Discussion
The use of a closely related mt genome as a reference-although of limited use for the construction of the whole mt genome-was here a straightforward approach for the identification of point mutations enabling the evaluation of mt sequence data of a relatively large sample set in a short time.
Anyhow, the real number of mitotypes in M. chamomilla will be underestimated in this study because not the whole mt genome of M. chamomilla was assembled and indels, SSRs and rearrangements were excluded from the analyses.
As mtDNA is-like in most seed plants-in the closely related species Helianthus annuus solely maternally inherited (Wills et al. 2005), the probability that it is likewise in M. chamomilla is high. Anyhow, the mitochondrion e.g. is maternally inherited in Arabidopsis thaliana (Martínez et al. 1997) while paternal or biparental inheritance is specified in Brassica napus (Erickson and Kemble 1990) (both Brassicaceae). Different organelle inheritance modes are also described within Oleaceae ( Van de Paer et al. 2018), which demonstrates that the way of inheritance of mitochondria is not always the same within plant families and it is not proven yet if the mt genome in M. chamomilla indeed is uniparentally inherited. As a slow mutation rate is expected for plant mitochondria (Gualberto et al. 2014), a relatively large number of SNPs would be contradictory in this case. However, an unexpected high variation of substitution rates of mt sequences across plants with some species exhibiting exceptionally high or low substitution rates were detected (Mower et al. 2007). Defects in mt repair mechanisms and/or DNA replication processes were beneath others discussed as possible reason for higher mt mutation rates in those cases (Mower et al. 2007).  Actually, reliable comparisons of our results to other populational studies concerning mt polymorphisms within Asteraceae are limited due to little comparable information. Mt sequence comparison of one cultivated and wild type of Helianthus annuus revealed a significantly lower amount of point mutations (0.027/kb) (Makarenko et al. 2016), the mitochondrions of Lactuca sativa and L. serriola even differed in 6 SNPs only (Kozik et al. 2019). Other publications examining inter-and intraspecific mt variability between related Asteraceae species did either not refer to mt data only (Peng et al. 2014) or used RFLP and no sequence data for the evaluation (Vermeulen et al. 1994). Two breeding lines of Capsicum revealed only one SNP (Wang et al. 2019). Kersten et al. (2016) found a mt sequence divergence of 0.065 SNPs/kb between Populus tremula and P. tremula x P. alba. Mitochondrial sequence comparisons in the subtribe Oleinae revealed 0.21 SNPs/kb between species of this subtribe ( Van de Paer et al. 2018). In comparison to those studies we would at first glance assume a high amount of substitutions in our samples.
A biparental inheritance would explain a high variability but can almost be excluded as the number of substitutions we found in the mt genome was with 0.32 SNPs/kb very low compared to the 9.51 SNPs/kb identified in the GBS analysis of the same species (Otto et al. 2017). The high sequence diversity found in this study might also be a result of the preselection of the analysed accessions by GBS analysis (Otto et al. 2017) that clearly enhanced the probability to find as many mitotypes as possible. Otto et al. (2017) found a tendency towards lower genetic variability within tetraploid accessions and a lacking congruence of genetic structure and geographic origin of the analysed accessions. Our results, although focusing on mt variability, indeed concurred with the results of the GBS analysis.
Breeding of such small-scale crops is very pragmatically performed because of economic and time constraints. The high variability of mitotypes within accessions shows that plants collected for selection were not isolated from each other (so regarded as one big population) and selection was performed by mass selection. Our data gives reason to the assumption that active exchange of plant material between plant breeders resulted in the establishment of numerous haplotypes within one accession. The Romanian variety 'Margaritar' e.g., resulted from breeding experiments using the Polish 'Zloty Lan' and chamomile plants collected in the wild in Poland (Das 2014). Plants from the same populations were used to develop the Polish 'Promyk' via further selection (Das 2014). A clear connection between the two varieties 'Margaritar' and 'Promyk' could not be found in our data but the high variability within the variety 'Margaritar' could be explained by a combination of different chamomile germplasm. Further on, it is possible that the chamomile lines were developed by combining several plants of different natural populations thus enhancing mt variability of the progenies. The varieties 'Bona' and 'Goral' both were attained by breeding experiments using 'Bohemia' and a Spanish chamomile (Das 2014). The use of geographically distinct chamomile populations might have resulted in the establishment of several genetically diverse mitotypes in the accession 'Bona'. As cultivated tetraploid chamomile varieties result from polyploidisation of diploid plants, it is not astonishing that tetraploid and diploid accessions both reflect the picture of high mt genetic diversity. Still it is remarkable that none of the mitotypes determined in diploid accessions could be found in tetraploid accessions as well. However, due to the unexpected high variability, three individuals per accession will not be enough to explain relationships between chamomile varieties and to clarify the breeding history of chamomile.
More samples should be examined in the intention to investigate the variability of the mt genome within the different chamomile origins more deeply. A bigger sample set and the additional analysis of natural populations would possibly provide an insight into the genetic background of the origins suitable for parentage analysis and population genetics.

Conclusion
To date, little is known about heritability, size, organisation or substitution rate of the mt genome of M. chamomilla. By means of this study a first step towards an exploration of mt genome variability in this medicinal plant was made. Promising regions for the development of phylogenetic markers were found which would enable further investigation of the distribution of different mitotypes in chamomile accessions, populations or varieties. Thus, the markers could be used for the development of a fast screening method of mt haplotypes, for evaluation of the organellar inheritance in M. chamomilla and for facilitation of phylogenetic relationships analyses between cytoplasmic lineages or chamomile populations.