Abstract
In this study, the complete plastome sequence of Nigella sativa (black seed), was analyzed for the first time. The plastome spans approximately 154,120 bp, comprising four sections: the Large Single-Copy (LSC) (85,538 bp), the Small Single-Copy (SSC) (17,984 bp), and two Inverted Repeat (IR) regions (25,299 bp). A comparative study of N. sativa’s plastome with ten other species from various genera in the Ranunculaceae family reveals substantial structural variations. The contraction of the inverted repeat region in N. sativa influences the boundaries of single-copy regions, resulting in a shorter plastome size than other species. When comparing the plastome of N. sativa with those of its related species, significant divergence is observed, particularly except for N. damascena. Among these, the plastome of A. glaucifolium displays the highest average pairwise sequence divergence (0.2851) with N. sativa, followed by A. raddeana (0.2290) and A. coerulea (0.1222). Furthermore, the study identified 12 distinct hotspot regions characterized by elevated Pi values (> 0.1). These regions include trnH-GUG-psbA, matK-trnQ-UUG, psbK-trnR-UCU, atpF-atpI, rpoB-psbD, ycf3-ndhJ, ndhC-cemA, petA-psaJ, trnN-GUU-ndhF, trnV-GAC-rps12, ycf2-trnI-CAU, and ndhA-ycf1. Approximately, 24 tandem and 48 palindromic and forward repeats were detected in N. sativa plastome. The analysis revealed 32 microsatellites with the majority being mononucleotide repeats. In the N. sativa plastome, phenylalanine had the highest number of codons (1982 codons), while alanine was the least common amino acid with 260 codons. A phylogenetic tree, constructed using protein-coding genes, revealed a distinct monophyletic clade comprising N. sativa and N. damascene, closely aligned with the Cimicifugeae tribe and exhibiting robust support. This plastome provides valuable genetic information for precise species identification, phylogenetic resolution, and evolutionary studies of N. sativa.
Similar content being viewed by others
Introduction
Chloroplast genome (plastome) comparative analysis has proven to be a valuable tool in phylogeny reconstruction and resolving complex evolutionary relationships1,2,3,4,5. In angiosperms, it has been observed that the number and order of genes in the plastome are generally conserved6. This conservation is attributed to the relatively slower evolution rate of chloroplast sequences compared to nuclear regions7,8. However, it is worth noting that sequence rearrangements in plastome have been reported in various plant species9,10,11. Inverted repeats region (IR) expansions or contractions into single-copy areas containing inversions, as well as significant inversions in large single-copy regions (LSC), are some examples of these rearrangements12,13. These inversion occurrences were most likely caused by intragenomic recombination in areas with varying G + C concentrations14,15 or tRNA activity16. The importance of gene rearrangements and inversions in plastomes for phylogenetic analyses lies in their rarity, ease of homology estimation, and simplicity in determining the polarity of inversion events17,18,19. The comparisons facilitate the investigation of molecular evolutionary patterns linked to structural rearrangement and the clarification of the molecular mechanisms responsible for those occurrences.
With a global distribution, the Ranunculaceae family has about 2000 primarily herbaceous species20,21,22 and is considered one of the oldest families to diverge from the eudicots. It is a large family, which includes approximately 59 genera and numerous Ranunculaceae plants have significant medicinal uses23. Deep discoveries and a reevaluation of the taxonomy of Ranunculaceae have been made possible in recent years by molecular phylogenetics. The results of molecular phylogenetic research have led to the reduction of several genera and the proposal of a new genus21,24,25,26,27,28. Several widely used plastid regions and tandemly repeated DNA have been the primary data sources for all molecular research conducted to date plastomes. Few entire plastomes have been published and made available through GenBank (http://www.ncbi.nlm.nih.gov).
Nigella, commonly known as fennel flower, constitutes a compact genus within the Nigelleae tribe, comprising 18 species in the Ranunculaceae family29,30. This genus is indigenous to Southern Europe, North Africa, South Asia, Southwest Asia, and the Middle East31,32. Nigella comprises fourteen species, N. sativa L. (black cumin) stands out as the most popular medicinal plant. Moreover, the seeds of N. sativa L. are utilized as spices in various culinary applications. N. damascena L. and N. arvensis are annual plants known for their ornamental and medicinal qualities33,34,35. A limited number of studies have examined genetic variation in N. sativa (black cumin) using DNA-based molecular markers36,37. Plastid phylogenomic investigations can be especially effective in elucidating the generic relationships within the Ranunculaceae family. Structural variations in the plastome, such as gene inversions, gene transpositions, and expansion–contraction of the inverted repeat (IR), offer valuable systematic insights into the family22,38,39.
In this study, we sequenced, assembled, and analyzed the complete plastome sequence of the N. sativa plant for the first time, which belongs to the Ranunculaceae family. We compared it with ten previously published chloroplast genome sequences from the Ranunculaceae family obtained from the National Center for Biotechnology Information (NCBI). This study conducted a general characteristic analysis of plastome for all species and compared it with N. sativa. This analysis likely encompassed a thorough examination of various features such as structure, gene composition, and other relevant attributes within the plastome of the studied species. Furthermore, the study involved the identification of microsatellites (SSRs), long repeat sequences, and highly variable regions within the chloroplast genomes of N. sativa and other studied species.
Results
General features and composition of plastome
This research investigates the plastome structure of N. sativa and compares it with the plastomes of ten additional species within the Ranunculaceae family. The complete plastome of N. sativa exhibits a quadripartite structure, consistent with the typical organization found in most land plant plastomes (Fig. 1). The plastome of N. sativa is approximately 154,120 bp in size and is divided into four main sections. These include the LSC region, which spans 85,538 bp, the SSC region covering 17,984 bp, and two IR regions with a total size of 25,299 bp. In this study, the plastome of P. anemonoides emerged as the largest, spanning a length of 164,383 bp, whereas the plastome of N. sativa was identified as the shortest among the 11 selected plastomes. The plastome of N. sativa contains a total of 128 genes, consisting of 83 genes for encoding proteins, 37 genes for transfer RNA (tRNA), and eight genes for ribosomal RNA (Table 1). The gene count for this organism is the most minimal among all plastomes, with A. coerulea displaying a larger total of 140 genes. There is variability in the number of protein-coding genes across the studied species, ranging from 81 to 94. Notably, N. sativa possesses a total of 83 protein-coding genes. Upon examining all species in the study, it is evident that A. glaucifolium boasts the highest number of protein-coding genes (PCGs), while A. coerulea exhibits the lowest count of PCGs. Within the plastome of N. sativa, 11 genes (rps11, rps12, rps14, rps15, rps18, rps19, rps2, rps3, rps4, rps7 and rps8) encode for small ribosomal subunits, while another set of eight genes (rpl14, rpl16, rpl2, rpl20, rpl22, rpl23, rpl33 and rpl36) encode for large ribosomal subunits. Furthermore, there are 45 genes associated with proteins related to photosynthesis, and an additional four genes (rpoA, rpoB, rpoC1, and rpoC2) are involved in encoding DNA-dependent RNA polymerase. Lastly, nine genes (accD, ccsA, cemA, matK, clpP, infA, ycf1, ycf2, and ycf4) are associated with the encoding of other proteins, as outlined in Table 2. The tRNA gene count ranges from 36 (in A. glaucifolium and A. raddeana) to 45 (in A. coerulea), while the rRNA gene count remains constant at 8 across all plastomes. We found 11 intron-containing genes (atpF, ndhA, ndhB, petB, petD, rpl16, rpl2, and rpoC1) in N. sativa plastome, eight of which contained single intron, whereas three genes (clpP, rps12 and ycf3) have two introns each (Table 3). The GC content of the plastome among the 11 species was generally similar, with N. sativa exhibiting a GC percentage of approximately 38%. In contrast, A. coerulea displayed a higher GC content of 39% across all the plastomes examined. In examining PCG length in N. sativa plastome, we found a length of 76,339 bp. Comparative analysis across species revealed diverse PCG lengths, ranging from 75,870 bp (N. damascene) to 84,105 bp (A. glaucifolium). Additionally, IR lengths in plastomes varied from 31,279 bp (A. raddeana) to 25,162 bp (N. damascene), indicating a positive correlation between overall plastome length and IR size across species (Table 1). We examined the codon usage frequency of protein-coding genes in the N. sativa plastome; phenylalanine had the most codons (1982 codons), then Lysine (1912 codons), while Alanine was the least common amino acid (260 codons). Of the total codons analyzed, 35 exhibited a relative synonymous codon usage (RSCU) greater than 1 in the N. sativa plastome. The most favored codon was AGA, encoding arginine, with an RSCU value of 1.78. Following closely, CAU, which encodes histidine, had an RSCU value of 1.44 (Table S1).
Comparative analysis and divergence
The mVISTA analysis uncovered sequence variability among 11 plastomes. In our results, the coding regions displayed comparatively low sequence divergence, while more significant divergence was observed in the non-coding regions. The results of the analysis revealed a noteworthy resemblance between N. damascena and N. sativa in comparison to other species. However, a distinctive pattern of divergence was observed in the region spanning from trnL to ycf1, particularly in the SSC region, as illustrated in Fig. 2. The analysis of various species revealed a variable number of divergences, with a notable pattern observed across different genomic regions. The most substantial divergences were identified within the LSC region, with A. raddeana and A. glaucifolium. Noteworthy divergences were also observed in other species, especially across the psbA to the atpH, rpoB to the trnT, and ycf3 to the ndhJ regions. A striking divergence pattern was also evident in A. coerulea, exhibiting significant distinctions, especially within the rbcL to clpP region in the LSC position. In the SSC region, all plastomes exhibited pronounced divergences compared to N. sativa (Fig. 2). High divergence was noted from ndhF to ycf1, with A. glaucifolium showcasing a particularly significant divergence. Contrastingly, the IR region displayed relatively lower levels of divergence compared to the LSC and SSC regions. The ycf2 gene, however, demonstrated substantial divergence in the IR region across all species, with P. anemonoides exhibiting heightened distinctions. Furthermore, the rpl2 gene displayed notable divergence, particularly in A. coerulea.
The average pairwise sequence divergence was also calculated for the complete plastome and protein coding genes. A. glaucifolium’s plastome displayed the highest average pairwise sequence divergence (0.2851) with N. sativa, followed by A. raddeana (0.2290) and A. coerulea (0.1222). In contrast, N. damascena exhibited a low pairwise sequence divergence of 0.0117 with N. sativa (Table S1 and Fig. 3). Analysis of protein-coding gene divergence in selected plastomes reveals a distinct pattern, depicted in a heatmap. Notably, the ycf1 gene exhibits significant divergence compared to N. sativa, with other divergent genes including rpl14, rpl16, rpl20, ccsA, cemA, matK, psbT, ndhA, and ndhF across all species, except N. damascene, which resembles N. sativa. The highest pairwise sequence divergence is observed in ycf1 at 0.2283. This study provides valuable insights into the evolutionary dynamics and genetic divergence among these species.
The complete plastome of N. sativa was aligned with N. damascena, and DnaSP software calculated nucleotide variability (Pi) to identify mutational hotspots. Nine highly variable loci with elevated Pi values were detected in the chloroplast genomes of both species, highlighting specific regions of sequence diversity. These include six divergent hotspots in LSC regions, trnD-GUC-psbD (0.055) and trnS-GGA-trnl-UAA (0.09), atpB-psaI (0.06), ycf4-cemA (0.08), psbE-petL (0.065), rps8-rpl16 (0.1), and 3 in SSC region ndhF-ndhG (0.8), ndhI-rps15 (0.065), and ycf1 (0.21) (Fig. 3B). Our investigation involved a thorough multiple alignment of nine plastomes, excluding A. glaucifolium and A. raddeana due to their substantial divergence from N. sativa. The analysis revealed 12 divergent hotspot regions with Pi values exceeding 0.1. Noteworthy loci in the LSC region include trnH-GUG-psbA, matK-trnQ-UUG, atpF-atpI, rpoB-psbD, ycf3-ndhJ, ndhC-cemA, and petA-psaJ. In the IR region, trnN-GUU-ndhF, trnV-GAC-rps12, and ycf2-trnI-CAU exhibited divergence. In the SSC region, the ndhA-ycf1 locus (0.27) stands out, as depicted in Fig. 3B. High Pi values in divergence regions highlight significant variations in the entire plastome of N. sativa. Specifically, the ndhC-cemA region shows the highest Pi value at 0.31, followed closely by ndhA-ycf1 at 0.27, providing insights into specific genomic distinctions in these areas.
Plastomes structure variations, inversions, and divergence hotspots
The plastome of the Ranunculaceae family is typically highly conserved, our study revealed variations in certain species compared to the N. sativa plastome. A significant 36 kb inversion in the LSC region (ycf3 to atpA genes) was identified in the plastomes of A. raddeana and A. glaucifolium. Additionally, a 19 kb inversion between ycf1 and ndhF genes in the SSC region was observed in the latter species (Fig. 4). Similarly, in the plastomes of A. coerulea, a 22 kb inversion from atpB to clpP in the large single-copy (LSC) region was observed (Fig. 4). A. raddeana and A. glaucifolium displayed minor inversions and shifts in the psbA and trnH-GUG to psbK region (LSC region). Rearrangements in A. raddeana and A. glaucifolium included the relocation of trnR-UCU, trnG-UCC, and trnS-GCU near ndhJ, as well as the movement of rps4 and rps16 to the genome’s start. Notably, trnK-UUU and matK shifted between rps16 and psbA genes. The absence of the rps16 gene in N. sativa and A. angustius was observed. Additionally, the ycf15 gene was exclusively present in A. glaucifolium (Fig. 4), highlighting distinct genomic variations and structural rearrangements in these chloroplast genomes.
IR expansion and contraction
To explore the potential expansion and contraction of IRs, the distributions of IR and SC border regions in the plastomes of 11 taxa within the family Ranunculaceae were compared. The rps19 gene, present in all species except A. raddeana, A. glaucifolium, P. anemonoides, and A. coerulea, exhibited an unusual behavior by crossing the boundary between the LSC and IRb regions. Notably, the rpl22 gene consistently resided in the LSC region across species, except for A. raddeana, A. glaucifolium, and A. coerulea, where it was absent (Fig. 5). Additionally, the typical placement of the rpl2 gene in the IRb region shifted to the LSC region in A. coerulea. The ycf1 gene in A. glaucifolium fully overlaps the JSB boundary, while across all species, it spans the JSA boundary, predominantly in the IRa region. In N. damascena and N. sativa, ycf1 is in the SSC region. The ndhF gene is closer to the JSB boundary in all species except A. glaucifolium, where it extends beyond the JSA boundary. The psbA is absent in A. glaucifolium, N. damascena, and N. sativa. The trnH gene is absent in A. raddeana and A. glaucifolium. A. coerulea has the rpl23 gene in the IRb region, absent in other species. A. raddeana and A. glaucifolium exhibit unique gene arrangements, with rps11 in the LSC region, infA in the IRb region in A. raddeana, and rps4 exclusively in A. glaucifolium’s LSC region (Fig. 5). This analysis highlights distinct plastome patterns among species. Structural variations in the IR and SSC regions can lead to gene rearrangements40,41. In this study, the lengths of IR regions were extended in P. anemonoides (30,979 bp), A. glaucifolium (31,256 bp), and A. raddeana (31,279 bp). This extension may contribute to the comparatively larger plastome sizes observed in these species compared to the IR region lengths of N. sativa (25,299 bp) and N. damascena (25,162 bp. Contraction and extension were identified in IR and SSC regions across all studied species. Additionally, in species such as A. coerulea, which has an extended plastome, there is an observable extension in the LSC region (Fig. 5).
Repeat and SSR analysis
The number of repeats identified in all selected species ranges from 46 to 50, encompassing 16 to 28 palindromic repeats, 17 to 26 forward repeats, and 0 to 15 reverse repeats (Fig. 6). In N. sativa, the total repeats are 48, including 23 palindromic repeats and 25 forward repeats, with no reverse repeats observed. Across the selected species, all repeat types are predominantly about 18–30 bp in length (Fig. 6). Tandem repeats vary from 14 to 49 in all species, most falling within the 11–20 bp range. Specifically, N. sativa exhibits 24 tandem repeats (Fig. 6C). The SSR analysis of 11 plastomes revealed diversity in microsatellite counts, notably, N. sativa displayed 32 repeats, predominantly consisting of mononucleotide repeats. Additionally, some di- and trinucleotide repeats are present in the SSR analysis. P. anemonoides exhibits the highest number of SSRs among all species, totaling 65 (Fig. 7A). The predominant type of SSRs across all plastomes were mononucleotide repeats, followed by dinucleotide and trinucleotide repeats. However, tetranucleotide, pentanucleotide, and hexanucleotide repeats were absent in all plastomes. A and T repeats constitute a more significant proportion of mononucleotide repeats than G and C repeats. Similarly, in dinucleotide repeats, the AT content represents a more significant proportion than the GC content (Fig. 7B).
Phylogenetic analysis
This study inferred phylogenetic relationships within Ranunculaceae from 73 shared protein coding genes. The Glaucidioideae, Hydrastidoideae, and Coptidoideae emerged as the earliest divergent lineages within the Ranunculaceae family in our study. In our current study, the analysis of plastid phylogenomics revealed a well-supported sister relationship between subfamilies Talictroideae and tribe Adonideae, with a strong bootstrap value of 95. The tribe Asteropyreae and Caltheae were observed to form the same clade in our study, but the support for this grouping is relatively low, with a bootstrap value of 44. Our analysis in Ranunculoideae successfully resolved the sister relationship between the tribes Anemoneae and Ranunculeae, with a robust bootstrap support value of 100 (Fig. 8). In our study, we observed that the position of Nigelleae is situated between Callianthemum and Cimicifugeae based on the protein coding genes data set. This tribe demonstrated its closest relationship with Cimicifugeae, a connection supported by a robust bootstrap value of 100. The phylogenetic trees strongly indicate that N. sativa is most closely related to N. damascene, which belongs to the genus Nigella and forms the same clade.
Discussion
In recent years, the plastome has frequently been employed as a DNA super barcode for the identification, classification, and phylogenetic research of medicinal plants42,43. In this study, we utilized next-generation sequencing to sequence the first complete plastome of N. sativa. The observed quadripartite structure is consistent with the typical organization found in the majority of plastomes of land plants22,44. The plastome sizes exhibited a range, with N. sativa having a size of 154,120 bp and P. anemonoides displaying the largest size at 164,383 bp (Table 1). These findings align with previous studies indicating size variation among plastomes from different genera within the Ranunculaceae family. The plastome sizes in Aquilegia, Delphinium, and Ranunculus have been estimated at 151 kb, 149 kb, and 157 kb, respectively45. Earlier studies on different angiosperm groups have indicated that plastome can be conserved46 or highly polymorphic47,48. Currently, the comparison of 11 plastomes from various genera in the Ranunculaceae family has shown significant variation in plastome structure. Our research aligns with earlier research that found structural variation in Clematis, opposing the assumption of conserved characteristic structures in the plastome49,50. In the present study, we observed significant divergence in the plastome and gene order among genomes such as A. raddeana, A. glaucifolium, and A. coerulea compared to N. sativa. The most notable distinction from the plastome of N. sativa involved a substantial inversion of 36 kb. It was identified between ycf3 to atpA genes (LSC region) in the plastome of A. raddeana and A. glaucifolium, and another inversion of 19 kb was observed in between the ycf1 and ndhF genes (SSC region) in the latter species (Fig. 4). Similarly, an inversion of about 22 kb was detected in the plastome of A. coerulea between the atpB to clpP2 gene in the large single-copy (LSC) region. Furthermore, we observed several smaller inversions, shifts in genes, and rearrangements in the plastome of these species. However, the other species including N. sativa lake inversions and transpositions in their plastome. Our findings are consistent with the research conducted by39, indicating that Clematis has undergone four rearrangements compared to Coptis. Coptis, an ancestral condition in Ranunculaceae, exhibits a typical chloroplast structure. Similarly, minor changes were documented in the family Orchidaceae, specifically involving the inversion of the petN-psbM region51. In contrast, gymnosperms belonging to the Pinaceae family exhibited a distinct pattern with five different plastome structures52. The identification of inversion and transposition events in the plastome of A. raddeana, A. coerulea, and A. glaucifolium is consistent with prior research indicating that the occurrence of structural rearrangements in plastome varies within the family. Previous studies have reported the presence of inversions in genera such as Anemone, Adonis, and Clematis53. Besides, the work of54 is in line with our study that within Ranunculeae species, the plastome gene orders align with those of numerous other genera (e.g., Aconitum, Thalictrum), and no occurrences of gene inversions or translocations have been observed.
Plastome sequences among family Ranunculaceae species show significant genetic divergence, as documented in prior research55. Aligned sequences indicate substantial differentiation, particularly in noncoding regions and SSC and LSC regions. Nucleotide diversity (PI) shows the extent of variation in DNA sequences, providing insights into the genetic diversity within a species56. Nucleotide diversity (PI) values were higher in the chloroplast genes of N. sativa and its related species within the LSC and SSC regions compared to the IR region. This observation is consistent with findings in other angiosperms57,58. Our findings indicate that the plastome of N. sativa exhibits a high degree of sequence similarity with N. damascena species because both belong to the same genus. Nevertheless, there are regions where the identity is relatively lower in comparison. In contrast, the other nine plastomes display substantial sequence divergence from N. sativa. We compared the N. sativa plastome with seven other sequenced species, excluding A. raddeana and A. glaucifolium, due to their higher divergence. Through sliding window analysis, we identified 12 divergent hotspot regions, including trnH-GUG-psbA (0.12), matK-trnQ-UUG (0.13), psbK-trnR-UCU (0.1), atpF-atpI (0.12), rpoB-psbD (0.19), ycf3-ndhJ (0.22), ndhC-cemA (0.31), petA-psaJ (0.24), trnN-GUU-ndhF (0.23), trnV-GAC-rps12 (0.17), and ycf2-trnI-CAU (0.092) and ndhA-ycf1 (0.27). The significantly divergent regions identified here offer valuable insights for developing molecular markers in plant identification and exploring phylogenetic relationships of N. sativa and related species. The detection of these positively selected sites such as atpF-atpI, rpoB-psbD, ycf3-ndhJ, ndhC-cemA, and petA-psaJ suggests that these regions have undergone adaptations to environmental stressors59. The identification and classification of Ranunculaceae species are crucial for understanding their evolutionary relationships and ecological roles59,60. The previous research revealed that the combination of markers such as ndhC-trnV-UAC, psbE-petL, rps8-rpl14, petN-psbM, atpF-atpI, trnT-GGU-psbD, rpl32-trnL-UAG, rpl16-rps3, rps16-trnQ-UUG, ndhG-ndhI, accD-psaI, trnG-GCC-trnfM-CAU, trnT-UGU-trnL-UAA, psbZ-trnG-GCC, and trnK-UUU-rps16 resulted in a 100% species identification rate, which is significantly higher than the rates achieved by individual markers59,60,61,62. The study also revealed that the use of combination markers can identify seven-fold more variant sites than conventional single-specific barcode markers Kim et al. This observation aligns with previous findings in the Ranunculaceae family, where over 20 divergent hotspot regions were identified59. Similarly, nine divergent hotspot regions in seven species of Pulsatilla (Ranunculaceae) were identified previously, including six intergenic spacer regions (rps4-rps16, rps16-matK, ndhC-trnV, psbE-petL, ndhD-ccsA and ccsA-ndhF) and four protein-coding regions (ycf1, ndhF and ndhI)60. These findings underscore the value of using multiple markers to account for the varying rates of nucleotide variation across different loci. The use of these combined markers can be particularly advantageous for identifying closely related species, where individual markers may not be sufficient to distinguish between them. The most effective multi-locus barcode for identifying Pulsatilla species from the Ranunculaceae family was found to be cpDNA barcodes like rbcL, matK and trnH-psbA in earlier research60. Furthermore, ycf1 gene was also found the most efficient barcode in Aconitum species identification61.
Additionally, our findings indicate that Angiosperms tend to accumulate variations at the genus level in the LSC and SSC regions of the plastome. This pattern is consistent with the distribution of variations reported in the plastomes of other genera, such as Cymbidium, Oenothera, and Pyrus63. Moreover, the observed distribution of divergence regions, predominantly in the LSC and SSC regions, aligns with previous reports on Chaenomeles and Lancea species64,65. Previously, five types of plastome were identified based on distinctions in the LSC region. N. damascena (Type I) represents an ancestral condition. A. raddeana and A. glaucifolium exhibit the second type (Type II) with a unique gene arrangement pattern involving inversions. Likewise, A. coerulea (Type V) features an inversion between accD and clpP1, distinguishing it from Type I chloroplast genomes. In the Ranunculaceae, the Type I plastome is considered the most primitive. According to39, all other types have originated from Type I through the inversion of different genes.
The concept of codon usage bias (CUB) refers to the differential frequency with which various synonymous codons encoding the same amino acid are observed in the coding sequences of a given organism’s genome48. CUB preferences are specific to different genes in different species and can even vary within a particular species. This variability is shaped by a combination of factors, including mutation, selection, and genetic drift, which act during the long-term evolution of genes and species66. In our study, we examined the codon usage frequency of protein-coding genes in the N. sativa plastome, among all phenylalanine had the highest codons (1982). Additionally, 35 codons analyzed exhibited a relative synonymous codon usage (RSCU) greater than 1 while the most favored codon was AGA, encoding arginine, with an RSCU value of 1.78.
The plastome of higher plants is known for its high degree of conservation. However, variations in genome length between species do arise due to the dynamic processes of extension and contraction occurring in the IR, LSC, and SSC regions67,68,69,70,71. Throughout plastome evolution, the IR region undergoes dynamic changes involving expansion and contraction, with genes entering either the IR region or the LSC and SSC regions72. We thoroughly compared 11 species, examining the two IRs and the two single-copy regions. In N. sativa, a notable contraction was observed in the IRs, while only a slight expansion was noted in the SSC region due to the shifting of rpl2 and ycf1 genes, leading to a shortened plastome length (Fig. 7). On the contrary, in P. anemonoides, there is an extension in the IR region. The larger genome size of this species might be due to the rps19 gene entering the junction of the LSC and IR borders, and 107 bp appeared in the IR region and was duplicated. Similarly, A. raddeana and A. glaucifolium exhibit expanded IR regions with placed genes infA, rps8, rpl2, ycf1, and rpl36 extending to the JLB Junction. Additionally, rps11 and rps4 genes are situated in the LSC region, contributing to increased genome size. The expanded genome size in A. coerulea results from LSC region enlargement, while SSC and IR regions simultaneously contract. This aligns with previous research indicating significant structural changes in land plant plastomes, including IR region loss or specific gene families73. The events of expansion and contraction in IRs are crucial in evolution as they can lead to alterations in gene content and plastome size47,74. The expansion of IRs has been documented in Araceae74,75. In certain cases, the LSC region expands while the SSC region decreases, reaching a size of only 7000 bp in Pothos76. The expansion and contraction of IR regions can result in the duplication or conversion of certain genes from duplicate to a single copy, respectively47,74. Modifications in IR size can also prompt rearrangements of genes in the SSC region, as recently observed in Zantedeschia74.
Long repeats are crucial contributors to the complete plastome’s variation, expansion, and rearrangement77. N. sativa was found to have approximately 48 long repeats. In comparison, the long repeats in these plastomes ranged from 46 (A. coerulea) to 50 (A. raddeana, A. macrophylla, A. angustius). The SSRs and long repeats in the 11 plastomes showed considerable variation. SSRs were mainly present in the non-coding region, and their sequence variation was higher compared to the coding region78. Additionally, SSRs can be employed for studying conservation genetics in endangered plant species, molecular identification, and exploring genetic relationships among related species79,80. The analysis of SSRs in the plastome of N. sativa revealed variations in the number of SSRs among 11 species, ranging from 24 (A. raddeana) to 65 (P. anemonoides). Mononucleotide repeats are the most common, followed by dinucleotide repeats, and the prevalent motifs across all species are A and T. Our results align with previous reports indicating that mononucleotide and dinucleotide repeats were the most and second most abundant SSRs in the plastomes of two Caldesia species81. Additionally, our findings are in line with earlier research suggesting that SSRs in plastome predominantly consist of polythymine (polyT) or polyadenine (polyA) repeats and less frequently contain tandem cytosine (C) and guanine (G) repeats82. This consistency supports the previous observation that plastome SSRs are primarily dominated by ‘A’ or ‘T’ mononucleotide repeats83,84.
The current classification of Ranunculaceae, as proposed by85, relies on a comprehensive analysis that combines both morphological and molecular phylogenetic data. This classification results from examining 6957 molecular characters and 65 morphological characters. In this proposed classification, Ranunculaceae is categorized into five monophyletic subfamilies: Glaucidioideae, Hydrastidoideae, Coptidoideae, Thalictroideae, and Ranunculoideae. The Ranunculoideae subfamily is further subdivided into ten strongly supported monophyletic tribes. The findings of our study align with previous research, supporting Glaucidium as the first diverging taxon and sister to all other Ranunculaceae species85,86,87. Our results are consistent with the findings of85, indicating that Hydrastis is the second diverging taxon with robust support, and Coptidoideae represents the third diverging clade. In earlier studies, the position of Nigelleae within the Ranunculaceae family has been inconsistent. However, a previous analysis of plastomes from 38 Ranunculaceae species found that Nigelleae is closely related to Delphineae. This relationship was strongly supported by a bootstrap value (100), providing robust evidence for the clustering of Nigelleae and Delphineae in the same clade88. Furthermore, based on 77 protein-coding genes and four rRNA genes, the analysis revealed that Caltheae is the sister group to Asteropyreae. In turn, Asteropyreae is identified as the sister group to the combined clade of Caltheae, Delphinieae, and Nigelleae39. Nevertheless, our findings align with the research conducted by89,90, where they identified Nigellaea as the sister group to Cimicifugeae. Similar results about Nigelleae were reported previously91. Furthermore, in line with our study, they also identified the sister relationship between the subfamilies Talictroideae and Adonideae. Moreover, in our research, the strongest supported grouping (with a bootstrap value of 100) among tribes of Ranunculoideae is the sister group relationship between Anemoneae and Ranunculeae. This finding is consistent with results from previous studies, providing additional confirmation to the observed relationship between these two tribes85,92,93,94. The data obtained from our study offers valuable insights for future genetic and evolutionary investigations of N. sativa and the broader Ranunculaceae family.
Conclusions
In conclusion, the sequencing and comparative analysis of the complete plastome of N. sativa were conducted for the first time, and the results were compared with those of other related species. The comparison highlighted the conservation of the overall structure in the available complete plastome of N. sativa. However, notable variations were observed in gene order, and certain structural changes were identified, primarily caused by the expansion or contraction of the IR regions into or out of adjacent single-copy regions. The comparative analysis of plastome N. sativa and other studied plants unveiled highly variable regions, including trnH-GUG-psbA, matK-trnQ-UUG, psbK-trnR-UCU, atpF-atpI, rpoB-psbD, ycf3-ndhJ, ndhC-cemA, petA-psaJ, trnN-GUU, ndhF, trnV-GAC-rps12, and ycf2-trnI-CAU. These regions are identified as fast-evolving loci and show promise as molecular markers in future studies. SSRs and long repeat sequences were identified in terms of number and types, providing potential and effective options for developing molecular markers. The phylogenetic analysis showed that N. sativa forms the same clade as N. damascene with a high bs value (100). However, this tribe is a successive sister to the Cimicifugeae tribe with strong support. The thorough analysis of these complete plastomes contributes valuable insights to conserving medicinal resources, understanding genetic diversity, exploring genome evolution and adaptation history, and investigating the phylogenetic relationships of N. sativa plants.
Materials and methods
The fresh leaves were collected from N. sativa cultivate in Agriculture Research Center, KPK, Pakistan and transported in liquid nitrogen to the − 80 °C facility. The specimens were submitted to the Agriculture Research Center KP, Pakistan herbarium center under the voucher numbers AGN-NG1 (N. sativa). Dr. Muhammad Waqas one of the leading agronomists at the Agriculture Research Center KPK, Pakistan, identified the plants. The plant samples were collected and processed per the national guidelines and legislation. Hence, a permission permits (NJ334/15/78) was obtained from the Environmental Protection Agency, Khyber Pakhtunkhwa, Pakistan.
DNA extraction and sequencing
To extract high-quality DNA from young and immature leaves of N. sativa, we employed a meticulous process. Firstly, the leaves were finely ground into a fine powder using liquid nitrogen. This method ensured that the DNA would be released from the cells effectively. To isolate the DNA, we utilized the highly reliable DNeasy Plant Mini Kit from Qiagen (Valencia, CA, USA). This kit provided us with a robust and efficient method for DNA extraction from plant samples. The kit's protocol was followed carefully to obtain high-quality DNA. Once the DNA was successfully isolated, we proceeded to sequence the chloroplast DNA using an Illumina HiSeq-2000 platform at Macrogen (Seoul, Korea). This cutting-edge sequencing platform allowed us to generate a vast number of raw reads for N. sativa, specifically around 578,630,881 raw reads. However, to ensure the reliability and accuracy of our analysis, we needed to filter out low-quality sequences. To achieve this, we implemented a stringent filtering criterion based on a Phred score of less than 30. This quality control step eliminated any reads that did not meet the desired threshold, ensuring that only high-quality sequences were retained for further analysis. To assemble the plastome with precision, we employed two different methods. Firstly, we utilized the GetOrganelle v 1.7.5 pipeline95, which is a sophisticated tool specifically designed for plastome assembly. Additionally, we also employed SPAdes version 3.10.1 (http://bioinf.spbau.ru/spades) as an assembler to enhance the accuracy and reliability of the assembly process.
Genome annotation
The annotation process of the plastome involved several steps using established tools and software. CpGAVAS296 and GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html), widely recognized online tools for genome annotation, were utilized to carry out the initial annotation. Additionally, tRNAscan-SE97, a well-established program, was employed to identify tRNA genes within the plastomes. To ensure the accuracy of the annotations, a comparative analysis was conducted by comparing the plastomes with reference genomes using Geneious Pro v.10.2.398 and tRNAs can-SE (v.1.21)97. This step allowed for the identification of start and stop codons, determination of intron boundaries, and implementation of manual alterations when necessary. To visualize the structural features of the plastomes, chloroplot, a powerful tool99, was used. Furthermore, the genomic divergence was assessed using mVISTA in shuffle-LAGAN mode, with the plastome of N. sativa serving as the reference55. In the N. sativa plastome, the average pairwise sequence divergence with ten related species (N. damascena, A. asiatica, A. angustius, A. raddeana, A. coerulea, A. glaucifolium, P. anemonoides, L. fumarioides, D. fargesii and A. macrophylla) was determined. We extensively compared gene order and performed multiple sequence alignment. This allowed us to employ comparative sequence analysis to identify any missing or unclear gene annotations. For whole genome alignment, we used MAFFT version 7.222 with default parameters100. Pairwise sequence divergence was calculated using Kimura’s two-parameter (K2P) model. This approach ensured an accurate assessment of the genetic data. In our analysis, we employed the DnaSP software version 6.13.03101 to perform a sliding window analysis with a window size of 200 bp and a step size of 100 bp. This analysis allowed us to calculate nucleotide variations, specifically the nucleotide diversity (Pi). To visualize the shared genes and gene divergence among different species plastomes, we utilized the heatmap2 package in the R software. Additionally, we created a synteny plot using the pyGenomeViz version 0.2.1 package, employing the pgv-mmseqs mode and setting an identity threshold of 50%. The relevant source for pyGenomeViz can be found on GitHub at the following URL: https://github.com/moshi4/pyGenomeViz.
Characterization of repetitive sequences and SSRs
We identified various functional repetitive sequences within the plastomes of N. sativa and 10 other species belonging to the Ranunculaceae family. We identified palindromic, forward, and reverse repeat sequences using the online tool REPuter102. The analysis was conducted with conditions specifying a minimum repeat size of 8 base pairs and a maximum of 50 computed repeats. Likewise, the MISA software103 was employed to calculate simple sequence repeats (SSRs) under specific conditions: ≥ 8 repeat units for one base pair repeats, ≥ 6 repeat units for two base pair repeats, ≥ 4 repeat units for 3 and 4 base pair repeats, and ≥ three repeat units for 5 and 6 base pair repeats. Moreover, tandem repeats were computed using the online tool Tandem Repeats Finder v.4.09104.
Genome divergence
We assessed the variation in shared protein-coding genes and complete plastomes among N. sativa and its related species. A comparative analysis was executed through multiple sequence alignment, wherein the examination and analysis of gene order were undertaken to enhance the precision of deficient and ambiguous gene annotations. Plastome annotations were conducted using MAFFT version 7.222100, employing default parameters. Pairwise sequence divergence was calculated utilizing Kimura’s two-parameter model (K2P)100. We created a synteny plot using the pyGenomeViz version 0.2.1 package, employing the pgv-mmseqs mode and setting an identity threshold of 50%. The relevant source for pyGenomeViz can be found on GitHub at the following URL: https://github.com/moshi4/pyGenomeViz.
Phylogenetic analyses
To determine the phylogenetic position of N. sativa within the family Ranunculaceae, 76 published plastome sequences of Ranunculaceae species were downloaded from the NCBI database for phylogenetic analysis. A comprehensive analysis was conducted using a dataset comprising 73 commonly shared genes among 75 members of the family Ranunculaceae, representing 11 different genera. To ensure accuracy, the nucleotide sequences of these 73 protein-coding genes were aligned and combined using MAFFT, employing the default settings as outlined by105. The best-fitting model of nucleotide evolution, TVM + F + I + G4, was determined by jModelTest 2106. Two distinct approaches were employed to deduce the phylogenetic relationship of N. sativa. Firstly, a Bayesian inference (BI) tree was constructed using Mrbayes 3.12, utilizing the Markov chain Monte Carlo sampling method. Secondly, a maximum likelihood (ML) tree was generated using PAUP* 4.0107. The ML tree was created by running 1000 bootstraps, which provided support values for different nodes. For the BI analysis, a total of four chains were employed: three heated chains and one cold chain. These chains were run for 10,000,000 generations, with a sampling frequency of 1000 and a print frequency of 10,000. To ensure convergence, a burn-in of 2500 (25% of the total number of generations divided by the sampling frequency) was implemented. Finally, a 50% majority-rule consensus tree was derived from the phylogenetic trees generated, and Figtree108 was utilized to visually represent the relationships among the moss species based on their plastome sequences.
Ethics approval and consent to participate
The authors declared that experimental research works on the plant described in this paper comply with institutional, national, and international guidelines. Field studies were conducted in accordance with local legislation and got permission from the provincial department of Forest and Grass of Khyber Pakhtunkhwa Province, Pakistan.
Data availability
All data generated or analyzed during this study are included in this published article. N. sativa plastome was submitted to NCBI with accession number (OR473632).
References
Park, I. et al. The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species. PLoS One 12, e0184257 (2017).
Shaw, J., Lickey, E. B., Schilling, E. E. & Small, R. L. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: The tortoise and the hare III. Am. J. Bot. 94, 275–288 (2007).
Mardanov, A. V. et al. Complete sequence of the duckweed (Lemna minor) chloroplast genome: Structural organization and phylogenetic relationships to other angiosperms. J. Mol. Evol. 66, 555–564 (2008).
Moore, M. J., Soltis, P. S., Bell, C. D., Burleigh, J. G. & Soltis, D. E. Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. Proc. Natl. Acad. Sci. 107, 4623–4628 (2010).
Sun, M., Li, J., Li, D. & Shi, L. Complete chloroplast genome sequence of the medical fern Drynaria roosii and its phylogenetic analysis. Mitochondrial DNA Part B 2, 7–8 (2017).
Asaf, S., Ahmad, W., Al-Harrasi, A. & Khan, A. L. Uncovering the first complete plastome genomics, comparative analyses, and phylogenetic dispositions of endemic medicinal plant Ziziphus hajarensis (Rhamnaceae). BMC Genom. 23, 1–16 (2022).
Jansen, R. K. et al. Methods in Enzymology Vol. 395, 348–384 (Elsevier, 2005).
Walker, J. F., Zanis, M. J. & Emery, N. C. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae). Am. J. Bot. 101, 722–729 (2014).
Doyle, J. J., Doyle, J. L., Ballenger, J. & Palmer, J. The distribution and phylogenetic significance of a 50-kb chloroplast DNA inversion in the flowering plant family Leguminosae. Mol. Phylogenet. Evol. 5, 429–438 (1996).
Tangphatsornruang, S. et al. Characterization of the complete chloroplast genome of Hevea brasiliensis reveals genome rearrangement, RNA editing sites and phylogenetic relationships. Gene 475, 104–112 (2011).
Walker, J. F., Jansen, R. K., Zanis, M. J. & Emery, N. C. Sources of inversion variation in the small single copy (SSC) region of chloroplast genomes. Am. J. Bot. 102, 1751–1752 (2015).
Palmer, J. D., Nugent, J. M. & Herbon, L. A. Unusual structure of geranium chloroplast DNA: A triple-sized inverted repeat, extensive gene duplications, multiple inversions, and two repeat families. Proc. Natl. Acad. Sci. 84, 769–773 (1987).
Tangphatsornruang, S. et al. The chloroplast genome sequence of mungbean (Vigna radiata) determined by high-throughput pyrosequencing: Structural organization and phylogenetic relationships. DNA Res. 17, 11–22 (2010).
Fullerton, S. M., Bernardo Carvalho, A. & Clark, A. G. Local rates of recombination are positively correlated with GC content in the human genome. Mol. Biol. Evol. 18, 1139–1142 (2001).
Smith, N. G., Webster, M. T. & Ellegren, H. Deterministic mutation rate variation in the human genome. Genome Res. 12, 1350–1356 (2002).
Hiratsuka, J. et al. The complete sequence of the rice (Oryza sativa) chloroplast genome: Intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol. Gen. Genet. MGG 217, 185–194 (1989).
Johansson, J. T. There large inversions in the chloroplast genomes and one loss of the chloroplast gene rps 16 suggest an early evolutionary split in the genus Adonis (Ranunculaceae). Plant Syst. Evol. 218, 133–143 (1999).
Jansen, R. K., Wojciechowski, M. F., Sanniyasi, E., Lee, S.-B. & Daniell, H. Complete plastid genome sequence of the chickpea (Cicer arietinum) and the phylogenetic distribution of rps12 and clpP intron losses among legumes (Leguminosae). Mol. Phylogenet. Evol. 48, 1204–1217 (2008).
Yan, M., Moore, M. J., Meng, A., Yao, X. & Wang, H. The first complete plastome sequence of the basal asterid family Styracaceae (Ericales) reveals a large inversion. Plant Syst. Evol. 303, 61–70 (2017).
Tamura, M. Flowering Plants· Dicotyledons: Magnoliid, Hamamelid and Caryophyllid Families 563–583 (Springer, 1993).
Ro, K.-E., Keener, C. S. & McPheron, B. A. Molecular phylogenetic study of the Ranunculaceae: Utility of the nuclear 26S ribosomal DNA in inferring intrafamilial relationships. Mol. Phylogenet. Evol. 8, 117–127 (1997).
Liu, H. et al. Comparative analysis of complete chloroplast genomes of Anemoclema, Anemone, Pulsatilla, and Hepatica revealing structural variations among genera in tribe Anemoneae (Ranunculaceae). Front. Plant Sci. 9, 1097 (2018).
The Angiosperm Phylogeny Group. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot. J. Linn. Soc. 161, 105–121 (2009).
Compton, J. A., Culham, A. & Jury, S. L. Reclassification of Actaea to include Cimicifuga and Souliea (Ranunculaceae): Phytogeny inferred from morphology, nrDNA ITS, and cpDNA trnL-F sequence variation. Taxon 47, 593–634 (1998).
Miikeda, O., Kita, K., Handa, T. & Yukawa, T. Phylogenetic relationships of Clematis (Ranunculaceae) based on chloroplast and nuclear DNA sequences. Bot. J. Linn. Soc. 152, 153–168 (2006).
Falck, D. & Lehtonen, S. Two new names in Clematis (Ranunculaceae). Phytotaxa 163, 58 (2014).
Jiang, N. et al. Phylogenetic reassessment of tribe Anemoneae (Ranunculaceae): Non-monophyly of Anemone sl revealed by plastid datasets. PLoS One 12, e0174792 (2017).
Compton, J. A. & Hedderson, T. A. A morphometric analysis of the Cimicifuga foetida L. complex (Ranunculaceae). Bot. J. Linn. Soc. 123, 1–23 (1997).
Zohary, M. The genus Nigella (Ranunculaceae)—A taxonomic revision. Plant Syst. Evol. 142, 71–105 (1983).
Dönmez, A. A., Aydin, Z. U. & Dönmez, E. O. Taxonomic monograph of the tribe Nigelleae (Ranunculaceae): A group including ancient medicinal plants. Turk. J. Bot. 45, 468–502 (2021).
Tutin, T. & Akeroyd, J. Nigella. Flora Europaea 1, 209–210 (1964).
Raab-Straube, E. V., Hand, R., Hörandl, E. & Nardi, E. Ranunculaceae. Euro+ Med Plantbase–the information resource for Euro-Mediterranean plant diversity. http://ww2.bgbm.org/EuroPlusMed/ (Accessed December 10, 2020) (2014).
Ghosh, A. & Datta, A. K. Karyotyping of Nigella sativa L. (black cumin) and Nigella damascena L. (love-in-a-mist) by image analyzing system. Cytologia 71, 1–4 (2006).
Malhotra, S. Handbook of Herbs and Spices 391–416 (Elsevier, 2012).
Shaker, S. S., Mohammadi, A. & Shahli, M. K. Cytological studies on some ecotypes of Nigella sativa L. in Iran. Cytologia 82, 123–126 (2017).
Birhanu, K., Tileye, F., Yohannes, P. & Said, M. Molecular diversity study of black cumin (Nigella sativa L.) from Ethiopia as revealed by inter simple sequence repeat (ISSR) markers. Afr. J. Biotechnol. 14, 1543–1551 (2015).
Mirzaei, K. & Mirzaghaderi, G. Genetic diversity analysis of Iranian Nigella sativa L. landraces using SCoT markers and evaluation of adjusted polymorphism information content. Plant Genet. Resour. 15, 64–71 (2017).
Sun, Y. et al. Complete plastome sequencing of both living species of Circaeasteraceae (Ranunculales) reveals unusual rearrangements and the loss of the ndh gene family. BMC Genomics 18, 1–10 (2017).
Zhai, W. et al. Chloroplast genomic data provide new and robust insights into the phylogeny and evolution of the Ranunculaceae. Mol. Phylogenet. Evol. 135, 12–21. https://doi.org/10.1016/j.ympev.2019.02.024 (2019).
Hirao, T., Watanabe, A., Kurita, M., Kondo, T. & Takata, K. Complete nucleotide sequence of the Cryptomeria japonica D. Don chloroplast genome and comparative chloroplast genomics: Diversified genomic structure of coniferous species. BMC Plant Biol. 8, 70. https://doi.org/10.1186/1471-2229-8-70 (2008).
Zeng, S. et al. The complete chloroplast genome sequences of six Rehmannia species. Genes 8, 103 (2017).
Rønsted, N., Law, S., Thornton, H., Fay, M. F. & Chase, M. W. Molecular phylogenetic evidence for the monophyly of Fritillaria and Lilium (Liliaceae; Liliales) and the infrageneric classification of Fritillaria. Mol. Phylogenet. Evol. 35, 509–527 (2005).
Xia, C., Wang, M., Guan, Y. & Li, J. Comparative analysis of the chloroplast genome for Aconitum species: Genome structure and phylogenetic relationships. Front. Genet. 13, 878182. https://doi.org/10.3389/fgene.2022.878182 (2022).
Tang, Y., Yukawa, T., Bateman, R. M., Jiang, H. & Peng, H. Phylogeny and classification of the East Asian Amitostigma alliance (Orchidaceae: Orchideae) based on six DNA markers. BMC Evol> Biol. 15, 1–32 (2015).
Palmer, J. D. Plastid chromosomes: Structure and evolution. Mol. Biol. Plastids 7, 5–53 (1991).
Henriquez, C. L. et al. Molecular evolution of chloroplast genomes in Monsteroideae (Araceae). Planta 251, 1–16 (2020).
Mehmood, F. et al. Chloroplast genome of Hibiscus rosa-sinensis (Malvaceae): Comparative analyses and identification of mutational hotspots. Genomics 112, 581–591 (2020).
Qian, W., Yang, J.-R., Pearson, N. M., Maclean, C. & Zhang, J. Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet. 8, e1002603 (2012).
Park, S., An, B. & Park, S. Recurrent gene duplication in the angiosperm tribe Delphinieae (Ranunculaceae) inferred from intracellular gene transfer events and heteroplasmic mutations in the plastid matK gene. Sci. Rep. 10, 2720 (2020).
Sinn, B. T., Sedmak, D. D., Kelly, L. M. & Freudenstein, J. V. Total duplication of the small single copy region in the angiosperm plastome: Rearrangement and inverted repeat instability in Asarum. Am. J. Bot. 105, 71–84 (2018).
Yang, J.-B., Tang, M., Li, H.-T., Zhang, Z.-R. & Li, D.-Z. Complete chloroplast genome of the genus Cymbidium: Lights into the species identification, phylogenetic implications and population genetic analyses. BMC Evol. Biol. 13, 1–12 (2013).
Wu, C.-S., Wang, Y.-N., Hsu, C.-Y., Lin, C.-P. & Chaw, S.-M. Loss of different inverted repeat copies from the chloroplast genomes of Pinaceae and cupressophytes and influence of heterotachy on the evaluation of gymnosperm phylogeny. Genome Biol. Evol. 3, 1284–1295 (2011).
Hoot, S. B. & Palmer, J. D. Structural rearrangements, including parallel inversions, within the chloroplast genome of Anemone and related genera. J. Mol. Evol. 38, 274–281. https://doi.org/10.1007/BF00176089 (1994).
Ji, J. et al. Complete plastid genomes of nine species of Ranunculeae (Ranunculaceae) and their phylogenetic inferences. Genes 14, 2140 (2023).
Frazer, K. A., Pachter, L., Poliakov, A., Rubin, E. M. & Dubchak, I. VISTA: Computational tools for comparative genomics. Nucleic Acids Res. 32, W273–W279. https://doi.org/10.1093/nar/gkh458 (2004).
Akhunov, E. D. et al. Nucleotide diversity maps reveal variation in diversity among wheat genomes and chromosomes. BMC Genomics 11, 1–22 (2010).
Wang, C. et al. Complete chloroplast genome sequence of Sonchus brachyotus helps to elucidate evolutionary relationships with related species of Asteraceae. BioMed Res. Int. 2021, 9410496 (2021).
Zhang, Y. et al. Complete chloroplast genome analysis of two important medicinal Alpinia species: Alpinia galanga and Alpinia kwangsiensis. Front. Plant Sci. 12, 705892 (2021).
Kim, K.-R. et al. Complete chloroplast genome determination of Ranunculus sceleratus from Republic of Korea (Ranunculaceae) and comparative chloroplast genomes of the members of the Ranunculus genus. Genes 14, 1149 (2023).
Zhang, T. et al. Comparative analysis of the complete chloroplast genome sequences of six species of Pulsatilla Miller, Ranunculaceae. Chin. Med. 14, 53. https://doi.org/10.1186/s13020-019-0274-5 (2019).
Cossard, G. et al. Subfamilial and tribal relationships of Ranunculaceae: Evidence from eight molecular markers. Plant Syst. Evol. 302, 419–431. https://doi.org/10.1007/s00606-015-1270-6 (2016).
He, J. et al. Structural variation of the complete chloroplast genome and plastid phylogenomics of the genus Asteropyrum (Ranunculaceae). Sci. Rep. 9, 15285. https://doi.org/10.1038/s41598-019-51601-2 (2019).
Korotkova, N., Nauheimer, L., Ter-Voskanyan, H., Allgaier, M. & Borsch, T. Variability among the most rapidly evolving plastid genomic regions is lineage-specific: Implications of pairwise genome comparisons in Pyrus (Rosaceae) and other angiosperms for marker choice. PLoS One 9, e112998. https://doi.org/10.1371/journal.pone.0112998 (2014).
Sun, J. et al. Evolutionary and phylogenetic aspects of the chloroplast genome of Chaenomeles species. Sci. Rep. 10, 11466 (2020).
Chi, X., Wang, J., Gao, Q., Zhang, F. & Chen, S. The complete chloroplast genomes of two Lancea species with comparative analysis. Molecules 23, 602 (2018).
Parvathy, S. T., Udayasuriyan, V. & Bhadana, V. Codon usage bias. Mol. Biol. Rep. 49, 539–565. https://doi.org/10.1007/s11033-021-06749-4 (2022).
Wang, R.-J. et al. Dynamics and evolution of the inverted repeat-large single copy junctions in the chloroplast genomes of monocots. BMC Evol. Biol. 8, 36. https://doi.org/10.1186/1471-2148-8-36 (2008).
Maréchal, A. & Brisson, N. Recombination and the maintenance of plant organelle genome stability. New Phytol. 186, 299–317. https://doi.org/10.1111/j.1469-8137.2010.03195.x (2010).
Rao, R. et al. The complete chloroplast genome of Ranunculus yunnanensis (Ranunculaceae). Mitochondrial DNA Part B 7, 60–61. https://doi.org/10.1080/23802359.2021.2002211 (2022).
Raubeson, L. A. et al. Comparative chloroplast genomics: Analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus. BMC Genomics 8, 174. https://doi.org/10.1186/1471-2164-8-174 (2007).
Marcel, D., Sidonie, B., Sylwia, S., Hanno, S. & Aurélien, T. Mutation rates in seeds and seed-banking influence substitution rates across the angiosperm phylogeny. bioRxiv https://doi.org/10.1101/156398 (2017).
Huang, H., Shi, C., Liu, Y., Mao, S.-Y. & Gao, L.-Z. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: Genome structure and phylogenetic relationships. BMC Evol. Biol. 14, 1–17 (2014).
Daniell, H., Lin, C.-S., Yu, M. & Chang, W.-J. Chloroplast genomes: Diversity, evolution, and applications in genetic engineering. Genome Biol. 17, 1–29 (2016).
Henriquez, C. L. et al. Evolutionary dynamics of chloroplast genomes in subfamily Aroideae (Araceae). Genomics 112, 2349–2360 (2020).
Wang, W. & Messing, J. High-throughput sequencing of three Lemnoideae (duckweeds) chloroplast genomes from total DNA. PLoS One 6, e24670 (2011).
Abdullah, et al. Complete chloroplast genomes of Anthurium huixtlense and Pothos scandens (Pothoideae, Araceae): Unique inverted repeat expansion and contraction affect rate of evolution. J. Mol. Evol. 88, 562–574 (2020).
Asaf, S. et al. Chloroplast genomes of Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea: Structures and comparative analysis. Sci. Rep. 7, 7556 (2017).
Powell, W., Morgante, M., McDevitt, R., Vendramin, G. & Rafalski, J. Polymorphic simple sequence repeat regions in chloroplast genomes: Applications to the population genetics of pines. Proc. Natl. Acad. Sci. 92, 7759–7763 (1995).
Clark, C. M., Wentworth, T. R. & O’Malley, D. M. Genetic discontinuity revealed by chloroplast microsatellites in eastern North American Abies (Pinaceae). Am. J. Bot. 87, 774–782 (2000).
Huang, J. et al. Development of chloroplast microsatellite markers and analysis of chloroplast diversity in Chinese jujube (Ziziphus jujuba Mill.) and wild jujube (Ziziphus acidojujuba Mill.). PLoS One 10, e0134519 (2015).
Mwanzia, V. M. et al. The complete chloroplast genomes of two species in threatened monocot genus Caldesia in China. Genetica 147, 381–390 (2019).
Yi, X., Gao, L., Wang, B., Su, Y.-J. & Wang, T. The complete chloroplast genome sequence of Cephalotaxus oliveri (Cephalotaxaceae): Evolutionary comparison of Cephalotaxus chloroplast DNAs and insights into the loss of inverted repeat copies in gymnosperms. Genome Biol. Evol. 5, 688–698 (2013).
Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E. & Tabata, S. Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 6, 283–290 (1999).
Qian, J. et al. The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza. PLoS One 8, e57607 (2013).
Wang, W., Lu, A.-M., Ren, Y., Endress, M. E. & Chen, Z.-D. Phylogeny and classification of Ranunculales: Evidence from four molecular loci and morphological data. Perspect. Plant Ecol. Evol. Syst. 11, 81–110 (2009).
Kim, Y.-D., Kim, S.-H., Kim, C. H. & Jansen, R. K. Phylogeny of Berberidaceae based on sequences of the chloroplast gene ndhF. Biochem. Syst. Ecol. 32, 291–301 (2004).
Soltis, D. E. et al. Gunnerales are sister to other core eudicots: Implications for the evolution of pentamery. Am. J. Bot. 90, 461–470 (2003).
Park, J. M., Oh, A. & Koo, J. Complete chloroplast genome sequence of Eranthis byunsanensis BY Sun (Ranunculaceae), an endemic species in Korea. Mitochondrial DNA Part B 8, 570–574 (2023).
Johansson, J. T. & Jansen, R. Chloroplast DNA variation and phylogeny of the Ranunculaceae. Plant Syst. Evol. 187, 29–49 (1993).
Johansson, J. T. Systematics and Evolution of the Ranunculiflorae 253–261 (Springer, 1995).
Cossard, G. et al. Subfamilial and tribal relationships of Ranunculaceae: Evidence from eight molecular markers. Plant Syst. Evol. 302, 419–431 (2016).
Hoot, S. B., Kramer, J. & Arroyo, M. T. Phylogenetic position of the South American dioecious genus Hamadryas and related Ranunculeae (Ranunculaceae). Int. J. Plant Sci. 169, 433–443 (2008).
Hoot, S. B. Systematics and Evolution of the Ranunculiflorae 241–251 (Springer, 1995).
Wang, W., Hu, H., Xiang, X.-G., Yu, S.-X. & Chen, Z.-D. Phylogenetic placements of Calathodes and Megaleranthis (Ranunculaceae): Evidence from molecular and morphological data. Taxon 59, 1712–1720 (2010).
Jin, J.-J. et al. GetOrganelle: A simple and fast pipeline for de novo assembly of a complete circular chloroplast genome using genome skimming data. BioRxiv 4, 256479 (2018).
Shi, L. et al. CPGAVAS2, an integrated plastome sequence annotator and analyzer. Nucleic Acids Res. 47, W65–W73 (2019).
Schattner, P., Brooks, A. N. & Lowe, T. M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 33, W686–W689 (2005).
Kearse, M. et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Zheng, S., Poczai, P., Hyvönen, J., Tang, J. & Amiryousefi, A. Chloroplot: An online program for the versatile plotting of organelle genomes. Front. Genet. 11, 1123 (2020).
Katoh, K. & Toh, H. Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics 26, 1899–1900 (2010).
Librado, P. & Rozas, J. DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452 (2009).
Kurtz, S. et al. REPuter: The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 29, 4633–4642 (2001).
Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: A web server for microsatellite prediction. Bioinformatics 33, 2583–2585 (2017).
Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: More models, new heuristics and parallel computing. Nat. Methods 9, 772–772 (2012).
Wilgenbusch, J. C. & Swofford, D. Inferring evolutionary trees with PAUP. Curr. Protoc. Bioinform. Chapter 6, Unit 6.4 (2003).
Rambaut, A. FigTree v1. 3.1. http://tree.bio.ed.ac.uk/software/figtree/ (2009).
Acknowledgements
This work was carried out with the support of “Cooperative Research Program for Agriculture Science and Technology Development (Project No. RS-2024-00348677)” Rural Development Administration, Republic of Korea.
Author information
Authors and Affiliations
Contributions
‘L,’ ‘SA,’ ‘RJ’ and ‘SAsif’ performed experiments; ‘IK,’ ‘SB’, and ‘SA’ wrote the original draft and Bioinformatics analysis: KMK and AH supervision arranging resources. All authors have read and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lubna, Asaf, S., Khan, I. et al. Genetic characterization and phylogenetic analysis of the Nigella sativa (black seed) plastome. Sci Rep 14, 14509 (2024). https://doi.org/10.1038/s41598-024-65073-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-65073-6
- Springer Nature Limited