Background

Mitochondria are semi-autonomous organelles in eukaryotes. Their primary function is the production of metabolic intermediates and cellular ATP through the citric acid cycle and oxidative phosphorylation pathway. For this reason, mitochondria are involved in a wide variety of cellular and developmental processes including pollen development and cytoplasmic male sterility (CMS) [1, 2]. Mitochondria have their own genomes, which harbor genes for ribosomal RNAs (rRNAs), transfer RNAs (tRNAs) and subunits of the respiratory complexes. Extensive research has been performed to understand organization and function of mitochondrial genomes. To date (September 30, 2012), more than 70 plant mitochondrial genomes have been sequenced, including those of 22 seed plant species (http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=33090&opt=organelle), and of a large number of protists, algae, fungi, and animals. These studies have greatly improved our understanding of mitochondrial gene content, genome size and organization, mutation rate as well as gene shuffling events. The sequenced mitochondrial genomes exhibit significant variation in both size and actual gene content, despite the universally conserved sequence that exists between the mitochondrial genomes of diverse species [3]. The size of sequenced plant mitochondrial genomes varies more than 12-fold among angiosperms, ranging from 208 kbp in white mustard (Brassica hitra) [4] to over 2,700 kbp in muskmelon (Cucumis melo) [5], while the number of genes only varies between 50 and 69 including 30 to 37 protein-coding genes [6]. The significant variation in size of the mitochondrial genome between species is explained by expansion of the inter-genic regions, structural rearrangements and intra- or intermolecular recombination events [7]. Gene shuffling events in higher plant mitochondrial genomes have occurred due to the presence of repeated sequences [8]. In combination with sequence duplication events, this has resulted in a unique diversity of plant mitochondrial genomes [9]. Therefore, the DNA sequence of plant mitochondria has become an important tool in phylogenetics for comparison of the evolutionary relationships among species. In addition, sequencing of the mitochondrial genome has the potential to increase our understanding of the complex genetic interactions between the nuclear and the organellar genomes.

Perennial ryegrass (Lolium perenne L.) is a diploid (2n = 2× = 14) member of the Poaceae family and one of the most important forage and turf grass species of temperate regions worldwide [10]. Its economic importance has led to the establishment of high-density genetic maps as well as genome and transcriptome sequence resources. For example, the complete chloroplast genome sequence has recently been published [11], and assembly of the genome sequence is currently being progressed [12]. However, the complete mitochondrial genome sequence of perennial ryegrass as well as of any other forage and turf grass species was hitherto unknown.

Therefore, the main objective of this study was to sequence, assemble and annotate the perennial ryegrass mitochondrial genome. Specifically, we aimed at (i) describing the organization of the perennial ryegrass mitochondrial genome for future comparative analyses of mitochondrial genomes within Lolium and between closely related grass species, (ii) identifying protein-coding genes, rRNA genes, tRNA genes and open reading frames (ORFs) to understand the function the mitochondrial genome, and (iii) gaining first insights into the mitochondrial transcriptome of perennial ryegrass.

Results

Isolation of intact mitochondria and extraction of mtDNA

A cellular fraction containing crude mitochondria were isolated from perennial ryegrass leaf tissue by homogenization followed by differential centrifugation. Further attempts to purify the mitochondria by Percoll density gradient centrifugation failed. The crude mitochondrial fraction was characterized by measuring the activity and latency of cytochrome c oxidase (CCO) as a marker enzyme for the intactness of the inner mitochondrial membrane, and malate dehydrogenase (MDH), an enzyme residing in the mitochondrial matrix as well as in the cytosol and several other places in the cell [13] (Table 1). The large increase in specific CCO activity indicates that there was a 7.7-fold enrichment of mitochondria from the homogenate to the crude mitochondrial fraction as expected. The latency of CCO measures the ability of the substrate, reduced cytochrome c, to reach the active site of the enzyme on the outer surface of the inner membrane. The high CCO latency in both homogenate and crude mitochondria indicates that the outer membrane was mainly intact [14, 15]. Only a small fraction of MDH activity co-purified with the mitochondria, but its latency increased dramatically (3.5-fold) indicating that the major part of the lost MDH had been present outside the mitochondria and that the crude mitochondria contain most of its MDH behind the permeability barrier of an intact inner membrane [16]. Thus, the crude mitochondrial fraction contained mainly intact mitochondria (90%), in which the mtDNA was protected inside intact outer and inner membranes (Table 1).

Table 1 Characterization of mitochondrial enrichment and intactness

Subsequently, contaminating nuclear and chloroplast DNA was removed by treating the crude mitochondrial fraction with DNAse. The isolation of mtDNA from intact mitochondria from 60 g (two batches of 30 g) fresh weight leaves resulted in 3.5 μg mtDNA.

Sequencing and assembly of the perennial ryegrass mitochondrial genome

A total of 287,367 single reads were generated with a mean length of 403 bp (approximately 116 Mbp of sequence information) from the mitochondrial genome of the perennial ryegrass genotype F1-30 using Roche 454 GS-FLX Titanium sequencing (Table 2). This resulted in a 167-fold coverage of the mitochondrial genome. The contaminating chloroplast sequence reads were removed by performing a reference assembly against the chloroplast genome (GenBank Acc. No.: NC_009950.1). The isolated mitochondrial DNA was contaminated with approximately 2% chloroplast DNA (Table 2).

Table 2 Summary of the perennial ryegrass mitochondrial genome sequencing and assembly

The initial assembly generated 2,403 contigs totaling 1.7 Mbp with an average length of 703 nucleotides. The longest contig was 219,170 bp, and the shortest contig was 116 bp. A BLASTn search was performed against the nucleotide collection of NCBI aiming to remove contaminating contigs. Nine out of 2,403 contigs were identified as plant mitochondrial DNA sequences with a mean length of 80,314 bp (total size 722,827 bp). The remaining 2,394 contigs corresponding to 0.83% of the 116 Mbp total sequence information were contaminating sequences which was discarded. These contigs were mainly single read sequences related to other organisms (Table 2). These nine contigs of the initial assembly could not be further arranged into a single circular molecule mainly for two reasons. Firstly, there were cases of misassembled contigs, and secondly there were cases where repetitive sequences led to a breakdown of the assembly process. In order to further resolve the arrangement of the nine contigs, we made use of the sequence information available from a perennial ryegrass nuclear genome sequencing project that is ongoing in our lab, which contained assembled contigs and scaffolds originating from the mitochondrial genome. This assembly included mate-pair Illumina libraries with insert sizes of up to 9 kbp, which helped us to predict the order and orientation of a number of the nine mitochondrial contigs. This was then followed by a process of designing primers to span contigs, followed by Sanger sequencing to fill in the gaps and, ultimately, merge the contigs (Figure 1). The complete nucleotide sequence of the mitochondrial genome of perennial ryegrass has been deposited to GenBank under the accession number JX 999996.

Figure 1
figure 1

Map of the perennial ryegrass mitochondrial genome. Protein, tRNA and rRNA-coding genes are shown inside and outside the circles. The second outer circle represents the circular master molecule. Genes and exons are indicated by arrowheads, p indicates a pseudogene. The forward and reverse DNA strands are shown in clockwise and anticlockwise orientation, respectively. The middle black peaked circle represents the G+C content of the master molecule. The inner circle shows the size markers in kbp in clockwise orientation. The first nucleotide of the cox1 gene is the starting point of the circular master molecule. This figure was generated using the CGView Server [17].

Features of the perennial ryegrass mitochondrial genome

The genome size was 678,580 bp with a G+C content of 44.1%. Annotation of the mitochondrial genome was performed and a total of 73 genes including protein-coding genes, rRNA genes, tRNA genes as well as 149 ORFs were identified. These regions account for 21.03% of the genome (Figure 1, Table 3, Table 4 and Additional file 1: Table S1).

Table 3 Main features of the assembled perennial ryegrass mitochondrial genome
Table 4 Genes identified in the perennial ryegrass mitochondrial genome

Protein-coding genes

The perennial ryegrass mitochondrial genome contains 39 genes encoding 34 different proteins including one pseudogene, rps4-p (Table 4). The genes encode 19 proteins of the electron transport chain. They include nine subunits of complex I: NADH dehydrogenase subunits 1, 2, 3, 4, 4L, 5, 6, 7 and 9 (nad1, 2, 3, 4, 4L, 5, 6, 7 and 9) of which nad7 has two copies; one subunit of complex III: apocytochrome b (cob); three subunits of complex IV: cytochrome c oxidase subunits 1, 2 and 3 (cox1, 2 and 3); five subunits of complex V: ATP synthase F1 subunits 1, 4, 6, 8 and 9 (atp1, 4, 6, 8 and 9). No genes were found to encode subunits of complex II: succinate dehydrogenase subunits 3 and 4 (sdh3 and sdh4). In addition, four genes encode proteins involved in cytochrome c biogenesis: subunits B, C and F (ccmB, C, FN and FC). Two genes, matR and mttB, encode maturase and transport membrane proteins, respectively. The thirteen genes rps1, rps2, rps3, rps4, rps7-1, rps7-2, rps12, rps13, rps14-1 and rps14-2, and rpl5-1, rpl5-2 and rpl16 encode ribosomal proteins. In total, the protein-coding regions cover 5.23% of the mitochondrial genome (Figure 1, Table 3 and Table 4).

RNA-coding genes

The perennial ryegrass mitochondrial genome contains a total of 34 RNA-coding genes: three rRNA genes (present in two copies each) for the ribosomal subunits 18 S, 26 S and 5 S, and 28 tRNA genes including one pseudogene, tRNA-PheGAA (Table 4). The anticodon of 28 tRNA genes match the codons of a total of 14 amino acids. The RNA-coding genes represent 1.95% of the mitochondrial genome (Figure 1, Table 3). The length of the rRNA genes range from 122 to 3,461 nucleotides, and the tRNA genes range from 71 to 88 nucleotides (Table 4). No tRNA genes were found for the amino acids Alanine (Ala), Arginine (Arg), Glysine (Gly), Isoleucine (Ile) Threonine (Thr), and Valine (Val) in the perennial ryegrass mitochondrial genome (Table 5).

Table 5 Differences in the tRNA gene content in sequenced mitochondrial genomes of grasses

Introns

Among the 39 protein-coding genes, only nine (nad1, nad2, nad4, nad5, nad7-1, nad7-2, cox2, ccmFC and rps3) contain introns. All the introns in sequenced mitochondrial genomes are classified as group II introns [18]. A total of 26 group II introns were found within the nine protein-coding genes including four trans-spliced introns that are part of nad1 and nad2 (Table 4). In total, twenty-two cis-spliced introns are present in nad1, nad2, nad4, nad5, nad7, cox2, ccmFC and rps3. Among 28 tRNA genes, one tRNA gene trnLCAA contains an intron.

Open reading frames (ORFs)

An ORF may be defined as an in-frame DNA sequence of 300 bp or longer that is bordered by a start and stop codon, with no match to a coding sequence in the public databases [19]. In the perennial ryegrass mitochondrial genome, we found 149 ORFs with a minimum and maximum length of 303 and 2,571 bp, respectively, and with a mean length of 452 bp covering 9.93% of the genome (Table 3, Additional file 1: Table S1).

Repetitive regions and their gene content

A variety of repetitive DNA sequences were found in the perennial ryegrass mitochondrial genome. There are four pairs of large inverted repeat (IR) sequences, with repeat lengths of 50,267, 30,833, 24,985 and 1,534 bp, as well as one large directly repeated (DR) sequence of 8,558 bp (Figure 2), with 99% sequence identity. Overall, these five large repeats account for 17.12% of the mitochondrial genome. The genes, nad7, rps7, rps14, rpl5, rrn5, rrn18, rrn26, trnD, trnC, trnQ, trnK, trnM, trnP, trnS and trnW, were found as multiple copies located in the large inverted repeat sequences in the perennial ryegrass mitochondrial genome (Figure 2, Table 4).

Figure 2
figure 2

Dot plot of the perennial ryegrass mitochondrial genome. Five repeats including four large inverted repeat pairs, IR1-IR4 and one direct repeat, DR (green) are marked by arrows. The inverted and direct repeat coordinates in the master molecule are: IR1, (178,334-228,600 bp; 484,825-434,554 bp); IR2, (147,726-178,558 bp; 410,808-379,975 bp); IR3, (106,923-131,907 bp; 361,199-336,212 bp); IR4 (94,022-95,555 bp; 604,126-602,594 bp) and DR (498,590-507,147 bp; 566,604-575,159 bp). This figure was generated using the PipMaker software [20].

A total of 96 pairs of short inverted repeat (SIR) sequences were identified covering 4,886 bp (0.72%) of the mitochondrial genome. Percent matches between SIRs were higher than 68%, with 83 pairs of SIRs showing values higher than 80%, while thirteen pairs of SIRs had a sequence identity of 100%. The average SIR length was 52 bp, and the longest SIR identified was 333 bp (Additional file 2: Table S2).

The mitochondrial genome of perennial ryegrass contained 29 tandem repeats, which covered 1,647 bp corresponding to 0.24% of the total sequence. The average period size and copy number were 26.07 and 2.17, respectively. The most common type of tandem repeat had period sizes of 42, 14 and 12, which totaled 42% of all tandem repeats found in the mitochondrial genome (Additional file 3: Table S3).

Simple sequence repeat sequences

We found 250 SSRs in the perennial ryegrass mitochondrial genome including 23, 196, 26 and 5 with mono-, tri-, tetra- and pentanucleotide repeats, respectively. SSRs with dinucleotide repeats were not found. The length of the mononucleotide, trinucleotide and pentanucleotide repeats range from 10–13, 9–12 and 15–20 bp, respectively. All the tetranucleotide repeats are 12 bp long (Additional file 4: Table S4). Of 196 trinucleotide repeats, only 14 (7.14%) were present in the coding regions in the perennial ryegrass mitochondrial genome.

Transposable element-related sequences

The presence of transposable elements (TEs) was also investigated using two different TE databases; Poaceae and Triticeae as queries from the Genetic Information Research Institute (http://www.girinst.org/censor/index.php). In total, 22,545 bp (3.32%) and 3,008 bp (0.44%) of the total genome sequence showed homology with TEs in Poaceae and Triticeae, respectively. The TEs were mainly long terminal retrotransposon (LTR) elements (Table 6). The circular master molecule coordinates of the TEs are presented in Additional file 5: Table S5.

Table 6 Transposable elements found in the perennial ryegrass mitochondrial genome

Transcriptome analyses

We performed an expression analysis of the 39 mitochondrial protein-coding genes (Table 7) using in-house RNA-seq data from the reproductive tissue of perennial ryegrass inflorescence (unpublished). The results are presented both as total number of reads matching the genes or as the number of reads corrected for the gene length (normalized expression). The most abundantly expressed genes in terms of total numbers were cob, cox1 and atp1, which all had more than 10,000 matching reads. However, when comparing normalized expression, the most highly expressed genes were nad9, cob, cox3, atp9 and rps12 with more than 10,000 reads per kbp gene length. Eleven genes had low expression (<1,000 matching reads per kbp gene length), namely ccmB, rps4, rps4-p, rps7-1, rps7-2, rps14-1, rps14-2, rpl5-1, rpl5-2, matR and mttB. Of these, the genes rps14-1, rps14-2, rpl5-1, and rpl5-2 had fewer than 100 matching reads per kbp gene length. The genes encoding subunits of the respiratory complexes (nad1 to nad9, cob, cox1 to cox3 and atp1 to atp9) all showed high expression both in absolute numbers and after normalization. In contrast, the genes encoding ribosomal proteins varied enormously in their expression levels (Table 7).

Table 7 Expression profile of the protein-coding genes in the perennial ryegrass mitochondrial genome

Discussion

Intactness of mitochondria

To obtain mtDNA uncontaminated by nuclear or chloroplast DNA, crude but intact mitochondria were isolated. Because the mtDNA is located behind two permeability barriers in intact mitochondria – an intact inner membrane as monitored by MDH latency [16] and an intact outer membrane as monitored by CCO latency [14, 15] – treatment with DNAse removed contaminating chloroplastic and nuclear DNA without degrading the mtDNA. The fact that only 2% of the sequenced contigs were chloroplastic (Table 2) shows that this strategy was successful.

Difficulties of de novoassembly of a plant mitochondrial genome

Although the average length of the gaps between the contigs was only 122 bp, it was not straightforward to close the gaps through PCR amplification of the missing DNA segments. This was mainly due to misassembled, duplicated and repetitive DNA sequences in the nine de novo assembled contigs. Few reports have been published on mitochondrial genome sequencing using next-generation sequencing due to assembly difficulties of short reads even when a reference genome exists [6].

Many next-generation sequencing platforms produce paired-end or mate-pair reads, which collectively can be referred to as read-pairs. Because the approximate physical distance of the read pairs are known, the paired nature of these reads constitutes a powerful source of information, significantly facilitating genome assembly, because they can span repetitive regions, and therefore can be used to join contigs.

It is currently not possible to isolate nuclear DNA from plants without having the nuclear DNA contaminated with organellar DNA (mitochondrial and chloroplast). Thus, organellar genomes are also to a certain degree being sequenced as part of a nuclear genome sequencing project, and prior to filtering, the perennial ryegrass genome draft therefore also contains mitochondrial contigs and scaffolds. The mitochondria-related scaffold information was used to re-assemble, order and orientate the mitochondrial contigs in our mitochondrial genome assembly, and for primer design to facilitate PCR amplification across gaps.

Features of the perennial ryegrass mitochondrial genome

The final assembly of the perennial ryegrass mitochondrial genome resulted in a single circular molecule of 678,580 bp with an average G+C content of 44.1% (Figure 1, Table 3). The G+C content is very similar to that of other sequenced plant mitochondrial genomes such as rice (Oryza sativa L.), 43.8%; bamboo (Ferrocalamus rimosivaginus L.), 44.1%; sugar beet (Beta vulgaris L.), 43.9%; melon (Cucumis melo L.), 44.5%; Arabidopsis (Arabidopsis thaliana L.) 44.8% and rapeseed (Brassica napus L.), 45.2% [5, 7, 2125]. The perennial ryegrass mitochondrial genome contains 73 genes including genes encoding known proteins and RNAs. Among the identified genes, 36 genes (30 encode for proteins, and 6 for tRNAs) are single-copy genes. The remaining 35 genes are multi-copy genes of nad7, rps7, rps14, rpl5, rrn5, rrn18, rrn26, trnD, trnC, trnQ, trnK, trnM, trnP, trnS, trnW and two pseudogenes, rps4-p and trnF-p (Table 4, Table 8). The 73 genes gave a density of one coding region per 9.30 kbp, which is less compact than bamboo, rice and Arabidopsis (one coding region per 7.73, 7.91 and 8.0 kbp, respectively) [21, 22, 25]. Gene distribution between two DNA strands depends on the different genomic configuration [26], but generally shows no extreme strand bias [25]. In the perennial ryegrass mitochondrial genome, two protein-coding genes, nad1 and nad2 are trans-spliced, 21 genes are encoded on the forward strand, and 16 on the reverse strand. All the rRNA genes are located on the forward strand except the rrn26-1 gene, while 15 tRNA genes are found on the forward strand and 13 on the reverse strand (Figure 1, Table 4).

Table 8 Copy number of mitochondrial genes that differ in perennial ryegrass, bamboo, wheat, rice and maize

The coding and intron sequences occupy 7.2% (48,723 bp) and 3.9% (26,631 bp) of the genome, respectively, including 39 protein, 28 tRNA and 6 rRNA genes (Table 3). In the maize (Zea mays L.) NB mitochondrial genome coding sequences account for 6.22% of the total genome [19]. Generally, the functional mitochondrial rRNA and tRNA genes of the sequenced angiosperm mitochondrial genomes lack introns [19]. We found that one tRNA gene, trnLCAA contains an intron in the perennial ryegrass mitochondrial genome. Similar results have been found in the date palm (Phoenix dactylifera L.) mitochondrial genome where three tRNA genes, trnKTTT, trnNATT and trnSupCTA, also contained an intron [28]. Further work is needed to determine if the perennial ryegrass intron containing tRNA gene, trnLCAA is functional.

The variation in the number of mitochondrial genes between species is mainly due to differences in gene content for the subunits of complex II, and especially ribosomal proteins and tRNAs [18]. Multiple copies of rRNA genes were found in perennial ryegrass (Table 8), and also in the mitochondrial genomes of sugar beet and wheat (Triticum aestivum L.) [23, 27]. All the known respiratory genes, except for the complex II genes sdh3 and sdh4, are present in the perennial ryegrass mitochondrial genome (Table 4). Both sdh3 and sdh4 genes are functional in tobacco and melon [5, 29] but absent in all other species, as reviewed by Ma et al. [22]. Sdh4 has been identified as a pseudogene in Arabidopsis, rapeseed and sugar beet [7, 23, 25]. Although the perennial ryegrass mitochondrial genome contains some multi-copy genes, it lacks the rpl2, rps10, rps11 and rps19 genes (Table 4, Table 8). Rpl2 is missing in sorghum (Sorghum bicolor L.), Tripsacum, maize and sugar beet [22]; it is functional in Arabidopsis, rice, rapeseed, tobacco and melon [5, 7, 21, 25, 29]; and it is a pseudogene in wheat and bamboo [22, 27]. Rps10 is only present in tobacco and melon [5, 29]. The rps11 gene is present in liverwort (Marchantia polymorpha) but has not been found in eudicotyledonous and monocotyledonous species except for the rice mitochondrial genome which retains rps11 as a pseudogene [21, 30]. The rps19 gene is functional in rice, bamboo, and tobacco, and has been identified as a pseudogene in wheat and Arabidopsis. In the perennial ryegrass mitochondrial genome, we found multiple copies of the rps14 gene, which has only been identified as a functional gene in rapeseed [7]. The comparison of all 14 ribosomal protein and both complex II (sdh) genes in 280 diverse angiosperms has demonstrated frequent loss of some of these 16 mitochondrial genes during angiosperm evolution [31]. It seems that genes encoding ribosomal proteins and complex II proteins are species-specific. In order to understand the gene loss and gain event in the angiosperm mitochondrial genome, it might be interesting to know the compensation pathway of the genes which are missing in the mitochondrial genome. The compensation pathway might be the first reason that genes which are no longer necessary to function in the cell can disappear entirely from the cell. The second reason is gene substitution or gene replacement, where the function of the missing mitochondrial gene is still needed and is directly replaced by a preexisting nuclear gene whose product can play the same role in the mitochondrion [32].

The perennial ryegrass mitochondrial genome has 3 rRNA genes, rrn18, rrn5 and rrn26, encoding the small subunit and large subunit rRNAs, which are present in all characterized mitochondrial genomes. The rrn5 gene is very small (122 nucleotides). In contrast to the rRNA genes in the mitochondrial genome, the mitochondrial 5S rRNA gene is absent in the mitochondrial genome of some fungi, animals and protists [3], and present only in the lands plants, a subset of algae [33] and in the protozoan [34].

Plant mitochondrial genes are translated using the universal genetic code and require tRNAs for all 20 amino acids, and the composition of the tRNA genes, encoded by the plant mitochondrial genomes, are to a high degree unique in angiosperms. In the perennial ryegrass mitochondrial genome, 27 functional tRNA genes are found for 14 amino acids. One pseudogene, trnF-p remains non-functional in the genome. Post-transcriptional modification within the anticodon sequence might be necessary to generate a functional trnF-p gene. Thus, functional tRNA genes for six essential amino acids, Ala, Arg, Gly, Ile, Thr and Val, are missing from the perennial ryegrass mitochondrial genome (Table 5), although tRNAs for 20 amino acids are required for protein synthesis in mitochondria. The missing six are presumably encoded by the nuclear genome and imported from the cytosol into the mitochondria [35, 36]. The tRNA gene content of the perennial ryegrass mitochondrial genome was compared with eight other grass mitochondrial genomes, and differences were observed with respect to presence or absence of tRNAs for Ala, Arg, Gly, Ile, leu, Thr, Trp and Val among these plant species. Plastid-derived tRNAs were not considered in the comparison (Table 5). In the perennial ryegrass mitochondrial genome, twenty-four tRNAs display a classical clover leaf structure, whereas each of the two tRNA-Ser (tRNA-SerTGA and tRNA-SerGCT) fold into an unusual four-loop secondary structure. One of the tRNAs (tRNA-Tyr GTA) has a two stem clover leaf structure. In the maize NB mitochondrial genome, tRNA-SerGCU and tRNA-SerUGA have a five loop secondary structure [19].

In addition to protein and RNA-coding genes, we identified 149 ORFs larger than 300 bp in the perennial ryegrass mitochondrial genome (Additional file 1: Table S1). Only ORFs found outside the genic regions of the mitochondrial genome were included in the analysis. The number of ORFs larger than 300 bp identified in the perennial ryegrass mitochondrial genome are comparable to previously reported for other species such as maize (121), sugar beet (93), Arabidopsis (85), rice (461), wheat (179) and tobacco (110) [19, 24, 25, 27, 29, 37].

Gene content in the repetitive regions

The mitochondrial genome of perennial ryegrass contains multiple copies of the genes nad7, rps7, rps14, rpl5, rrn5, rrn18, rrn26, trnD, trnC, trnQ, trnK, trnM, trnP, trnS and trnW (Table 4, Table 8). All of the multi-copy protein genes are located in the inverted repeat regions, and multi-copy RNA genes are located in both repeat and inverted repeat regions (Figure 2). As for trnP, two copies were identical, whereas the third copy differed by a single nucleotide. The trnM-1 also differed from trnM-2 by a single nucleotide. Similarly, as for trnQ in the wheat mitochondrial genome, two copies are identical, whereas the third copy differed by a single nucleotide [27]. A comparison of multi-copy mitochondrial genes among grass genomes such as ryegrass, bamboo, wheat, rice and maize (Table 8), suggests that gene duplication is a species-specific phenomenon [27]. The large repeated sequences covers 17.35% of the maize NB mitochondrial genome [19], while such sequences covered 17.12% in perennial ryegrass.

Splicing

Splicing is often part of post-transcriptional modification of messenger RNAs (mRNAs). It involves the excision of non-coding intron sequences from a precursor RNA and subsequently ligation of the flanking exon sequences to produce a mature transcript. Two types of splicing were found in the perennial ryegrass mitochondrial genome: cis-splicing, the intramolecular ligation of exon sequences on the same precursor RNA, and trans-splicing involving the intermolecular ligation of exon sequences from different primary transcripts [38]. Trans-splicing is characteristic for angiosperm mitochondrial introns, particularly for genes encoding complex I subunits [18]. Of 26 introns of the protein-coding genes, only four were trans-spliced (nad1 and nad2), confirming that trans-slicing is less common than cis-spicing [38]. In the perennial ryegrass mitochondrial genome, two genes nad1 and nad2, encoding proteins of NADH dehydrogenase subunits 1 and 2, were each split into 5 exons. In nad1, exon nad1a was located far from the other four exons on the other stand. In case of the nad2 gene, exons nad2a and nad2b were found approximately 265 kb from the other three exons on the other strand.

Genome diversity

The mitochondrial genomes of flowering plants are more complex than those of animal and fungi [39, 40]. They extensively vary in size (ranging from 208 kbp in white mustard [4] to over 2,700 kbp in muskmelon [5]), gene content, genome rearrangement patterns and presence of repetitive sequences. Multiple copies of a few of the conserved full-length genes or exons are found in the mitochondrial genome (Table 8), which has also undergone size expansion when compared between plant species. In the perennial ryegrass mitochondrial genome, only 142,764 bp (21.03%) of the total DNA sequences encode proteins, RNAs and ORFs (Table 3), while the vast majority of the genome sequence has unknown function.

In the perennial ryegrass mitochondrial genome, the rps4-p and rps4 genes are conserved to each other in the 5 end (599 nucleotides with a 99% sequence identity) but they do not share the sequence. The BLASTn result confirmed that the 3 end of the rps4-p gene has no homology with rps4 gene of other species reported so far. Thus, the rps4-p gene might be a variant of the ribosomal gene, rps4 or a pseudogene in the perennial ryegrass mitochondrial genome. The amino acid sequences of rps4 and rps4-p genes in perennial ryegrass were aligned with the amino acid sequences of the rps4 gene of maize (Acc. No. YP_588274.1), sorghum (Acc. No. YP_762343.1), rice (Acc. No. YP_514660.1), wheat (Acc. No. ADE08097.1) and bamboo (Acc. No. AEK66732.1), (Figure 3). The alignment showed that only 196 amino acids (45%) of the rps4-p gene are conserve to the rps4 gene. Transcriptome analysis confirmed that both rps4 and rps4-p are expressed in the reproductive tissue of the perennial ryegrass. Both rps4 and rps4-p genes had low normalized expression pattern (<1,000 reads per kbp length) (Table 7). In addition, we also found two ribosomal protein genes, rps3 and rpl16, sharing 110 bp of sequence between them (Figure 1, Table 4). The shared sequence was located in the second exon of rps3 (rps3b) and at the beginning of the rpl16 gene. Similarly, in the wheat KS3-type mitochondrial genome, KSorf1484 has 46 bp shared sequence with the cob gene [41].

Figure 3
figure 3

Amino acid sequence alignment of the perennial ryegrass rps 4 and rps 4-p genes with the rps 4 gene of maize, sorghum, rice, wheat and bamboo. Amino acid sequence of the rps4 gene of maize (Acc. No. YP_588274.1), sorghum (Acc. No. YP_762343.1), rice (Acc. No. YP_514660.1), wheat (Acc. No. ADE08097.1) and bamboo (Acc. No. AEK66732.1). Color code: white, conserved; green, identical; cyan, similar and yellow, different residues. This alignment was generated using the SDSC Biology WorkBench [42].

Plant mitochondrial genomes contain TEs, DNA sequences that can move from one position to another. TEs can constitute an appreciable fraction in the genome and are found in most species with the exception of liverwort [33]. In the perennial ryegrass mitochondrial genome, we found 22 and 95 TE fragments of various sizes covering a total of 3,008 and 22,545 bp based on comparison to the Triticeae and Poaceae databases, respectively (Table 6). The sequences varied in length and the elements were dispersed in the perennial ryegrass mitochondrial genome (Additional file 5: Table S5). The TEs appear to be more abundant in grasses than cereals. In the perennial ryegrass mitochondrial genome only 0.44% or 3.32% of the genome showed homology with transposable like elements, most of which resemble retrotransposable elements, suggesting that their contribution to the expanded perennial ryegrass mitochondrial genome is negligible.

Gene expression profile

In this study, the expression profile of perennial ryegrass mitochondrial genes was studied in reproductive flower tissues (Table 7). In reproductive tissues, the mitochondrial density and activity is especially high presumably because the energy and biosynthetic requirement is particularly high during reproduction, i.e., during pollen development [43, 44]. The generally high expression level of all of the encoded subunits of the respiratory complexes in the mitochondrial genome (Table 7) is consistent with that observation. However, there were some variations between the expression levels of the complexes:

Complex I, the NADH dehydrogenase, contains around 40 subunits, one copy each per complex, in higher plants [45] where nine are mitochondrially encoded in perennial ryegrass (Table 4). The normalized expression levels for all of the complex I subunits were generally high and varied about 3-fold (Table 7).

Complex IV, cytochrome c oxidase, contains 12–13 subunits out of which three are mitochondrially encoded in most plants including perennial ryegrass (Table 4). All three subunits were highly expressed (Table 7).

Complex V, the ATP synthase, consists of about 15 subunits, five of which are encoded by the mitochondrial genome in perennial ryegrass (Table 4), while the remaining are encoded in the nucleus, synthesized on free ribosomes in the cytosol and imported into the mitochondria to be assembled with the mitochondrially encoded subunits into a complex in the inner membrane. The normalized expression levels of all the ATP synthase (complex V) genes was relatively high in perennial ryegrass reproductive tissues and varied <3-fold with the exception of atp9 (encoding subunit c of the Fo), which had a 2.5 times higher expression than the second highest, which is atp1 (encoding subunit alpha of the F1) (Table 7). This may be significant given that subunit c of the complex is present in 10–15 copies per complex and the alpha subunit is present in three copies while all the other mitochondrially encoded subunits are only present in one copy each per complex V [46]. Thus, the mRNA levels in perennial ryegrass mitochondria as expressed by the normalized read numbers are correlated with the biosynthetic requirement for complex V subunits. Previous studies have shown that the atp1 gene is highly expressed in male flower of date palm, and in pollen mother cells of Arabidopsis[28, 47].

Finally, the four genes encoding proteins involved in cytochrome c biosynthesis, ccmB, ccmC, ccmFC and ccmFN, were all expressed, but not at particularly high levels (Table 7). An earlier microarray analysis of mitochondrial gene expression at the early stage of wheat shoot tissues reported that the ccmFN gene showed increased transcript level under three different stress conditions, low temperature (4°C), high salinity (0.2 M NaCl) and high osmotic potential (0.3 M mannitol) [48].

Protein biosynthesis in plant mitochondria takes place on bacterial-type ribosomes, where 14 of the subunits are encoded in the perennial ryegrass genome (Table 4). Out of these, eight had low normalized expression levels (<1,000 reads per kbp length), while three (rps3, rps12 and rpl16) showed high expression levels (>5,000 reads per kb length) (Table 7). Four of the ribosomal protein genes were hardly expressed at all (rps14-1, rps14-2, rpl5-1 and rpl5-2) which may be because they are non-functional, or because they are required in other tissues, but not in the reproductive tissues in perennial ryegrass. Consistent with the latter hypothesis, two ribosomal protein genes, rps1 and rps19, are much more abundantly expressed in roots than in other tissues of date palm [28].

Transcription does not necessarily lead to protein synthesis. An astonishing 48.5% and 30.8% of the total mitochondrial genomes of rice and date palm, respectively, are transcribed, which is due to RNA synthesis from large parts of the regions outside the annotated genes. For comparison, only 6.5% of the date palm mitochondrial genome is translated into proteins [28, 37]. The functions of the transcribed inter-genic regions of plant mitochondrial genomes are not well understood.

The expression of the mitochondrial genes in plants is carried out by phage-type RNA polymerase encoded in the nuclear genome [49]. The process of gene expression in plant mitochondrial DNA is rather complex, influenced by multiple promoters, RNA processing and particularly at the post-transcriptional processes, splicing and editing [5052]. Prior to transcription, the RNA polymerase is capable of promoter recognition, initiation, and elongation on their own but need auxillary factors to recognize all transcription initiation sites [53]. The sequence analysis of the Arabidopsis mitochondrial genome showed that potential promoter motifs exist in the inter-generic regions [54]. In addition, a number of annotated genes do not show potential promoter sequences confirming the possibility that other sequences can initiate transcription [55]. For this reason, transcription is actually initiated from a variety of promoter sites in the genome [50, 55]. Thus, transcription in plant mitochondria produces both cryptic transcripts from regions that do not contain genes or from the opposite DNA strand of the genes; as well as defective transcripts encoded by the genes but failing to complete the complex post-transcriptional process to become functional transcripts [56]. Once initiated, transcription sometimes give rise to extremely large transcripts due to the absence of efficient transcription termination in plant mitochondria [57]. This contributes significantly to the transcription of the inter-genic regions. Therefore, large portions of the mitochondrial genome are transcribed but not translated into proteins.

Conclusions

For the first time, the mitochondrial genome of perennial ryegrass has been sequenced, successfully assembled and annotated. The data presented here constitute a primary platform to understand the organization and function of the mitochondrial genome in one of the most important forage and turf grass species. The circular mitochondrial master molecule will be useful for comparative mitochondrial genomics and for future research on agronomically important traits such as CMS.

Perspectives

Perennial ryegrass is a dominant forage species in the temperate regions worldwide, and its main role is to provide forage to the ruminant animals. Eighty per cent of the world’s cow milk and 70% of the world’s beef and veal are produced from temperate grasslands [58]. A major portion of these grasslands is covered by perennial ryegrass, which is, however, not well adapted to regions with severe winter or hot summer [58], unless the geographic range of the species can be extended by developing more robust cultivars. One way to increase productivity, nutritional quality and tolerance towards biotic and abiotic stress is to maximize the genetically available heterosis using hybrid breeding schemes. However, hybrid seed production requires a tool to efficiently control pollination, a tool such as CMS. The mitochondrial genome is a key to understanding the origin and function of CMS and will – in the long term – facilitate the development of hybrid cultivars in allogamous forage grasses.

Methods

Plant material

The perennial ryegrass genotype F1-30 was used for mitochondrial genome sequencing. F1-30 was developed from a cross between a genotype of the Italian cultivar Veyo and the Danish ecotype Falster [59]. The F1-30 genotype was multiplied by clonal propagation and grown in 15 cm × 15 cm plastic pots in the greenhouse in order to develop plant material for mitochondrial DNA (mtDNA) extraction.

Isolation of intact mitochondria

The plants were kept in darkness for 24 h prior to mitochondrial isolation in order to reduce the amount of starch in the chloroplasts. For each batch of mtDNA isolation a total of 30 g young leaves were collected from 4-month-old clones of F1-30 to isolate intact mitochondria. All the equipment and buffers were kept at 4°C before the extraction, and all the steps were conducted on ice or at 4°C.

The leaves were cut into small pieces (5–10 mm) with scissors. Thirty g leaf pieces were homogenized for 1 min in 300 ml extraction buffer containing 0.3 M mannitol, 5 mM EDTA, 30 mM MOPS (pH 7.3 adjusted with 1 M KOH), using a chilled mortar and pestle. The reagents 0.2% (w/v) BSA, 5 mM DTT and 1% (w/v) PVPP, were added to the extraction buffer prior to use. The crude homogenate was filtered through two layers of cotton cloth followed by centrifugation at 2,000 g for 10 min to pellet starch, nuclei and chloroplast. The recovered supernatant was centrifuged at 10,000 g for 15 min to pellet intact mitochondria.

The mitochondrial pellet was resuspended in 7 ml DNAse-I buffer containing 0.44 M sucrose, 50 mM Tris–HCl (pH 8.0) and 10 mM MgCl2. Eight mg of DNAse-I recombinant, grade I (Roche, Mannheim, Germany) was dissolved in 1 ml DNAse-I buffer, and added to the mitochondrial suspension to give a final concentration of 1 mg/ml DNAse-I. Digestion was allowed to continue on ice for 2 h to degrade any nuclear and chloroplast DNA present outside the mitochondria. The digestion was terminated by adding 0.5 M EDTA (pH 8.0) to a final concentration of 25 mM. Mitochondria were re-pelleted at 16,000 g for 10 min. The pellet was resuspended in 25 ml of wash buffer (0.3 M mannitol, 1 mM EDTA, 10 mM MOPS, pH 7.2 adjusted with 1 M KOH). Intact mitochondria were washed twice by resuspension in wash buffer and re-pelleting at 16,000 g for 10 min.

Extraction, purification and precipitation of mtDNA

The washed mitochondrial pellet was lysed by suspension in 2 ml lysis buffer containing 10 mM Tris–HCl (pH 8.0), 10 mM NaCl and 1 mM EDTA (pH 8.0), followed by the addition of 10% SDS to a final concentration of 1% (v/v) and incubation at 37°C for 30 min. DNA was purified according to the standard method [60] with slight modifications. An equal volume of phenol:chloroform (25:24 v/v) was added to the sample and centrifuged at 20,800 g for 15 min at room temperature. The aqueous phase was transferred to an eppendorf tube, and two additional cycles of phenol:chloroform extraction and two cycles of chloroform extraction were performed. The purified DNA was precipitated by adding 0.1 volume of 3 M sodium acetate (pH 5.5) and 2 volumes of cold (4°C) absolute ethanol (99.9%) to the purified DNA. The mixture was vortexed briefly and incubated at −20°C overnight. The precipitated DNA was recovered by centrifugation at 4°C at 20,375 g for 15 min. The ethanol was removed by decantation and 300 μl ice cold 70% (v/v) ethanol was added and centrifuged again at 4°C at 20,375 g for 5 min. The ethanol was removed and the pellet was air dried and resuspended in sterile Tris/EDTA buffer (pH 8.0) followed by an equal volume of R40 (40 μg/ml RNAse A (Roche, Mannheim, Germany) in Tris/EDTA buffer). The mtDNA was checked for quality by gel electrophoresis on a 1.5% agarose gel in 1× TAE buffer (Additional file 6: Figure 1). DNA from two batches of mtDNA isolation was pooled in order to obtain a sufficient amount for sequencing.

Monitoring mitochondrial purity and intactness and protein concentration

During the mitochondrial preparation, we kept 1 ml samples from each preparation step (homogenate to supernatant) to be used for the determination of enzyme activation and protein concentration. The activity and latency of cytochrome c oxidase (CCO), an inner membrane enzyme, was measured at 550 nm in an assay medium containing 0.3 M sucrose, 50 mM Tris, 100 mM KCl and 45 μM reduced cytochrome c, pH adjusted to 7.2 using 1 M acetic acid plus or minus 0.05% (w/v) Triton X-100. The activity and latency of NAD+-dependent malate dehydrogenase (MDH), a matrix enzyme, was measured at 340 nm in the cuvette using 1 ml assay medium containing 0.3 M sucrose, 20 mM MOPS-KOH, pH 7.0, 20 μl of 100 mM oxaloacetate, pH 7.0, 5 μl of 200 mM salicylhydroxamic acid, 2 μl of 0.2 mM antimycin and 4 μl of 50 mM NADH plus or minus 0.05% (w/v) Triton X-100. In both cases the enzyme latency was calculated as,

Percentage intact = ( [ ( rate + Triton ) ( rate Triton ) ] / ( rate + Triton ) ) × 100 %

[14]. The latency of CCO activity is a measure of the integrity of mitochondrial outer membrane as the substrate, reduced cytochrome c, can not penetrate an intact outer membrane to reach the active site on the outer surface of the inner membrane. The latency of MDH activity is a measure of the integrity of the mitochondrial inner membrane as NADH can not cross an intact inner membrane to reach the enzyme present in the mitochondrial matrix [14, 15].

The protein concentration in the various fractions was measured at 562 nm using Bicinchoninic acid (BCA) protein assay kit (Sigma) containing BCA working reagent, BSA protein standard and 5% deoxycholic acid as recommended by the manufacturer.

Library preparation and next-generation sequencing

A sequencing library for F1-30 was prepared according to the Rapid Library Preparation Method Manual October 2009 (Rev. Jan 2010) using 500 ng mtDNA. Sequencing was performed on a Roche 454 GS-FLX Titanium instrument (software version 2.3) following the manufacturer’s recommendations.

De novoassembly of the mitochondrial genome

Adaptor removal, quality filtering (quality score 99.8%), and reference assembly against the chloroplast genome was performed using the CLC Genomics Workbench software (v.5.0). The chloroplast reads were removed by performing a reference assembly against the perennial ryegrass chloroplast genome (GenBank Acc. No.: NC_009950.1) using the parameters: similarity, 0.98; conflict resolution, vote (A, C, G, T); non-specific matches, random and masking of references, none. Reads mapping to the chloroplast genome were subsequently removed from the dataset. The sequence reads were assembled into contigs using the CLC Genomics Workbench software (v.5.0) with the parameters: Mismatch cost, 2; Insertion cost, 3; Deletion cost, 3; Length fraction, 0.5; Similarity fraction, 0.99. Gap closure and manual inspection and editing were performed using the SeqMan software (v.5.0.3). Mitochondrial genome contigs were identified by BLASTn (E-value 1×10−10). The PipMaker software [20] was used to identify repetitive regions within and among the mitochondrial contigs in order to facilitate primer design. DNASIS Max (v.2.9.) was used to blast in-house against the perennial ryegrass genome scaffolds in order to validate the order of the contigs in the mitochondrial genome. Primers were designed using the Primer3 software (v.0.4.0). Genomic DNA of F1-30 was used as template for PCR to amplify the gaps between contigs. The purified PCR products were sequenced by Eurofins MWG Operon (Ebersberg, Germany). The gap sequences were incorporated into the assembly using the SeqMan software.

Genome annotation and analysis

Annotation of the mitochondrial genome was done using the Maker2 pipeline [61]. In a first round of analysis we used an in-house assembly of the F1-30 transcriptome (unpublished) as initial evidence for gene prediction. We also utilized a collection of plant mitochondrial protein sequences from various organisms included in the genome annotation software package Mitofy for plant mitochondria (http://dogma.ccbb.utexas.edu/mitofy). Repeat masking was performed with a grass-specific repeat database from RepBase [62]. After an initial round of gene prediction, a training file was generated for the ab initio gene predictor SNAP, and an additional round of gene prediction was performed. The resulting GFF3 file was loaded into Apollo [63] for visualization and manual curation of the genes after taking all the available evidence into account. Structural RNA genes were identified using tRNAscan-SE 1.21 (for tRNAs) and RNAmmer 1.2 for rRNAs [64, 65]. Search for ORFs was performed using CLC Genomics Workbench (v.5.0).

Sequence repeats were investigated using PipMaker with default parameter settings [20]. SIR were detected using the inverted repeat finder software [66] (match 2; mismatch 3; delta 5; match probability 80; indel probability 10; minimum alignment score 40; maximum length to report 100,000; maximum loop 100,000; maximum loop separation for tuple of length 4). Tandem repeats were detected using the Tandem Repeat Finder (v.4.04) developed by [67] with parameters (alignment parameters [match, mismatch, indel; 2,7,7], min. align. score 50; max. period size 2,000). SSRs were identified using the msatcommander 0.8.2-WINXP.Zip software package [68]. The parameters for SSR detection were 1- to 2-nucleotide (nt) repeats of at least 10 nt length and 3- to 6-nt repeats with at least three repeat units (Additional file 4: Table S4).

TEs were identified using CENSOR (with default parameter settings, using Poaceae and Triticeae as reference [62].

RNA preparation and sequencing

For transcriptome analysis, pollen and stigma tissue samples were collected from the F1-30 genotype grown under standard growing conditions in a greenhouse. A sealed paper bag was put over the inflorescences for 8 hours at anthesis to collect the pollen. The pollen was harvested after 8 hours, frozen in liquid nitrogen and stored at –80°C. Unpollinated stigmas were isolated from flowers just before anthesis, frozen in liquid nitrogen and stored at –80°C. Total RNA was extracted from each sample using the RNeasy™ Plant Mini Kit following the manufactures instructions (Qiagen, Valencia, CA, USA ), and the RNA integrity was measured with a RNA 6000 Nano Labchip™ on the Agilent 2100 Bioanalyzer™ (Agilent Technologies, Santa Clara, CA, USA). Samples were sequenced on an Illumina HiSeq2000 system.

Read quality and trimming of sequences

Using the program FastQC (Babraham Institute, CA, USA) we were able to visualize the read quality and length of the Illumina raw reads. Using the output from this program we determined that Illumina adaptors were present at the 3 end of the reads. It also indicated to us that the paired-end reads were overlapping. Using this to our advantage, we used the program fastq-join.pl [http://code.google.com/p/ea-utils] to merge reads with an overlap of 10 bp after removing the Illumina adaptors on the 3 end of the read using the program Homer-Tools [69].

Transcriptome analyses

Illumina 101 bp reads from reproductive tissue samples were used for gene expression analysis of the 39 protein-coding genes. Reads were mapped onto the sequences of the 39 genes using Bowtie [70], allowing a maximum of 2 mismatches in the first 25 bp. The program RSEM [71] was used to calculate RNA-seq read abundance from the SAM alignments. The expression of each gene was calculated by dividing the abundance estimates from RSEM by the length of the gene (kbp).