18S and ITS2 rDNA sequence-structure phylogeny of Prototheca (Chlorophyta, Trebouxiophyceae)

Protothecosis is an infectious disease caused by organisms currently classified within the green algal genus Prototheca. The disease can manifest as cutaneous lesions, olecranon bursitis or disseminated or systemic infections in both immunocompetent and immunosuppressed patients. Concerning diagnostics, taxonomic validity is important. Prototheca, closely related to the Chlorella species complex, is known to be polyphyletic, branching with Auxenochlorella and Helicosporidium. The phylogeny of Prototheca was discussed and revisited several times in the last decade; new species have been described. Phylogenetic analyses were performed using ribosomal DNA (rDNA) and partial mitochondrial cytochrome b (cytb) sequence data. In this work we use Internal Transcribed Spacer 2 (ITS2) as well as 18S rDNA data. However, for the first time, we reconstruct phylogenetic relationships of Prototheca using primary sequence and RNA secondary structure information simultaneously, a concept shown to increase robustness and accuracy of phylogenetic tree estimation. Using encoded sequence-structure data, Neighbor-Joining, Maximum-Parsimony and Maximum-Likelihood methods yielded well-supported trees in agreement with other trees calculated on rDNA; but differ in several aspects from trees using cytb as a phylogenetic marker. ITS2 secondary structures of Prototheca sequences are in agreement with the well-known common core structure of eukaryotes but show unusual differences in their helix lengths. An elongation of the fourth helix of some species seems to have occurred independently in the course of evolution.


Material and methods
For a material and methods workflow, see Fig. 1. ITS2 and 18S rDNA sequences of Prototheca and its affiliated species (Tables S1, S2) were obtained from NCBI Nucleotide database (retrieved on 2021-04-26) (Benson et al. 2009). ITS2 sequences were annotated using the "annotate" option implemented in the ITS2 database which uses Hidden Markov Models to annotate eukaryote ITS2 (Eddy, 1998;Keller et al. 2009;Schultz et al. 2006;Ankenbrand et al. 2015).
In ClustalX (Larkin et al. 2007), ITS2 as well as 18S rDNA sequences were aligned. Introns were removed from the 18S rDNA alignment with the help of the sequence editor Align (Hepperle et al. 2004).
Based on minimum free energy and constrained folding by using lower case letters, secondary structures of selected ITS2 (Tables S1, S2) sequences were predicted with RNAstructure (Reuter et al. 2010) which were then used as templates for homology modeling Selig et al. 2008) of the remaining secondary structures. Homology modeling was performed with the "model" option as implemented in the ITS2 database. Secondary structures of 18S rDNA sequences were also predicted via homology modeling using the ITS2 database. The template structure (Jaagichlorella luteoviridis (choDaT) DarienKo & PröScholD, 2019) was obtained from the Comparative RNA Web Site (Cannone et al. 2002; Figure S1).
ITS2 and 18S rDNA sequence-structure datasets were each aligned using 4SALE . 4SALE uses a 12-letter-alphabet consisting of the four nucleotides and their structural states (unpaired, paired left, paired right) to encode sequence and structure information simultaneously. 4SALE was also used to visualize a consensus structure for Prototheca ITS2 sequences.
Sequence-structure alignments were exported from 4SALE for further analysis. Specifically, a sequence-structure Neighbor-Joining (NJ) (Saitou and Nei, 1987) tree was calculated based on both ITS2 and 18S rDNA sequencestructure alignments using ProfDistS (Friedrich et al. 2005;Wolf et al. 2008). For ITS2 sequence-structure data, Q_ ITS2, a sequence-structure specific General Time Reversible correction model (cf. Lanave et al. 1984) as implemented in ProfDistS, was used, while for 18S rDNA sequence-structure data a sequence-structure specific JC model (Jukes and Cantor, 1969) was used as distance estimation method.
From each dataset, a subset with less taxa was manually chosen (Tables S1, S2). For each subset, a sequence-structure NJ tree was calculated using ProfDistS. Maximum-Parsimony (MP) and Maximum-Likelihood (ML) trees based on sequence-structure-data were calculated with PAUP (Swofford, 2002) (using one-letter encoded sequence-structure data) and R (R Core Team, 2018), respectively. The R-script is available at http:// 4sale. bioap ps. bioze ntrum. uni-wuerz burg. de/ mlseq str. html. Additionally, ITS2 and 18S rDNA sequences were aligned in ClustalX and sequence-only NJ trees were calculated in ProfDistS. For all methods, due to the complexity of the sequence-structure approach, a bootstrap support (Felsenstein, 1985) was estimated based on 100 pseudo-replicates. Fig. 1 Flowchart of all methods applied in this work. Sequences which could not be properly annotated or aligned were discarded, as well as Prototheca strains classified as "sp.". For alignment editing, Align (Hepperle et al. 2004) was used (not shown). After reconstruction of Neighbor-Joining (NJ) overview trees using ProfDistS ) subsets were manually chosen for Maximum-Likelihood (ML), Maximum-Parsimony (MP) and Neighbor-Joining (NJ) analysis. Further figures available with this manuscript are indicated ◂ A third dataset was created consisting of combined ITS2 and 18S rDNA sequence-structure alignments. For this dataset, sequence-structure NJ, MP and ML trees were calculated with the programs and methods described above (for comparison reasons, additionally for this dataset, each marker was again handled separately, cf. Figures S4 and S5). All trees were rooted with Chlorella vulgaris BeyerincK, 1890 and Parachlorella kessleri l.Krieniz, e.h.hegeWalD, D. hePPerle, v.a.r.huSS, T.rohr & M. Wolf, 2004. All alignments are available on request.

Taxon sampling
From NCBI, 192 ITS2 sequences of Prototheca and affiliated species could be obtained, as well as 165 18S rDNA sequences (Tables S1, S2). For ITS2 sequences, re-annotation was performed using the "annotate" tool in the ITS2 database with "Viridiplantae" as the model, inclusion of the proximal stem (last 25 nucleotides of 5.8S and first 25 nucleotides of 28S rDNA) and an E-Value of < 0.01 or < 0.1.
For ITS2 and 18S rDNA, sequence alignments were created using ClustalX. Sequences, which could not be annotated or aligned, were discarded. The ITS2 sequence of Prototheca wickerhamii was significantly longer than all other ITS2 sequences and could therefore not be properly aligned. The final alignments consist of 118 ITS2 sequences and 73 18S rDNA sequences (cf. Figure 1).
For ITS2 sequences, six secondary structure templates were created using RNAstructure (P. blaschkeae, P. cutis, P. stagnorum W.B.cooKe, 1968, P. ulmea r.S.Pore, 1986, P. xanthoriae JagielSKi, 2019, P. zopfii). With these templates, structures of all other sequences could be predicted using the "model" tool of the ITS2 database with at least 70 percent transfer of the structure for most and 60 percent of the structure for three sequences (P. tumulicola nagaT-SuKa, Kiyuna, KigaWa & J.SugiyaMa, 2016). Structures of P. wickerhamii could not be predicted with the templates described and showed a significantly longer and bifurcated fourth helix when modeled with RNAstructure. This taxon is therefore missing in further analyses. Phylogenetic trees were calculated on sequence-structure alignments generated in 4SALE, consisting of 112 taxa and a subset with 30 taxa.
For 18S rDNA sequences, a structure template was obtained from CRW (Jaagichlorella luteoviridis, X73998, Figure S1). All 73 18S rDNA sequences could be predicted with at least 70 percent transfer of the structures, for all structures except Helicosporidium sp. (67.82%). Prototheca sequences classified as "sp." were discarded. For 18S rDNA data, phylogenetic trees were calculated with 71 taxa and a subset of 26 taxa. From ITS2 and 18S rDNA subsets, a combined sequencestructure alignment of 15 strains / sequences was created.

Phylogeny of Prototheca based on ITS2 sequence-structure data
A Neighbor-Joining tree was calculated based on 112 ITS2 sequence-structure pairs (Fig. 2). From the clades shown in this overview tree, 30 taxa were manually selected for NJ, MP and ML analysis (Fig. 3). Towards the root of this tree, a highly supported supergroup consisting of P. miyajii, P. cutis and P. paracutis finds itself with Jaagichlorella luteoviridis and Auxenochlorella protothecoides, showing the polyphyly of the Prototheca genus. Strains of P. xanthoriae form a sister clade to all other Prototheca strains in the ML tree, although its position differs in the trees based on NJ and MP algorithms. P. moriformis W.Krüger, 1894 is very highly supported to be a sister group to the remaining taxa, which then are further divided into a P. tumulicola /P. stagnorum clade and a second clade, a supergroup consisting of P. zopfii /P. bovis, P. ciferrii negroni & BlaiSTen, 1941, P. pringsheimii JagielSKi, 2019, P. cerasi JagielSKi, 2019, P. cookei JagielSKi, 2019, and P. blaschkeae. In this supergroup, P. ciferrii appears to be polyphyletic with P. pringsheimii sequences branching within the P. ciferrii clade. P. ciferrii /P. pringsheimii and their sister group P. cerasi appear to be a sister group to the P. zopfii /P. bovis clade. All of these strains form a sister group to P. cookei. P. blaschkeae appears to sister with just the P. zopfii /P. bovis clade in the overview NJ tree, but in trees calculated on the subset data the sister group also includes P. ciferrii, P. pringsheimii, P. cerasi, P. cookei and P. moriformis (only in the MP tree).
In general, the topology of the trees calculated on ITS2 sequence-structure-data show similar topology to the trees calculated by Masuda et al. (2016) and Hirose et al. (2018), with the additional taxa proposed by Jagielski et al. (2019) and Kunthiphun et al. (2019). Auxenochlorella protothecoides and Jaagichlorella luteoviridis are sister groups in our tree based on ITS2 sequence-structure data with a bootstrap support of 100 and both clade with P. cutis / P. paracutis / P. miyajii with a bootstrap support of 81. Auxenochlorella protothecoides branches with Prototheca wickerhamii (with a bootstrap support of 58) in the work of Hirose et al. (2018), and is sister group to all Prototheca sequences except P. wickerhamii in the tree proposed by Masuda et al. (2016) with a bootstrap support of 87. P. ulmea is poorly supported being a sister group to P. zopfii, P. moriformis and P. blaschkeae sequences in the same work whereas we show P. tumulicola / P. stagnorum as sister group to these species with a bootstrap support of 76.
The phylogenetic position of P. xanthoriae remains unresolved here as its position differs in all constructed trees always with low bootstrap support. Several other relationships (e.g. the close relationship of P. miyajii + P. cutis /P. paracutis, P. tumulicola + P. stagnorum or P. moriformis + the supergroup consisting of P. zopfii /P. bovis, P. ciferrii, P. pringsheimii, P. cerasi, P. cookei, P. blaschkeae, P. tumulicola, and P. stagnorum) are very highly supported by bootstrap values > 90 in all (NJ, MP, ML) calculated trees. A single P. moriformis sequence (MK445153) was positioned within the P. zopfii /P. bovis clade in the ITS2 sequence-structure overview tree (Fig. 2). This strain (SAG 263-2) appears in the P. moriformis clade (cluster IX) in the phylogram based on the partial cytb sequences by Jagielski et al. (2019).
Comparing the sequence-structure tree to a tree based on sequence data only ( Figure S2), it is apparent that the sequence-only tree is similar, sometimes lower supported   ). An alignment of 112 sequence-structure-pairs (x.fasta format) of Prototheca and affiliated species was created using 4SALE  and encoded by a 12-letter alphabet (Wolf et al. 2014) for reconstruction of this tree. GenBank accession numbers accompany each taxon name. Clades are alternately marked green and blue and are additionally named alongside the tree in accordance with the clade names proposed in the phylogram by Jagielski et al. (2019). Taxa which were manually chosen for the subset are marked bold. The tree is rooted with Chlorella vulgaris FM205854 than the tree based on the sequence-structure alignment and shows several differences in the topology, e.g. the positions of the P. tumulicola / P. stagnorum clade or the P. moriformis clade. Auxenochlorella protothecoides and Jaagichlorella luteoviridis branch inside the P. cutis / P. paracutis / P. miyajii clade in the sequence-only tree. This latter clade without A. protothecoides and J. luteoviridis is highly supported in the sequence-structure tree with a bootstrap value of 93.

Phylogeny of Prototheca based on 18S rDNA sequence-structure data
Using the Neighbor-Joining algorithm, an overview tree based on 71 18S rDNA sequence-structure pairs was created (Fig. 4). Here, as in several other trees based on rDNA data (e.g. Masuda et al. 2016;Hirose et al. 2018;Shave et al. 2021), P. wickerhamii appears to be polyphyletic with two strains (X56099, X74003) branching outside of the P. wickerhamii clade. Jagielski et al. (2019)   ) analyses. For NJ tree reconstruction the global multiple sequence-structure alignment (.xfasta format) as derived by 4SALE  was automatically encoded by a 12-letter alphabet (Wolf et al. 2014). For ML and MP tree reconstruction the "one letter encoded" fasta format (12-letter alphabet) as derived by 4SALE  was used. GenBank accession numbers accompany each taxon name. Clades are alternately marked green and blue and are additionally named alongside the tree in accordance with the clade names proposed in the phylogram by Jagielski et al. (2019). The tree is rooted with Chlorella vulgaris FM205854 and Parachlorella kessleri FM205885 26 sequence-structure pairs. Trees calculated on this subset data (Fig. 5) show P. xanthoriae as the sister group to all strains except the outgroup. Auxenochlorella protothecoides and Jaagichlorella luteoviridis are sister group to all remaining taxa, which are then further divided into a P. miyajii /P. cutis clade and a second clade, in which Helicosporidium sisters with P. wickerhamii and another supergroup of several Prototheca species. This supergroup forms two clades, the first being P. ulmea /P. moriformis and their sister group P. tumulicola /P. stagnorum, the second divided into a P. blaschkeae clade and a second clade consisting of P. ciferrii, P. moriformis and P. zopfii /P. bovis. Bootstrap support of this tree is generally high as all but one external nodes are supported by a bootstrap value > 65. The trees calculated on 18S rDNA sequence-structure data show similar topology to the trees based on LSU rDNA data proposed in literature (e.g. Masuda et al. 2016;Hirose et al. 2018), but differ from the phylogram based on partial cytb sequences by Jagielski et al. (2019) in several aspects. P. stagnorum, P. tumulicola and P. moriformis appear towards the root of the tree in the cytb sequence based phylogram, while our tree shows all three species distant to the root and forming the sister 0.03   1 6 3 5 1 2 P r o t o t h e c a b la s c h k N 6 1 0 7 0 1 A u x e n o c h lo r e ll a p r o t o t h e c o id   ). An alignment of 71 sequencestructure-pairs (x.fasta format) of Prototheca and its affiliated species was created using 4SALE  and encoded by a 12-letter alphabet (Wolf et al. 2014) for reconstruction of this tree. GenBank accession numbers accompany each taxon name.

P r o t o t h e c a w ic k e r h a m ii
Clades are alternately marked green and blue and are additionally named alongside the tree in accordance with the clade names proposed in the phylogram by Jagielski et al. 2019. Taxa which were manually chosen for the subset are marked bold. The tree is rooted with Chlorella vulgaris FM205854 and Parachlorella kessleri FM205885 group to P. zopfii / P. bovis, P. moriformis, P. ciferrii and P. blaschkeae. In the cytb sequenced phylogram, a multifurcation occurs including P. wickerhamii, the sister groups P. miyajii and P. cutis as well as a clade consisting of P. xanthoriae, Helicosporidium sp. and Auxenochlorella protothecoides. P. wickerhamii is shown to be the sister group of P. zopfii / P. bovis, P. ciferrii, P. blaschkeae, P. tumulicola, P. stagnorum and P. moriformis in our tree, although with low bootstrap support. Helicosporidium sp. is sister group to all of these species (with moderate bootstrap support) and P. miyajii / P. cutis sister with these species including Helicosporidium sp. with bootstrap values > 70 for all methods applied (NJ, MP, ML).
Comparing the sequence-structure tree to a tree based on sequence data only ( Figure S3), it is apparent that the bootstrap support is mostly higher although sometimes similar in the sequence-structure tree. The topology differs slightly, e.g. in the P. zopfii /P. bovis + P. ciferrii + P. moriformis clade, P. tumulicola + P. stagnorum clade or within the P. wickerhamii clade.  Fig. 5 18S rDNA sequence-structure Maximum-Likelihood tree calculated with R (R Core Team, 2018) including a representative subset of 26 sequence-structure pairs from Prototheca and its affiliated species which were manually selected from Fig. 4. Bootstrap values from 100 pseudo-replicates mapped at the internodes are from Maximum-Likelihood (ML), Maximum-Parsimony (MP, obtained from PAUP (Swofford, 2002)) and Neighbor-Joining (NJ, obtained from ProfDistS ) analyses. For NJ tree reconstruction the global multiple sequence-structure alignment (.xfasta format) as derived by 4SALE  was automatically encoded by a 12-letter alphabet (Wolf et al. 2014). For ML and MP tree reconstruction the "one letter encoded" fasta format (12-letter alphabet) as derived by 4SALE  was used. GenBank accession numbers accompany each taxon name. Clades are alternately marked green and blue and are additionally named alongside the tree in accordance with the clade names proposed in the phylogram by Jagielski et al. (2019). The tree is rooted with Chlorella vulgaris FM205854 and Parachlorella kessleri FM205885 Phylogeny of Prototheca based on combined ITS2 and 18S rDNA sequence-structure data A combined ITS2 and 18S rDNA sequence-structure alignment was created from strains which appeared in both the ITS2 and 18S rDNA subset. NJ, MP and ML trees were calculated on this 15 taxa sequence-structure alignment (Fig. 6). This tree is highly supported by bootstrap values ≥ 70 at all nodes throughout the whole tree with most of them being > 95. P. cutis and P. miyajii form a clade outside all other Prototheca clades, which are then further divided into a P. moriformis clade and the sister group consisting of P. stagnorum, P. tumulicola, P. blaschkeae, P. ciferrii and P. zopfii /P. bovis. In this supergroup, P. stagnorum and P. tumulicola find themselves together against the remaining taxa which then have P. blaschkeae as sister group to P. ciferrii and P. zopfii /P. bovis clades.
Comparing the ITS2 and 18S rDNA subset trees to the tree based on the combined alignment, the trees show similar topology despite several species missing in the combined alignment. In all three trees, P. zopfii / P. bovis (and one P. moriformis strain in the 18S rDNA tree) is the sister group to P. ciferrii, forming a supergroup which then is sister group to P. blaschkeae.
While P. moriformis and P. tumulicola / P. stagnorum are sister groups in the 18S rDNA tree, P. moriformis is sister group to several more species in the ITS2 and the combined tree. P. cutis / P. miyajii form a sister group to all other Prototheca strains in the combined and the 18S rDNA tree (except P. xanthoriae, which sisters with all species except the outgroup in the 18S rDNA tree) but are more closely related to Auxenochlorella protothecoides and Jaagichlorella luteoviridis in the tree based on ITS2 sequence-structure data. Accordingly, these nodes are the nodes showing a relatively low bootstrap support in the very highly supported tree based on the combined 18S rDNA and ITS2 alignment.
Calculating trees on ITS2 and 18S rDNA sequence-structure data of the 15 chosen taxa for the combined alignment  (Swofford, 2002)) and Neighbor-Joining (NJ, obtained from ProfDistS ) analyses. For NJ tree reconstruction the global multiple sequence-structure alignment (.xfasta format) as derived by 4SALE  was automatically encoded by a 12-letter alphabet (Wolf et al. 2014). For ML and MP tree reconstruction the "one letter encoded" fasta format (12-letter alphabet) as derived by 4SALE  was used. Strain numbers accompany each taxon name. The tree is rooted with Chlorella vulgaris CCAP 211/81 and Parachlorella kessleri CCAP 211/11G separately, it is apparent that the combined ML tree shows more similarity to the 18S rDNA tree ( Figure S4) than the ITS2 tree ( Figure S5). Both separate trees show the supergroup consisting of P. moriformis, P. tumulicola, P. stagnorum, P. blaschkeae, P. ciferrii and P. zopfii / P. bovis with a bootstrap value of 100. The relationships between the Prototheca strains in this supergroup varies; however the combined tree and the 18S rDNA tree show P. blaschkeae being a sister group to P. ciferri and P. zopfii / P. bovis while in the ITS2 tree, P. blaschkeae is related to P. zopfii / P. bovis, but with low bootstrap support. The ITS2 tree shows P. miyajii / P. cutis being sister group to Auxenochlorella protothecoides and Jaagichlorella luteoviridis, whereas this relationship doesn't appear in the 18S rDNA or combined ML tree. The bootstrap support of the 18S rDNA tree is overall high with all but one bootstrap value > 60. In the ITS2 tree, two bootstrap values are lower than 50 with the accompanying nodes being the ones where the ITS2 tree doesn't show the same topology as either the 18S rDNA or combined tree.
Finally, if we compare ITS2 and 18S rDNA trees, we must not forget that we cannot include P. wickerhamii in the comparison. In order to deduce the phylogeny of the entire genus Prototheca, i.e., including P. wickerhamii, one always needs at least one additional marker gene beside ITS2.

Evolution of protothecean ITS2 secondary structures
ITS2 secondary structures of six Prototheca sequences were constructed using RNAstructure (Fig. 7). In general, these structures folded into the common core structure known for eukaryotes with four helices . Protothecean ITS sequences are known to vary in length (Marques et al. 2015). Prototheca sequences in this work were between 269 (all three P. tumulicola strains) and 543 / 544 bp (P. moriformis MF163495 /P. ulmea MF163497) long. ITS2 sequences of P. wickerhamii were significantly longer (1171-1358 bp). ITS2 structures from P. blaschkeae and P. cutis showed an exceptionally large fourth helix, while helix IV of P. stagnorum and P. zopfii was rather short. The third helix of P. ulmea appears to be bifurcated. Figure 8 visualizes the sequence-structure alignment of all Prototheca strains in the subset by a 51% consensus structure. A few bindings in helix II, between helix II and III and at the end of helix III are shown to be 80% conserved where known ITS2 structure motifs (the U-U mismatch in helix II, the triple A between helix II and III and the UGGU motif in helix III) are generally located.
Despite the differences in length in the Prototheca ITS2 sequences, homology modeling of the secondary structures was possible with just three templates (P. zopfii, P. cutis, P. stagnorum) at 50% consensus level for all Prototheca sequences except P. blaschkeae and P. wickerhamii. The P. zopfii template could be used to model other P. zopfii structures and those of P. bovis, P. cerasi, P. ciferrii, P. cookei, one P. moriformis strain and P. pringsheimii. These species also form a supergroup in the ML sequence-structure tree (Fig. 3). With the P. cutis template, all strains of the P. cutis /P. paracutis + P. miyajii clade could be predicted. The P. stagnorum template could be used for prediction of the secondary structures of other P. stagnorum and the P. tumulicola sequences as well as P. ulmea, P. xanthoriae and P. moriformis sequences with a lower consensus. Therefore, three additional templates were created (P. blaschkeae, P. ulmea and P. xanthoriae).
Given their distant relationship in the pylogenetic trees based on ITS2 sequence-structure data, elongation of helix IV of the ITS2 in P. blaschkeae and P. cutis seems to have occurred independently in the course of evolution.
ITS2 is one of the most effective phylogenetic markers. The high variability allows to study closely related organisms, the conserved structure reveals larger relationships. In most cases, the secondary structure helps to better align variable sequences. Sometimes, however, the length variations and differences even within a genus are already so large that alignments (whether based only on sequence or on sequencestructure information) should be viewed with caution. Prototheca is such an example. Homology is difficult to discern and individual sequences are even impossible to align at all. On the other hand, if you take out only a few sequences (e.g. those with an extremely elongated fourth helix), the alignment quickly becomes much more compact. With this study, we reconstructed phylogenetic trees on extremely diverse Prototheca sequences-whose sequence-structure information was encoded into a new alphabet; and indeed the results show robust trees similar to those based on other markers (e.g. 18S, LSU or cytb). We encourage the commuity to draw on additional markers and, by comparison and/or concatination, to better and better understand the phylogeny of Prototheca and related taxa.
To understand ITS2 length differences further research is needed. Compared to other genera, in terms of extreme sequence differences, it seems possible to discover additional species in the Prototheca species complex. Such species will then put the sequence differences into perspective and/or significantly advance our understanding of length variation (e.g. by expansion, duplication, and/or alternative splicing), or more generally, our understanding of RNA sequence-structure evolution.

Conclusion
In this work, using sequence-structure information simultaneously, for two phylogenetic markers (ITS2 and 18S rDNA), we reconstructed generally well-supported phylogenetic trees that are in overall agreement with the trees based on rDNA sequences (mainly LSU data) proposed in literature but show several topological differences to trees calculated on cytb sequences. Prototheca wickerhamii, the main causative for human protothecosis, could not be included in analysis based on ITS2 data since its ITS2 sequences were exceptionally long and could therefore not be aligned with other Prototheca sequences. The phylogenetic trees calculated on sequence-structure alignments of our subset data show Maximum-Likelihood support (> 50) for all but three branches in both the ITS2 and the 18S rDNA tree. Bootstrap support values are generally higher than those from sequence-only analyses (in this study or in the available literature using RNA and/or protein data).
The ITS2 of Prototheca is known to vary in length. Our study shows that out of the Prototheca ITS2 structures we reconstructed, P. blaschkeae and P. cutis displayed an elongated fourth helix. Helix III of P. moriformis (formerly P. ulmea) appears to be bifurcated. Despite the differences in length, a 51% consensus structure showing all but the fourth helix could be visualized with some nucleotide bonds being 80% conserved throughout all examined Prototheca structures.  Ankenbrand et al. 2015). Templates were created in RNAstructure (Reuter et al. 2010) based on minimum free energy and constrained folding. The stem, consisting of the last 25 nucleotides of the 5.8S and the first 25 nucleotides of the 28S rDNA is highlighted in purple (5.8S) and blue (28S) using Varna (Darty et al. 2009) helix III ITS2 Fig. 8 Visualization of the subset Prototheca sequence-structure alignment without gaps (outgroups, Jaagichlorella and Auxenochlorella species were excluded) by a 51% consensus structure created in 4SALE . Nucleotide bonds that are at least 80% conserved are marked in yellow. Conservation of the sequence is indicated by red (low conservation) to green (high conservation) color. Nucleotides which are 100% conserved in all sequences are written as A, U, G or C