Using RNA-Seq for Transcriptome Profiling of Botrylloides sp. Regeneration

Meier, Michael; Wilson, Megan J.

doi:10.1007/978-1-0716-2172-1_32

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2450))

5820 Accesses
3 Citations
2 Altmetric

Abstract

The decrease in sequencing costs and technology improvements has led to the adoption of RNA-sequencing to profile transcriptomes from further non-traditional regeneration model organisms such as the colonial ascidian Botrylloides leachii. The relatively unbiased way in which transcripts are identified and quantified makes this technique suitable to detect large-scale changes in expression, and the identification of novel transcripts and isoforms. Of particular interest to many researchers is the discovery of differentially expressed transcripts across different treatment conditions or stages of regeneration. This protocol describes a workflow starting from processing raw sequencing reads, mapping reads, assembly of transcripts, and measuring their abundance, creating lists of differentially expressed genes and their biological interpretation using gene ontologies. All programs used in this protocol are open-source software tools and freely available.

You have full access to this open access chapter, Download protocol PDF

Bioinformatic Methods for the Analysis of High-Throughput RNA Sequencing in Arbuscular Mycorrhizal Fungi

RNA-Seq for Transcriptome Analysis in Non-model Plants

RNA-Seq Data Analysis Pipeline for Plants: Transcriptome Assembly, Alignment, and Differential Expression Analysis

Key words

1 Introduction

Ascidians, also known as tunicates or sea squirts, are marine invertebrate filter-feeding animals. They belong to the Tunicata subphylum , an extant sister clade of the vertebrate clade in the chordate phylum. The close evolutionary relationship is most evident during embryo development where ascidians share vertebrate morphological features such as a notochord and a neural tube. Broadly, ascidians can be classified as solitary or colonial and whether they are pelagic or sessile. Botrylloid colonial ascidians have become an important model organism to study whole-body regeneration, chordate evolution, immunobiology, and allorecognition [1,2,3,4,5,6,7]. Fueled by recent advances in next-generation sequencing technology, RNA-sequencing has become a standard tool to characterize entire transcriptomes of cells, tissues, and whole organisms. This protocol uses publicly available RNA-seq data from a regeneration time course experiment in Botrylloides leachii [1]. Additionally, with the availability of a sequenced genome [8], a wider range of software tools (for sequenced organisms) are now also available to analyze this RNA-seq data from B. leachii.

Despite the wide use of RNA-seq, data analysis workflows are equally numerous and not yet standardized [9]. RNA-seq can provide a snapshot of all transcripts present in a cell or tissue of interest. Often though, the research questions are centered around quantifying changes in gene expression across time points or treatment conditions. To accurately compare different samples to each other important post-processing steps of RNA-seq data must be done before meaningful conclusions can be drawn. The focus of this chapter is to provide the user with a workflow to produce a table of differentially expressed genes (DEGs) from an RNA-seq experiment. The dataset used in this protocol has been previously published by our lab and consists of eight samples, each consisting of pooled RNA from multiple regenerating fragments isolated during a regeneration time-course [1]. Although no replicates were included in this earlier study, limiting differential expression analysis of lowly expressed genes, this protocol includes a “simulated” workflow for DESeq2 [10] to obtain a list of differentially expressed genes across stages of regeneration. We also provide an overview of library preparation strategies and overall experimental design.

Figure 1 presents a general workflow for species with a sequenced and annotated genome.

In principle, this protocol can also be applied to RNA-seq datasets generated from other organisms. Although a genome reference is needed when following the protocol described here, we also suggest software to quantify gene expression without such a reference.

2 Materials

One of the challenges in conducting RNA-seq analysis or any other bioinformatic task is the installation and maintenance of various software packages and programs which can represent a major hurdle for users new to this field. Commands written in shell will be indicated by the “$” prefix, commands in R will be proceeded by “>” and outputs by “#”.

1.
RNA-seq data (see Notes 1–4).
2.
Reference genome (see Notes 5 and 6).
3.
Genome annotations and gene ontology information (see Notes 6–8).
4.
A medium to high-performance computer (see Note 9).
5.
A terminal access to that computer (see Note 10).
6.
A scientific software management environment: Bioconda (see Notes 11 and 12).
7.
An RNA sequence mapper: STAR (see Note 12).
8.
Differential expression software: DESeq2 (see Note 12).
9.
Command-line access to the sequence read archive (SRA): SRA-Tools (see Note 13).
10.
Gene Ontology (GO) analysis toolkit: GOATOOLS (see Note 14).
11.
R statistical computing environment (see Note 15).

3 Methods

This protocol will be exemplified using our paired RNA-seq data of regeneration in B. leachii [1]. This dataset consists of eight samples which are listed in Table 1 and can be retrieved from the SRA archive SRP064769 (see Note 16).

Table 1 RNA-seq sample list from SRA archive SRP064769

Full size table

3.1 Quality Control and Trimming

First, before mapping can be done, quality and adapter trimming should be performed.

1.
Open a terminal on the computer.
2.
Change directory to the folder containing the RNA-seq data to be analyzed:
$ cd /Users/yourusername/projectfolder/RNAseqdata
3.
Quality control (QC) paired-end data by running the following command:
$ trim_galore --paired --fastqc embr_1.fq embr_2.fq
4.
QC single-ended data run this command instead:
$ trim_galore --fastqc embr.fq
5.
Open the newly created embr.fastqc file using a web browser.
6.
If there are persistent QC issues with the sequencing data additional filtering steps might be necessary (see Note 17).
7.
Otherwise, proceed to mapping the reads to the reference genome.

3.2 Mapping

Before we can align the reads to the reference genome, we have to build a genome index using STAR [11]. Then samples will be mapped and gene expression values will be calculated and expressed as a raw gene count number.

1.
In terminal create a folder for the reference genome files (see Note 18).
2.
Create the genome index (see Note 19):
$ STAR --runMode genomeGenerate --runThreadN 4 --genomeDir genome_Boleac --genomeFastaFiles Boleac_SBv3_genome.fasta --sjdbGTFfile Boleac_transcripts.gtf
3.
Align the reads (see Notes 20 and 21):
$ STAR --runMode alignReads --genomeDir genome --outFileNamePrefix BL_regeneration/reg_0 --sjdbGTFfile Boleac_transcripts_v5.gtf --quantMode GeneCounts --runThreadN 4 --outSAMtype BAM SortedByCoordinate --readFilesIn reg_0_1_val_1.fq reg_0_2_val_2.fq
4.
Repeat for other samples (see Note 22).

3.3 Normalizing Count Data with DESeq2

1.
Start R software environment [12].
2.
Set the working directory to the path where your STAR output files are contained.
> setwd(“path to your local folder”)
3.
Load DESeq2 by typing:
> library(DESeq2)
4.
Concatenate count files by typing (see Notes 23 and 24):
> ff <- list.files( path = "./", pattern = "*ReadsPerGene.out.tab$", full.names = TRUE ) > counts.files <- lapply( ff, read.table, skip = 4 ) > counts <- as.data.frame(sapply( counts.files, function(x) x[ , 4 ])) > ff <- gsub( "[.]ReadsPerGene[.]out[.]tab", "", ff ) > ff <- gsub( "[.]/counts/", "", ff ) > colnames(counts) <- c("embr","wcol","reg_0","reg_1","reg_2","reg_3","reg_4","reg_5") > row.names(counts) <- counts.files[[1]]$V1
5.
DESeq2 requires a sample information sheet (see Note 25) that can be created using a spreadsheet software such as MS Excel and imported into R by typing:
coldata <- read.csv(file="coldata.csv", row.names=1)
6.
Generate a dds object which normalizes all count data across samples to library size by typing:
> dds <- DESeqDataSetFromMatrix(countData = counts,colData = coldata,design = ~ stage) > dds <- DESeq(dds)
7.
Create a sample distance matrix type (see Note 26):
> res <- results(dds) > vsd <- vst(dds, blind=FALSE) > sampleDists <- dist(t(assay(vsd))) > data <- plotPCA(vsd, returnData=TRUE) > percentVar <- round(100 * attr(data, "percentVar"))
8.
Load ggplot2 by typing:
> library(ggplot2)
9.
Plot the principal component analysis (PCA) of your data (Fig. 2) with:
> p<-ggplot(data, aes(PC1, PC2)) + geom_text(label=rownames(data), nudge_x=0.25, nudge_y=0.25,check_overlap=T) + xlab(paste0("PC1: ",percentVar[1],"% variance")) + ylab(paste0("PC2: ",percentVar[2],"% variance")) + theme_light() > p

3.4 Differential Gene Expression Analysis Using DESeq2

To extract differentially expressed genes (DEGs) between conditions (stages 1 and 2 and 0, arbitrarily chosen), we define the contrasts we are interested in with the results() function from DESeq2 (see Note 27).

1.
To get differentially expressed genes between stages 1 and 0, we type:
> res1_0 <- results(dds, contrast=c("stage","1","0"))
2.
We then using the subset function to generate result tables of differentially up- and down-regulated at an adjusted p-value also described as false discovery rate (FDR) of 5%.
> res_1_0_sig<-subset(res1_0, padj < 0.05) > res_1_0_sig_up<-subset(res_1_0_sig, log2FoldChange > 0.5) > res_1_0_sig_up_order<-res_1_0_sig_up[order(res_1_0_sig_up$padj),] > write.csv(res_1_0_sig_up_order, file=“res_1_0_sig_up_order.csv”) > res_1_0_sig_down<-subset(res_1_0_sig, log2FoldChange < 0.5) > res_1_0_sig_down_order<-res_1_0_sig_down[order(res_1_0_sig_down$padj),] > write.csv(res_1_0_sig_down_order, file=“res_1_0_sig_down_order.csv”)
3.
To view the results table (Table 2):
> head(res_1_0_sig_up_order)
4.
Perform the same process to generate results for comparing stages 2 and 0.

Table 2 List of DEGs ordered by increasing significance

Full size table

> res2_0 <- results(dds, contrast=c("stage","2","0")) > res_2_0_sig<-subset(res2_0, padj < 0.05) > res_2_0_sig_up<-subset(res_2_0_sig, log2FoldChange > 0.5) > res_2_0_sig_up_order<-res_2_0_sig_up[order(res_2_0_sig_up$padj),] > write.csv(res_2_0_sig_up_order, file=“res_2_0_sig_up_order.csv”) > res_2_0_sig_down<-subset(res_2_0_sig, log2FoldChange < 0.5) > res_2_0_sig_down_order<-res_2_0_sig_down[order(res_2_0_sig_down$padj),] > write.csv(res_2_0_sig_down_order, file=“res_2_0_sig_down_order.csv”)

3.5 Gene Ontology Enrichment Analysis

To get more insight into the biological significance of DE genes we obtained in the above step, we will assign gene ontologies and perform Gene Ontology (GO) enrichment analysis using goatools [13].

1.
In R create a list of gene IDs from the result file:
> gene_ids_up<-rownames(res_1_0_sig_up_order) > write.table(gene_ids_up,file="gene_ids_up.csv", row.names=FALSE, quote = FALSE, col.names=FALSE)
2.
Repeat this step for all the result output files created.
3.
Define our background gene list:
> res1_0_universe<-subset(res1_0, baseMean > 10) > gene_ids_univ<-rownames(res1_0_universe) > write.table(gene_ids_univ,file="gene_ids_univ.txt", row.names=FALSE, quote = FALSE, col.names=FALSE)
4.
Modify the GAF file so the identifiers match the gene IDs in our lists:
> gaf<-read.table(file="Boleac_slimTunicate.gaf",skip=5,sep="\t") > gaf_red<-gaf_new[ ,c("V6","V5")] > gaf_red_col<-aggregate(V5 ~V6, gaf_red, paste, collapse=";") > write.table(gaf_red_col,file="ids2go_BL.txt", row.names=FALSE, quote = FALSE, col.names=FALSE)
5.
Find enrichments with a python script run in terminal (see Note 28).
$ find_enrichment.py gene_ids_up.txt gene_ids_univ.txt ids2go_BL.txt --pval=0.05 --method=fdr_bh --pval_field=fdr_bh --outfile=results_id2gos_1_0_up.xlsx
6.
Plot top 15 enriched terms of each category (see Notes 29 and 30, Fig. 3).
> library(“readxl”) > go<-readxl::read_excel("results_id2gos_1_0_up.xlsx") > go_MF<-subset(go, NS=="MF") > go_MF_15<-go_MF[1:15, ] > b<-ggplot(go_MF_15, aes(x=reorder(name, -p_fdr_bh),y=study_count,color=p_fdr_bh,size=study_count))+geom_point() +coord_flip() > b + theme_minimal() + labs(x="Molecular Function",y="Gene count",color="p.adjust", size="Gene count")

4 Notes

1.
Although this protocol focuses largely on the analysis of RNA-seq data, a few important considerations about the study design are mentioned here. How many reads per sample and how many biological replicates should an RNA-seq experiment have? As a general rule, biological replicates should be prioritized over-read depth. To better estimate the number of reads needed, Illumina offers a read coverage calculator called Scotty (https://support.illumina.com/downloads/sequencing_coverage_calculator.html). This web-based tool is designed to use user data from a small trial experiment consisting of at least two biological replicates per condition to estimate the number of replicates and sequencing depth needed. Alternatively, publicly available datasets closely resembling the user’s data can be used. The higher the expected biological variation the more replicates and the higher the sequencing depth needed to identify DEGs.
2.
Due to the high sensitivity of RNA-seq, so-called batch-effects created during sample or library preparation can have negative implications on identifying DEGs stemming from “true” biological variation. To control for these effects, processing (e.g., RNA-extraction ) of samples and library prep should ideally happen at the same time.
3.
This experiment used an RNA-seq library preparation method generating strand-specific paired-end reads which are recommended when no reference genome is available. Strand-specific information is also useful if anti-sense transcripts such as long-non-coding RNAs are of interest. If the goal is to simply measure differential gene expression between conditions, then an un-stranded library can be sufficient although the price difference in stranded vs. non-stranded kits almost warrants a stranded kit.
4.
RNA quality is very important and should be assessed using Bioanlayzer (Agilent Technologies) which gives an RNA integrity number (RIN) based on the ratio of the major ribosomal RNA fractions. Especially samples that are widely different from the average RIN of other samples should be reconsidered. Most sequencing providers will recommend a certain threshold above which they will process a sample (>7.0). Ascidians biosynthesize cellulose which is incorporated into their protective tunic imposes the use of relatively harsh methods to extract RNA, which can lead to degradation of the RIN in some samples. We have had success using plant RNA extraction kits to circumvent this issue.
5.
If there is no genomic sequence available to map RNA-seq reads, programs such as Trinity [14] or SOAPdenovo [15] can be used to assemble and quantify transcripts.
6.
Download the genome sequence (FASTA) and annotation files (GTF and GAF) for B. leachii can be obtained from Aniseed https://www.aniseed.cnrs.fr/aniseed/download/download_data. This can be done directly in terminal using the commands:
$ wget https://www.aniseed.cnrs.fr/aniseed/download/?file=data%2Fboleac%2FBoleac_SBv3_genome_gff3_fasta.zip $ wget https://www.aniseed.cnrs.fr/aniseed/download/?file=data%2Fboleac%2FBoleac.gaf.gz
7.
Download the gene annotation and gene ontology files in terminal by typing:
$ wget https://www.aniseed.cnrs.fr/aniseed/download/?file=data%2Fboleac%2FBoleac_slimTunicate.gaf.gz -O Boleac_slimTunicate.gaf.gz $ gunzip Boleac_slimTunicate.gaf.gz $ wget http://purl.obolibrary.org/obo/go/go-basic.obo -O go-basic.obo
8.
If there is no genome annotation available, it is possible to perform automated genome annotation. However, this analysis lies outside of the scope of this protocol . For a generic pipeline for performing such an analysis, see Blanchoud et al. [8].
9.
A computer with more than 4 GB RAM is required for analysis. The most memory-intensive step is the alignment of the reads to the genome which scales with the genome size of the organism. This step could be also performed on a high-capacity server more suitable for this task.
10.
We recommend that you familiarize yourself with basic command line use (e.g., https://ubuntu.com/tutorials/command-line-for-beginners#1-overview) to create, rename, and navigate folders and files.
11.
To also make RNA-seq analysis more reproducible, we recommend using a software manager called Bioconda [16]. This protocol will be using software available in this environment, but similar software could be installed from other sources too.
12.
To install Miniconda (a version of Anaconda which supports Bioconda) on a computer, run the following commands:
$ curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-${ARCH}.sh $ sh Miniconda3-latest-${ARCH}.sh

where the environment variable ARCH should be set to the type of your local operating system (e.g., Linux-ppc64le, Linux-x86_64, MacOSX-x86_64, Windows-x86). This installation includes software tools used in this protocol such as STAR [11], statistical environment R [12], and DESeq2 [10]. For a full list of installed packages run:
$ conda list $ conda install -c bioconda bioconductor-deseq2
13.
To install SRA-Tools run:
$ conda install -c bioconda sra-tool

Detailed instructions and information on setting up downloads from SRA archives can be found on the website https://ncbi.github.io/sra-tools/
14.
Install goatools [13] by typing the code below in terminal.
$ conda install -c bioconda goatools
15.
Go to R project website (http://www.r-project.org/), download, and install an up-to-date R version [12].
16.
Individual files from the archive SRP064769 can be downloaded using the fastq-dump command:
$ fastq-dump --split-files SRR2729873

This will create two FASTQ files for each sample, rename all files with their short identifier according to Table 3, e.g., SRR2729873 to embr before proceeding.
17.
A detailed explanation of FastQC and examples of good/bad datasets and what to look for in the reports created can be found here on the website: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
18.
To create the directory that will contain the genome index, use the command:
$ mkdir genome_Boleac
19.
The option “--genomeDir” specifies where the generated indexes are stored. The path to the FASTA file is specified with “–genomeFastaFiles” and the GTF annotation file with “–sjdbGTFfile.” In this case, the annotation file is supplied in GFF format “–sjdbGTFtagExonParentTranscript,” Parent option can be used instead. “–runThreadN” option defines the number of threads used for parallelization.
20.
This commands give paths to genome indexes (--genomeDir), the GTF file (--sjdbGTFfile), the trimmed FASTQ files (– readFilesIn), and will quantify transcripts(--quantMode GeneCounts). This step is computationally intensive and can take several hours.
21.
The option—quantMode GeneCounts—produces a file named ReadsPerGene.out.tab with four columns specified in Table 3: geneID; 2: counts for unstranded RNA; 3: counts for first strand; 4: counts for second strand. Additionally, the number of unmapped and multi-mapping reads are in the header. If the percentage of unmapped reads is higher than 25% of total reads, the sample is problematic and should be reconsidered.
22.
This command detailed in step 3 of Subheading 3.2 is only for sample reg_0. For other samples, the “–outFileNamePrefix” as well as “–readFilesIn” part needs to be altered. Again, specify a directory for the output files (see Note 20), and give paths to genome indexes (--genomeDir) as well as GTF file (--sjdbGTFfile).
23.
For subsequent analysis, using DESeq2 read counts must be selected based on the library preparation protocol was used, unstranded or stranded. To identify the library preparation method and sequencing strategy of the sample, the number of reads for each strand and in total can be counted using the awk function:
$ grep -v "N_" reg_0ReadsPerGene.out.tab | awk '{unst+=$2;forw+=$3;rev+=$4}END{print\ unst,forw,rev}'

which results in “$ 8842238 508841 8718480.” This is interpreted as 8,718,480 reads map to the reverse or second strand and only 508,841 reads map to the first strand, which indicates a stranded library was made and the reverse strand was sequenced.
24.
Depending on the library preparation method, it is crucial to select the right column in this step. In our case, second-strand synthesis was used, so column 4 was selected for further analysis.

> counts <- as.data.frame( sapply( counts.files, function(x) x[ , 4 ))
25.
To define which condition or time-points samples are associated, we create a coldata object which can be made in a text-editor or Microsoft Excel and saved in a CSV file format in the working directory specified earlier. In this example, the coldata object has a unique identifier for each sample (column 1) and one column specifying conditional information (Table 4). The unique identifier of samples must be identical in the coldata object as well as in the counts object. To check if that is the case type:
> all(rownames(coldata) == colnames(counts))

In this case, the counts object column names can be replaced with the sample identifiers in coldata. Please note that the condition in this chosen arbitrary is for demonstration purposes only.
26.
A principal component plot (PCA) is a useful tool to assess if and how individual samples cluster and if the variation in the expression of the most heterogeneously expressed genes can be explained by the nature of the sample (e.g., samples clustering along developmental time).

In this case, we can see (Fig. 2) a strong separation of the embryonic stage (explaining most of the variation) and separation of regeneration stages and the whole colony as the second-largest component of variation.
27.
It is not recommended to perform differential gene expression analysis on data sets without biological replicates. In our case and for demonstration purposes only, we arbitrarily pooled samples from reg_1 and reg_2 (Stage 1) and reg_3 and reg_4 (Stage 2) to compare it to reg_0 (Stage 0) in the coldata object (Table 4) when running DESeq2 in step 6 of Subheading 3.3. To get differentially expressed genes between these conditions, we need to specify which comparison we want to make.
28.
The results of the goatools enrichment analysis contains GO terms which are found significantly (Benjamini/Hochberg: fdr_bh at a 5% false discovery rate) overrepresented in the differentially expressed genes compared to the GO terms of all expressed genes in the samples. The resulting file contains terms for three major categories such as biological process (BP), cellular component (CC), and molecular function (MF).
29.
To import xls files back into R, install the library “readxl”:
> install.packages("readxl")
30.
To plot the other categories or more terms, you can change the subset and name the output accordingly.

Table 3 An example of a text file showing the first few lines of gene raw count data

Full size table

Table 4 Coldata object with sample information

Full size table

References

Zondag LE, Rutherford K, Gemmell NJ, Wilson MJ (2016) Uncovering the pathways underlying whole body regeneration in a chordate model, Botrylloides leachi using de novo transcriptome analysis. BMC Genomics 17:114
Article Google Scholar
Zondag L, Clarke RM, Wilson MJ (2019) Histone deacetylase activity is required for Botrylloides leachii whole-body regeneration. J Exp Biol 222:jeb203620
Article Google Scholar
Ballarin L, Zaniolo G (2007) Colony specificity in Botrylloides leachi. II. Cellular aspects of the non-fusion reaction. Invert Surviv J 4:38–44
Google Scholar
Blanchoud S, Rinkevich B, Wilson MJ (2018) Whole-body regeneration in the colonial tunicate Botrylloides leachii. Results Probl Cell Differ 65:337–355
Article CAS Google Scholar
Paz G, Rinkevich B (2002) Morphological consequences for multi-partner chimerism in Botrylloides, a colonial urochordate. Dev Comp Immunol 26:615–622
Article Google Scholar
Burighel P, Caicci F, Zaniolo G, Gasparini F, Degasperi V, Manni L (2008) Does hair cell differentiation predate the vertebrate appearance? Brain Res Bull 75:331–334
Article Google Scholar
Hirose E, Saito Y, Watanabe H (1997) Subcuticular rejection: an advanced mode of the allogeneic rejection in the compound ascidians Botrylloides simodensis and B. fuscus. Biol Bull 192:53–61
Article CAS Google Scholar
Blanchoud S, Rutherford K, Zondag L, Gemmell NJ, De Wilson MJ (2018) Novo draft assembly of the Botrylloides leachii genome provides further insight into tunicate evolution. Sci Rep 8:5518
Article Google Scholar
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628
Article CAS Google Scholar
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550
Article Google Scholar
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
Article CAS Google Scholar
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for statistical computing. Vienna, Austria. https://www.R-project.org/
Klopfenstein DV, Zhang L, Pedersen BS, Ramirez F, Warwick Vesztrocy A, Naldi A, Mungall CJ, Yunes JM, Botvinnik O, Weigel M, Dampier W, Dessimoz C, Flick P, Tang H (2018) GOATOOLS: a Python library for Gene Ontology analyses. Sci Rep 8:10872
Article CAS Google Scholar
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD, Chen ZH, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Article CAS Google Scholar
Xie YL, Wu GX, Tang JB, Luo RB, Patterson J, Liu SL, Huang WH, He GZ, Gu SC, Li SK, Zhou X, Lam TW, Li YR, Xu X, Wong GKS, Wang J (2014) SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30:1660–1666
Article CAS Google Scholar
Gruning B, Dale R, Sjodin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Koster J, Bioconda T (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15:475–476
Article Google Scholar

Download references

Acknowledgments

This work is supported by a Royal Society of New Zealand grant (UOO1713) to M.J.W. We would like to thank Berivan Temiz for critical reading of the manuscript.

Author information

Authors and Affiliations

Developmental Genomics Laboratory, Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
Michael Meier & Megan J. Wilson

Authors

Michael Meier
View author publications
You can also search for this author in PubMed Google Scholar
Megan J. Wilson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Megan J. Wilson .

Editor information

Editors and Affiliations

Department of Biology, University of Fribourg, Fribourg, Fribourg, Switzerland
Simon Blanchoud
Dept Génétique et Evolution, Université de Genève, Genève, Geneva, Switzerland
Brigitte Galliot

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Meier, M., Wilson, M.J. (2022). Using RNA-Seq for Transcriptome Profiling of Botrylloides sp. Regeneration. In: Blanchoud, S., Galliot, B. (eds) Whole-Body Regeneration. Methods in Molecular Biology, vol 2450. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2172-1_32

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2172-1_32
Published: 01 April 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2171-4
Online ISBN: 978-1-0716-2172-1
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Using RNA-Seq for Transcriptome Profiling of Botrylloides sp. Regeneration

Abstract

Similar content being viewed by others

Bioinformatic Methods for the Analysis of High-Throughput RNA Sequencing in Arbuscular Mycorrhizal Fungi

RNA-Seq for Transcriptome Analysis in Non-model Plants

RNA-Seq Data Analysis Pipeline for Plants: Transcriptome Assembly, Alignment, and Differential Expression Analysis

Key words

1 Introduction

2 Materials